Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 7 |
Descriptor
Scoring | 16 |
Computer Assisted Testing | 8 |
Rating Scales | 4 |
Test Construction | 4 |
Test Scoring Machines | 4 |
Automation | 3 |
Comparative Analysis | 3 |
Correlation | 3 |
Higher Education | 3 |
Performance Based Assessment | 3 |
Scores | 3 |
More ▼ |
Source
Applied Measurement in… | 16 |
Author
Clauser, Brian E. | 2 |
Clyman, Stephen G. | 2 |
Attali, Yigal | 1 |
Ben-Simon, Anat | 1 |
Buzick, Heather | 1 |
Chang, Lucy | 1 |
Christine E. DeMars | 1 |
Cohen, Yoav | 1 |
Davison, Mark L. | 1 |
Dunham, Trudy C. | 1 |
El-Bayoumi, Gigi | 1 |
More ▼ |
Publication Type
Journal Articles | 16 |
Reports - Research | 8 |
Reports - Evaluative | 7 |
Collected Works - General | 1 |
Reports - Descriptive | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
What Works Clearinghouse Rating
Sarah Alahmadi; Christine E. DeMars – Applied Measurement in Education, 2024
Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods…
Descriptors: Multiple Choice Tests, Item Response Theory, College Students, Scores
Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016
Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…
Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Lottridge, Susan; Wood, Scott; Shaw, Dan – Applied Measurement in Education, 2018
This study sought to provide a framework for evaluating machine score-ability of items using a new score-ability rating scale, and to determine the extent to which ratings were predictive of observed automated scoring performance. The study listed and described a set of factors that are thought to influence machine score-ability; these factors…
Descriptors: Program Effectiveness, Computer Assisted Testing, Test Scoring Machines, Scoring
Rupp, André A. – Applied Measurement in Education, 2018
This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…
Descriptors: Design, Automation, Scoring, Test Scoring Machines
Puhan, Gautam – Applied Measurement in Education, 2009
The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…
Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

Mills, Craig N.; Stocking, Martha L. – Applied Measurement in Education, 1996
Issues that must be addressed in the large-scale application of computerized adaptive testing are explored, including considerations of test design, scoring, test administration, item and item bank development, and other aspects of test construction. Possible solutions and areas in which additional work is needed are identified. (SLD)
Descriptors: Adaptive Testing, Computer Assisted Testing, Elementary Secondary Education, Higher Education

Wise, Steven L., Ed.; And Others – Applied Measurement in Education, 1988
Six papers on the use of partial credit item response theory score models in applied measurement settings are presented. These applications include the scoring of medical certification examinations using computer-based patient simulations, narrative writing tests, and educational diagnosis. (TJH)
Descriptors: Clinical Diagnosis, Computer Assisted Testing, Computer Simulation, Educational Diagnosis

Hanson, Bradley A. – Applied Measurement in Education, 1996
Determining whether score distributions differ on two or more test forms administered to samples of examinees from a single population is explored using three statistical tests using loglinear models. Examples are presented of applying tests of distribution differences to decide if equating is needed for alternative forms of a test. (SLD)
Descriptors: Equated Scores, Scoring, Statistical Distributions, Test Format
McCarty, F. A.; Oshima, T. C.; Raju, Nambury S. – Applied Measurement in Education, 2007
Oshima, Raju, Flowers, and Slinde (1998) described procedures for identifying sources of differential functioning for dichotomous data using differential bundle functioning (DBF) derived from the differential functioning of items and test (DFIT) framework (Raju, van der Linden, & Fleer, 1995). The purpose of this study was to extend the…
Descriptors: Rating Scales, Test Bias, Scoring, Test Items

Clauser, Brian E.; Swanson, David B.; Clyman, Stephen G. – Applied Measurement in Education, 1999
Performed generalizability analyses of expert ratings and computer-produced scores for a computer-delivered performance assessment of physicians' patient management skills. The two automated scoring systems produced scores for the 200 medical students that were approximately as generalizable as those produced by the four expert raters. (SLD)
Descriptors: Comparative Analysis, Computer Assisted Testing, Generalizability Theory, Higher Education

Dunham, Trudy C.; Davison, Mark L. – Applied Measurement in Education, 1990
The effects of packing or skewing the response options of a scale on the common measurement problems of leniency and range restriction in instructor ratings were assessed. Results from a sample of 130 undergraduate education students indicate that packing reduced leniency but had no effect on range restriction. (TJH)
Descriptors: Education Majors, Higher Education, Professors, Rating Scales

Hardy, Roy A. – Applied Measurement in Education, 1995
Cost factors associated with the development, administration, and scoring of performance assessment tasks are examined in the context of a statewide or other large-scale assessment program. Resources of money, time, and expertise are discussed. (SLD)
Descriptors: Cost Estimates, Costs, Educational Assessment, Estimation (Mathematics)

Klein, Stephen P.; And Others – Applied Measurement in Education, 1995
Portfolios are the centerpiece of Vermont's statewide assessment program in mathematics. Portfolio scores in the first two years were not reliable enough to permit the reporting of student-level results, but increasing the number of readers or the number of portfolio pieces is not operationally feasible. (SLD)
Descriptors: Educational Assessment, Elementary Secondary Education, Mathematics Tests, Performance Based Assessment

Clauser, Brian E.; Ross, Linette P.; Clyman, Stephen G.; Rose, Kathie M.; Margolis, Melissa J.; Nungester, Ronald J.; Piemme, Thomas E.; Chang, Lucy; El-Bayoumi, Gigi; Malakoff, Gary L.; Pincetl, Pierre S. – Applied Measurement in Education, 1997
Describes an automated scoring algorithm for a computer-based simulation examination of physicians' patient-management skills. Results with 280 medical students show that scores produced using this algorithm are highly correlated to actual clinician ratings. Scores were also effective in discriminating between case performance judged passing or…
Descriptors: Algorithms, Computer Assisted Testing, Computer Simulation, Evaluators
Previous Page | Next Page »
Pages: 1 | 2