ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	30
Since 2006 (last 20 years)	48

Source

Applied Measurement in…

Publication Type

Journal Articles	48
Reports - Research	33
Reports - Evaluative	11
Reports - Descriptive	4

Education Level

Higher Education	6
Postsecondary Education	6
Middle Schools	4
Secondary Education	4
Grade 5	3
Junior High Schools	3
Elementary Education	2
Grade 7	2
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 8	1
High Schools	1
More ▼

Audience

Location

United Kingdom	2
Virginia	2
Arizona	1
Australia	1
Europe	1
Israel	1
Massachusetts	1
North Carolina	1
Oman	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	3
Test of English as a Foreign…	2
Measures of Academic Progress	1
National Assessment of…	1
Praxis Series	1
United States Medical…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 48 results Save | Export

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Automated Scoring of Short-Answer Questions: A Progress Report

Peer reviewed

Direct link

Brian E. Clauser; Victoria Yaneva; Peter Baldwin; Le An Ha; Janet Mee – Applied Measurement in Education, 2024

Multiple-choice questions have become ubiquitous in educational measurement because the format allows for efficient and accurate scoring. Nonetheless, there remains continued interest in constructed-response formats. This interest has driven efforts to develop computer-based scoring procedures that can accurately and efficiently score these items.…

Descriptors: Computer Uses in Education, Artificial Intelligence, Scoring, Responses

An Examination of Individual Ability Estimation and Classification Accuracy under Rapid Guessing Misidentifications

Peer reviewed

Direct link

Rios, Joseph – Applied Measurement in Education, 2022

To mitigate the deleterious effects of rapid guessing (RG) on ability estimates, several rescoring procedures have been proposed. Underlying many of these procedures is the assumption that RG is accurately identified. At present, there have been minimal investigations examining the utility of rescoring approaches when RG is misclassified, and…

Descriptors: Accuracy, Guessing (Tests), Scoring, Classification

Change in Engagement during Test Events: An Argument for Weighted Scoring?

Peer reviewed

Direct link

Steven L. Wise; G. Gage Kingsbury; Meredith L. Langi – Applied Measurement in Education, 2023

Recent research has provided evidence that performance change during a student's test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance,…

Descriptors: Tests, Weighted Scores, Test Wiseness, Scoring

Comparing Examinee-Based and Response-Based Motivation Filtering Methods in Remote Low-Stakes Testing

Peer reviewed

Direct link

Sarah Alahmadi; Christine E. DeMars – Applied Measurement in Education, 2024

Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods…

Descriptors: Multiple Choice Tests, Item Response Theory, College Students, Scores

A Method for Identifying Partial Test-Taking Engagement

Peer reviewed

Direct link

Wise, Steven; Kuhfeld, Megan – Applied Measurement in Education, 2021

Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not…

Descriptors: Scoring, Achievement Tests, Identification, Validity

The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy

Peer reviewed

Direct link

Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020

Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…

Descriptors: Scoring, Expectation, Accuracy, Time

Understanding and Interpreting Human Scoring

Peer reviewed

Direct link

Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020

This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…

Descriptors: Scoring, Test Format, Responses, Predictor Variables

Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2020

This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists' judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores.…

Descriptors: Cutting Scores, Evaluation Methods, Standard Setting (Scoring), Equations (Mathematics)

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Coefficient [beta] as Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-Scored Tests

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Measurement in Education, 2021

KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…

Descriptors: Test Reliability, Scores, Scoring, Computation

Comparison of Two Approaches to Interpretive Use Arguments

Peer reviewed

Direct link

Carney, Michele; Crawford, Angela; Siebert, Carl; Osguthorpe, Rich; Thiede, Keith – Applied Measurement in Education, 2019

The "Standards for Educational and Psychological Testing" recommend an argument-based approach to validation that involves a clear statement of the intended interpretation and use of test scores, the identification of the underlying assumptions and inferences in that statement--termed the interpretation/use argument, and gathering of…

Descriptors: Inquiry, Test Interpretation, Validity, Scores

Validating Rubric Scoring Processes: An Application of an Item Response Tree Model

Peer reviewed

Direct link

Myers, Aaron J.; Ames, Allison J.; Leventhal, Brian C.; Holzman, Madison A. – Applied Measurement in Education, 2020

When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters' scoring processes…

Descriptors: Scoring Rubrics, Validity, Item Response Theory, Interrater Reliability

Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring

Peer reviewed

Direct link

Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020

We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…

Descriptors: Scoring, Essays, Behavior, Predictive Measurement

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Scoring	35
Test Items	16
Evaluators	10
Standard Setting (Scoring)	10
Validity	10
Interrater Reliability	9
Comparative Analysis	8
Cutting Scores	8
Essay Tests	8
Item Response Theory	8
Evaluation Methods	7
Foreign Countries	7
Reliability	7
Scores	7
Accuracy	6
Automation	6
Correlation	6
Essays	6
Multiple Choice Tests	6
Scoring Rubrics	6
Statistical Analysis	6
Test Construction	6
Test Scoring Machines	6
Generalizability Theory	5
Licensing Examinations…	5
More ▼

Wyse, Adam E.	3
Attali, Yigal	2
Bridgeman, Brent	2
Clauser, Brian E.	2
Glazer, Nancy	2
Kannan, Priya	2
Sgammato, Adrienne	2
Tannenbaum, Richard J.	2
Wolfe, Edward W.	2
Adams, Elizabeth	1
Almehrizi, Rashid S.	1
Ames, Allison J.	1
Andrich, David	1
Arslan, Burcu	1
Bejar, Isaac I.	1
Ben-Simon, Anat	1
Bimpeh, Yaw	1
Boyer, Michelle	1
Brian E. Clauser	1
Buzick, Heather	1
Carney, Michele	1
Carol Eckerly	1
Chis, Liliana	1
Choi, Ikkyu	1
Christine E. DeMars	1
More ▼