NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20243
Since 2021 (last 5 years)8
Since 2016 (last 10 years)30
Since 2006 (last 20 years)48
Source
Applied Measurement in…48
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 48 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Brian E. Clauser; Victoria Yaneva; Peter Baldwin; Le An Ha; Janet Mee – Applied Measurement in Education, 2024
Multiple-choice questions have become ubiquitous in educational measurement because the format allows for efficient and accurate scoring. Nonetheless, there remains continued interest in constructed-response formats. This interest has driven efforts to develop computer-based scoring procedures that can accurately and efficiently score these items.…
Descriptors: Computer Uses in Education, Artificial Intelligence, Scoring, Responses
Peer reviewed Peer reviewed
Direct linkDirect link
Rios, Joseph – Applied Measurement in Education, 2022
To mitigate the deleterious effects of rapid guessing (RG) on ability estimates, several rescoring procedures have been proposed. Underlying many of these procedures is the assumption that RG is accurately identified. At present, there have been minimal investigations examining the utility of rescoring approaches when RG is misclassified, and…
Descriptors: Accuracy, Guessing (Tests), Scoring, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Steven L. Wise; G. Gage Kingsbury; Meredith L. Langi – Applied Measurement in Education, 2023
Recent research has provided evidence that performance change during a student's test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance,…
Descriptors: Tests, Weighted Scores, Test Wiseness, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Sarah Alahmadi; Christine E. DeMars – Applied Measurement in Education, 2024
Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods…
Descriptors: Multiple Choice Tests, Item Response Theory, College Students, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven; Kuhfeld, Megan – Applied Measurement in Education, 2021
Effort-moderated (E-M) scoring is intended to estimate how well a disengaged test taker would have performed had they been fully engaged. It accomplishes this adjustment by excluding disengaged responses from scoring and estimating performance from the remaining responses. The scoring method, however, assumes that the remaining responses are not…
Descriptors: Scoring, Achievement Tests, Identification, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020
Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…
Descriptors: Scoring, Expectation, Accuracy, Time
Peer reviewed Peer reviewed
Direct linkDirect link
Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020
This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…
Descriptors: Scoring, Test Format, Responses, Predictor Variables
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Applied Measurement in Education, 2020
This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists' judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores.…
Descriptors: Cutting Scores, Evaluation Methods, Standard Setting (Scoring), Equations (Mathematics)
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023
This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…
Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Almehrizi, Rashid S. – Applied Measurement in Education, 2021
KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…
Descriptors: Test Reliability, Scores, Scoring, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Carney, Michele; Crawford, Angela; Siebert, Carl; Osguthorpe, Rich; Thiede, Keith – Applied Measurement in Education, 2019
The "Standards for Educational and Psychological Testing" recommend an argument-based approach to validation that involves a clear statement of the intended interpretation and use of test scores, the identification of the underlying assumptions and inferences in that statement--termed the interpretation/use argument, and gathering of…
Descriptors: Inquiry, Test Interpretation, Validity, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Myers, Aaron J.; Ames, Allison J.; Leventhal, Brian C.; Holzman, Madison A. – Applied Measurement in Education, 2020
When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters' scoring processes…
Descriptors: Scoring Rubrics, Validity, Item Response Theory, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020
We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…
Descriptors: Scoring, Essays, Behavior, Predictive Measurement
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4