ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	11

Source

Applied Measurement in…

Publication Type

Journal Articles	27
Reports - Research	16
Reports - Evaluative	9
Information Analyses	4
Opinion Papers	1
Reports - Descriptive	1

Education Level

Higher Education	3
Postsecondary Education	3
Grade 7	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	2
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 27 results Save | Export

The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy

Peer reviewed

Direct link

Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020

Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…

Descriptors: Scoring, Expectation, Accuracy, Time

Understanding and Interpreting Human Scoring

Peer reviewed

Direct link

Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020

This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…

Descriptors: Scoring, Test Format, Responses, Predictor Variables

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring

Peer reviewed

Direct link

Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020

We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…

Descriptors: Scoring, Essays, Behavior, Predictive Measurement

The Impact of Operational Scoring Experience and Additional Mentored Training on Raters' Essay Scoring Accuracy

Peer reviewed

Direct link

Choi, Ikkyu; Wolfe, Edward W. – Applied Measurement in Education, 2020

Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what…

Descriptors: Scoring, Accuracy, Training, Evaluators

Applying Cognitive Theory to the Human Essay Rating Process

Peer reviewed

Direct link

Finn, Bridgid; Arslan, Burcu; Walsh, Matthew – Applied Measurement in Education, 2020

To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters' scoring accuracy on TOEFL and GRE essays.…

Descriptors: Epistemology, Essay Tests, Evaluators, Cognitive Processes

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Regression Effects in Angoff Ratings: Examples from Credentialing Exams

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…

Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Comparing Human and Automated Essay Scoring for Prospective Graduate Students with Learning Disabilities and/or ADHD

Peer reviewed

Direct link

Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016

Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…

Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring

Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"

Peer reviewed

Direct link

Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…

Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests

Investigating Design Features of Descriptive Graphic Rating Scales.

Peer reviewed

Myford, Carol M. – Applied Measurement in Education, 2002

Studied the use of descriptive graphic rating scales by 11 raters to evaluate students' work, exploring different design features. Used a Rasch-model based rating scale analysis to determine that all the continuous scales could be considered to have at least five points, and that defined midpoints did not result in higher student separation…

Descriptors: Evaluators, Rating Scales, Reliability, Test Construction

"Applied Measurement in Education": Bridging the Gap between Measurement Research and Practice.

Peer reviewed

Wise, Steven L., Ed.; And Others – Applied Measurement in Education, 1988

An editorial policy statement on the nature of the new journal "Applied Measurement in Education" is provided. The journal's intent is to improve measurement practice and communication between academicians and educational measurement practitioners in the field. This journal will contain articles with clear implications for applied measurement…

Descriptors: Editorials, Educational Assessment, Educational Researchers, Evaluators

Implications of Item Response Theory for the Measurement Practitioner.

Peer reviewed

Loyd, Brenda H. – Applied Measurement in Education, 1988

The impact of item response theory (IRT) on the measurement practitioner is discussed, with a review of potential benefits. The complexity of IRT theory and procedures and the lack of robustness of IRT procedures to violation of assumptions must be recognized for the measurement practitioner to realize its advantages. (SLD)

Descriptors: Educational Researchers, Evaluation Methods, Evaluators, Latent Trait Theory

Something Old, Something New, Something Borrowed, a Lot to Do.

Peer reviewed

Berk, Ronald A. – Applied Measurement in Education, 1995

A brief summary of standard setting knowledge is presented, derived from about 20 methods that utilize a judgmental review process, the approach most relevant to the standard-setting strategies proposed in this special issue. Criteria for judging effectiveness and critiques of the methods discussed in the issue are offered. (SLD)

Descriptors: Criteria, Decision Making, Educational History, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2

Evaluators	27
Scoring	10
Standard Setting (Scoring)	10
Evaluation Methods	9
Performance Based Assessment	7
Standards	7
Decision Making	6
Essay Tests	6
Interrater Reliability	6
Accuracy	5
Educational Assessment	5
Test Construction	5
Comparative Analysis	4
Essays	4
Statistical Analysis	4
Teacher Evaluation	4
Computer Assisted Testing	3
Correlation	3
Elementary Secondary Education	3
Item Analysis	3
Licensing Examinations…	3
Measurement Techniques	3
Models	3
Predictor Variables	3
Rating Scales	3
More ▼

Glazer, Nancy	2
Jaeger, Richard M.	2
Plake, Barbara S.	2
Wolfe, Edward W.	2
Arslan, Burcu	1
Attali, Yigal	1
Bejar, Isaac I.	1
Bennett, Randy Elliot	1
Berk, Ronald A.	1
Bridgeman, Brent	1
Buzick, Heather	1
Chang, Lucy	1
Choi, Ikkyu	1
Clauser, Brian E.	1
Clyman, Stephen G.	1
Cohen, Allan	1
Duchnowski, Matthew P.	1
El-Bayoumi, Gigi	1
Escoffery, David S.	1
Ferrara, Steve	1
Finn, Bridgid	1
Flor, Michael	1
Frisbie, David A.	1
Hambleton, Ronald K.	1
More ▼