ERIC - Search Results

Publication Date

In 2025	6
Since 2024	15
Since 2021 (last 5 years)	43
Since 2016 (last 10 years)	93
Since 2006 (last 20 years)	151

Descriptor

Evaluators	234
Scoring	234
Interrater Reliability	72
Second Language Learning	60
English (Second Language)	51
Language Tests	49
Writing Evaluation	48
Comparative Analysis	43
Correlation	42
Essays	41
Evaluation Methods	41
Foreign Countries	41
Scores	41
Computer Assisted Testing	39
Performance Based Assessment	32
Accuracy	29
Rating Scales	28
Computer Software	27
Essay Tests	26
Student Evaluation	25
Test Construction	24
Elementary Secondary Education	23
Reliability	23
Training	23
Higher Education	22
More ▼

Publication Type

Reports - Research	169
Journal Articles	157
Speeches/Meeting Papers	38
Reports - Evaluative	35
Tests/Questionnaires	18
Reports - Descriptive	14
Dissertations/Theses -…	9
Information Analyses	7
Guides - Non-Classroom	3
Opinion Papers	2
ERIC Digests in Full Text	1
ERIC Publications	1
Numerical/Quantitative Data	1
Reference Materials -…	1
Reports - General	1
More ▼

Education Level

Higher Education	39
Postsecondary Education	34
Elementary Education	11
Elementary Secondary Education	10
Secondary Education	10
Early Childhood Education	7
Primary Education	6
Kindergarten	5
High Schools	3
Intermediate Grades	3
Adult Education	2
Grade 2	2
Grade 4	2
Grade 6	2
Middle Schools	2
Grade 1	1
Grade 3	1
Grade 5	1
Preschool Education	1
More ▼

Audience

Researchers	3
Practitioners	2
Teachers	2
Administrators	1
Community	1

Location

China	7
India	4
Japan	4
California	3
Iran	3
Europe	2
Florida	2
Nigeria	2
Oregon	2
South Korea	2
Turkey	2
United States	2
Alabama	1
Algeria	1
Australia	1
California (Los Angeles)	1
Colombia	1
Georgia	1
Germany	1
Hong Kong	1
Idaho	1
Illinois	1
Japan (Tokyo)	1
Kentucky	1
Maryland	1
More ▼

Laws, Policies, & Programs

Race to the Top

Assessments and Surveys

Test of English as a Foreign…	21
International English…	5
Graduate Record Examinations	4
National Assessment of…	4
Alabama High School…	2
Test of English for…	2
ACTFL Oral Proficiency…	1
Gates MacGinitie Reading Tests	1
General Educational…	1
New Jersey High School…	1
Praxis Series	1
Program for International…	1
Test of Gross Motor…	1
Torrance Tests of Creative…	1
Trends in International…	1
United States Medical…	1
edTPA (Teacher Performance…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 234 results Save | Export

Employing a Hierarchical Rater Models for Automated Scoring: Scope Review on the Application in Educational Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Akif Avcu – Malaysian Online Journal of Educational Technology, 2025

This scope-review presents the milestones of how Hierarchical Rater Models (HRMs) become operable to used in automated essay scoring (AES) to improve instructional evaluation. Although essay evaluations--a useful instrument for evaluating higher-order cognitive abilities--have always depended on human raters, concerns regarding rater bias,…

Descriptors: Automation, Scoring, Models, Educational Assessment

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

ChatGPT as an Automated Essay Scoring Tool in the Writing Classrooms: How It Compares with Human Scoring

Peer reviewed

Direct link

Ngoc My Bui; Jessie S. Barrot – Education and Information Technologies, 2025

With the generative artificial intelligence (AI) tool's remarkable capabilities in understanding and generating meaningful content, intriguing questions have been raised about its potential as an automated essay scoring (AES) system. One such tool is ChatGPT, which is capable of scoring any written work based on predefined criteria. However,…

Descriptors: Artificial Intelligence, Natural Language Processing, Technology Uses in Education, Automation

One Score, Two Components: Disentangling Appropriateness and Originality in PISA Creative Thinking Judgments Using Generalized Item Response Tree Models

Peer reviewed

Direct link

Nils Myszkowski; Martin Storme – Journal of Creative Behavior, 2025

In the PISA 2022 creative thinking test, students provide a response to a prompt, which is then coded by human raters as no credit, partial credit, or full credit. Like many large-scale educational testing frameworks, PISA uses the generalized partial credit model (GPCM) as a response model for these ordinal ratings. In this paper, we show that…

Descriptors: Creative Thinking, Creativity Tests, Scores, Prompting

Non-Expert Raters' Scoring Behavior and Cognition in Assessing Pragmatic Production in L2 Chinese

Peer reviewed

Direct link

Shuai Li; Xian Li; Yali Feng; Ting Wen – Educational Linguistics, 2023

This chapter reports on a study investigating non-expert raters' scoring behavior and cognitive processes involved in evaluating speech acts and pragmatic routines in L2 Chinese. Pragmatic production data were collected from 51 American learners of Chinese, who completed a 12-item oral Discourse Completion Test (DCT). The learners were divided…

Descriptors: Scoring, Cognitive Processes, Speech Acts, Pragmatics

Examining Inter-Rater Reliability of Evaluators Judging Teacher Performance: Proposing an Alternative to Cohen's Kappa. CEME Technical Report. CEMETR-2022-02

Download full text

Lambert, Richard G.; Holcomb, T. Scott; Bottoms, Bryndle – Center for Educational Measurement and Evaluation, 2022

The validity of the Kappa coefficient of chance-corrected agreement has been questioned when the prevalence of specific rating scale categories is low and agreement between raters is high. The researchers proposed the Lambda Coefficient of Rater-Mediated Agreement as an alternative to Kappa to address these concerns. Lambda corrects for chance…

Descriptors: Interrater Reliability, Evaluators, Rating Scales, Teacher Evaluation

Exploring Difficult-to-Score Essays with a Hyperbolic Cosine Accuracy Model and Coh-Metrix Indices

Peer reviewed

Direct link

Wang, Jue; Engelhard, George; Combs, Trenton – Journal of Experimental Education, 2023

Unfolding models are frequently used to develop scales for measuring attitudes. Recently, unfolding models have been applied to examine rater severity and accuracy within the context of rater-mediated assessments. One of the problems in applying unfolding models to rater-mediated assessments is that the substantive interpretations of the latent…

Descriptors: Writing Evaluation, Scoring, Accuracy, Computational Linguistics

The Impact of Setting Scoring Expectations on Rater Scoring Rates and Accuracy

Peer reviewed

Direct link

Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020

Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…

Descriptors: Scoring, Expectation, Accuracy, Time

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Peer reviewed

Direct link

Casabianca, Jodi M.; Donoghue, John R.; Shin, Hyo Jeong; Chao, Szu-Fu; Choi, Ikkyu – Journal of Educational Measurement, 2023

Using item-response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large-scale testing scenarios,…

Descriptors: Item Response Theory, Alternative Assessment, Evaluators, Research Problems

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A.; Ge, Yuan – Educational and Psychological Measurement, 2021

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for…

Descriptors: Evaluators, Scoring, Data Collection, Design

Evaluating ChatGPT as a Self-Learning Tool in Medical Biochemistry: A Performance Assessment in Undergraduate Medical University Examination

Peer reviewed

Direct link

Krishna Mohan Surapaneni; Anusha Rajajagadeesan; Lakshmi Goudhaman; Shalini Lakshmanan; Saranya Sundaramoorthi; Dineshkumar Ravi; Kalaiselvi Rajendiran; Porchelvan Swaminathan – Biochemistry and Molecular Biology Education, 2024

The emergence of ChatGPT as one of the most advanced chatbots and its ability to generate diverse data has given room for numerous discussions worldwide regarding its utility, particularly in advancing medical education and research. This study seeks to assess the performance of ChatGPT in medical biochemistry to evaluate its potential as an…

Descriptors: Biochemistry, Science Instruction, Artificial Intelligence, Teaching Methods

Effect of Immediate Elaborated Feedback on Rater Accuracy. Research Report. ETS RR-20-09

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal – ETS Research Report Series, 2020

Principles of skill acquisition dictate that raters should be provided with frequent feedback about their ratings. However, in current operational practice, raters rarely receive immediate feedback about their scores owing to the prohibitive effort required to generate such feedback. An approach for generating and administering feedback responses…

Descriptors: Feedback (Response), Evaluators, Accuracy, Scores

Understanding and Interpreting Human Scoring

Peer reviewed

Direct link

Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020

This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…

Descriptors: Scoring, Test Format, Responses, Predictor Variables

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Exploring an Alternative to Record Motor Competence Assessment: Interrater and Intrarater Audio-Video Reliability

Peer reviewed

Direct link

Cristina Menescardi; Aida Carballo-Fazanes; Núria Ortega-Benavent; Isaac Estevan – Journal of Motor Learning and Development, 2024

The Canadian Agility and Movement Skill Assessment (CAMSA) is a valid and reliable circuit-based test of motor competence which can be used to assess children's skills in a live or recorded performance and then coded. We aimed to analyze the intrarater reliability of the CAMSA scores (total, time, and skill score) and time measured, by comparing…

Descriptors: Interrater Reliability, Evaluators, Scoring, Psychomotor Skills

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 16

Language Testing	21
ETS Research Report Series	14
Applied Measurement in…	10
Educational and Psychological…	9
ProQuest LLC	9
Educational Measurement:…	8
Grantee Submission	5
Journal of Educational…	4
Language Assessment Quarterly	4
Language Testing in Asia	4
Advances in Health Sciences…	3
Assessing Writing	2
Assessment in Education:…	2
Education and Information…	2
Educational Assessment	2
International Journal of…	2
Journal of Applied Measurement	2
Journal of Educational and…	2
Journal of Experimental…	2
Journal of Motor Learning and…	2
Language Learning	2
Measurement:…	2
Studies in Educational…	2
AERA Online Paper Repository	1
Advances in Physiology…	1
More ▼

Wind, Stefanie A.	6
Wolfe, Edward W.	6
Attali, Yigal	4
Xi, Xiaoming	4
Bejar, Isaac I.	3
Bridgeman, Brent	3
Engelhard, George, Jr.	3
Linn, Robert L.	3
McNamara, Danielle S.	3
Myford, Carol M.	3
Allen, Laura K.	2
Arslan, Burcu	2
Baker, Eva L.	2
Barkaoui, Khaled	2
Casabianca, Jodi M.	2
Choi, Ikkyu	2
Crossley, Scott A.	2
Eckes, Thomas	2
Finn, Bridgid	2
Glazer, Nancy	2
Goldberg, Gail Lynn	2
Guo, Wenjing	2
Han, Chao	2
Jaeger, Richard M.	2
More ▼