ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	28

Descriptor

Comparative Analysis	41
Interrater Reliability	41
Scoring	41
Correlation	12
Evaluators	11
Foreign Countries	11
English (Second Language)	10
Second Language Learning	9
Writing Evaluation	9
Essay Tests	8
Language Tests	8
Scores	8
Student Evaluation	8
Computer Assisted Testing	7
Evaluation Methods	7
Validity	7
Automation	6
Computer Software	6
Elementary School Students	6
Test Items	6
College Students	5
Essays	5
Second Language Instruction	5
Statistical Analysis	5
Writing Tests	5
More ▼

Publication Type

Journal Articles	27
Reports - Research	26
Reports - Evaluative	9
Speeches/Meeting Papers	7
Tests/Questionnaires	5
Dissertations/Theses -…	3
Information Analyses	2
Numerical/Quantitative Data	1
Reports - Descriptive	1

Education Level

Higher Education	6
Elementary Education	4
Postsecondary Education	4
High Schools	3
Secondary Education	3
Elementary Secondary Education	2
Grade 1	2
Grade 11	2
Grade 2	2
Adult Education	1
Early Childhood Education	1
Grade 10	1
Grade 12	1
Grade 3	1
Grade 4	1
Kindergarten	1
Middle Schools	1
Preschool Education	1
More ▼

Audience

Location

Australia	2
China	2
Arizona	1
Colorado	1
Florida	1
Georgia	1
Hong Kong	1
Iran	1
Netherlands	1
Nevada	1
North Carolina	1
Pennsylvania	1
Singapore	1
Taiwan	1
Tennessee	1
Turkey	1
United States	1
Vermont	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	3
National Assessment of…	2
Early Childhood Environment…	1
Expressive One Word Picture…	1
Mean Length of Utterance	1
Neale Analysis of Reading…	1
Peabody Picture Vocabulary…	1
Praxis Series	1
Wechsler Individual…	1
Woodcock Johnson Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 41 results Save | Export

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

A Comparative Analysis of the "Early Childhood Environment Rating Scale--Revised" and "Early Childhood Environment Rating Scale, Third Edition"

Peer reviewed
PDF on ERIC

Download full text

Direct link

Neitzel, Jennifer; Early, Diane; Sideris, John; LaForrett, Doré; Abel, Michael B.; Soli, Margaret; Davidson, Dawn L.; Haboush-Deloye, Amanda; Hestenes, Linda L.; Jenson, Denise; Johnson, Cindy; Kalas, Jennifer; Mamrak, Angela; Masterson, Marie L.; Mims, Sharon U.; Oya, Patti; Philson, Bobbi; Showalter, Megan; Warner-Richter, Mallory; Kortright Wood, Jill – Journal of Early Childhood Research, 2019

The Early Childhood Environment Rating Scales, including the "Early Childhood Environment Rating Scale--Revised" (Harms et al., 2005) and the "Early Childhood Environment Rating Scale, Third Edition" (Harms et al., 2015) are the most widely used observational assessments in early childhood learning environments. The most recent…

Descriptors: Rating Scales, Early Childhood Education, Educational Quality, Scoring

Applying Generalizability Theory in Language Testing: Comparing Nested and Crossed Scoring Designs in the Assessment of Speaking Skills

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021

Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…

Descriptors: Language Tests, Scoring, Speech Communication, State Universities

Monitoring the Performance of Human and Automated Scores for Spoken Responses

Peer reviewed

Direct link

Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…

Descriptors: Automation, Scoring, Speech Tests, Language Tests

Students' Use of Formalisations for Improved Logical Reasoning

Peer reviewed

Direct link

Bronkhorst, Hugo; Roorda, Gerrit; Suhre, Cor; Goedhart, Martin – Research in Mathematics Education, 2022

Logical reasoning as part of critical thinking is becoming more and more important to prepare students for their future life in society, work, and study. This article presents the results of a quasi-experimental study with a pre-test-post-test control group design focusing on the effective use of formalisations to support logical reasoning. The…

Descriptors: Mathematics Instruction, Teaching Methods, Logical Thinking, Critical Thinking

Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

Peer reviewed
PDF on ERIC

Download full text

Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca – Practical Assessment, Research & Evaluation, 2018

Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…

Descriptors: Interrater Reliability, Measurement, Observation, Educational Research

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Exploration of New Complexity Metrics for Curriculum-Based Measures of Writing

Peer reviewed
PDF on ERIC

Download full text

Direct link

Wagner, Kyle; Smith, Alex; Allen, Abigail; McMaster, Kristen; Poch, Apryl; Lembke, Erica – Assessment for Effective Intervention, 2019

Researchers and practitioners have questioned whether scoring procedures used with curriculum-based measures of writing (CBM-W) capture growth in complexity of writing. We analyzed data from six independent samples to examine two potential scoring metrics for picture word CBM-W (PW), a sentence-level CBM task. Correct word sequences per response…

Descriptors: Curriculum Based Assessment, Writing Evaluation, Comparative Analysis, Scoring

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Rater Cognition in L2 Speaking Assessment: A Review of the Literature

Peer reviewed
PDF on ERIC

Download full text

Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016

This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…

Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests

Evaluation of "e-rater"® for the "Praxis I"®Writing Test. Research Report. ETS RR-15-03

Peer reviewed
PDF on ERIC

Download full text

Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M. – ETS Research Report Series, 2015

Automated scoring models were trained and evaluated for the essay task in the "Praxis I"® writing test. Prompt-specific and generic "e-rater"® scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the…

Descriptors: Writing Tests, Licensing Examinations (Professions), Teacher Competency Testing, Scoring

Automated Scoring of Teachers' Open-Ended Responses to Video Prompts: Bringing the Classroom-Video-Analysis Assessment to Scale

Peer reviewed
PDF on ERIC

Download full text

Direct link

Nicole B. Kersting; Bruce L. Sherin; James W. Stigler – Educational and Psychological Measurement, 2014

In this study, we explored the potential for machine scoring of short written responses to the Classroom-Video-Analysis (CVA) assessment, which is designed to measure teachers' usable mathematics teaching knowledge. We created naïve Bayes classifiers for CVA scales assessing three different topic areas and compared computer-generated scores to…

Descriptors: Scoring, Automation, Video Technology, Teacher Evaluation

Previous Page | Next Page »

Pages: 1 | 2 | 3

ProQuest LLC	3
Applied Measurement in…	2
ETS Research Report Series	2
Action in Teacher Education	1
Advances in Physiology…	1
Assessment for Effective…	1
Assessment in Education:…	1
Australian Journal of…	1
Education Digest: Essential…	1
Educational Research	1
Educational and Psychological…	1
English Language Teaching	1
English Teaching	1
Hispania	1
International Journal of…	1
JALT CALL Journal	1
Journal of Applied Testing…	1
Journal of Early Childhood…	1
Journal of Educational…	1
Journal of Speech, Language,…	1
Language Testing	1
Measurement:…	1
Practical Assessment,…	1
ReCALL	1
Research in Mathematics…	1
More ▼

Lunz, Mary E.	2
O'Neill, Thomas R.	2
Abel, Michael B.	1
Alkahtani, Saif F.	1
Allen, Abigail	1
Alt, Mary	1
Amanda Huee-Ping Wong	1
Barkaoui, Khaled	1
Bhola, Dennison S.	1
Bolaños, Daniel	1
Boyer, Michelle	1
Breland, Hunter M.	1
Breyer, F. Jay	1
Bronkhorst, Hugo	1
Bruce L. Sherin	1
Buckendahl, Chad W.	1
Burmester, Kristen O'Rourke	1
Cole, Ron A.	1
Coniam, David	1
Crews, William E., Jr.	1
Davidson, Dawn L.	1
De Ayala, R. J.	1
Early, Diane	1
Ferroli, Lou	1
More ▼