ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	19
Since 2006 (last 20 years)	33

Descriptor

Correlation	43
Evaluators	43
Interrater Reliability	43
Comparative Analysis	12
Scores	12
Scoring	12
Writing Evaluation	12
Statistical Analysis	11
Essays	10
Foreign Countries	10
Second Language Learning	10
Evaluation Methods	9
Scoring Rubrics	9
English (Second Language)	8
Rating Scales	8
Computational Linguistics	7
Computer Assisted Testing	7
Computer Software	6
Language Tests	6
Undergraduate Students	6
Accuracy	5
Decision Making	4
Elementary School Students	4
Grading	4
Item Response Theory	4
More ▼

Publication Type

Journal Articles	38
Reports - Research	29
Reports - Evaluative	12
Tests/Questionnaires	6
Collected Works - Serials	1
Dissertations/Theses -…	1
Information Analyses	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Higher Education	9
Postsecondary Education	7
Grade 6	3
Elementary Education	2
Grade 7	2
Secondary Education	2
Elementary Secondary Education	1
Grade 1	1
Grade 11	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 8	1
High Schools	1
More ▼

Audience

Practitioners	1
Researchers	1

Location

California	3
Hong Kong	2
United Kingdom	2
China	1
Finland	1
Japan	1
Michigan	1
Ohio	1
Singapore	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing 1 to 15 of 43 results Save | Export

Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments

Peer reviewed
PDF on ERIC

Download full text

Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025

As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…

Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy

The Whole Is More than the Sum of Its Parts -- Assessing Writing Using the Consensual Assessment Technique

Peer reviewed

Direct link

Zahn, Daniela; Canton, Ursula; Boyd, Victoria; Hamilton, Laura; Mamo, Josianne; McKay, Jane; Proudfoot, Linda; Telfer, Dickson; Williams, Kim; Wilson, Colin – Studies in Higher Education, 2021

Evaluating the impact of Academic Literacies teaching (Lea and Street [1998. "Student Writing in Higher Education: An Academic Literacies Approach." "Studies in Higher Education" 23 (2): 157-72. doi:10.1080/03075079812331380364]) is difficult, as it involves gauging whether writers: (1) gain better understanding of what…

Descriptors: Writing Evaluation, Evaluation Methods, Undergraduate Students, Foreign Countries

Rater Connections and the Detection of Bias in Performance Assessment

Peer reviewed

Direct link

Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022

In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…

Descriptors: Evaluators, Bias, Identification, Performance Based Assessment

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

Perceptual and Acoustic Assessment of Strain Using Synthetically Modified Voice Samples

Peer reviewed

Direct link

Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020

Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…

Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

Inter-Rater Agreement for the Milestones and Barriers Assessments of the Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP)

Peer reviewed

Direct link

Montallana, Khrystle L.; Gard, Brendan M.; Lotfizadeh, Amin D.; Poling, Alan – Journal of Autism and Developmental Disorders, 2019

We determined inter-rater agreement for the VB-MAPP, an instrument sometimes used in planning educational goals and evaluating intervention effects for young people with autism. A pair of raters independently rated each of 32 children diagnosed with autism. Intraclass correlation coefficients for the total Milestones and Barrier scores were 0.876…

Descriptors: Barriers, Interrater Reliability, Autism, Educational Objectives

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

The Use of Semantic Similarity Tools in Automated Content Scoring of Fact-Based Essays Written by EFL Learners

Peer reviewed

Direct link

Wang, Qiao – Education and Information Technologies, 2022

This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Using Subjective and Objective Measures to Predict Level of Reading Fluency at the End of First Grade

Peer reviewed

Direct link

Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow – Reading Psychology, 2018

This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…

Descriptors: Reading Fluency, Prediction, Grade 1, Elementary School Students

Item Response Models for Local Dependence among Multiple Ratings

Peer reviewed

Direct link

Wang, Wen-Chung; Su, Chi-Ming; Qiu, Xue-Lan – Journal of Educational Measurement, 2014

Ratings given to the same item response may have a stronger correlation than those given to different item responses, especially when raters interact with one another before giving ratings. The rater bundle model was developed to account for such local dependence by forming multiple ratings given to an item response as a bundle and assigning…

Descriptors: Item Response Theory, Interrater Reliability, Models, Correlation

The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

Peer reviewed

Direct link

Davis, Larry – Language Testing, 2016

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

Descriptors: Evaluators, Oral Language, Scores, Language Tests

Functional Adequacy in L2 Writing: Towards a New Rating Scale

Peer reviewed

Direct link

Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017

The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…

Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse

Previous Page | Next Page »

Pages: 1 | 2 | 3

Educational and Psychological…	3
Language Testing	3
Applied Measurement in…	2
ETS Research Report Series	2
Education and Information…	2
Advances in Language and…	1
Advances in Physiology…	1
American Journal on Mental…	1
Applied Psychological…	1
Canadian Modern Language…	1
Educational Research and…	1
Educational Sciences: Theory…	1
English Language Teaching	1
English Teaching	1
International Journal of…	1
International Journal of…	1
Journal of Attention Disorders	1
Journal of Autism and…	1
Journal of Baltic Science…	1
Journal of Early Adolescence	1
Journal of Educational…	1
Journal of Speech, Language,…	1
Journal of Teacher Education	1
Measurement:…	1
National Center for Research…	1
More ▼

Coniam, David	3
Strong, Michael	2
Wind, Stefanie A.	2
Accomazzo, Sarah	1
Alliger, George M.	1
Amanda Huee-Ping Wong	1
Ari, Gokhan	1
Berger, Cynthia M.	1
Beyreli, Latif	1
Boyd, Victoria	1
Breyer, F. Jay	1
Canton, Ursula	1
Chaudhary, Banshi D.	1
Crossley, Scott A.	1
Cádiz, Manuel Díaz	1
Davis, Larry	1
Duchnowski, Matthew P.	1
Engelhard, George, Jr.	1
Escoffery, David S.	1
Fagot, Robert F.	1
Foltz, Peter	1
Gard, Brendan M.	1
Gargani, John	1
Graham, Matthew	1
More ▼