ERIC - Search Results

Publication Date

In 2026	0
Since 2025	5
Since 2022 (last 5 years)	15
Since 2017 (last 10 years)	33
Since 2007 (last 20 years)	52

Descriptor

Interrater Reliability	58
Second Language Learning	58
English (Second Language)	44
Scoring	36
Language Tests	34
Foreign Countries	30
Evaluators	25
Second Language Instruction	23
Scoring Rubrics	22
Scores	18
Writing Evaluation	16
Computer Assisted Testing	13
Correlation	13
Comparative Analysis	12
Language Proficiency	12
Oral Language	12
Essays	11
College Students	9
Evaluation Criteria	9
Speech Communication	9
Computer Software	8
Undergraduate Students	8
Rating Scales	7
Student Evaluation	7
Writing Tests	7
More ▼

Publication Type

Journal Articles	53
Reports - Research	48
Tests/Questionnaires	13
Reports - Descriptive	3
Reports - Evaluative	3
Dissertations/Theses -…	2
Information Analyses	2
Guides - Non-Classroom	1
Speeches/Meeting Papers	1

Education Level

Higher Education	22
Postsecondary Education	21
Secondary Education	4
Elementary Education	3
Adult Education	2
Early Childhood Education	2
High Schools	2
Primary Education	2
Elementary Secondary Education	1
Grade 3	1
Grade 5	1
Intermediate Grades	1
Kindergarten	1
Middle Schools	1
Preschool Education	1
More ▼

Audience

Practitioners	1
Researchers	1

Location

China	7
Turkey	5
Japan	4
India	3
Iran	3
South Korea	3
Germany	2
Hong Kong	2
Arizona	1
Colombia	1
Europe	1
Georgia	1
Iran (Tehran)	1
Japan (Tokyo)	1
Jordan	1
Malaysia	1
Mexico	1
New York (New York)	1
Saudi Arabia	1
Switzerland	1
Taiwan	1
Thailand	1
Turkey (Istanbul)	1
United States	1
Vietnam	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	13
International English…	5
Peabody Picture Vocabulary…	2
Expressive One Word Picture…	1
Flesch Kincaid Grade Level…	1
Graduate Record Examinations	1
Mean Length of Utterance	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 58 results Save | Export

Comparison of Traditional Machine Learning and Neural Network Approaches for Automated Scoring of Second Language English Essays

Peer reviewed

Direct link

Erik Voss – Language Testing, 2025

An increasing number of language testing companies are developing and deploying deep learning-based automated essay scoring systems (AES) to replace traditional approaches that rely on handcrafted feature extraction. However, there is hesitation to accept neural network approaches to automated essay scoring because the features are automatically…

Descriptors: Artificial Intelligence, Automation, Scoring, English (Second Language)

Artificial Intelligence in International English Language Testing System Writing Assessments: A Comparative Study of Human Ratings and DeepAI

Peer reviewed
PDF on ERIC

Download full text

Somayeh Fathali; Fatemeh Mohajeri – Technology in Language Teaching & Learning, 2025

The International English Language Testing System (IELTS) is a high-stakes exam where Writing Task 2 significantly influences the overall scores, requiring reliable evaluation. While trained human raters perform this task, concerns about subjectivity and inconsistency have led to growing interest in artificial intelligence (AI)-based assessment…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Artificial Intelligence

ChatGPT4o as an AI Peer Assessor in EFL Speaking Classrooms: Examining Scoring Reliability and Feedback Effectiveness

Peer reviewed

Direct link

Junfei Li; Jinyan Huang; Thomas Sheeran – SAGE Open, 2025

This study investigated the role of ChatGPT4o as an AI peer assessor in English-as-a-foreign-language (EFL) speaking classrooms, with a focus on its scoring reliability and the effectiveness of its feedback. The research involved 40 first-year English major students from two parallel classes at a Chinese university. Twenty from one class served as…

Descriptors: Artificial Intelligence, Technology Uses in Education, Peer Evaluation, English (Second Language)

Examining AI-Based Accuracy Assessment in L2 Learners' Writing

Peer reviewed

Direct link

On-Soon Lee – Journal of Pan-Pacific Association of Applied Linguistics, 2024

Despite the increasing interest in using AI tools as assistant agents in instructional settings, the effectiveness of ChatGPT, the generative pretrained AI, for evaluating the accuracy of second language (L2) writing has been largely unexplored in formative assessment. Therefore, the current study aims to examine how ChatGPT, as an evaluator,…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

The Rashomon Effect: Which Features of a Speaker's Talk Do Listeners Notice?

Peer reviewed

Direct link

Seedhouse, Paul; Satar, Müge – Classroom Discourse, 2023

The same L2 speaking performance may be analysed and evaluated in very different ways by different teachers or raters. We present a new, technology-assisted research design which opens up to investigation the trajectories of convergence and divergence between raters. We tracked and recorded what different raters noticed when, whilst grading a…

Descriptors: Language Tests, English (Second Language), Second Language Learning, Oral Language

Examining Rater Reliability When Using an Analytical Rubric for Oral Presentation Assessments

Peer reviewed
PDF on ERIC

Download full text

Sasithorn Limgomolvilas; Patsawut Sukserm – LEARN Journal: Language Education and Acquisition Research Network, 2025

The assessment of English speaking in EFL environments can be inherently subjective and influenced by various factors beyond linguistic ability, including choice of assessment criteria, and even the rubric type. In classroom assessment, the type of rubric recommended for English speaking tasks is the analytical rubric. Driven by three aims, this…

Descriptors: Oral Language, Speech Communication, English (Second Language), Second Language Learning

How Many Raters Can Be Enough: G Theory Applied to Assessment and Measurement of L2 Speech Perception

Peer reviewed
PDF on ERIC

Download full text

Kevin Hirschi; Okim Kang – Language Teaching Research Quarterly, 2023

This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on "G studies"--a method of breaking down measurement variance--and "D studies"--a predictive study of the impact on reliability when…

Descriptors: Evaluators, Generalization, Evaluation Methods, Speech Communication

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Examining Consistency among Different Rubrics for Assessing Writing

Peer reviewed

Direct link

Shabani, Enayat A.; Panahi, Jaleh – Language Testing in Asia, 2020

The literature on using scoring rubrics in writing assessment denotes the significance of rubrics as practical and useful means to assess the quality of writing tasks. This study tries to investigate the agreement among rubrics endorsed and used for assessing the essay writing tasks by the internationally recognized tests of English language…

Descriptors: Writing Evaluation, Scoring Rubrics, Scores, Interrater Reliability

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Revolutionising Essay Evaluation: A Cutting-Edge Rubric for AI-Assisted Writing

Peer reviewed

Direct link

Hassan Saleh Mahdi; Ahmed Alkhateeb – International Journal of Computer-Assisted Language Learning and Teaching, 2025

This study aims to develop a robust rubric for evaluating artificial intelligence (AI)--assisted essay writing in English as a Foreign Language (EFL) contexts. Employing a modified Delphi technique, we conducted a comprehensive literature review and administered Likert scale questionnaires. This process yielded nine key evaluation criteria,…

Descriptors: Scoring Rubrics, Essays, Writing Evaluation, Artificial Intelligence

A Rasch Analysis of Rater Behaviour in Speaking Assessment

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat – International Online Journal of Education and Teaching, 2020

The assessment of speaking skills in foreign language testing has always had some pros (testing learners' speaking skills doubles the validity of any language test) and cons (many testrelevant/irrelevant variables interfere) since it is a multi-dimensional process. In the meantime, exploring grader behaviours while scoring learners' speaking…

Descriptors: Item Response Theory, Interrater Reliability, Speech Skills, Second Language Learning

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

Use of Automated Scoring in Spoken Language Assessments for Test Takers with Speech Impairments. Research Report. ETS RR-17-42

Peer reviewed
PDF on ERIC

Download full text

Loukina, Anastassia; Buzick, Heather – ETS Research Report Series, 2017

This study is an evaluation of the performance of automated speech scoring for speakers with documented or suspected speech impairments. Given that the use of automated scoring of open-ended spoken responses is relatively nascent and there is little research to date that includes test takers with disabilities, this small exploratory study focuses…

Descriptors: Automation, Scoring, Language Tests, Speech Tests

Applying Generalizability Theory in Language Testing: Comparing Nested and Crossed Scoring Designs in the Assessment of Speaking Skills

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021

Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…

Descriptors: Language Tests, Scoring, Speech Communication, State Universities

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

ETS Research Report Series	7
Language Testing	5
Language Assessment Quarterly	3
SAGE Open	3
English Language Teaching	2
Journal of Pan-Pacific…	2
Modern Language Journal	2
ProQuest LLC	2
Working Papers in TESOL &…	2
Advances in Language and…	1
Assessing Writing	1
Classroom Discourse	1
ELT Journal	1
Education and Information…	1
Educational and Psychological…	1
English Teaching	1
IEEE Transactions on Learning…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
International Online Journal…	1
JALT CALL Journal	1
Journal of Education and…	1
Journal of Speech, Language,…	1
More ▼

Davis, Larry	2
Polat, Murat	2
Ahmadi Shirazi, Masoumeh	1
Ahmadi, Alireza	1
Ahmed Alkhateeb	1
Alt, Mary	1
Barkhuizen, Gary	1
Bejar, Isaac I.	1
Beltrán, Jorge	1
Bogorevich, Valeriia	1
Breyer, F. Jay	1
Buzick, Heather	1
Carey, Michael D.	1
Carlson, Sybil B.	1
Casabianca, Jodi M.	1
Chan, Stephanie W. Y.	1
Cheung, Wai Ming	1
Coniam, David	1
Dunn, Peter K.	1
Edwards, Alison L.	1
Elder, Catherine	1
Eng, Lin Siew	1
Erik Voss	1
Fatemeh Mohajeri	1
More ▼