ERIC - Search Results

Publication Date

In 2026	0
Since 2025	2
Since 2022 (last 5 years)	6
Since 2017 (last 10 years)	14
Since 2007 (last 20 years)	33

Descriptor

Computer Assisted Testing	34
Evaluators	34
Second Language Learning	34
English (Second Language)	31
Language Tests	27
Scoring	20
Foreign Countries	15
Language Proficiency	15
Scores	14
Correlation	13
Comparative Analysis	12
Interrater Reliability	12
Oral Language	12
Computer Software	11
Essays	11
Second Language Instruction	10
Rating Scales	9
Writing Evaluation	9
Speech Communication	8
Statistical Analysis	8
Cues	7
Accuracy	6
Evaluation Criteria	6
Native Language	6
Language Teachers	5
More ▼

Source

ETS Research Report Series	9
Language Testing	7
Language Assessment Quarterly	5
Assessment in Education:…	2
English Language Teaching	2
ProQuest LLC	2
CALICO Journal	1
Educational Research and…	1
English Teaching	1
Innovation in Language…	1
International Journal of…	1
International Journal of…	1
TESL Canada Journal	1
More ▼

Publication Type

Journal Articles	32
Reports - Research	29
Tests/Questionnaires	8
Dissertations/Theses -…	2
Information Analyses	1
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Higher Education	12
Postsecondary Education	11
Secondary Education	3
High Schools	2
Grade 11	1

Audience

Location

China	5
California	1
China (Beijing)	1
Europe	1
Germany	1
Hong Kong	1
Iran	1
Japan	1
Switzerland	1
Turkey	1
United States	1
Vietnam	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	17
International English…	3
ACTFL Oral Proficiency…	1
Foreign Language Classroom…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 34 results Save | Export

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Artificial Intelligence as an Automated Essay Scoring Tool: A Focus on ChatGPT

Peer reviewed
PDF on ERIC

Download full text

Ahmet Can Uyar; Dilek Büyükahiska – International Journal of Assessment Tools in Education, 2025

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners' essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion…

Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, Teaching Methods

Automated Speech Scoring of Dialogue Response by Japanese Learners of English as a Foreign Language

Peer reviewed

Direct link

Yuko Hayashi; Yusuke Kondo; Yutaka Ishii – Innovation in Language Learning and Teaching, 2024

Purpose: This study builds a new system for automatically assessing learners' speech elicited from an oral discourse completion task (DCT), and evaluates the prediction capability of the system with a view to better understanding factors deemed influential in predicting speaking proficiency scores and the pedagogical implications of the system.…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Japanese

Temporal Fluency and Floor/Ceiling Scoring of Intermediate and Advanced Speech on the ACTFL Spanish Oral Proficiency Interview--Computer

Peer reviewed

Direct link

Cox, Troy L.; Brown, Alan V.; Thompson, Gregory L. – Language Testing, 2023

The rating of proficiency tests that use the Inter-agency Roundtable (ILR) and American Council on the Teaching of Foreign Languages (ACTFL) guidelines claims that each major level is based on hierarchal linguistic functions that require mastery of multidimensional traits in such a way that each level subsumes the levels beneath it. These…

Descriptors: Oral Language, Language Fluency, Scoring, Cues

Assessing L2 English Speaking Using Automated Scoring Technology: Examining Automarker Reliability

Peer reviewed

Direct link

Xu, Jing; Jones, Edmund; Laxton, Victoria; Galaczi, Evelina – Assessment in Education: Principles, Policy & Practice, 2021

Recent advances in machine learning have made automated scoring of learner speech widespread, and yet validation research that provides support for applying automated scoring technology to assessment is still in its infancy. Both the educational measurement and language assessment communities have called for greater transparency in describing…

Descriptors: Second Language Learning, Second Language Instruction, English (Second Language), Computer Software

A Comparative Judgment Approach to Assessing Chinese Sign Language Interpreting

Peer reviewed

Direct link

Han, Chao; Xiao, Xiaoyan – Language Testing, 2022

The quality of sign language interpreting (SLI) is a gripping construct among practitioners, educators and researchers, calling for reliable and valid assessment. There has been a diverse array of methods in the extant literature to measure SLI quality, ranging from traditional error analysis to recent rubric scoring. In this study, we want to…

Descriptors: Comparative Analysis, Sign Language, Deaf Interpreting, Evaluators

Integrated Listening/Speaking Skill Assessment: The Role of Ambiguity Tolerance, Cognitive/Metacognitive Strategy Use, and Foreign Language Anxiety

Peer reviewed
PDF on ERIC

Download full text

Karim Sadeghi; Neda Bakhshi – International Journal of Language Testing, 2025

Assessing language skills in an integrative form has drawn the attention of assessment experts in recent years. While some research data exists on integrative listening/reading-to-write assessment, there is comparatively little research literature on listening-to-speak integrated assessment. Also, little attention has been devoted to the role of…

Descriptors: Language Tests, Second Language Learning, English (Second Language), Computer Assisted Testing

Automated Essay Scoring at Scale: A Case Study in Switzerland and Germany. TOEFL® Research Report. RR-86. ETS RR-19-12

Peer reviewed
PDF on ERIC

Download full text

Rupp, André A.; Casabianca, Jodi M.; Krüger, Maleika; Keller, Stefan; Köller, Olaf – ETS Research Report Series, 2019

In this research report, we describe the design and empirical findings for a large-scale study of essay writing ability with approximately 2,500 high school students in Germany and Switzerland on the basis of 2 tasks with 2 associated prompts, each from a standardized writing assessment whose scoring involved both human and automated components.…

Descriptors: Automation, Foreign Countries, English (Second Language), Language Tests

Using Spoken Language Technology for Generating Feedback to Prepare for the TOEFL iBT® Test: A User Perception Study

Peer reviewed

Direct link

Gu, Lin; Davis, Larry; Tao, Jacob; Zechner, Klaus – Assessment in Education: Principles, Policy & Practice, 2021

Recent technology advancements have increased the prospects for automated spoken language technology to provide feedback on speaking performance. In this study we examined user perceptions of using an automated feedback system for preparing for the TOEFL iBT® test. Test takers and language teachers evaluated three types of machine-generated…

Descriptors: Audio Equipment, Test Preparation, Feedback (Response), Scores

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Variations in Rating Scale Functioning in Assessing Speech Act Production in L2 Chinese

Peer reviewed

Direct link

Li, Shuai; Taguchi, Naoko; Xiao, Feng – Language Assessment Quarterly, 2019

Adopting Linacre's guidelines for evaluating rating scale effectiveness, we examined whether and how a six-point rating scale functioned differently across raters, speech acts, and second language (L2) proficiency levels. We developed a 12-item Computerized Oral Discourse Completion Task (CODCT) for assessing the production of requests, refusals,…

Descriptors: Speech Acts, Rating Scales, Guidelines, Evaluators

"How Scripted Is This Going to Be?" Raters' Views of Authenticity in Speaking-Performance Tests

Peer reviewed

Direct link

Burton, John Dylan – Language Assessment Quarterly, 2020

An assumption underlying speaking tests is that scores reflect the ability to produce online, non-rehearsed speech. Speech produced in testing situations may, however, be less spontaneous if extensive test preparation takes place, resulting in memorized or rehearsed responses. If raters detect these patterns, they may conceptualize speech as…

Descriptors: Language Tests, Oral Language, Scores, Speech Communication

Language Testing in China: Past and Future

Peer reviewed
PDF on ERIC

Download full text

Li, Xuelian – English Language Teaching, 2019

Based on the articles written by mainland Chinese scholars published in the most influential Chinese and international journals, the present article analyzed the language testing research, compared the tendencies of seven categories between 2000-2009 and 2010-2019, and put forward future research directions by referring to international hot…

Descriptors: Language Tests, Testing, Educational History, Futures (of Society)

The Effect of Training and Rater Differences on Oral Proficiency Assessment

Peer reviewed

Direct link

Kang, Okim; Rubin, Don; Kermad, Alyssa – Language Testing, 2019

As a result of the fact that judgments of non-native speech are closely tied to social biases, oral proficiency ratings are susceptible to error because of rater background and social attitudes. In the present study we seek first to estimate the variance attributable to rater background and attitudinal variables on novice raters' assessments of L2…

Descriptors: Evaluators, Second Language Learning, Language Tests, English (Second Language)

The Influence of Training and Experience on Rater Performance in Scoring Spoken Language

Peer reviewed

Direct link

Davis, Larry – Language Testing, 2016

Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…

Descriptors: Evaluators, Oral Language, Scores, Language Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3

Bridgeman, Brent	2
Davis, Larry	2
Kunnan, Antony John	2
Mollaun, Pamela	2
Xi, Xiaoming	2
Zechner, Klaus	2
Ahmet Can Uyar	1
Alegre, Analucia	1
Attali, Yigal	1
Bejar, Isaac I.	1
Blanchard, Daniel	1
Breyer, F. Jay	1
Brown, Alan V.	1
Burton, John Dylan	1
Cahill, Aoife	1
Casabianca, Jodi M.	1
Chodorow, Martin	1
Clevinger, Amanda	1
Coniam, David	1
Cox, Troy L.	1
Crossley, Scott	1
Davey, Tim	1
Davis, Lawrence Edward	1
Dilek Büyükahiska	1
Eisenberg, Ann	1
More ▼