ERIC - Search Results

Publication Date

In 2025	2
Since 2024	3
Since 2021 (last 5 years)	9
Since 2016 (last 10 years)	20
Since 2006 (last 20 years)	39

Descriptor

Evaluation Criteria	41
Evaluators	41
Second Language Learning	41
English (Second Language)	33
Language Tests	24
Foreign Countries	16
Second Language Instruction	16
Writing Evaluation	13
Comparative Analysis	12
Oral Language	12
Scores	12
Language Proficiency	10
Language Teachers	10
Scoring	10
Statistical Analysis	10
Correlation	9
Decision Making	9
Pronunciation	9
Essays	8
Rating Scales	8
Speech Communication	8
Interrater Reliability	7
Computer Assisted Testing	6
Discourse Analysis	6
Language Fluency	6
More ▼

Publication Type

Journal Articles	38
Reports - Research	37
Tests/Questionnaires	3
Dissertations/Theses -…	2
Information Analyses	2
Reports - Evaluative	1

Education Level

Higher Education	12
Postsecondary Education	10
Secondary Education	2
Adult Education	1
Grade 12	1
High Schools	1

Audience

Location

China	4
Australia	3
Iran	2
Turkey	2
Argentina	1
Illinois (Urbana)	1
Mexico	1
Pakistan	1
Thailand	1
Turkey (Istanbul)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	7
International English…	4

What Works Clearinghouse Rating

Showing 1 to 15 of 41 results Save | Export

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

Assessing Second-Language Academic Writing: AI vs. Human Raters

Peer reviewed
PDF on ERIC

Download full text

Vasfiye Geçkin; Ebru Kiziltas; Çagatay Çinar – Journal of Educational Technology and Online Learning, 2023

The quality of writing in a second language (L2) is one of the indicators of the level of proficiency for many college students to be eligible for departmental studies. Although certain software programs, such as Intelligent Essay Assessor or IntelliMetric, have been introduced to evaluate second-language writing quality, an overall assessment of…

Descriptors: Writing Evaluation, Second Language Learning, Second Language Instruction, Language Proficiency

Detecting Differential Rater Severity in a High-Stakes EFL Classroom Writing Assessment: A Many-Facets Rasch Measurement Approach

Peer reviewed
PDF on ERIC

Download full text

Apichat Khamboonruang – PASAA: Journal of Language Teaching and Learning in Thailand, 2023

Differential rater severity (DRS), one prevalent case of differential rater functioning (aka rater bias or rater interaction) effects, manifests itself when a rater assigns unusually severe or lenient ratings, threatening the validity and fairness of rater-mediated assessment. Building on a many-facets Rasch measurement (MFRM) approach, this study…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring Rubrics

Effects of Benchmarking and Peer-Assessment on French Learners' Self-Assessments of Accentedness, Comprehensibility, and Fluency

Peer reviewed

Direct link

Tsunemoto, Aki; Trofimovich, Pavel; Blanchet, Josée; Bertrand, Juliane; Kennedy, Sara – Foreign Language Annals, 2022

This study examined the effect of benchmarking and peer-assessment activities on second language (L2) French learners' self-assessments of accentedness, comprehensibility, and fluency. The learners, who included 25 L2 French students enrolled in a 15-week university-level French course, recorded two oral presentations at the beginning and the end…

Descriptors: Benchmarking, French, Self Evaluation (Individuals), Second Language Learning

Raters' Perceptions of Rating Scales Criteria and Its Effect on the Process and Outcome of Their Rating

Peer reviewed

Direct link

Heidari, Nasim; Ghanbari, Nasim; Abbasi, Abbas – Language Testing in Asia, 2022

It is widely believed that human rating performance is influenced by an array of different factors. Among these, rater-related variables such as experience, language background, perceptions, and attitudes have been mentioned. One of the important rater-related factors is the way the raters interact with the rating scales. In particular, how raters…

Descriptors: Evaluators, Rating Scales, Language Tests, English (Second Language)

Comparing Rating Modes: Analysing Live, Audio, and Video Ratings of IELTS Speaking Test Performances

Peer reviewed

Direct link

Nakatsuhara, Fumiyo; Inoue, Chihiro; Taylor, Lynda – Language Assessment Quarterly, 2021

This mixed methods study compared IELTS examiners' scores when assessing spoken performances under live and two 'non-live' testing conditions using audio and video recordings. Six IELTS examiners assessed 36 test-takers' performances under the live, audio, and video rating conditions. Scores in the three rating modes were calibrated using the…

Descriptors: Video Technology, Audio Equipment, English (Second Language), Language Tests

The Effects of Test Type, Pronunciation, and Proficiency Level on EFL Learners' Speaking Exam Scores

Peer reviewed
PDF on ERIC

Download full text

Kilinc, Kardelen; Yildirim, Ozgur – World Journal of Education, 2020

The present study aims to reveal the effects of test type, pronunciation and proficiency levels of the students on speaking test scores. A total of 147 Turkish EFL students consisting of 38 beginner, 36 elementary, 37 pre-intermediate and 36 intermediate levels participated in the study. Presentation as planned, and paired speaking test as…

Descriptors: Test Format, Pronunciation, Scores, Language Proficiency

Developing Tools for Learning Oriented Assessment of Interactional Competence: Bridging Theory and Practice

Peer reviewed

Direct link

May, Lyn; Nakatsuhara, Fumiyo; Lam, Daniel; Galaczi, Evelina – Language Testing, 2020

In this paper we report on a project in which we developed tools to support the classroom assessment of learners' interactional competence (IC) and provided learning oriented feedback in the context of preparation for a high-stakes face-to-face speaking test. Six trained examiners provided stimulated verbal reports (n = 72) on 12 paired…

Descriptors: Intercultural Communication, High Stakes Tests, Feedback (Response), Evaluators

Rater Cognition in L2 Speaking Assessment: A Review of the Literature

Peer reviewed
PDF on ERIC

Download full text

Han, Qie – Working Papers in TESOL & Applied Linguistics, 2016

This literature review attempts to survey representative studies within the context of L2 speaking assessment that have contributed to the conceptualization of rater cognition. Two types of studies are looked at: 1) studies that examine "how" raters differ (and sometimes agree) in their cognitive processes and rating behaviors, in terms…

Descriptors: Second Language Learning, Student Evaluation, Evaluators, Speech Tests

A Generalizability Theory Study of Optimal Measurement Design for a Summative Assessment of English/Chinese Consecutive Interpreting

Peer reviewed

Direct link

Han, Chao – Language Testing, 2019

Summative assessment of interpretation is widely conducted in interpreting courses/programs to inform high-stakes decision making, such as the selection, certification, and conferral of academic degrees. Yet there has been very limited empirical research to investigate the score dependability of summative interpretation assessment. The present…

Descriptors: Generalization, Decision Making, Summative Evaluation, Evaluators

"How Scripted Is This Going to Be?" Raters' Views of Authenticity in Speaking-Performance Tests

Peer reviewed

Direct link

Burton, John Dylan – Language Assessment Quarterly, 2020

An assumption underlying speaking tests is that scores reflect the ability to produce online, non-rehearsed speech. Speech produced in testing situations may, however, be less spontaneous if extensive test preparation takes place, resulting in memorized or rehearsed responses. If raters detect these patterns, they may conceptualize speech as…

Descriptors: Language Tests, Oral Language, Scores, Speech Communication

Assessing Individual and Group Oral Exams: Scoring Criteria and Rater Interaction

Peer reviewed
PDF on ERIC

Download full text

Yalçin-Çolakoglu, Özlem; Selçuk, Merve – Advances in Language and Literary Studies, 2019

Criterion referenced tests of second language speaking performance are administered in different institutions using different procedures. The present study reports raters' practices of second language speaking tests, in particular the correspondence between test-takers' grades when assessed individually and in groups. Data derived from…

Descriptors: Oral Language, Language Tests, Test Validity, Inferences

Previous Page | Next Page »

Pages: 1 | 2 | 3

Language Testing	10
Language Assessment Quarterly	6
ETS Research Report Series	4
Language Testing in Asia	3
ProQuest LLC	2
TESL Canada Journal	2
Advances in Language and…	1
Asia-Pacific Education…	1
British Journal of…	1
Foreign Language Annals	1
Hispania	1
Iranian Journal of Language…	1
Journal of Educational…	1
PASAA: Journal of Language…	1
Reading & Writing Quarterly	1
Reading and Writing: An…	1
Studies in Higher Education	1
Working Papers in TESOL &…	1
World Journal of Education	1
More ▼

Alemi, Minoo	3
Eckes, Thomas	2
Nakatsuhara, Fumiyo	2
Pill, John	2
Tajeddin, Zia	2
Abbasi, Abbas	1
Apichat Khamboonruang	1
Attali, Yigal	1
Barkaoui, Khaled	1
Bertrand, Juliane	1
Blanchet, Josée	1
Bridgeman, Brent	1
Briggs, Sarah L.	1
Brown, Annie	1
Burton, John Dylan	1
Cai, Hongwen	1
Davey, Tim	1
Ebru Kiziltas	1
Fatih Yavuz	1
Fernandez, Miguel	1
Galaczi, Evelina	1
Gamze Yavas Çelik	1
Ghanbari, Nasim	1
Guntly, Erin	1
Han, Chao	1
More ▼