ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	13
Since 2016 (last 10 years)	31
Since 2006 (last 20 years)	47

Descriptor

Evaluators	50
Interrater Reliability	50
Second Language Learning	50
English (Second Language)	42
Language Tests	25
Foreign Countries	24
Oral Language	20
Second Language Instruction	18
Language Proficiency	17
Scores	15
Scoring	15
Comparative Analysis	14
Writing Evaluation	13
Computer Assisted Testing	12
Rating Scales	12
Scoring Rubrics	11
Speech Communication	11
Correlation	10
Statistical Analysis	10
Essays	9
Pronunciation	9
Native Language	8
Native Speakers	8
College Students	7
Evaluation Criteria	7
More ▼

Publication Type

Journal Articles	47
Reports - Research	42
Tests/Questionnaires	11
Information Analyses	3
Dissertations/Theses -…	2
Reports - Evaluative	2
Guides - Non-Classroom	1
Reports - Descriptive	1

Education Level

Higher Education	17
Postsecondary Education	16
Secondary Education	5
Adult Education	2
High Schools	2
Early Childhood Education	1
Elementary Education	1
Grade 11	1
Grade 2	1
Primary Education	1

Audience

Practitioners

Location

Japan	5
Iran	4
China	3
Europe	3
Hong Kong	2
Canada	1
China (Beijing)	1
Germany	1
Illinois (Urbana)	1
India	1
Iran (Tehran)	1
Japan (Tokyo)	1
Ohio	1
South Korea	1
Switzerland	1
Turkey (Istanbul)	1
United Kingdom (Great Britain)	1
Vietnam	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	11
International English…	2
Flesch Kincaid Grade Level…	1
Modern Language Aptitude Test	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 50 results Save | Export

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Impact of Self-Construal on Rater Severity in Peer Assessments of Oral Presentations

Peer reviewed

Direct link

Tanaka, Mitsuko; Ross, Steven J. – Assessment in Education: Principles, Policy & Practice, 2023

Raters vary from each other in their severity and leniency in rating performance. This study examined the factors affecting rater severity in peer assessments of oral presentations in English as a Foreign Language (EFL), focusing on peer raters' self-construal and presentation abilities. Japanese university students enrolled in EFL classes…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Peer Evaluation

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

How Many Raters Can Be Enough: G Theory Applied to Assessment and Measurement of L2 Speech Perception

Peer reviewed
PDF on ERIC

Download full text

Kevin Hirschi; Okim Kang – Language Teaching Research Quarterly, 2023

This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on "G studies"--a method of breaking down measurement variance--and "D studies"--a predictive study of the impact on reliability when…

Descriptors: Evaluators, Generalization, Evaluation Methods, Speech Communication

Meta-Analysis of Inter-Rater Agreement and Discrepancy Between Human and Automated English Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Jiyeo Yun – English Teaching, 2023

Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…

Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring

Depth-Perception-Based Representation in Holistic Rating on ESL Essay Writing

Peer reviewed

Direct link

Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024

This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…

Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

Measurement Properties of a Standardized Elicited Imitation Test: An Integrative Data Analysis

Peer reviewed

Direct link

Isbell, Daniel R.; Son, Young-A – Studies in Second Language Acquisition, 2022

Elicited Imitation Tests (EITs) are commonly used in second language acquisition (SLA)/bilingualism research contexts to assess the general oral proficiency of study participants. While previous studies have provided valuable EIT construct-related validity evidence, some key gaps remain. This study uses an integrative data analysis to further…

Descriptors: Bilingualism, Imitation, Language Tests, Second Language Learning

Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Peer reviewed

Direct link

Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki – Studies in Second Language Acquisition, 2023

Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward…

Descriptors: Second Language Learning, Second Language Instruction, Interrater Reliability, Speech Communication

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

The Processes of Rating L2 Speaking Performance Using an Analytic Rating Scale -- A Qualitative Exploration

Peer reviewed
PDF on ERIC

Download full text

Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022

In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…

Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction

The Use of Semantic Similarity Tools in Automated Content Scoring of Fact-Based Essays Written by EFL Learners

Peer reviewed

Direct link

Wang, Qiao – Education and Information Technologies, 2022

This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring

Identifying Language Disorder in Bilingual Children Using Automatic Speech Recognition

Peer reviewed

Direct link

Albudoor, Nahar; Peña, Elizabeth D. – Journal of Speech, Language, and Hearing Research, 2022

Purpose: The differential diagnosis of developmental language disorder (DLD) in bilingual children represents a unique challenge due to their distributed language exposure and knowledge. The current evidence indicates that dual-language testing yields the most accurate classification of DLD among bilinguals, but there are limited personnel and…

Descriptors: Language Impairments, Bilingualism, Clinical Diagnosis, Language Tests

Writing Scale Effects on Raters: An Exploratory Study

Peer reviewed

Direct link

Jeong, Heejeong – Language Testing in Asia, 2019

In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Rater Dominance in Discussion as a Resolution Method

Peer reviewed
PDF on ERIC

Download full text

Ahmadi, Alireza – Taiwan Journal of TESOL, 2020

Rater subjectivity has long been an intriguing topic. The use of discussion as a resolution method is a practical way to reduce this subjectivity. However, the efficacy of discussion depends on whether different raters get equally engaged in it or one rater tends to dominate others. This study investigated whether and how rater dominance occurs in…

Descriptors: Evaluators, Interrater Reliability, Discussion, Discourse Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Language Testing	9
Language Assessment Quarterly	6
ETS Research Report Series	5
Studies in Second Language…	4
English Language Teaching	2
Language Testing in Asia	2
ProQuest LLC	2
Advances in Language and…	1
Assessment in Education:…	1
Education and Information…	1
Educational Research and…	1
English Teaching	1
Foreign Language Annals	1
JALT CALL Journal	1
Journal of Pan-Pacific…	1
Journal of Speech, Language,…	1
Language Education &…	1
Language Learning Journal	1
Language Teaching Research…	1
New Horizons in Education	1
Online Submission	1
Reading & Writing Quarterly	1
Reading and Writing: An…	1
SAGE Open	1
Taiwan Journal of TESOL	1
More ▼

Ahmadi, Alireza	2
Coniam, David	2
Saito, Kazuya	2
Ahmadi Shirazi, Masoumeh	1
Albudoor, Nahar	1
Barkhuizen, Gary	1
Beh-Afarin, Seyed Reza	1
Bejar, Isaac I.	1
Bogorevich, Valeriia	1
Breyer, F. Jay	1
Carey, Michael D.	1
Casabianca, Jodi M.	1
Chuang, Ping-Lin	1
Clevinger, Amanda	1
Crossley, Scott	1
Davis, Larry	1
Dunn, Peter K.	1
Elder, Catherine	1
Han, Qie	1
Heidari, Jamshid	1
Hemat, Ramin	1
Hijikata-Someya, Yuko	1
Hsu, Lung-hsun	1
Huang, Becky H.	1
Huang, Lan-fen	1
More ▼