ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	8
Since 2017 (last 10 years)	16
Since 2007 (last 20 years)	26

Descriptor

Reliability	29
Second Language Learning	29
English (Second Language)	24
Scoring	20
Foreign Countries	14
Validity	13
Second Language Instruction	11
Evaluators	10
Scoring Rubrics	10
Language Tests	9
Scores	9
Writing Evaluation	9
Comparative Analysis	7
Language Teachers	7
Correlation	6
Language Proficiency	6
Writing Tests	6
Essays	5
Student Evaluation	5
Writing Skills	5
Accuracy	4
Automation	4
College Faculty	4
College Students	4
Computer Assisted Testing	4
More ▼

Publication Type

Journal Articles	23
Reports - Research	23
Tests/Questionnaires	6
Reports - Evaluative	3
Dissertations/Theses -…	2
Books	1
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Higher Education	10
Postsecondary Education	10
Secondary Education	5
High Schools	3
Elementary Education	2
Early Childhood Education	1
Elementary Secondary Education	1
Grade 10	1
Grade 11	1
Grade 12	1
Grade 2	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Preschool Education	1
Primary Education	1
More ▼

Audience

Location

Japan	3
Australia	2
China	2
Taiwan	2
Turkey	2
Austria	1
California	1
Canada	1
Colombia	1
Hong Kong	1
Iran	1
Mexico	1
United Kingdom	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	4
Graduate Management Admission…	1
Test of English for…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

Utilizing Large Language Models for EFL Essay Grading: An Examination of Reliability and Validity in Rubric-Based Assessments

Peer reviewed

Direct link

Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025

This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Revisiting the Effectiveness of a Performance Decision Tree-Style Rubric Compared to a Grid-Style Rubric

Peer reviewed

Direct link

Yuichiro Yokouchi – Language Testing in Asia, 2025

The performance decision tree (PDT; Fulcher et al., 2011) is a rubric style that is applicable to performance assessment, with origins in Upshur and Turner's (1995) empirically derived binary-choice, boundary-definition (EBB) scale. It is easier for raters to assess performance by evaluating multiple binary-choice descriptors. Additionally,…

Descriptors: Scoring Rubrics, Second Language Learning, Second Language Instruction, Language Teachers

Rubric Rating with MFRM versus Randomly Distributed Comparative Judgment: A Comparison of Two Approaches to Second-Language Writing Assessment

Peer reviewed

Direct link

Sims, Maureen E.; Cox, Troy L.; Eckstein, Grant T.; Hartshorn, K. James; Wilcox, Matthew P.; Hart, Judson M. – Educational Measurement: Issues and Practice, 2020

The purpose of this study is to explore the reliability of a potentially more practical approach to direct writing assessment in the context of ESL writing. Traditional rubric rating (RR) is a common yet resource-intensive evaluation practice when performed reliably. This study compared the traditional rubric model of ESL writing assessment and…

Descriptors: Scoring Rubrics, Item Response Theory, Second Language Learning, English (Second Language)

Scoring Rubric Reliability and Internal Validity in Rater-Mediated EFL Writing Assessment: Insights from Many-Facet Rasch Measurement

Peer reviewed

Direct link

Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022

Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…

Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods

Developing a Rating Scale for Integrated Assessment of Reading-into-Writing Skills

Peer reviewed

Direct link

O'Grady, Stefan; Taskesen, Özgür – Language Learning in Higher Education, 2022

An important aspect of language assessment development is to create tasks that engage the competencies required in the target situation. For this reason, English-medium university entrance tests increasingly feature integrated reading-into-writing tasks as a way of enhancing target domain representation. Despite increased use of this task type,…

Descriptors: Writing Evaluation, Scoring Rubrics, Rating Scales, English (Second Language)

Measuring Bilingual Language Dominance: An Examination of the Reliability of the Bilingual Language Profile

Peer reviewed

Direct link

Olson, Daniel J. – Language Testing, 2023

Measuring language dominance, broadly defined as the relative strength of each of a bilingual's two languages, remains a crucial methodological issue in bilingualism research. While various methods have been proposed, the Bilingual Language Profile (BLP) has been one of the most widely used tools for measuring language dominance. While previous…

Descriptors: Bilingualism, Language Dominance, Native Language, Second Language Learning

The Internal Consistency and Accuracy of Automatically Scored Written Receptive Meaning-Recall Data: A Preliminary Study

Peer reviewed
PDF on ERIC

Download full text

Stuart McLean; Paul Raine; Geoffrey Pinchbeck; Laura Huston; Young Ae Kim; Suzuka Nishiyama; Shotaro Ueno – Vocabulary Learning and Instruction, 2021

Vocableveltest.org is a testing platform on which users can create on- line self-marking meaning-recall (reading or listening) and form-recall (typing) tests that address a number of limitations of the existing vocabulary level tests and vocabulary size tests. A major limitation of many existing vocabulary tests is the written receptive…

Descriptors: Accuracy, Automation, Scoring, Writing (Composition)

Towards Scalable Vocabulary Acquisition Assessment with BERT

Peer reviewed
PDF on ERIC

Download full text

Zhongdi Wu; Eric Larson; Makoto Sano; Doris Baker; Nathan Gage; Akihito Kamata – Grantee Submission, 2023

In this investigation we propose new machine learning methods for automated scoring models that predict the vocabulary acquisition in science and social studies of second grade English language learners, based upon free-form spoken responses. We evaluate performance on an existing dataset and use transfer learning from a large pre-trained language…

Descriptors: Prediction, Vocabulary Development, English (Second Language), Second Language Learning

Automated L2 Writing Performance Assessment: A Literature Review

Peer reviewed

Direct link

Sari, Elif; Han, Turgay – Reading Matrix: An International Online Journal, 2021

Providing both effective feedback applications and reliable assessment practices are two central issues in ESL/EFL writing instruction contexts. Giving individual feedback is very difficult in crowded classes as it requires a great amount of time and effort for instructors. Moreover, instructors likely employ inconsistent assessment procedures,…

Descriptors: Automation, Writing Evaluation, Artificial Intelligence, Natural Language Processing

The Consistency of "TOEIC"® Speaking Scores across Ratings and Tasks. Research Report. ETS RR-17-46

Peer reviewed
PDF on ERIC

Download full text

Schmidgall, Jonathan E. – ETS Research Report Series, 2017

This report briefly reviews the design and scoring procedure for the "TOEIC"® Speaking test and summarizes existing evidence about the consistency of TOEIC Speaking test scores. It then describes several analyses conducted using generalizability theory to provide additional information about the consistency of scores across different…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Speech Tests

Translation Quality Assessment Rubric: A Rasch Model-Based Validation

Peer reviewed
PDF on ERIC

Download full text

Samir, Aynaz; Tabatabaee-Yazdi, Mona – International Journal of Language Testing, 2020

The present study aimed to examine and validate a rubric for translation quality assessment using Rasch analysis. To this end, the researchers interviewed 20 expert translation instructors to identify the factors they consider important for assessing the quality of students' translation. Based on the specific commonalities found throughout the…

Descriptors: Translation, Scoring Rubrics, Second Language Learning, Second Language Instruction

Bilingual Language Use Is Context Dependent: Using the Language and Social Background Questionnaire to Assess Language Experiences and Test-Rest Reliability

Peer reviewed

Direct link

Mann, Aaron; de Bruin, Angela – International Journal of Bilingual Education and Bilingualism, 2022

Bilingualism is a multi-faceted experience and bilinguals differ in how they use their languages in daily life. Therefore, assessments of bilingualism that consider the role of (social) context are needed when describing bilinguals. In this study, we evaluated how (reliably) the Language and Social Background Questionnaire (LSBQ; Anderson et al.…

Descriptors: Bilingualism, Foreign Countries, Native Language, Second Language Learning

Rater Reliability and Score Discrepancy under Holistic and Analytic Scoring of Second Language Writing

Peer reviewed

Direct link

Zhang, Bo; Xiao, Yunnan; Luo, Juan – Language Testing in Asia, 2015

Previous studies comparing holistic scoring to analytic scoring of second language writing have given mixed results. Some of them suffer from methodological drawbacks, such as limited writing sample size, limited number of raters, and lack of direct comparison of the two methods. Based on 300 writing samples graded by 14 raters, this research…

Descriptors: Evaluators, Reliability, Scores, Holistic Approach

Measuring the Impact of Rater Negotiation in Writing Performance Assessment

Peer reviewed

Direct link

Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017

Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

ETS Research Report Series	3
Language Testing	3
Language Testing in Asia	2
ProQuest LLC	2
Applied Linguistics	1
Babel	1
British Journal of…	1
Brookes Publishing Company	1
CALICO Journal	1
College Entrance Examination…	1
Educational Measurement:…	1
Educational Psychology	1
English Language Teaching	1
Grantee Submission	1
International Journal of…	1
International Journal of…	1
Journal of Language and…	1
Language Assessment Quarterly	1
Language Learning in Higher…	1
National Center for Research…	1
Reading Matrix: An…	1
Reading and Writing: An…	1
Vocabulary Learning and…	1
More ▼

Kantor, Robert	2
Lee, Yong-Won	2
Akihito Kamata	1
Aliaga, Pablo	1
Attali, Yigal	1
Baram-Tsabari, Ayelet	1
Barrueco, Sandra	1
Boscardin, Christy Kim	1
Burstein, Jill	1
Chen, Jin	1
Chen, Yuan-shan	1
Cox, Troy L.	1
Davis, Lawrence Edward	1
Doris Baker	1
Eckstein, Grant T.	1
Eric Larson	1
Fatih Yavuz	1
Fercsey, Andrea	1
Gamze Yavas Çelik	1
Gentile, Claudia	1
Geoffrey Pinchbeck	1
Han, Turgay	1
Hart, Judson M.	1
Hartshorn, K. James	1
Janssen, Gerriet	1
More ▼