Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 8 |
| Since 2017 (last 10 years) | 16 |
| Since 2007 (last 20 years) | 26 |
Descriptor
| Reliability | 29 |
| Second Language Learning | 29 |
| English (Second Language) | 24 |
| Scoring | 20 |
| Foreign Countries | 14 |
| Validity | 13 |
| Second Language Instruction | 11 |
| Evaluators | 10 |
| Scoring Rubrics | 10 |
| Language Tests | 9 |
| Scores | 9 |
| More ▼ | |
Source
Author
| Kantor, Robert | 2 |
| Lee, Yong-Won | 2 |
| Akihito Kamata | 1 |
| Aliaga, Pablo | 1 |
| Attali, Yigal | 1 |
| Baram-Tsabari, Ayelet | 1 |
| Barrueco, Sandra | 1 |
| Boscardin, Christy Kim | 1 |
| Burstein, Jill | 1 |
| Chen, Jin | 1 |
| Chen, Yuan-shan | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 23 |
| Reports - Research | 23 |
| Tests/Questionnaires | 6 |
| Reports - Evaluative | 3 |
| Dissertations/Theses -… | 2 |
| Books | 1 |
| Information Analyses | 1 |
| Speeches/Meeting Papers | 1 |
Education Level
| Higher Education | 10 |
| Postsecondary Education | 10 |
| Secondary Education | 5 |
| High Schools | 3 |
| Elementary Education | 2 |
| Early Childhood Education | 1 |
| Elementary Secondary Education | 1 |
| Grade 10 | 1 |
| Grade 11 | 1 |
| Grade 12 | 1 |
| Grade 2 | 1 |
| More ▼ | |
Audience
Location
| Japan | 3 |
| Australia | 2 |
| China | 2 |
| Taiwan | 2 |
| Turkey | 2 |
| Austria | 1 |
| California | 1 |
| Canada | 1 |
| Colombia | 1 |
| Hong Kong | 1 |
| Iran | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 4 |
| Graduate Management Admission… | 1 |
| Test of English for… | 1 |
What Works Clearinghouse Rating
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025
Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…
Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction
Yuichiro Yokouchi – Language Testing in Asia, 2025
The performance decision tree (PDT; Fulcher et al., 2011) is a rubric style that is applicable to performance assessment, with origins in Upshur and Turner's (1995) empirically derived binary-choice, boundary-definition (EBB) scale. It is easier for raters to assess performance by evaluating multiple binary-choice descriptors. Additionally,…
Descriptors: Scoring Rubrics, Second Language Learning, Second Language Instruction, Language Teachers
Sims, Maureen E.; Cox, Troy L.; Eckstein, Grant T.; Hartshorn, K. James; Wilcox, Matthew P.; Hart, Judson M. – Educational Measurement: Issues and Practice, 2020
The purpose of this study is to explore the reliability of a potentially more practical approach to direct writing assessment in the context of ESL writing. Traditional rubric rating (RR) is a common yet resource-intensive evaluation practice when performed reliably. This study compared the traditional rubric model of ESL writing assessment and…
Descriptors: Scoring Rubrics, Item Response Theory, Second Language Learning, English (Second Language)
Li, Wentao – Reading and Writing: An Interdisciplinary Journal, 2022
Scoring rubrics are known to be effective for assessing writing for both testing and classroom teaching purposes. How raters interpret the descriptors in a rubric can significantly impact the subsequent final score, and further, the descriptors may also color a rater's judgment of a student's writing quality. Little is known, however, about how…
Descriptors: Scoring Rubrics, Interrater Reliability, Writing Evaluation, Teaching Methods
O'Grady, Stefan; Taskesen, Özgür – Language Learning in Higher Education, 2022
An important aspect of language assessment development is to create tasks that engage the competencies required in the target situation. For this reason, English-medium university entrance tests increasingly feature integrated reading-into-writing tasks as a way of enhancing target domain representation. Despite increased use of this task type,…
Descriptors: Writing Evaluation, Scoring Rubrics, Rating Scales, English (Second Language)
Olson, Daniel J. – Language Testing, 2023
Measuring language dominance, broadly defined as the relative strength of each of a bilingual's two languages, remains a crucial methodological issue in bilingualism research. While various methods have been proposed, the Bilingual Language Profile (BLP) has been one of the most widely used tools for measuring language dominance. While previous…
Descriptors: Bilingualism, Language Dominance, Native Language, Second Language Learning
Stuart McLean; Paul Raine; Geoffrey Pinchbeck; Laura Huston; Young Ae Kim; Suzuka Nishiyama; Shotaro Ueno – Vocabulary Learning and Instruction, 2021
Vocableveltest.org is a testing platform on which users can create on- line self-marking meaning-recall (reading or listening) and form-recall (typing) tests that address a number of limitations of the existing vocabulary level tests and vocabulary size tests. A major limitation of many existing vocabulary tests is the written receptive…
Descriptors: Accuracy, Automation, Scoring, Writing (Composition)
Zhongdi Wu; Eric Larson; Makoto Sano; Doris Baker; Nathan Gage; Akihito Kamata – Grantee Submission, 2023
In this investigation we propose new machine learning methods for automated scoring models that predict the vocabulary acquisition in science and social studies of second grade English language learners, based upon free-form spoken responses. We evaluate performance on an existing dataset and use transfer learning from a large pre-trained language…
Descriptors: Prediction, Vocabulary Development, English (Second Language), Second Language Learning
Sari, Elif; Han, Turgay – Reading Matrix: An International Online Journal, 2021
Providing both effective feedback applications and reliable assessment practices are two central issues in ESL/EFL writing instruction contexts. Giving individual feedback is very difficult in crowded classes as it requires a great amount of time and effort for instructors. Moreover, instructors likely employ inconsistent assessment procedures,…
Descriptors: Automation, Writing Evaluation, Artificial Intelligence, Natural Language Processing
Schmidgall, Jonathan E. – ETS Research Report Series, 2017
This report briefly reviews the design and scoring procedure for the "TOEIC"® Speaking test and summarizes existing evidence about the consistency of TOEIC Speaking test scores. It then describes several analyses conducted using generalizability theory to provide additional information about the consistency of scores across different…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Speech Tests
Samir, Aynaz; Tabatabaee-Yazdi, Mona – International Journal of Language Testing, 2020
The present study aimed to examine and validate a rubric for translation quality assessment using Rasch analysis. To this end, the researchers interviewed 20 expert translation instructors to identify the factors they consider important for assessing the quality of students' translation. Based on the specific commonalities found throughout the…
Descriptors: Translation, Scoring Rubrics, Second Language Learning, Second Language Instruction
Mann, Aaron; de Bruin, Angela – International Journal of Bilingual Education and Bilingualism, 2022
Bilingualism is a multi-faceted experience and bilinguals differ in how they use their languages in daily life. Therefore, assessments of bilingualism that consider the role of (social) context are needed when describing bilinguals. In this study, we evaluated how (reliably) the Language and Social Background Questionnaire (LSBQ; Anderson et al.…
Descriptors: Bilingualism, Foreign Countries, Native Language, Second Language Learning
Zhang, Bo; Xiao, Yunnan; Luo, Juan – Language Testing in Asia, 2015
Previous studies comparing holistic scoring to analytic scoring of second language writing have given mixed results. Some of them suffer from methodological drawbacks, such as limited writing sample size, limited number of raters, and lack of direct comparison of the two methods. Based on 300 writing samples graded by 14 raters, this research…
Descriptors: Evaluators, Reliability, Scores, Holistic Approach
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
