ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	20

Descriptor

Comparative Analysis	22
Scoring	18
Second Language Learning	17
Language Tests	16
English (Second Language)	13
Foreign Countries	11
Scores	10
Language Proficiency	6
Grammar	5
Second Language Instruction	5
Statistical Analysis	5
Vocabulary Development	5
College Students	4
Computer Assisted Testing	4
Evaluators	4
Foreign Students	4
Student Placement	4
Undergraduate Students	4
Correlation	3
Decision Making	3
Item Response Theory	3
Oral Language	3
Reading Comprehension	3
Receptive Language	3
Regression (Statistics)	3
More ▼

Source

Language Testing

Publication Type

Journal Articles	22
Reports - Research	16
Reports - Evaluative	6
Information Analyses	1

Education Level

Higher Education	10
Postsecondary Education	6
Secondary Education	3
Elementary Education	1

Audience

Location

China	3
Germany	2
United Kingdom	2
Austria	1
France	1
Iran	1
Japan	1
Sweden	1
Turkey	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

Comparative Judgement for Evaluating Young Learners' EFL Writing Performances: Reliability and Teacher Perceptions of Holistic and Dimension-Based Judgements

Peer reviewed

Direct link

Rebecca Sickinger; Tineke Brunfaut; John Pill – Language Testing, 2025

Comparative Judgement (CJ) is an evaluation method, typically conducted online, whereby a rank order is constructed, and scores calculated, from judges' pairwise comparisons of performances. CJ has been researched in various educational contexts, though only rarely in English as a Foreign Language (EFL) writing settings, and is generally agreed to…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

Assessing the Speaking Proficiency of L2 Chinese Learners: Review of the Hanyu Shuiping Kouyu Kaoshi

Peer reviewed

Direct link

Li, Albert W. – Language Testing, 2023

The Hanyu Shuiping Kaoshi (HSK) is a multi-level, multi-purpose Chinese proficiency test developed by the Center for Language Education and Cooperation (previously the Office of Chinese Language Council International and, henceforth, referred to by its colloquial name "Hanban"). It assesses reading, writing, and listening skills of…

Descriptors: Language Tests, Language Proficiency, Chinese, Second Language Learning

Linking Scores from Two Written Receptive English Academic Vocabulary Tests--The VLT-Ac and the AVT

Peer reviewed

Direct link

Warnby, Marcus; Malmström, Hans; Hansen, Kajsa Yang – Language Testing, 2023

The academic section of the Vocabulary Levels Test (VLT-Ac) and the Academic Vocabulary Test (AVT) both assess meaning-recognition knowledge of written receptive academic vocabulary, deemed central for engagement in academic activities. Depending on the purpose and context of the testing, either of the tests can be appropriate, but for research…

Descriptors: Foreign Countries, Scores, Written Language, Receptive Language

Do Experience and Text Quality Matter for Raters' Decision-Making Behaviors?

Peer reviewed

Direct link

Sahan, Özgür; Razi, Salim – Language Testing, 2020

This study examines the decision-making behaviors of raters with varying levels of experience while assessing EFL essays of distinct qualities. The data were collected from 28 raters with varying levels of rating experience and working at the English language departments of different universities in Turkey. Using a 10-point analytic rubric, each…

Descriptors: Decision Making, Essays, Writing Evaluation, Evaluators

Monitoring the Performance of Human and Automated Scores for Spoken Responses

Peer reviewed

Direct link

Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…

Descriptors: Automation, Scoring, Speech Tests, Language Tests

Proficiency at the Lexis-Grammar Interface: Comparing Oral versus Written French Exam Tasks

Peer reviewed

Direct link

Vandeweerd, Nathan; Housen, Alex; Paquot, Magali – Language Testing, 2023

This study investigates whether re-thinking the separation of lexis and grammar in language testing could lead to more valid inferences about proficiency across modes. As argued by Römer, typical scoring rubrics ignore important information about proficiency encoded at the lexis-grammar interface, in particular how the co-selection of lexical and…

Descriptors: French, Language Tests, Grammar, Second Language Learning

A Comparative Judgment Approach to Assessing Chinese Sign Language Interpreting

Peer reviewed

Direct link

Han, Chao; Xiao, Xiaoyan – Language Testing, 2022

The quality of sign language interpreting (SLI) is a gripping construct among practitioners, educators and researchers, calling for reliable and valid assessment. There has been a diverse array of methods in the extant literature to measure SLI quality, ranging from traditional error analysis to recent rubric scoring. In this study, we want to…

Descriptors: Comparative Analysis, Sign Language, Deaf Interpreting, Evaluators

Setting Cut Scores on an EFL Placement Test Using the Prototype Group Method: A Receiver Operating Characteristic (ROC) Analysis

Peer reviewed

Direct link

Eckes, Thomas – Language Testing, 2017

This paper presents an approach to standard setting that combines the prototype group method (PGM; Eckes, 2012) with a receiver operating characteristic (ROC) analysis. The combined PGM-ROC approach is applied to setting cut scores on a placement test of English as a foreign language (EFL). To implement the PGM, experts first named learners whom…

Descriptors: English (Second Language), Language Tests, Cutting Scores, Standard Setting (Scoring)

Comparability of Students' Writing Performance on TOEFL iBT and in Required University Writing Courses

Peer reviewed

Direct link

Llosa, Lorena; Malone, Margaret E. – Language Testing, 2019

Investigating the comparability of students' performance on TOEFL writing tasks and actual academic writing tasks is essential to provide backing for the extrapolation inference in the TOEFL validity argument (Chapelle, Enright, & Jamieson, 2008). This study compared 103 international non-native-English-speaking undergraduate students'…

Descriptors: Computer Assisted Testing, Language Tests, English (Second Language), Second Language Learning

Elicited Imitation as a Measure of Second Language Proficiency: A Narrative Review and Meta-Analysis

Peer reviewed

Direct link

Yan, Xun; Maeda, Yukiko; Lv, Jing; Ginther, April – Language Testing, 2016

Elicited imitation (EI) has been widely used to examine second language (L2) proficiency and development and was an especially popular method in the 1970s and early 1980s. However, as the field embraced more communicative approaches to both instruction and assessment, the use of EI diminished, and the construct-related validity of EI scores as a…

Descriptors: Second Language Learning, Language Proficiency, Meta Analysis, Effect Size

Evaluating Different Standard-Setting Methods in an ESL Placement Testing Context

Peer reviewed

Direct link

Shin, Sun-Young; Lidster, Ryan – Language Testing, 2017

In language programs, it is crucial to place incoming students into appropriate levels to ensure that course curriculum and materials are well targeted to their learning needs. Deciding how and where to set cutscores on placement tests is thus of central importance to programs, but previous studies in educational measurement disagree as to which…

Descriptors: Language Tests, English (Second Language), Standard Setting (Scoring), Student Placement

Comparing C-Tests and Yes/No Vocabulary Size Tests as Predictors of Receptive Language Skills

Peer reviewed

Direct link

Harsch, Claudia; Hartig, Johannes – Language Testing, 2016

Placement and screening tests serve important functions, not only with regard to placing learners at appropriate levels of language courses but also with a view to maximizing the effectiveness of administering test batteries. We examined two widely reported formats suitable for these purposes, the discrete decontextualized Yes/No vocabulary test…

Descriptors: Comparative Analysis, Secondary School Students, Vocabulary Development, Student Placement

A Comparison of Two Scoring Methods for an Automated Speech Scoring System

Peer reviewed

Direct link

Xi, Xiaoming; Higgins, Derrick; Zechner, Klaus; Williamson, David – Language Testing, 2012

This paper compares two alternative scoring methods--multiple regression and classification trees--for an automated speech scoring system used in a practice environment. The two methods were evaluated on two criteria: construct representation and empirical performance in predicting human scores. The empirical performance of the two scoring models…

Descriptors: Scoring, Classification, Weighted Scores, Comparative Analysis

Previous Page | Next Page »

Pages: 1 | 2

Schmitt, Norbert	2
Zechner, Klaus	2
August, Diane	1
Brown, Anne	1
Carlo, Maria	1
Eckes, Thomas	1
Esmat Babaii	1
Farshad Effatpanah	1
Garras, John	1
Gierl, Mark J.	1
Ginther, April	1
Goodwin, Amanda P.	1
Han, Chao	1
Hansen, Kajsa Yang	1
Harrington, Michael	1
Harsch, Claudia	1
Hartig, Johannes	1
Higgins, Derrick	1
Housen, Alex	1
Huggins, A. Corinne	1
John Pill	1
Kenyon, Dorry	1
Lee, James F.	1
Li, Albert W.	1
Lidster, Ryan	1
More ▼