ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	13
Since 2017 (last 10 years)	42
Since 2007 (last 20 years)	65

Descriptor

Comparative Analysis	115
Language Tests	115
English (Second Language)	69
Test Reliability	64
Second Language Learning	61
Foreign Countries	53
Language Proficiency	47
Test Validity	40
Second Language Instruction	39
Interrater Reliability	31
Scores	26
Reliability	24
College Students	22
Correlation	21
Test Construction	21
Scoring	19
Oral Language	18
Test Items	18
Testing	16
Test Format	15
Computer Assisted Testing	14
Evaluators	14
Interviews	14
Cloze Procedure	12
Higher Education	12
More ▼

Publication Type

Reports - Research	87
Journal Articles	77
Speeches/Meeting Papers	18
Reports - Evaluative	13
Tests/Questionnaires	13
Reports - Descriptive	6
Information Analyses	4
Books	2
Collected Works - General	2
Guides - Non-Classroom	2
Book/Product Reviews	1
Collected Works - Serials	1
Dissertations/Theses -…	1
Dissertations/Theses -…	1
Non-Print Media	1
Opinion Papers	1
Reference Materials - General	1
More ▼

Education Level

Higher Education	27
Postsecondary Education	24
Elementary Education	8
Secondary Education	8
High Schools	5
Grade 6	2
Intermediate Grades	2
Adult Education	1
Grade 10	1
Grade 11	1
Grade 12	1
Grade 7	1
Grade 8	1
Grade 9	1
Junior High Schools	1
Kindergarten	1
Middle Schools	1
More ▼

Audience

Practitioners	4
Teachers	4
Administrators	1
Researchers	1

Location

Iran	10
China	5
Turkey	5
Japan	3
Australia	2
Israel	2
Sweden	2
Taiwan	2
Vietnam	2
Belgium	1
Cyprus	1
Denmark	1
Europe	1
Germany	1
Hong Kong	1
Hungary	1
Indonesia	1
New Zealand	1
North Carolina	1
Pakistan	1
Saudi Arabia	1
Thailand	1
Thailand (Bangkok)	1
United Kingdom	1
United Kingdom (Great Britain)	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	10
International English…	3
Peabody Picture Vocabulary…	3
ACTFL Oral Proficiency…	2
SAT (College Admission Test)	2
Autism Diagnostic Observation…	1
Clinical Evaluation of…	1
English Proficiency Test	1
Graduate Management Admission…	1
Graduate Record Examinations	1
Kaufman Assessment Battery…	1
Michigan Test of English…	1
Reynell Developmental…	1
Test of Language Development	1
Wechsler Adult Intelligence…	1
Wechsler Intelligence Scale…	1
Woodcock Johnson Tests of…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 115 results Save | Export

A Comparison of Yen's Q3 Coefficient and Rasch Testlet Modeling for Identifying Local Item Dependence: Evidence from Two Vocabulary Matching Tests

Peer reviewed

Direct link

Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025

This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…

Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis

Examining the Effect of Item Difficulty and Rater Leniency on Iranian Test Takers' Performance on WDCT and DSAT: A Comparative Study

Peer reviewed
PDF on ERIC

Download full text

Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025

The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…

Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Estimating the Impact of Local Item Dependency in a Test of Second Language Reading Comprehension

Peer reviewed
PDF on ERIC

Download full text

Tim Stoeckel; Liang Ye Tan; Hung Tan Ha; Nam Thi Phuong Ho; Tomoko Ishii; Young Ae Kim; Chunmei Huang; Stuart McLean – Vocabulary Learning and Instruction, 2024

Local item dependency (LID) occurs when test-takers' responses to one test item are affected by their responses to another. It can be problematic if it causes inflated reliability estimates or distorted person and item measures. The cued-recall reading comprehension test in Hu and Nation's (2000) well-known and influential coverage--comprehension…

Descriptors: Reading Comprehension, English (Second Language), Second Language Instruction, Second Language Learning

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

The Effects of Multimodal Teaching on English Vocabulary Knowledge of Thai Primary School Students

Peer reviewed
PDF on ERIC

Download full text

Kasikarn Bansong; Somkiet Poopatwiboon; Apisak Sukying – Journal of Education and Learning, 2023

It is increasingly prevalent in digital learning and teaching strategies for discerning a global perspective on creating the student learning experience. Multimodality is an emergent phenomenon that may influence how digital learning is designed, especially during the COVID-19 pandemic in which immersive learning environments, such as a virtual…

Descriptors: Elementary School Students, English (Second Language), Second Language Learning, Second Language Instruction

Applying Generalizability Theory in Language Testing: Comparing Nested and Crossed Scoring Designs in the Assessment of Speaking Skills

Peer reviewed
PDF on ERIC

Download full text

Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021

Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…

Descriptors: Language Tests, Scoring, Speech Communication, State Universities

Monitoring the Performance of Human and Automated Scores for Spoken Responses

Peer reviewed

Direct link

Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018

As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…

Descriptors: Automation, Scoring, Speech Tests, Language Tests

Investigating the Impact of Rater Training on Rater Errors in the Process of Assessing Writing Skill

Peer reviewed
PDF on ERIC

Download full text

Sata, Mehmet; Karakaya, Ismail – International Journal of Assessment Tools in Education, 2022

In the process of measuring and assessing high-level cognitive skills, interference of rater errors in measurements brings about a constant concern and low objectivity. The main purpose of this study was to investigate the impact of rater training on rater errors in the process of assessing individual performance. The study was conducted with a…

Descriptors: Evaluators, Training, Comparative Analysis, Academic Language

Assessment by Comparative Judgement: An Application to Secondary Statistics and English in New Zealand

Peer reviewed

Direct link

Marshall, Neil; Shaw, Kirsten; Hunter, Jodie; Jones, Ian – New Zealand Journal of Educational Studies, 2020

There is growing interest in using comparative judgement to assess student work as an alternative to traditional marking. Comparative judgement requires no rubrics and is instead grounded in experts making pairwise judgements about the relative 'quality' of students' work according to a high level criterion. The resulting decision data are fitted…

Descriptors: Comparative Analysis, Decision Making, Student Evaluation, Evaluation Methods

Calibrated Parsing Items Evaluation: A Step towards Objectifying the Translation Assessment

Peer reviewed

Direct link

Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019

The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…

Descriptors: Test Items, Translation, Computer Software, Evaluators

Is Putting SUGAR (Sampling Utterances of Grammatical Analysis Revised) into Language Sample Analysis a Good Thing? A Response to Pavelko and Owens (2017)

Peer reviewed

Direct link

Guo, Ling-Yu; Eisenberg, Sarita; Bernstein Ratner, Nan; MacWhinney, Brian – Language, Speech, and Hearing Services in Schools, 2018

Purpose: In this letter, the authors respond to Pavelko and Owens' (2017) newly advanced set of procedures for language sample analysis: Sampling Utterances and Grammatical Analysis Revised (SUGAR). Method: The authors contrast some of the new guidelines for transcription, morpheme segmentation, and language sample elicitation in SUGAR with…

Descriptors: Sampling, Grammar, Transcripts (Written Records), Morphemes

Measuring the Development of General Language Skills in English as a Foreign Language--Longitudinal Invariance of the C-Test

Peer reviewed

Direct link

Schnoor, Birger; Hartig, Johannes; Klinger, Thorsten; Naumann, Alexander; Usanova, Irina – Language Testing, 2023

Research on assessing English as a foreign language (EFL) development has been growing recently. However, empirical evidence from longitudinal analyses based on substantial samples is still needed. In such settings, tests for measuring language development must meet high standards of test quality such as validity, reliability, and objectivity, as…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Longitudinal Studies

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Language Testing	16
Language Assessment Quarterly	4
ETS Research Report Series	3
Assessment in Education:…	2
English Language Teaching	2
International Journal of…	2
Journal of Language and…	2
Journal of Speech, Language,…	2
Language Learning	2
Language, Speech, and Hearing…	2
System	2
TESOL International Journal	2
American Journal of…	1
Applied Linguistics	1
Canadian Journal of School…	1
Canadian Modern Language…	1
Child Language Teaching and…	1
Cogent Education	1
College Board	1
College Entrance Examination…	1
Cross Currents	1
ELT Journal	1
Edinburgh Working Papers in…	1
Education and Information…	1
Education and Training in…	1
More ▼

Stansfield, Charles W.	3
Attali, Yigal	2
Brown, James Dean	2
Henning, Grant	2
Hung Tan Ha	2
Kenyon, Dorry	2
Kunnan, Antony John	2
Nakamura, Yuji	2
Takala, Sauli	2
Tim Stoeckel	2
Winke, Paula	2
Adams, R. J.	1
Ahmadi Shirazi, Masoumeh	1
Ahmadi, Alireza	1
Ahn, Jieun Irene	1
Ahour, Touran	1
Akbari, Alireza	1
Alderson, J. Charles, Ed.	1
Alharthi, Saleh	1
Apisak Sukying	1
Arani, Davood Khedmatkar	1
Armstrong, Elizabeth	1
Arth, Thomas O.	1
August, Diane	1
More ▼