Publication Date
In 2025 | 14 |
Since 2024 | 44 |
Descriptor
Source
Author
Tim Stoeckel | 3 |
Emma Marsden | 2 |
Hung Tan Ha | 2 |
James S. Kim | 2 |
Joshua B. Gilbert | 2 |
Luke W. Miratrix | 2 |
Tomoko Ishii | 2 |
Afsar Rouhi | 1 |
Ali Zahabi | 1 |
Allan S. Cohen | 1 |
Amber Dudley | 1 |
More ▼ |
Publication Type
Journal Articles | 42 |
Reports - Research | 40 |
Tests/Questionnaires | 5 |
Information Analyses | 3 |
Dissertations/Theses -… | 1 |
Reports - Evaluative | 1 |
Education Level
Higher Education | 26 |
Postsecondary Education | 26 |
Elementary Education | 5 |
Secondary Education | 4 |
Early Childhood Education | 2 |
Grade 1 | 2 |
Grade 2 | 2 |
Grade 3 | 2 |
High Schools | 2 |
Primary Education | 2 |
Adult Education | 1 |
More ▼ |
Audience
Location
Iran | 6 |
China | 4 |
Japan | 3 |
United Kingdom | 3 |
Iran (Tehran) | 2 |
Spain | 2 |
Thailand | 2 |
Vietnam | 2 |
Europe | 1 |
Japan (Tokyo) | 1 |
New Zealand | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
International English… | 5 |
Pearson Test of English… | 1 |
Test of English for… | 1 |
What Works Clearinghouse Rating
Jiawei Xiong; George Engelhard; Allan S. Cohen – Measurement: Interdisciplinary Research and Perspectives, 2025
It is common to find mixed-format data results from the use of both multiple-choice (MC) and constructed-response (CR) questions on assessments. Dealing with these mixed response types involves understanding what the assessment is measuring, and the use of suitable measurement models to estimate latent abilities. Past research in educational…
Descriptors: Responses, Test Items, Test Format, Grade 8
Kaja Haugen; Cecilie Hamnes Carlsen; Christine Möller-Omrani – Language Awareness, 2025
This article presents the process of constructing and validating a test of metalinguistic awareness (MLA) for young school children (age 8-10). The test was developed between 2021 and 2023 as part of the MetaLearn research project, financed by The Research Council of Norway. The research team defines MLA as using metalinguistic knowledge at a…
Descriptors: Language Tests, Test Construction, Elementary School Students, Metalinguistics
Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…
Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods
Hung Tan Ha; Duyen Thi Bich Nguyen; Tim Stoeckel – Language Assessment Quarterly, 2025
This article compares two methods for detecting local item dependence (LID): residual correlation examination and Rasch testlet modeling (RTM), in a commonly used 3:6 matching format and an extended matching test (EMT) format. The two formats are hypothesized to facilitate different levels of item dependency due to differences in the number of…
Descriptors: Comparative Analysis, Language Tests, Test Items, Item Analysis
Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025
This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…
Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
Paula Elosua – Language Assessment Quarterly, 2024
In sociolinguistic contexts where standardized languages coexist with regional dialects, the study of differential item functioning is a valuable tool for examining certain linguistic uses or varieties as threats to score validity. From an ecological perspective, this paper describes three stages in the study of differential item functioning…
Descriptors: Reading Tests, Reading Comprehension, Scores, Test Validity
Dongkwang Shin; Jang Ho Lee – ELT Journal, 2024
Although automated item generation has gained a considerable amount of attention in a variety of fields, it is still a relatively new technology in ELT contexts. Therefore, the present article aims to provide an accessible introduction to this powerful resource for language teachers based on a review of the available research. Particularly, it…
Descriptors: Language Tests, Artificial Intelligence, Test Items, Automation
Apichat Khamboonruang – Language Testing in Asia, 2025
Chulalongkorn University Language Institute (CULI) test was developed as a local standardised test of English for professional and international communication. To ensure that the CULI test fulfils its intended purposes, this study employed Kane's argument-based validation and Rasch measurement approaches to construct the validity argument for the…
Descriptors: Universities, Second Language Learning, Second Language Instruction, Language Tests
Ingela Holmström; Krister Schönström; Magnus Ryttervik – Language Assessment Quarterly, 2024
There is a lack of tests available for assessing sign language proficiency among L2 learners. We have therefore developed a sign repetition test, SignRepL2, with a specific focus on the phonological features of signs. This paper describes the two phases of developing this test. In the first phase, content was developed in the form of 50 items with…
Descriptors: Sign Language, Novices, Task Analysis, Second Language Learning
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Barno S. Abdullaeva; Diyorjon Abdullaev; Nurislom I. Khursanov; Khurshida B. Kadirova; Laylo Djuraeva – International Journal of Language Testing, 2024
Cloze tests are commonly used in language testing as a quick measure of overall language ability or reading comprehension. A problem for the analysis of cloze tests with item response theory models is that cloze test items are locally dependent. This leads to the violation of the conditional or local independence assumption of IRT models. In this…
Descriptors: Cloze Procedure, Language Tests, Test Items, Correlation
Junlan Pan; Emma Marsden – Language Testing, 2024
"Tests of Aptitude for Language Learning" (TALL) is an openly accessible internet-based battery to measure the multifaceted construct of foreign language aptitude, using language domain-specific instruments and L1-sensitive instructions and stimuli. This brief report introduces the components of this theory-informed battery and…
Descriptors: Language Tests, Aptitude Tests, Second Language Learning, Test Construction
Shadi Noroozi; Hossein Karami – Language Testing in Asia, 2024
Recently, psychometricians and researchers have voiced their concern over the exploration of language test items in light of Messick's validation framework. Validity has been central to test development and use; however, it has not received due attention in language tests having grave consequences for test takers. The present study sought to…
Descriptors: Foreign Countries, Doctoral Students, Graduate Students, Language Proficiency
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction