Publication Date
In 2025 | 8 |
Since 2024 | 31 |
Since 2021 (last 5 years) | 121 |
Since 2016 (last 10 years) | 227 |
Since 2006 (last 20 years) | 324 |
Descriptor
Difficulty Level | 404 |
Language Tests | 404 |
Second Language Learning | 249 |
English (Second Language) | 242 |
Foreign Countries | 233 |
Second Language Instruction | 150 |
Test Items | 133 |
Language Proficiency | 124 |
Comparative Analysis | 81 |
Scores | 78 |
Reading Comprehension | 67 |
More ▼ |
Source
Author
Al-Jarf, Reima | 6 |
Perkins, Kyle | 5 |
Papageorgiou, Spiros | 4 |
Cox, Troy L. | 3 |
Huntley, Renee M. | 3 |
Kitao, Kenji | 3 |
Kitao, S. Kathleen | 3 |
Alderson, J. Charles | 2 |
Baghaei, Purya | 2 |
Bagheri, Mohammad Sadegh | 2 |
Brunfaut, Tineke | 2 |
More ▼ |
Publication Type
Education Level
Audience
Practitioners | 15 |
Teachers | 12 |
Researchers | 3 |
Students | 1 |
Location
Iran | 30 |
China | 20 |
Japan | 20 |
Germany | 11 |
Taiwan | 11 |
Thailand | 10 |
United Kingdom | 10 |
Saudi Arabia | 9 |
South Korea | 9 |
Turkey | 9 |
Europe | 8 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Tia M. Fechter; Heeyeon Yoon – Language Testing, 2024
This study evaluated the efficacy of two proposed methods in an operational standard-setting study conducted for a high-stakes language proficiency test of the U.S. government. The goal was to seek low-cost modifications to the existing Yes/No Angoff method to increase the validity and reliability of the recommended cut scores using a convergent…
Descriptors: Standard Setting, Language Proficiency, Language Tests, Evaluation Methods
Neda Kianinezhad; Mohsen Kianinezhad – Language Education & Assessment, 2025
This study presents a comparative analysis of classical reliability measures, including Cronbach's alpha, test-retest, and parallel forms reliability, alongside modern psychometric methods such as the Rasch model and Mokken scaling, to evaluate the reliability of C-tests in language proficiency assessment. Utilizing data from 150 participants…
Descriptors: Psychometrics, Test Reliability, Language Proficiency, Language Tests
Anja Riemenschneider; Zarah Weiss; Pauline Schröter; Detmar Meurers – TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect, 2024
The linguistic characteristics of text productions depend on various factors, including individual language proficiency as well as the tasks used to elicit the production. To date, little attention has been paid to whether some writing tasks are more suitable than others to represent and differentiate students' proficiency levels. This issue is…
Descriptors: English (Second Language), Writing (Composition), Difficulty Level, Language Proficiency
Apichat Khamboonruang – Language Testing in Asia, 2025
Chulalongkorn University Language Institute (CULI) test was developed as a local standardised test of English for professional and international communication. To ensure that the CULI test fulfils its intended purposes, this study employed Kane's argument-based validation and Rasch measurement approaches to construct the validity argument for the…
Descriptors: Universities, Second Language Learning, Second Language Instruction, Language Tests
Ludewig, Ulrich; Schwerter, Jakob; McElvany, Nele – Journal of Psychoeducational Assessment, 2023
A better understanding of how distractor features influence the plausibility of distractors is essential for an efficient multiple-choice (MC) item construction in educational assessment. The plausibility of distractors has a major influence on the psychometric characteristics of MC items. Our analysis utilizes the nominal categories model to…
Descriptors: Vocabulary, Language Tests, German, Grade 4
Yu, Qiaona – Applied Linguistics, 2021
Language complexity reveals the ability to use a wide and varied range of sophisticated structures and vocabulary. Although different languages compose complexity differently, complexity measures such as the T-unit have typically been based on clause subordination, which may underrepresent complexity and threaten the validity of studies. This…
Descriptors: Chinese, Difficulty Level, Syntax, Language Proficiency
Thirakunkovit, Suthathip; Rhee, Seongha – THAITESOL Journal, 2021
This study explores the extent to which the difficulty levels of grammar items in an English test can be predicted by the complexity of grammatical structures. The researchers carried out two sets of analyses. In the first analysis, the item facility and item discrimination indices of 175 multiple-choice items were examined. In the second…
Descriptors: Grammar, Test Items, Difficulty Level, English (Second Language)
De Cat, Cécile; Melia, Tara – Journal of Child Language, 2022
The Sentence Structure sub-test (SST) of the Clinical Evaluation of Language Fundamentals (CELF) aims to "measure the acquisition of grammatical (structural) rules at the sentence level". Although originally designed for clinical practice with monolingual children, components of the CELF, such as the SST, are often used to inform…
Descriptors: Sentence Structure, Language Tests, Reading Comprehension, Cognitive Processes
Alan Shaw – PASAA: Journal of Language Teaching and Learning in Thailand, 2023
Although the TOEFL iBT Listening test is sometimes used for other purposes, it was designed primarily for use as a college entrance examination. Item difficulty in TOEFL iBT Listening tests is the product of interactions between two sets of complex relationships: 1) relationships among numerous item characteristics themselves, and 2) relationships…
Descriptors: English (Second Language), Second Language Instruction, Listening Skills, Language Tests
Yoshiki Fujiwara; Hiroyuki Shimada – Language Acquisition: A Journal of Developmental Linguistics, 2024
The goal of this paper is to tease apart two approaches to the source of children's consistent scope assignment in negative sentences containing logical connectives: the Semantic Subset Principle and the Semantic Subset Maxim. Previous developmental work has observed that four- to six-year-old children across languages have difficulty with…
Descriptors: Semantics, Language Acquisition, Form Classes (Languages), Morphemes
Noboru Sakai – Journal of Educators Online, 2025
This study aims to investigate ChatGPT's ability to comprehend input from nonnative speakers, specifically those learning English as a second language, with Japanese speakers serving as the model population. The experiment examines how ChatGPT evaluates the difficulty levels of the Test of English for International Communication (TOEIC), which is…
Descriptors: Foreign Countries, Artificial Intelligence, Native Speakers, English (Second Language)
Kuo-Zheng Feng – Language Testing in Asia, 2024
This study addressed a gap in existing research on Multiple-Choice (MC) cloze tests by focusing on the learners' perspective, specifically examining the difficulties faced by vocational high school students (VHSs). A nationwide sample of 293 VHSs participated, providing both quantitative and qualitative data through a self-developed questionnaire.…
Descriptors: Language Tests, Multiple Choice Tests, Cloze Procedure, Student Attitudes
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Jin, Kuan-Yu; Eckes, Thomas – Measurement: Interdisciplinary Research and Perspectives, 2022
Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale's middle categories. In the present paper, we adopted Jin and Wang's (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters…
Descriptors: Performance Based Assessment, Evaluators, Scoring, Sample Size
Ali Akbar Boori; Mohammad Ghazanfari; Behzad Ghonsooly; Purya Baghaei – International Journal of Language Testing, 2024
The purpose of this study was to compare the functioning of five restrictive CDMs, including DINA, DINO, A-CDM, LLM, and RRUM, against the G-DINA model to identify the best-fitting CDM which can better explain the interaction underlying the attributes of the reading comprehension section of an Iranian high-stakes language proficiency test. To this…
Descriptors: Foreign Countries, Doctoral Students, Reading Comprehension, Language Tests