NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 107 results Save | Export
Peer reviewed Peer reviewed
Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2024
Assessing the difficulty of reading comprehension questions is crucial to educational methodologies and language understanding technologies. Traditional methods of assessing question difficulty rely frequently on human judgments or shallow metrics, often failing to accurately capture the intricate cognitive demands of answering a question. This…
Descriptors: Difficulty Level, Reading Tests, Test Items, Reading Comprehension
Peer reviewed Peer reviewed
Direct linkDirect link
Andrés Christiansen; Rianne Janssen – Educational Assessment, Evaluation and Accountability, 2024
In international large-scale assessments, students may not be compelled to answer every test item: a student can decide to skip a seemingly difficult item or may drop out before the end of the test is reached. The way these missing responses are treated will affect the estimation of the item difficulty and student ability, and ultimately affect…
Descriptors: Test Items, Item Response Theory, Grade 4, International Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Haladyna, Thomas M.; Rodriguez, Michael C. – Educational Assessment, 2021
Full-information item analysis provides item developers and reviewers comprehensive empirical evidence of item quality, including option response frequency, point-biserial index (PBI) for distractors, mean-scores of respondents selecting each option, and option trace lines. The multi-serial index (MSI) is introduced as a more informative…
Descriptors: Test Items, Item Analysis, Reading Tests, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Joshua B. Gilbert; Luke W. Miratrix; Mridul Joshi; Benjamin W. Domingue – Journal of Educational and Behavioral Statistics, 2025
Analyzing heterogeneous treatment effects (HTEs) plays a crucial role in understanding the impacts of educational interventions. A standard practice for HTE analysis is to examine interactions between treatment status and preintervention participant characteristics, such as pretest scores, to identify how different groups respond to treatment.…
Descriptors: Causal Models, Item Response Theory, Statistical Inference, Psychometrics
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Dhyaaldian, Safa Mohammed Abdulridah; Kadhim, Qasim Khlaif; Mutlak, Dhameer A.; Neamah, Nour Raheem; Kareem, Zaidoon Hussein; Hamad, Doaa A.; Tuama, Jassim Hassan; Qasim, Mohammed Saad – International Journal of Language Testing, 2022
A C-Test is a gap-filling test for measuring language competence in the first and second language. C-Tests are usually analyzed with polytomous Rasch models by considering each passage as a super-item or testlet. This strategy helps overcome the local dependence inherent in C-Test gaps. However, there is little research on the best polytomous…
Descriptors: Item Response Theory, Cloze Procedure, Reading Tests, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Ferrara, Steve; Steedle, Jeffrey T.; Frantz, Roger S. – Applied Measurement in Education, 2022
Item difficulty modeling studies involve (a) hypothesizing item features, or item response demands, that are likely to predict item difficulty with some degree of accuracy; and (b) entering the features as independent variables into a regression equation or other statistical model to predict difficulty. In this review, we report findings from 13…
Descriptors: Reading Comprehension, Reading Tests, Test Items, Item Response Theory
Joshua B. Gilbert; Luke W. Miratrix; Mridul Joshi; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2024
Analyzing heterogeneous treatment effects (HTE) plays a crucial role in understanding the impacts of educational interventions. A standard practice for HTE analysis is to examine interactions between treatment status and pre-intervention participant characteristics, such as pretest scores, to identify how different groups respond to treatment.…
Descriptors: Causal Models, Item Response Theory, Statistical Inference, Psychometrics
He, Wei – NWEA, 2021
New MAP® Growth™ assessments are being developed that administer items more closely matched to the grade level of the student. However, MAP Growth items are calibrated with samples that typically consist of students from a variety of grades, including the target grade to which an item is aligned. While this choice of calibration sample is…
Descriptors: Achievement Tests, Test Items, Instructional Program Divisions, Difficulty Level
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Huu Thanh Minh Nguyen; Nguyen Van Anh Le – TESL-EJ, 2024
Comparing language tests and test preparation materials holds important implications for the latter's validity and reliability. However, not enough studies compare such materials across a wide range of indices. Therefore, this study investigated the text complexity of IELTS academic reading tests (IRT) and IELTS reading practice tests (IRPrT).…
Descriptors: Second Language Learning, English (Second Language), Language Tests, Readability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Budi Waluyo; Ali Zahabi; Luksika Ruangsung – rEFLections, 2024
The increasing popularity of the Common European Framework of Reference (CEFR) in non-native English-speaking countries has generated a demand for concrete examples in the creation of CEFR-based tests that assess the four main English skills. In response, this research endeavors to provide insight into the development and validation of a…
Descriptors: Language Tests, Language Proficiency, Undergraduate Students, Language Skills
He, Wei – NWEA, 2022
To ensure that student academic growth in a subject area is accurately captured, it is imperative that the underlying scale remains stable over time. As item parameter stability constitutes one of the factors that affects scale stability, NWEA® periodically conducts studies to check for the stability of the item parameter estimates for MAP®…
Descriptors: Achievement Tests, Test Items, Test Reliability, Academic Achievement
Peer reviewed Peer reviewed
Direct linkDirect link
Monika Grotek; Agnieszka Slezak-Swiat – Reading in a Foreign Language, 2024
The study investigates the effect of the perception of text and task difficulty on adults' performance in reading tests in L1 and L2. The relationship between the following variables is studied: (a) readers' perception of text and task difficulty in L1 and L2 measured in a self-reported post-task questionnaire, (b) the number of correct answers to…
Descriptors: Difficulty Level, Second Language Learning, Eye Movements, Task Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Cifci, Musa; Kaplan, Kadir – Turkish Online Journal of Educational Technology - TOJET, 2020
An achievement test was prepared to determine students' caricature reading skills. In the first draft of the achievement test, 32 test items and four choices were prepared for each question. The item analysis of the data obtained from the pre-application was made and the internal consistency coefficient (KR-20) was calculated as 0.67 for the…
Descriptors: Reading Tests, Achievement Tests, Reading Skills, Literary Devices
Peer reviewed Peer reviewed
Direct linkDirect link
Becker, Anthony; Nekrasova-Beker, Tatiana – Educational Assessment, 2018
While previous research has identified numerous factors that contribute to item difficulty, studies involving large-scale reading tests have provided mixed results. This study examined five selected-response item types used to measure reading comprehension in the Pearson Test of English Academic: a) multiple-choice (choose one answer), b)…
Descriptors: Reading Comprehension, Test Items, Reading Tests, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Geramipour, Masoud – Language Testing in Asia, 2021
Rasch testlet and bifactor models are two measurement models that could deal with local item dependency (LID) in assessing the dimensionality of reading comprehension testlets. This study aimed to apply the measurement models to real item response data of the Iranian EFL reading comprehension tests and compare the validity of the bifactor models…
Descriptors: Foreign Countries, Second Language Learning, English (Second Language), Reading Tests
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8