Publication Date
In 2025 | 3 |
Since 2024 | 6 |
Since 2021 (last 5 years) | 23 |
Since 2016 (last 10 years) | 45 |
Since 2006 (last 20 years) | 64 |
Descriptor
Accuracy | 65 |
Difficulty Level | 65 |
Test Items | 65 |
Item Response Theory | 28 |
Item Analysis | 15 |
Comparative Analysis | 14 |
Correlation | 13 |
Computation | 12 |
Foreign Countries | 12 |
Multiple Choice Tests | 11 |
Language Tests | 10 |
More ▼ |
Source
Author
Benton, Tom | 2 |
Bulut, Okan | 2 |
He, Wei | 2 |
Jiao, Hong | 2 |
Nelson, Gena | 2 |
Schatschneider, Christopher | 2 |
Steedle, Jeffrey T. | 2 |
Wang, Shudong | 2 |
Wood, Carla | 2 |
A. Alexander Beaujean | 1 |
Adam E. Green | 1 |
More ▼ |
Publication Type
Reports - Research | 56 |
Journal Articles | 52 |
Dissertations/Theses -… | 7 |
Information Analyses | 2 |
Reports - Evaluative | 2 |
Speeches/Meeting Papers | 2 |
Tests/Questionnaires | 2 |
Numerical/Quantitative Data | 1 |
Education Level
Audience
Location
Florida | 2 |
United Kingdom | 2 |
Australia | 1 |
Chile | 1 |
Indonesia | 1 |
Iran | 1 |
Japan (Tokyo) | 1 |
Kansas | 1 |
Malaysia | 1 |
New York | 1 |
Nigeria | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Rodgers, Emily; D'Agostino, Jerome V.; Berenbon, Rebecca; Johnson, Tracy; Winkler, Christa – Journal of Early Childhood Literacy, 2023
Running Records are thought to be an excellent formative assessment tool because they generate results that educators can use to make their teaching more responsive. Despite the technical nature of scoring Running Records and the kinds of important decisions that are attached to their analysis, few studies have investigated assessor accuracy. We…
Descriptors: Formative Evaluation, Scoring, Accuracy, Difficulty Level
Aiman Mohammad Freihat; Omar Saleh Bani Yassin – Educational Process: International Journal, 2025
Background/purpose: This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods: The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the…
Descriptors: Accuracy, Computation, Multiple Choice Tests, Test Items
Apichat Khamboonruang – Language Testing in Asia, 2025
Chulalongkorn University Language Institute (CULI) test was developed as a local standardised test of English for professional and international communication. To ensure that the CULI test fulfils its intended purposes, this study employed Kane's argument-based validation and Rasch measurement approaches to construct the validity argument for the…
Descriptors: Universities, Second Language Learning, Second Language Instruction, Language Tests
Dimitrov, Dimiter M.; Atanasov, Dimitar V. – Measurement: Interdisciplinary Research and Perspectives, 2021
This study offers an approach to test equating under the latent D-scoring method (DSM-L) using the nonequivalent groups with anchor tests (NEAT) design. The accuracy of the test equating was examined via a simulation study under a 3 × 3 design by two conditions: group ability at three levels and test difficulty at three levels. The results for…
Descriptors: Equated Scores, Scoring, Test Items, Accuracy
Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model
Custer, Michael; Kim, Jongpil – Online Submission, 2023
This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…
Descriptors: Sample Size, Item Response Theory, Test Items, Computation
Reza Shahi; Hamdollah Ravand; Golam Reza Rohani – International Journal of Language Testing, 2025
The current paper intends to exploit the Many Facet Rasch Model to investigate and compare the impact of situations (items) and raters on test takers' performance on the Written Discourse Completion Test (WDCT) and Discourse Self-Assessment Tests (DSAT). In this study, the participants were 110 English as a Foreign Language (EFL) students at…
Descriptors: Comparative Analysis, English (Second Language), Second Language Learning, Second Language Instruction
Clariana, Roy B.; Park, Eunsung – Educational Technology Research and Development, 2021
Cognitive and metacognitive processes during learning depend on accurate monitoring, this investigation examines the influence of immediate item-level knowledge of correct response feedback on cognition monitoring accuracy. In an optional end-of-course computer-based review lesson, participants (n = 68) were randomly assigned to groups to receive…
Descriptors: Feedback (Response), Cognitive Processes, Accuracy, Difficulty Level
Benton, Tom – Research Matters, 2020
This article reviews the evidence on the extent to which experts' perceptions of item difficulties, captured using comparative judgement, can predict empirical item difficulties. This evidence is drawn from existing published studies on this topic and also from statistical analysis of data held by Cambridge Assessment. Having reviewed the…
Descriptors: Test Items, Difficulty Level, Expertise, Comparative Analysis
Das, Syaamantak; Mandal, Shyamal Kumar Das; Basu, Anupam – Contemporary Educational Technology, 2020
Cognitive learning complexity identification of assessment questions is an essential task in the domain of education, as it helps both the teacher and the learner to discover the thinking process required to answer a given question. Bloom's Taxonomy cognitive levels are considered as a benchmark standard for the classification of cognitive…
Descriptors: Classification, Difficulty Level, Test Items, Identification
Walker, Grant M.; Basilakos, Alexandra; Fridriksson, Julius; Hickok, Gregory – Journal of Speech, Language, and Hearing Research, 2022
Purpose: Meaningful changes in picture naming responses may be obscured when measuring accuracy instead of quality. A statistic that incorporates information about the severity and nature of impairments may be more sensitive to the effects of treatment. Method: We analyzed data from repeated administrations of a naming test to 72 participants with…
Descriptors: Naming, Change, Aphasia, Severity (of Disability)
Tibbits, Nicole; Lancaster, Hope Sparks; de Diego-Lázaroc, Beatriz – Language, Speech, and Hearing Services in Schools, 2023
Purpose: This study examined the effect of phonological overlap on English and Spanish expressive vocabulary accuracy as measured by the bilingual Expressive One-Word Picture Vocabulary Test--Fourth Edition (EOWPVT-IV). We hypothesized that if languages interact during an expressive vocabulary task, then higher phonological overlap will predict…
Descriptors: Phonology, English, Spanish, Bilingual Students
Yoo Jeong Jang – ProQuest LLC, 2022
Despite the increasing demand for diagnostic information, observed subscores have been often reported to lack adequate psychometric qualities such as reliability, distinctiveness, and validity. Therefore, several statistical techniques based on CTT and IRT frameworks have been proposed to improve the quality of subscores. More recently, DCM has…
Descriptors: Classification, Accuracy, Item Response Theory, Correlation
Jose A. Diaz; Steven M. Nelson; A. Alexander Beaujean; Adam E. Green; Michael K. Scullin – Creativity Research Journal, 2024
The compound Remote Associates Test (RAT) is a classic measure of creativity. Participants are shown three cue words (sore-shoulder-sweat) and asked to generate a word that connects them (cold). Theoretical views of RAT performance differ in the degree to which they conceptualize performance as depending on automatic spreading activation across…
Descriptors: Test Items, Creative Thinking, Creativity Tests, Performance
Gregory J. Crowther; Usha Sankar; Leena S. Knight; Deborah L. Myers; Kevin T. Patton; Lekelia D. Jenkins; Thomas A. Knight – Journal of Microbiology & Biology Education, 2023
The biology education literature includes compelling assertions that unfamiliar problems are especially useful for revealing students' true understanding of biology. However, there is only limited evidence that such novel problems have different cognitive requirements than more familiar problems. Here, we sought additional evidence by using…
Descriptors: Science Instruction, Artificial Intelligence, Scoring, Molecular Structure
Musa Adekunle Ayanwale; Jamiu Oluwadamilare Amusa; Adekunle Ibrahim Oladejo; Funmilayo Ayedun – Interchange: A Quarterly Review of Education, 2024
The study focuses on assessing the proficiency levels of higher education students, specifically the physics achievement test (PHY 101) at the National Open University of Nigeria (NOUN). This test, like others, evaluates various aspects of knowledge and skills simultaneously. However, relying on traditional models for such tests can result in…
Descriptors: Item Response Theory, Difficulty Level, Item Analysis, Test Items