NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Berenbon, Rebecca F.; McHugh, Bridget C. – Educational Measurement: Issues and Practice, 2023
To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ…
Descriptors: Test Items, Multiple Choice Tests, Standards, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…
Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Arikan, Serkan; Aybek, Eren Can – Educational Measurement: Issues and Practice, 2022
Many scholars compared various item discrimination indices in real or simulated data. Item discrimination indices, such as item-total correlation, item-rest correlation, and IRT item discrimination parameter, provide information about individual differences among all participants. However, there are tests that aim to select a very limited number…
Descriptors: Monte Carlo Methods, Item Analysis, Correlation, Individual Differences
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Educational Measurement: Issues and Practice, 2018
The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested "miditests," which are anchor tests that are content-representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties.…
Descriptors: Test Content, Difficulty Level, Test Items, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Rutkowski, David; Rutkowski, Leslie; Liaw, Yuan-Ling – Educational Measurement: Issues and Practice, 2018
Participation in international large-scale assessments has grown over time with the largest, the Programme for International Student Assessment (PISA), including more than 70 education systems that are economically and educationally diverse. To help accommodate for large achievement differences among participants, in 2009 PISA offered…
Descriptors: Educational Assessment, Foreign Countries, Achievement Tests, Secondary School Students
Peer reviewed Peer reviewed
Direct linkDirect link
Soland, James – Educational Measurement: Issues and Practice, 2019
As computer-based tests become more common, there is a growing wealth of metadata related to examinees' response processes, which include solution strategies, concentration, and operating speed. One common type of metadata is item response time. While response times have been used extensively to improve estimates of achievement, little work…
Descriptors: Test Items, Item Response Theory, Metadata, Self Efficacy
Peer reviewed Peer reviewed
Direct linkDirect link
Sheehan, Kathleen M. – Educational Measurement: Issues and Practice, 2017
Automated text complexity measurement tools (also called readability metrics) have been proposed as a way to help teachers, textbook publishers, and assessment developers select texts that are closely aligned with the new, more demanding text complexity expectations specified in the Common Core State Standards. This article examines a critical…
Descriptors: Reading Material Selection, Difficulty Level, Common Core State Standards, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Tiffin-Richards, Simon P.; Pant, Hans Anand; Koller, Olaf – Educational Measurement: Issues and Practice, 2013
Cut-scores were set by expert judges on assessments of reading and listening comprehension of English as a foreign language (EFL), using the bookmark standard-setting method to differentiate proficiency levels defined by the Common European Framework of Reference (CEFR). Assessments contained stratified item samples drawn from extensive item…
Descriptors: Foreign Countries, English (Second Language), Language Tests, Standard Setting (Scoring)
Peer reviewed Peer reviewed
Albanese, Mark A. – Educational Measurement: Issues and Practice, 1993
A comprehensive review is given of evidence, with a bearing on the recommendation to avoid use of complex multiple choice (CMC) items. Avoiding Type K items (four primary responses and five secondary choices) seems warranted, but evidence against CMC in general is less clear. (SLD)
Descriptors: Cues, Difficulty Level, Multiple Choice Tests, Responses
Peer reviewed Peer reviewed
Linn, Robert L.; Drasgow, Fritz – Educational Measurement: Issues and Practice, 1987
This article discusses the application of the Golden Rule procedure to items of the Scholastic Aptitude Test. Using item response theory, the analyses indicate that the Golden Rule procedures are ineffective in detecting biased items and may undermine the reliability and validity of tests. (Author/JAZ)
Descriptors: College Entrance Examinations, Difficulty Level, Item Analysis, Latent Trait Theory
Peer reviewed Peer reviewed
Frisbie, David A. – Educational Measurement: Issues and Practice, 1992
Literature related to the multiple true-false (MTF) item format is reviewed. Each answer cluster of a MTF item may have several true items and the correctness of each is judged independently. MTF tests appear efficient and reliable, although they are a bit harder than multiple choice items for examinees. (SLD)
Descriptors: Achievement Tests, Difficulty Level, Literature Reviews, Multiple Choice Tests