NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025
This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…
Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Fu, Yanyan; Strachan, Tyler; Ip, Edward H.; Willse, John T.; Chen, Shyh-Huei; Ackerman, Terry – International Journal of Testing, 2020
This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and…
Descriptors: Item Response Theory, Models, Test Items, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Aksu Dunya, Beyza – International Journal of Testing, 2018
This study was conducted to analyze potential item parameter drift (IPD) impact on person ability estimates and classification accuracy when drift affects an examinee subgroup. Using a series of simulations, three factors were manipulated: (a) percentage of IPD items in the CAT exam, (b) percentage of examinees affected by IPD, and (c) item pool…
Descriptors: Adaptive Testing, Classification, Accuracy, Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Rutkowski, Leslie; Rutkowski, David; Zhou, Yan – International Journal of Testing, 2016
Using an empirically-based simulation study, we show that typically used methods of choosing an item calibration sample have significant impacts on achievement bias and system rankings. We examine whether recent PISA accommodations, especially for lower performing participants, can mitigate some of this bias. Our findings indicate that standard…
Descriptors: Simulation, International Programs, Adolescents, Student Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Oshima, T. C.; Wright, Keith; White, Nick – International Journal of Testing, 2015
Raju, van der Linden, and Fleer (1995) introduced a framework for differential functioning of items and tests (DFIT) for unidimensional dichotomous models. Since then, DFIT has been shown to be a quite versatile framework as it can handle polytomous as well as multidimensional models both at the item and test levels. However, DFIT is still limited…
Descriptors: Test Bias, Item Response Theory, Test Items, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Wei, Hua; Lin, Jie – International Journal of Testing, 2015
Out-of-level testing refers to the practice of assessing a student with a test that is intended for students at a higher or lower grade level. Although the appropriateness of out-of-level testing for accountability purposes has been questioned by educators and policymakers, incorporating out-of-level items in formative assessments for accurate…
Descriptors: Test Items, Computer Assisted Testing, Adaptive Testing, Instructional Program Divisions
Peer reviewed Peer reviewed
Direct linkDirect link
Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012
Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…
Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D. – International Journal of Testing, 2012
Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Descriptors: Foreign Countries, Psychometrics, Test Bias, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011
This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…
Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
DeMars, Christine E.; Wise, Steven L. – International Journal of Testing, 2010
This investigation examined whether different rates of rapid guessing between groups could lead to detectable levels of differential item functioning (DIF) in situations where the item parameters were the same for both groups. Two simulation studies were designed to explore this possibility. The groups in Study 1 were simulated to reflect…
Descriptors: Guessing (Tests), Test Bias, Motivation, Gender Differences
Peer reviewed Peer reviewed
Direct linkDirect link
Wells, Craig S.; Cohen, Allan S.; Patton, Jeffrey – International Journal of Testing, 2009
A primary concern with testing differential item functioning (DIF) using a traditional point-null hypothesis is that a statistically significant result does not imply that the magnitude of DIF is of practical interest. Similarly, for a given sample size, a non-significant result does not allow the researcher to conclude the item is free of DIF. To…
Descriptors: Test Bias, Test Items, Statistical Analysis, Hypothesis Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; Mapuranga, Raymond – International Journal of Testing, 2009
Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
Descriptors: Test Bias, Evaluation Methods, Test Items, Educational Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Veldkamp, Bernard P. – International Journal of Testing, 2008
Integrity[TM], an online application for testing both the statistical integrity of the test and the academic integrity of the examinees, was evaluated for this review. Program features and the program output are described. An overview of the statistics in Integrity[TM] is provided, and the application is illustrated with a small simulation study.…
Descriptors: Simulation, Integrity, Statistics, Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Papanastasiou, Elena C.; Reckase, Mark D. – International Journal of Testing, 2007
Because of the increased popularity of computerized adaptive testing (CAT), many admissions tests, as well as certification and licensure examinations, have been transformed from their paper-and-pencil versions to computerized adaptive versions. A major difference between paper-and-pencil tests and CAT from an examinee's point of view is that in…
Descriptors: Simulation, Adaptive Testing, Computer Assisted Testing, Test Items
Peer reviewed Peer reviewed
Muniz, Jose; Hambleton, Ronald K.; Xing, Dehui – International Journal of Testing, 2001
Studied two procedures for detecting potentially flawed items in translated tests with small samples: (1) conditional item "p" value comparisons; and (2) delta plots. Varied several factors in this simulation study. Findings show that the two procedures can be valuable in identifying flawed test items, especially when the size of the…
Descriptors: Identification, Sample Size, Simulation, Test Items