ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	14

Descriptor

Simulation	15
Test Items	15
Item Response Theory	7
Test Bias	7
Computer Assisted Testing	5
Sample Size	5
Adaptive Testing	4
Evaluation Methods	4
Test Length	4
Comparative Analysis	3
Computation	3
Measurement	3
Models	3
Probability	3
Accuracy	2
Achievement Tests	2
Cheating	2
Computer Software	2
Correlation	2
Educational Assessment	2
Equations (Mathematics)	2
Error Patterns	2
Error of Measurement	2
Foreign Countries	2
Item Analysis	2
More ▼

Source

International Journal of…

Publication Type

Journal Articles	15
Reports - Research	13
Reports - Evaluative	2

Education Level

Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Intermediate Grades	1
Middle Schools	1
Primary Education	1

Audience

Location

Canada

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

IRT Linking Methods for the Bifactor Model with Mixed Format Tests

Peer reviewed

Direct link

Sohee Kim; Ki Lynn Cole – International Journal of Testing, 2025

This study conducted a comprehensive comparison of Item Response Theory (IRT) linking methods applied to a bifactor model, examining their performance on both multiple choice (MC) and mixed format tests within the common item nonequivalent group design framework. Four distinct multidimensional IRT linking approaches were explored, consisting of…

Descriptors: Item Response Theory, Comparative Analysis, Models, Item Analysis

The Recovery of Correlation between Latent Abilities Using Compensatory and Noncompensatory Multidimensional IRT Models

Peer reviewed

Direct link

Fu, Yanyan; Strachan, Tyler; Ip, Edward H.; Willse, John T.; Chen, Shyh-Huei; Ackerman, Terry – International Journal of Testing, 2020

This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and…

Descriptors: Item Response Theory, Models, Test Items, Simulation

Item Parameter Drift in Computer Adaptive Testing Due to Lack of Content Knowledge

Peer reviewed

Direct link

Aksu Dunya, Beyza – International Journal of Testing, 2018

This study was conducted to analyze potential item parameter drift (IPD) impact on person ability estimates and classification accuracy when drift affects an examinee subgroup. Using a series of simulations, three factors were manipulated: (a) percentage of IPD items in the CAT exam, (b) percentage of examinees affected by IPD, and (c) item pool…

Descriptors: Adaptive Testing, Classification, Accuracy, Computer Assisted Testing

Item Calibration Samples and the Stability of Achievement Estimates and System Rankings: Another Look at the PISA Model

Peer reviewed

Direct link

Rutkowski, Leslie; Rutkowski, David; Zhou, Yan – International Journal of Testing, 2016

Using an empirically-based simulation study, we show that typically used methods of choosing an item calibration sample have significant impacts on achievement bias and system rankings. We examine whether recent PISA accommodations, especially for lower performing participants, can mitigate some of this bias. Our findings indicate that standard…

Descriptors: Simulation, International Programs, Adolescents, Student Evaluation

Multiple-Group Noncompensatory Differential Item Functioning in Raju's Differential Functioning of Items and Tests

Peer reviewed

Direct link

Oshima, T. C.; Wright, Keith; White, Nick – International Journal of Testing, 2015

Raju, van der Linden, and Fleer (1995) introduced a framework for differential functioning of items and tests (DFIT) for unidimensional dichotomous models. Since then, DFIT has been shown to be a quite versatile framework as it can handle polytomous as well as multidimensional models both at the item and test levels. However, DFIT is still limited…

Descriptors: Test Bias, Item Response Theory, Test Items, Simulation

Using Out-of-Level Items in Computerized Adaptive Testing

Peer reviewed

Direct link

Wei, Hua; Lin, Jie – International Journal of Testing, 2015

Out-of-level testing refers to the practice of assessing a student with a test that is intended for students at a higher or lower grade level. Although the appropriateness of out-of-level testing for accountability purposes has been questioned by educators and policymakers, incorporating out-of-level items in formative assessments for accurate…

Descriptors: Test Items, Computer Assisted Testing, Adaptive Testing, Instructional Program Divisions

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Modeling Item-Level and Step-Level Invariance Effects in Polytomous Items Using the Partial Credit Model

Peer reviewed

Direct link

Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D. – International Journal of Testing, 2012

Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…

Descriptors: Foreign Countries, Psychometrics, Test Bias, Test Items

Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model

Peer reviewed

Direct link

Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011

This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…

Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis

Can Differential Rapid-Guessing Behavior Lead to Differential Item Functioning?

Peer reviewed

Direct link

DeMars, Christine E.; Wise, Steven L. – International Journal of Testing, 2010

This investigation examined whether different rates of rapid guessing between groups could lead to detectable levels of differential item functioning (DIF) in situations where the item parameters were the same for both groups. Two simulation studies were designed to explore this possibility. The groups in Study 1 were simulated to reflect…

Descriptors: Guessing (Tests), Test Bias, Motivation, Gender Differences

A Range-Null Hypothesis Approach for Testing DIF under the Rasch Model

Peer reviewed

Direct link

Wells, Craig S.; Cohen, Allan S.; Patton, Jeffrey – International Journal of Testing, 2009

A primary concern with testing differential item functioning (DIF) using a traditional point-null hypothesis is that a statistically significant result does not imply that the magnitude of DIF is of practical interest. Similarly, for a given sample size, a non-significant result does not allow the researcher to conclude the item is free of DIF. To…

Descriptors: Test Bias, Test Items, Statistical Analysis, Hypothesis Testing

Differential Item Functioning Analysis Using Rasch Item Information Functions

Peer reviewed

Direct link

Wyse, Adam E.; Mapuranga, Raymond – International Journal of Testing, 2009

Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…

Descriptors: Test Bias, Evaluation Methods, Test Items, Educational Assessment

A Review of "Integrity[TM]"

Peer reviewed

Direct link

Veldkamp, Bernard P. – International Journal of Testing, 2008

Integrity[TM], an online application for testing both the statistical integrity of the test and the academic integrity of the examinees, was evaluated for this review. Program features and the program output are described. An overview of the statistics in Integrity[TM] is provided, and the application is illustrated with a small simulation study.…

Descriptors: Simulation, Integrity, Statistics, Computer Assisted Testing

A "Rearrangement Procedure" for Scoring Adaptive Tests with Review Options

Peer reviewed

Direct link

Papanastasiou, Elena C.; Reckase, Mark D. – International Journal of Testing, 2007

Because of the increased popularity of computerized adaptive testing (CAT), many admissions tests, as well as certification and licensure examinations, have been transformed from their paper-and-pencil versions to computerized adaptive versions. A major difference between paper-and-pencil tests and CAT from an examinee's point of view is that in…

Descriptors: Simulation, Adaptive Testing, Computer Assisted Testing, Test Items

Small Sample Studies To Detect Flaws in Item Translations.

Peer reviewed

Muniz, Jose; Hambleton, Ronald K.; Xing, Dehui – International Journal of Testing, 2001

Studied two procedures for detecting potentially flawed items in translated tests with small samples: (1) conditional item "p" value comparisons; and (2) delta plots. Varied several factors in this simulation study. Findings show that the two procedures can be valuable in identifying flawed test items, especially when the size of the…

Descriptors: Identification, Sample Size, Simulation, Test Items

Ackerman, Terry	1
Aksu Dunya, Beyza	1
Chen, Shyh-Huei	1
Chernyshenko, Oleksandr S.	1
Cohen, Allan S.	1
DeMars, Christine E.	1
Emons, Wilco H. M.	1
Fu, Yanyan	1
Gattamorta, Karina A.	1
Hambleton, Ronald K.	1
Ip, Edward H.	1
Ki Lynn Cole	1
Kruyen, Peter M.	1
Lin, Jie	1
Mapuranga, Raymond	1
Muniz, Jose	1
Myers, Nicholas D.	1
Oshima, T. C.	1
Papanastasiou, Elena C.	1
Patton, Jeffrey	1
Penfield, Randall D.	1
Reckase, Mark D.	1
Rutkowski, David	1
Rutkowski, Leslie	1
Sijtsma, Klaas	1
More ▼