ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	25

Descriptor

Comparative Analysis	29
Simulation	29
Test Bias	29
Item Response Theory	16
Test Items	16
Sample Size	11
Models	7
Statistical Analysis	7
Evaluation Methods	6
Scores	6
Adaptive Testing	5
Computation	5
Computer Assisted Testing	5
Mathematics Tests	5
Regression (Statistics)	5
Difficulty Level	4
Maximum Likelihood Statistics	4
Monte Carlo Methods	4
Ability	3
Achievement Tests	3
Error Patterns	3
Error of Measurement	3
Foreign Countries	3
Hierarchical Linear Modeling	3
Probability	3
More ▼

Source

Educational and Psychological…	5
Journal of Educational…	4
International Journal of…	3
Applied Psychological…	2
ETS Research Report Series	2
Educational Sciences: Theory…	2
Applied Measurement in…	1
Educational Measurement:…	1
Hacettepe University Journal…	1
Journal of Educational Issues	1
Large-scale Assessments in…	1
Multivariate Behavioral…	1
Structural Equation Modeling:…	1
More ▼

Publication Type

Journal Articles	25
Reports - Research	21
Reports - Evaluative	8
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	2
High Schools	1
Secondary Education	1

Audience

Location

North Carolina	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Trends in International…	3
Program for International…	2
National Assessment of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

Peer reviewed

Direct link

Matlock, Ki Lynn; Turner, Ronna – Educational and Psychological Measurement, 2016

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…

Descriptors: Item Response Theory, Computation, Test Items, Difficulty Level

Five Methods for Estimating Angoff Cut Scores with IRT

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017

This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…

Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics

DIF Analysis with Multilevel Data: A Simulation Study Using the Latent Variable Approach

Peer reviewed
PDF on ERIC

Download full text

Jin, Ying; Eason, Hershel – Journal of Educational Issues, 2016

The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling…

Descriptors: Test Bias, Simulation, Hierarchical Linear Modeling, Comparative Analysis

Anchor Selection Strategies for DIF Analysis: Review, Assessment, and New Approaches

Peer reviewed

Direct link

Kopf, Julia; Zeileis, Achim; Strobl, Carolin – Educational and Psychological Measurement, 2015

Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…

Descriptors: Test Items, Equated Scores, Test Bias, Item Response Theory

Effect of Differential Item Functioning on Test Equating

Peer reviewed
PDF on ERIC

Download full text

Kabasakal, Kübra Atalay; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2015

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…

Descriptors: Test Bias, Equated Scores, Item Response Theory, Simulation

Multiple-Group Noncompensatory Differential Item Functioning in Raju's Differential Functioning of Items and Tests

Peer reviewed

Direct link

Oshima, T. C.; Wright, Keith; White, Nick – International Journal of Testing, 2015

Raju, van der Linden, and Fleer (1995) introduced a framework for differential functioning of items and tests (DFIT) for unidimensional dichotomous models. Since then, DFIT has been shown to be a quite versatile framework as it can handle polytomous as well as multidimensional models both at the item and test levels. However, DFIT is still limited…

Descriptors: Test Bias, Item Response Theory, Test Items, Simulation

Centering, Scale Indeterminacy, and Differential Item Functioning Detection in Hierarchical Generalized Linear and Generalized Linear Mixed Models

Peer reviewed

Direct link

Cheong, Yuk Fai; Kamata, Akihito – Applied Measurement in Education, 2013

In this article, we discuss and illustrate two centering and anchoring options available in differential item functioning (DIF) detection studies based on the hierarchical generalized linear and generalized linear mixed modeling frameworks. We compared and contrasted the assumptions of the two options, and examined the properties of their DIF…

Descriptors: Test Bias, Hierarchical Linear Modeling, Comparative Analysis, Test Items

Longitudinal Multistage Testing

Peer reviewed

Direct link

Pohl, Steffi – Journal of Educational Measurement, 2013

This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for…

Descriptors: Adaptive Testing, Longitudinal Studies, Difficulty Level, Comparative Analysis

Comparing DIF Methods for Data with Dual Dependency

Peer reviewed

Direct link

Jin, Ying; Kang, Minsoo – Large-scale Assessments in Education, 2016

Background: The current study compared four differential item functioning (DIF) methods to examine their performances in terms of accounting for dual dependency (i.e., person and item clustering effects) simultaneously by a simulation study, which is not sufficiently studied under the current DIF literature. The four methods compared are logistic…

Descriptors: Comparative Analysis, Test Bias, Simulation, Regression (Statistics)

Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

Peer reviewed

Direct link

Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna – Journal of Educational Measurement, 2014

Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…

Descriptors: Test Bias, Models, Simulation, Error Patterns

Comparing Performances (Type I Error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel Methods in the Determination of Differential Item Functioning

Peer reviewed
PDF on ERIC

Download full text

Atalay Kabasakal, Kübra; Arsan, Nihan; Gök, Bilge; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2014

This simulation study compared the performances (Type I error and power) of Mantel-Haenszel (MH), SIBTEST, and item response theory-likelihood ratio (IRT-LR) methods under certain conditions. Manipulated factors were sample size, ability differences between groups, test length, the percentage of differential item functioning (DIF), and underlying…

Descriptors: Comparative Analysis, Item Response Theory, Statistical Analysis, Test Bias

The Langer-Improved Wald Test for DIF Testing with Multiple Groups: Evaluation and Comparison to Two-Group IRT

Peer reviewed

Direct link

Woods, Carol M.; Cai, Li; Wang, Mian – Educational and Psychological Measurement, 2013

Differential item functioning (DIF) occurs when the probability of responding in a particular category to an item differs for members of different groups who are matched on the construct being measured. The identification of DIF is important for valid measurement. This research evaluates an improved version of Lord's X[superscript 2] Wald test for…

Descriptors: Test Bias, Item Response Theory, Computation, Comparative Analysis

A Comparison of Linking Methods for Estimating National Trends in International Comparative Large-Scale Assessments in the Presence of Cross-national DIF

Peer reviewed

Direct link

Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016

Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…

Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation

An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

Peer reviewed
PDF on ERIC

Download full text

Ali, Usama S.; Chang, Hua-Hua – ETS Research Report Series, 2014

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…

Descriptors: Adaptive Testing, Simulation, Pretests Posttests, Test Items

Iterative Linking with the Differential Functioning of Items and Tests (DFIT) Method: Comparison of Testwide and Item Parameter Replication (IPR) Critical Values

Peer reviewed

Direct link

Seybert, Jacob; Stark, Stephen – Applied Psychological Measurement, 2012

A Monte Carlo study was conducted to examine the accuracy of differential item functioning (DIF) detection using the differential functioning of items and tests (DFIT) method. Specifically, the performance of DFIT was compared using "testwide" critical values suggested by Flowers, Oshima, and Raju, based on simulations involving large numbers of…

Descriptors: Test Bias, Monte Carlo Methods, Form Classes (Languages), Simulation

Previous Page | Next Page »

Pages: 1 | 2

Jin, Ying	2
Kamata, Akihito	2
Kelecioglu, Hülya	2
Paek, Insu	2
Woods, Carol M.	2
Wyse, Adam E.	2
Ali, Usama S.	1
Arsan, Nihan	1
Atalay Kabasakal, Kübra	1
Atar, Burcu	1
Bartram, Dave	1
Berger, Dale E.	1
Brown, Richard S.	1
Cai, Li	1
Chang, Hua-Hua	1
Chen, Shu-Ying	1
Cheong, Yuk Fai	1
DuVernet, Amy M.	1
Eason, Hershel	1
Fidalgo, Angel M.	1
Finch, W. Holmes	1
French, Brian F.	1
Frey, Sharon L.	1
Gök, Bilge	1
Haag, Nicole	1
More ▼