ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	33

Descriptor

Simulation	36
Statistical Analysis	36
Test Bias	36
Test Items	18
Item Response Theory	16
Sample Size	13
Error of Measurement	11
Evaluation Methods	8
Comparative Analysis	7
Item Analysis	7
Computation	6
Effect Size	6
Models	6
Regression (Statistics)	6
Factor Analysis	5
Difficulty Level	4
Psychometrics	4
Scores	4
Test Length	4
Differences	3
Hypothesis Testing	3
Mathematics Tests	3
Maximum Likelihood Statistics	3
Measurement	3
Monte Carlo Methods	3
More ▼

Source

Educational and Psychological…	15
ETS Research Report Series	5
Applied Psychological…	2
International Journal of…	2
Alberta Journal of…	1
Educational Sciences: Theory…	1
Journal of Educational Issues	1
Journal of Educational…	1
Measurement and Evaluation in…	1
Numeracy	1
ProQuest LLC	1
Psicologica: International…	1
Psychometrika	1
Sociological Methods &…	1
More ▼

Publication Type

Journal Articles	33
Reports - Research	31
Reports - Evaluative	4
Dissertations/Theses -…	1
Numerical/Quantitative Data	1

Education Level

Higher Education	2
Postsecondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 4	1
High Schools	1
Intermediate Grades	1

Audience

Location

Canada	1
Florida	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Florida Comprehensive…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 36 results Save | Export

Implementing a Standardized Effect Size in the POLYSIBTEST Procedure

Peer reviewed

Direct link

Weese, James D.; Turner, Ronna C.; Liang, Xinya; Ames, Allison; Crawford, Brandon – Educational and Psychological Measurement, 2023

A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and…

Descriptors: Effect Size, Classification, Guidelines, Statistical Analysis

Probing for Bias: Comparing Populations Using Item Response Curves

Peer reviewed
PDF on ERIC

Download full text

Paul J. Walter; Edward Nuhfer; Crisel Suarez – Numeracy, 2021

We introduce an approach for making a quantitative comparison of the item response curves (IRCs) of any two populations on a multiple-choice test instrument. In this study, we employ simulated and actual data. We apply our approach to a dataset of 12,187 participants on the 25-item Science Literacy Concept Inventory (SLCI), which includes ample…

Descriptors: Item Analysis, Multiple Choice Tests, Simulation, Data Analysis

Examining Differential Item Functioning: IRT-Based Detection in the Framework of Confirmatory Factor Analysis

Peer reviewed

Direct link

Dimitrov, Dimiter M. – Measurement and Evaluation in Counseling and Development, 2017

This article offers an approach to examining differential item functioning (DIF) under its item response theory (IRT) treatment in the framework of confirmatory factor analysis (CFA). The approach is based on integrating IRT- and CFA-based testing of DIF and using bias-corrected bootstrap confidence intervals with a syntax code in Mplus.

Descriptors: Test Bias, Item Response Theory, Factor Analysis, Evaluation Methods

Type I Error Inflation in DIF Identification with Mantel-Haenszel: An Explanation and a Solution

Peer reviewed

Direct link

Magis, David; De Boeck, Paul – Educational and Psychological Measurement, 2014

It is known that sum score-based methods for the identification of differential item functioning (DIF), such as the Mantel-Haenszel (MH) approach, can be affected by Type I error inflation in the absence of any DIF effect. This may happen when the items differ in discrimination and when there is item impact. On the other hand, outlier DIF methods…

Descriptors: Test Bias, Statistical Analysis, Test Items, Simulation

How Does Polytomous Item Bias Affect Total-Group Survey Score Comparisons?

Peer reviewed

Direct link

Hidalgo, Ma Dolores; Benítez, Isabel; Padilla, Jose-Luis; Gómez-Benito, Juana – Sociological Methods & Research, 2017

The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A…

Descriptors: Test Items, Test Bias, Item Response Theory, Surveys

Item Response Theory with Covariates (IRT-C): Assessing Item Recovery and Differential Item Functioning for the Three-Parameter Logistic Model

Peer reviewed

Direct link

Tay, Louis; Huang, Qiming; Vermunt, Jeroen K. – Educational and Psychological Measurement, 2016

In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To…

Descriptors: Item Response Theory, Test Bias, Simulation, College Entrance Examinations

DIF Analysis with Multilevel Data: A Simulation Study Using the Latent Variable Approach

Peer reviewed
PDF on ERIC

Download full text

Jin, Ying; Eason, Hershel – Journal of Educational Issues, 2016

The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling…

Descriptors: Test Bias, Simulation, Hierarchical Linear Modeling, Comparative Analysis

Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

Peer reviewed

Direct link

Li, Zhushan – Journal of Educational Measurement, 2014

Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

Descriptors: Test Bias, Sample Size, Statistical Analysis, Regression (Statistics)

Psychometric Consequences of Subpopulation Item Parameter Drift

Peer reviewed

Direct link

Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2017

This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…

Descriptors: Psychometrics, Test Items, Item Response Theory, Hypothesis Testing

An Investigation of the Efficacy of Criterion Refinement Procedures in Mantel-Haenszel DIF Analysis. Research Report. ETS RR-13-16

Peer reviewed
PDF on ERIC

Download full text

Zwick, Rebecca; Ye, Lei; Isham, Steven – ETS Research Report Series, 2013

Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. Although it is often assumed that refinement of the matching criterion always provides more accurate DIF results, the actual situation proves to be more complex. To explore the effectiveness of refinement, we…

Descriptors: Test Bias, Statistical Analysis, Simulation, Educational Testing

Detecting Differential Item Functioning of Polytomous Items for an Ideal Point Response Process

Peer reviewed

Direct link

Wang, Wei; Tay, Louis; Drasgow, Fritz – Applied Psychological Measurement, 2013

There has been growing use of ideal point models to develop scales measuring important psychological constructs. For meaningful comparisons across groups, it is important to identify items on such scales that exhibit differential item functioning (DIF). In this study, the authors examined several methods for assessing DIF on polytomous items…

Descriptors: Test Bias, Effect Size, Item Response Theory, Statistical Analysis

Effect of Multiple Testing Adjustment in Differential Item Functioning Detection

Peer reviewed

Direct link

Kim, Jihye; Oshima, T. C. – Educational and Psychological Measurement, 2013

In a typical differential item functioning (DIF) analysis, a significance test is conducted for each item. As a test consists of multiple items, such multiple testing may increase the possibility of making a Type I error at least once. The goal of this study was to investigate how to control a Type I error rate and power using adjustment…

Descriptors: Test Bias, Test Items, Statistical Analysis, Error of Measurement

Comparing Performances (Type I Error and Power) of IRT Likelihood Ratio SIBTEST and Mantel-Haenszel Methods in the Determination of Differential Item Functioning

Peer reviewed
PDF on ERIC

Download full text

Atalay Kabasakal, Kübra; Arsan, Nihan; Gök, Bilge; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2014

This simulation study compared the performances (Type I error and power) of Mantel-Haenszel (MH), SIBTEST, and item response theory-likelihood ratio (IRT-LR) methods under certain conditions. Manipulated factors were sample size, ability differences between groups, test length, the percentage of differential item functioning (DIF), and underlying…

Descriptors: Comparative Analysis, Item Response Theory, Statistical Analysis, Test Bias

The Langer-Improved Wald Test for DIF Testing with Multiple Groups: Evaluation and Comparison to Two-Group IRT

Peer reviewed

Direct link

Woods, Carol M.; Cai, Li; Wang, Mian – Educational and Psychological Measurement, 2013

Differential item functioning (DIF) occurs when the probability of responding in a particular category to an item differs for members of different groups who are matched on the construct being measured. The identification of DIF is important for valid measurement. This research evaluates an improved version of Lord's X[superscript 2] Wald test for…

Descriptors: Test Bias, Item Response Theory, Computation, Comparative Analysis

Distinguishing Differential Testlet Functioning from Differential Bundle Functioning Using the Multilevel Measurement Model

Peer reviewed

Direct link

Beretvas, S. Natasha; Walker, Cindy M. – Educational and Psychological Measurement, 2012

This study extends the multilevel measurement model to handle testlet-based dependencies. A flexible two-level testlet response model (the MMMT-2 model) for dichotomous items is introduced that permits assessment of differential testlet functioning (DTLF). A distinction is made between this study's conceptualization of DTLF and that of…

Descriptors: Test Bias, Simulation, Test Items, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3

De Boeck, Paul	2
Finch, W. Holmes	2
French, Brian F.	2
Magis, David	2
Paek, Insu	2
Penfield, Randall D.	2
Tay, Louis	2
Zwick, Rebecca	2
Ali, Usama S.	1
Ames, Allison	1
Arsan, Nihan	1
Atalay Kabasakal, Kübra	1
Benítez, Isabel	1
Beretvas, S. Natasha	1
Cai, Li	1
Carvajal, Jorge	1
Chang, Hua-Hua	1
Cohen, Allan S.	1
Crawford, Brandon	1
Crisel Suarez	1
Cui, Ying	1
Dimitrov, Dimiter M.	1
Drasgow, Fritz	1
Eason, Hershel	1
Edward Nuhfer	1
More ▼