ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	19

Descriptor

Comparative Analysis	19
Computation	19
Test Bias	19
Test Items	9
Item Response Theory	8
Models	8
Statistical Analysis	8
Accuracy	6
Sample Size	5
Simulation	5
Difficulty Level	4
Maximum Likelihood Statistics	4
Monte Carlo Methods	4
Scores	4
Tests	4
Correlation	3
Error of Measurement	3
Evaluation Methods	3
Foreign Countries	3
Goodness of Fit	3
Regression (Statistics)	3
Educational Assessment	2
International Assessment	2
Markov Processes	2
Probability	2
More ▼

Source

Educational and Psychological…	6
ETS Research Report Series	3
International Journal of…	3
Journal of Educational…	2
Applied Measurement in…	1
Educational Measurement:…	1
Educational Testing Service	1
International Journal of…	1
Journal of Educational and…	1

Publication Type

Journal Articles	18
Reports - Research	14
Reports - Evaluative	4
Reports - Descriptive	1

Education Level

Elementary Secondary Education	1
Higher Education	1
Secondary Education	1

Audience

Location

Canada

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	2
Progress in International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Detecting Differential Item Functioning: Item Response Theory Methods versus the Mantel-Haenszel Procedure

Peer reviewed
PDF on ERIC

Download full text

Diaz, Emily; Brooks, Gordon; Johanson, George – International Journal of Assessment Tools in Education, 2021

This Monte Carlo study assessed Type I error in differential item functioning analyses using Lord's chi-square (LC), Likelihood Ratio Test (LRT), and Mantel-Haenszel (MH) procedure. Two research interests were investigated: item response theory (IRT) model specification in LC and the LRT and continuity correction in the MH procedure. This study…

Descriptors: Test Bias, Item Response Theory, Statistical Analysis, Comparative Analysis

Tree-Based Global Model Tests for Polytomous Rasch Models

Peer reviewed

Direct link

Komboz, Basil; Strobl, Carolin; Zeileis, Achim – Educational and Psychological Measurement, 2018

Psychometric measurement models are only valid if measurement invariance holds between test takers of different groups. Global model tests, such as the well-established likelihood ratio (LR) test, are sensitive to violations of measurement invariance, such as differential item functioning and differential step functioning. However, these…

Descriptors: Item Response Theory, Models, Tests, Measurement

Standard Errors for National Trends in International Large-Scale Assessments in the Case of Cross-National Differential Item Functioning

Peer reviewed

Direct link

Sachse, Karoline A.; Haag, Nicole – Applied Measurement in Education, 2017

Standard errors computed according to the operational practices of international large-scale assessment studies such as the Programme for International Student Assessment's (PISA) or the Trends in International Mathematics and Science Study (TIMSS) may be biased when cross-national differential item functioning (DIF) and item parameter drift are…

Descriptors: Error of Measurement, Test Bias, International Assessment, Computation

An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

Peer reviewed

Direct link

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol – Educational Measurement: Issues and Practice, 2016

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…

Descriptors: Test Bias, Research Methodology, Evaluation Methods, Models

Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

Peer reviewed

Direct link

Matlock, Ki Lynn; Turner, Ronna – Educational and Psychological Measurement, 2016

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…

Descriptors: Item Response Theory, Computation, Test Items, Difficulty Level

An Empirical Investigation of the Potential Impact of Item Misfit on Test Scores. Research Report. ETS RR-17-60

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Robin, Frederic – ETS Research Report Series, 2017

In this study, we examined the potential impact of item misfit on the reported scores of an admission test from the subpopulation invariance perspective. The target population of the test consisted of 3 major subgroups with different geographic regions. We used the logistic regression function to estimate item parameters of the operational items…

Descriptors: Scores, Test Items, Test Bias, International Assessment

The Impact of Model Parameterization and Estimation Methods on Tests of Measurement Invariance with Ordered Polytomous Data

Peer reviewed
PDF on ERIC

Download full text

Direct link

Koziol, Natalie A.; Bovaird, James A. – Educational and Psychological Measurement, 2018

Evaluations of measurement invariance provide essential construct validity evidence--a prerequisite for seeking meaning in psychological and educational research and ensuring fair testing procedures in high-stakes settings. However, the quality of such evidence is partly dependent on the validity of the resulting statistical conclusions. Type I or…

Descriptors: Computation, Tests, Error of Measurement, Comparative Analysis

Rasch Mixture Models for DIF Detection: A Comparison of Old and New Score Specifications

Peer reviewed

Direct link

Frick, Hannah; Strobl, Carolin; Zeileis, Achim – Educational and Psychological Measurement, 2015

Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch…

Descriptors: Item Response Theory, Test Bias, Comparative Analysis, Scores

Longitudinal Multistage Testing

Peer reviewed

Direct link

Pohl, Steffi – Journal of Educational Measurement, 2013

This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for…

Descriptors: Adaptive Testing, Longitudinal Studies, Difficulty Level, Comparative Analysis

The Langer-Improved Wald Test for DIF Testing with Multiple Groups: Evaluation and Comparison to Two-Group IRT

Peer reviewed

Direct link

Woods, Carol M.; Cai, Li; Wang, Mian – Educational and Psychological Measurement, 2013

Differential item functioning (DIF) occurs when the probability of responding in a particular category to an item differs for members of different groups who are matched on the construct being measured. The identification of DIF is important for valid measurement. This research evaluates an improved version of Lord's X[superscript 2] Wald test for…

Descriptors: Test Bias, Item Response Theory, Computation, Comparative Analysis

Toward Increasing Fairness in Score Scale Calibrations Employed in International Large-Scale Assessments

Peer reviewed

Direct link

Oliveri, Maria Elena; von Davier, Matthias – International Journal of Testing, 2014

In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale assessments. Our approach seeks to improve fairness in scoring international large-scale assessments, which often…

Descriptors: Test Bias, Scores, International Programs, Educational Assessment

A Comparison of Strategies for Estimating Conditional DIF

Peer reviewed

Direct link

Moses, Tim; Miao, Jing; Dorans, Neil J. – Journal of Educational and Behavioral Statistics, 2010

In this study, the accuracies of four strategies were compared for estimating conditional differential item functioning (DIF), including raw data, logistic regression, log-linear models, and kernel smoothing. Real data simulations were used to evaluate the estimation strategies across six items, DIF and No DIF situations, and four sample size…

Descriptors: Test Bias, Statistical Analysis, Computation, Comparative Analysis

The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

Peer reviewed
PDF on ERIC

Download full text

Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

Descriptors: Test Format, Test Items, Responses, Computation

Estimation Methods for One-Parameter Testlet Models

Peer reviewed

Direct link

Jiao, Hong; Wang, Shudong; He, Wei – Journal of Educational Measurement, 2013

This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…

Descriptors: Computation, Item Response Theory, Models, Monte Carlo Methods

A Generalized Logistic Regression Procedure to Detect Differential Item Functioning among Multiple Groups

Peer reviewed

Direct link

Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul – International Journal of Testing, 2011

We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…

Descriptors: Language Skills, Identification, Foreign Countries, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2

Dorans, Neil J.	2
Miao, Jing	2
Moses, Tim	2
Strobl, Carolin	2
Zeileis, Achim	2
Beland, Sebastien	1
Bovaird, James A.	1
Brooks, Gordon	1
Cai, Li	1
Cho, Sun-Joo	1
Diaz, Emily	1
Dorans, Neil	1
Frick, Hannah	1
Gerard, Paul	1
Haag, Nicole	1
He, Wei	1
Jiao, Hong	1
Johanson, George	1
Kim, Sooyeon	1
Komboz, Basil	1
Koziol, Natalie A.	1
Lee, Woo-yeol	1
Magis, David	1
Mapuranga, Raymond	1
Matlock, Ki Lynn	1
More ▼