Publication Date
In 2025 | 0 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 19 |
Since 2006 (last 20 years) | 52 |
Descriptor
Test Bias | 125 |
Test Items | 45 |
Models | 24 |
Item Response Theory | 22 |
Item Analysis | 20 |
Simulation | 18 |
Comparative Analysis | 17 |
Evaluation Methods | 17 |
Scores | 16 |
Higher Education | 15 |
Test Validity | 15 |
More ▼ |
Source
Journal of Educational… | 125 |
Author
Linn, Robert L. | 4 |
Novick, Melvin R. | 4 |
Penfield, Randall D. | 4 |
Bolt, Daniel M. | 3 |
Goldman, Roy D. | 3 |
Kim, Sooyeon | 3 |
Camilli, Gregory | 2 |
Chase, Clinton I. | 2 |
Darlington, Richard B. | 2 |
Dorans, Neil J. | 2 |
Finch, W. Holmes | 2 |
More ▼ |
Publication Type
Education Level
Secondary Education | 3 |
Elementary Secondary Education | 2 |
Higher Education | 2 |
Postsecondary Education | 2 |
Grade 4 | 1 |
Grade 8 | 1 |
Audience
Researchers | 3 |
Laws, Policies, & Programs
Defunis v Odegaard | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024
Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…
Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory
Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024
This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…
Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction
Chalmers, R. Philip – Journal of Educational Measurement, 2023
Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and…
Descriptors: Test Bias, Item Response Theory, Definitions, Monte Carlo Methods
Corinne Huggins-Manley; Anthony W. Raborn; Peggy K. Jones; Ted Myers – Journal of Educational Measurement, 2024
The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the…
Descriptors: Nonparametric Statistics, Test Bias, Scores, Statistical Significance
Bolt, Daniel M.; Liao, Xiangyi – Journal of Educational Measurement, 2021
We revisit the empirically observed positive correlation between DIF and difficulty studied by Freedle and commonly seen in tests of verbal proficiency when comparing populations of different mean latent proficiency levels. It is shown that a positive correlation between DIF and difficulty estimates is actually an expected result (absent any true…
Descriptors: Test Bias, Difficulty Level, Correlation, Verbal Tests
Lanrong Li; Betsy Jane Becker – Journal of Educational Measurement, 2021
Differential bundle functioning (DBF) has been proposed to quantify the accumulated amount of differential item functioning (DIF) in an item cluster/bundle (Douglas, Roussos, and Stout). The simultaneous item bias test (SIBTEST, Shealy and Stout) has been used to test for DBF (e.g., Walker, Zhang, and Surber). Research on DBF may have the…
Descriptors: Test Bias, Test Items, Meta Analysis, Effect Size
Huelmann, Thorben; Debelak, Rudolf; Strobl, Carolin – Journal of Educational Measurement, 2020
This study addresses the topic of how anchoring methods for differential item functioning (DIF) analysis can be used in multigroup scenarios. The direct approach would be to combine anchoring methods developed for two-group scenarios with multigroup DIF-detection methods. Alternatively, multiple tests could be carried out. The results of these…
Descriptors: Test Items, Test Bias, Equated Scores, Item Analysis
Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024
For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…
Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory
A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022
The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…
Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence
Lee, Sunbok – Journal of Educational Measurement, 2020
In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML…
Descriptors: Regression (Statistics), Test Bias, Maximum Likelihood Statistics, Simulation
Wolkowitz, Amanda A.; Wright, Keith D. – Journal of Educational Measurement, 2019
This article explores the amount of equating error at a passing score when equating scores from exams with small samples sizes. This article focuses on equating using classical test theory methods of Tucker linear, Levine linear, frequency estimation, and chained equipercentile equating. Both simulation and real data studies were used in the…
Descriptors: Error Patterns, Sample Size, Test Theory, Test Bias
Shear, Benjamin R. – Journal of Educational Measurement, 2018
When contextual features of test-taking environments differentially affect item responding for different test takers and these features vary across test administrations, they may cause differential item functioning (DIF) that varies across test administrations. Because many common DIF detection methods ignore potential DIF variance, this article…
Descriptors: Test Bias, Regression (Statistics), Hierarchical Linear Modeling
Zwick, Rebecca; Ye, Lei; Isham, Steven – Journal of Educational Measurement, 2018
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…
Descriptors: Test Bias, Testing, Test Items, Bayesian Statistics
Ip, Edward H.; Strachan, Tyler; Fu, Yanyan; Lay, Alexandra; Willse, John T.; Chen, Shyh-Huei; Rutkowski, Leslie; Ackerman, Terry – Journal of Educational Measurement, 2019
Test items must often be broad in scope to be ecologically valid. It is therefore almost inevitable that secondary dimensions are introduced into a test during test development. A cognitive test may require one or more abilities besides the primary ability to correctly respond to an item, in which case a unidimensional test score overestimates the…
Descriptors: Test Items, Test Bias, Test Construction, Scores
Wind, Stefanie A. – Journal of Educational Measurement, 2019
Numerous researchers have proposed methods for evaluating the quality of rater-mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many-facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On…
Descriptors: Nonparametric Statistics, Test Validity, Test Reliability, Item Response Theory