ERIC - Search Results

Publication Date

In 2025	0
Since 2024	4
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	19
Since 2006 (last 20 years)	52

Descriptor

Test Bias	125
Test Items	45
Models	24
Item Response Theory	22
Item Analysis	20
Simulation	18
Comparative Analysis	17
Evaluation Methods	17
Scores	16
Higher Education	15
Test Validity	15
Test Construction	14
Statistical Analysis	13
Predictive Validity	12
Test Reliability	11
Difficulty Level	10
Testing Problems	10
Achievement Tests	9
Culture Fair Tests	9
Multiple Choice Tests	9
Predictive Measurement	9
Racial Differences	9
Sample Size	9
Selection	9
Sex Differences	9
More ▼

Source

Journal of Educational…

125

Publication Type

Journal Articles	101
Reports - Research	71
Reports - Evaluative	18
Reports - Descriptive	9
Speeches/Meeting Papers	3
Information Analyses	2
Book/Product Reviews	1
Guides - Non-Classroom	1
Opinion Papers	1
Reports - General	1

Education Level

Secondary Education	3
Elementary Secondary Education	2
Higher Education	2
Postsecondary Education	2
Grade 4	1
Grade 8	1

Audience

Researchers

Location

Belgium	1
Ireland	1
Israel	1
Netherlands	1
Turkey	1
United Kingdom (England)	1

Laws, Policies, & Programs

Defunis v Odegaard

What Works Clearinghouse Rating

Showing 1 to 15 of 125 results Save | Export

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Peer reviewed

Direct link

Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024

Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…

Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework

Peer reviewed

Direct link

Hwanggyu Lim; Danqi Zhu; Edison M. Choe; Kyung T. Han – Journal of Educational Measurement, 2024

This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of…

Descriptors: Item Response Theory, Test Bias, Test Reliability, Test Construction

A Unified Comparison of IRT-Based Effect Sizes for DIF Investigations

Peer reviewed

Direct link

Chalmers, R. Philip – Journal of Educational Measurement, 2023

Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and…

Descriptors: Test Bias, Item Response Theory, Definitions, Monte Carlo Methods

A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables

Peer reviewed

Direct link

Corinne Huggins-Manley; Anthony W. Raborn; Peggy K. Jones; Ted Myers – Journal of Educational Measurement, 2024

The purpose of this study is to develop a nonparametric DIF method that (a) compares focal groups directly to the composite group that will be used to develop the reported test score scale, and (b) allows practitioners to explore for DIF related to focal groups stemming from multicategorical variables that constitute a small proportion of the…

Descriptors: Nonparametric Statistics, Test Bias, Scores, Statistical Significance

On the Positive Correlation between DIF and Difficulty: A New Theory on the Correlation as Methodological Artifact

Peer reviewed

Direct link

Bolt, Daniel M.; Liao, Xiangyi – Journal of Educational Measurement, 2021

We revisit the empirically observed positive correlation between DIF and difficulty studied by Freedle and commonly seen in tests of verbal proficiency when comparing populations of different mean latent proficiency levels. It is shown that a positive correlation between DIF and difficulty estimates is actually an expected result (absent any true…

Descriptors: Test Bias, Difficulty Level, Correlation, Verbal Tests

Assessing Differential Bundle Functioning Using Meta-Analysis

Peer reviewed

Direct link

Lanrong Li; Betsy Jane Becker – Journal of Educational Measurement, 2021

Differential bundle functioning (DBF) has been proposed to quantify the accumulated amount of differential item functioning (DIF) in an item cluster/bundle (Douglas, Roussos, and Stout). The simultaneous item bias test (SIBTEST, Shealy and Stout) has been used to test for DBF (e.g., Walker, Zhang, and Surber). Research on DBF may have the…

Descriptors: Test Bias, Test Items, Meta Analysis, Effect Size

A Comparison of Aggregation Rules for Selecting Anchor Items in Multigroup DIF Analysis

Peer reviewed

Direct link

Huelmann, Thorben; Debelak, Rudolf; Strobl, Carolin – Journal of Educational Measurement, 2020

This study addresses the topic of how anchoring methods for differential item functioning (DIF) analysis can be used in multigroup scenarios. The direct approach would be to combine anchoring methods developed for two-group scenarios with multigroup DIF-detection methods. Alternatively, multiple tests could be carried out. The results of these…

Descriptors: Test Items, Test Bias, Equated Scores, Item Analysis

DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

Peer reviewed

Direct link

Carmen Köhler; Lale Khorramdel; Artur Pokropek; Johannes Hartig – Journal of Educational Measurement, 2024

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The…

Descriptors: Measures (Individuals), Test Bias, Models, Item Response Theory

Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments

Peer reviewed

Direct link

A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022

The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…

Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence

Logistic Regression Procedure Using Penalized Maximum Likelihood Estimation for Differential Item Functioning

Peer reviewed

Direct link

Lee, Sunbok – Journal of Educational Measurement, 2020

In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML…

Descriptors: Regression (Statistics), Test Bias, Maximum Likelihood Statistics, Simulation

Effectiveness of Equating at the Passing Score for Exams with Small Sample Sizes

Peer reviewed

Direct link

Wolkowitz, Amanda A.; Wright, Keith D. – Journal of Educational Measurement, 2019

This article explores the amount of equating error at a passing score when equating scores from exams with small samples sizes. This article focuses on equating using classical test theory methods of Tucker linear, Levine linear, frequency estimation, and chained equipercentile equating. Both simulation and real data studies were used in the…

Descriptors: Error Patterns, Sample Size, Test Theory, Test Bias

Using Hierarchical Logistic Regression to Study DIF and DIF Variance in Multilevel Data

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2018

When contextual features of test-taking environments differentially affect item responding for different test takers and these features vary across test administrations, they may cause differential item functioning (DIF) that varies across test administrations. Because many common DIF detection methods ignore potential DIF variance, this article…

Descriptors: Test Bias, Regression (Statistics), Hierarchical Linear Modeling

Aggregating Polytomous DIF Results over Multiple Test Administrations

Peer reviewed

Direct link

Zwick, Rebecca; Ye, Lei; Isham, Steven – Journal of Educational Measurement, 2018

In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…

Descriptors: Test Bias, Testing, Test Items, Bayesian Statistics

Bias and Bias Correction Method for Nonproportional Abilities Requirement (NPAR) Tests

Peer reviewed

Direct link

Ip, Edward H.; Strachan, Tyler; Fu, Yanyan; Lay, Alexandra; Willse, John T.; Chen, Shyh-Huei; Rutkowski, Leslie; Ackerman, Terry – Journal of Educational Measurement, 2019

Test items must often be broad in scope to be ecologically valid. It is therefore almost inevitable that secondary dimensions are introduced into a test during test development. A cognitive test may require one or more abilities besides the primary ability to correctly respond to an item, in which case a unidimensional test score overestimates the…

Descriptors: Test Items, Test Bias, Test Construction, Scores

Nonparametric Evidence of Validity, Reliability, and Fairness for Rater-Mediated Assessments: An Illustration Using Mokken Scale Analysis

Peer reviewed

Direct link

Wind, Stefanie A. – Journal of Educational Measurement, 2019

Numerous researchers have proposed methods for evaluating the quality of rater-mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many-facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On…

Descriptors: Nonparametric Statistics, Test Validity, Test Reliability, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Linn, Robert L.	4
Novick, Melvin R.	4
Penfield, Randall D.	4
Bolt, Daniel M.	3
Goldman, Roy D.	3
Kim, Sooyeon	3
Camilli, Gregory	2
Chase, Clinton I.	2
Darlington, Richard B.	2
Dorans, Neil J.	2
Finch, W. Holmes	2
French, Brian F.	2
Hewitt, Barbara Newlin	2
Ironson, Gail H.	2
McHale, Frederick	2
Oshima, T. C.	2
Petersen, Nancy S.	2
Prowker, Adam	2
Rutkowski, Leslie	2
Subkoviak, Michael J.	2
Suh, Youngsuk	2
Walker, Michael E.	2
de la Torre, Jimmy	2
A. Corinne Huggins-Manley	1
More ▼

SAT (College Admission Test)	7
Graduate Record Examinations	3
Program for International…	3
Metropolitan Readiness Tests	2
Armed Services Vocational…	1
California Achievement Tests	1
Cattell Culture Fair…	1
General Aptitude Test Battery	1
Kaufman Assessment Battery…	1
Law School Admission Test	1
Mathematics Anxiety Rating…	1
McCarthy Scales of Childrens…	1
Metropolitan Achievement Tests	1
National Assessment of…	1
National Teacher Examinations	1
Peabody Picture Vocabulary…	1
Preschool Inventory	1
Slosson Intelligence Test	1
Stanford Achievement Tests	1
Stanford Binet Intelligence…	1
State Trait Anxiety Inventory	1
System of Multicultural…	1
Trends in International…	1
Wechsler Adult Intelligence…	1
Wechsler Intelligence Scale…	1
More ▼