ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	12

Descriptor

Test Bias	19
Test Items	19
Testing Problems	5
Item Analysis	4
Minority Groups	4
Standards	4
Test Construction	4
Educational Assessment	3
Equated Scores	3
Evaluation Methods	3
Foreign Countries	3
High Stakes Tests	3
Psychometrics	3
Achievement Tests	2
College Entrance Examinations	2
Comparative Testing	2
Difficulty Level	2
International Assessment	2
Item Response Theory	2
Scores	2
Simulation	2
Statistical Analysis	2
Test Interpretation	2
Test Theory	2
Test Validity	2
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	19
Reports - Research	8
Reports - Evaluative	5
Opinion Papers	4
Reports - Descriptive	3
Information Analyses	1

Education Level

Secondary Education	3
Elementary Secondary Education	2
Elementary Education	1
Grade 10	1
Grade 3	1
Grade 5	1
High Schools	1
Higher Education	1
Postsecondary Education	1

Audience

Location

United States	2
Canada	1
Massachusetts	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	2
ACT Assessment	1
Graduate Record Examinations	1
National Teacher Examinations	1
Preliminary Scholastic…	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

The Multidimensionality of Measurement Bias in High-Stakes Testing: Using Machine Learning to Evaluate Complex Sources of Differential Item Functioning

Peer reviewed

Direct link

Belzak, William C. M. – Educational Measurement: Issues and Practice, 2023

Test developers and psychometricians have historically examined measurement bias and differential item functioning (DIF) across a single categorical variable (e.g., gender), independently of other variables (e.g., race, age, etc.). This is problematic when more complex forms of measurement bias may adversely affect test responses and, ultimately,…

Descriptors: Test Bias, High Stakes Tests, Artificial Intelligence, Test Items

Adjusting for Ability Differences of Equating Samples When Randomization Is Suboptimal

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael E. – Educational Measurement: Issues and Practice, 2022

Test equating requires collecting data to link the scores from different forms of a test. Problems arise when equating samples are not equivalent and the test forms to be linked share no common items by which to measure or adjust for the group nonequivalence. Using data from five operational test forms, we created five pairs of research forms for…

Descriptors: Ability, Tests, Equated Scores, Testing Problems

Assessing the Impact of a Test Question: Evidence from the "Underground Railroad" Controversy

Peer reviewed

Direct link

Dee, Thomas S.; Domingue, Benjamin W. – Educational Measurement: Issues and Practice, 2021

On the second day of a 2019 high-stakes English Language Arts assessment, Massachusetts 10th graders faced an essay question that was based on a passage from the novel "The Underground Railroad" and publicly characterized as racially insensitive. Though the state excluded the essay responses from student scores, an unresolved public…

Descriptors: High School Students, Grade 10, Language Arts, High Stakes Tests

Towards an Integrated Framework of Bias in Noncognitive Assessment in International Large-Scale Studies: Challenges and Prospects

Peer reviewed

Direct link

Vijver, Fons J. R. – Educational Measurement: Issues and Practice, 2018

A conceptual framework of measurement bias in cross-cultural comparisons, distinguishing between construct, method, and item bias (differential item functioning), is used to describe a methodological framework addressing assessment of noncognitive variables in international large-scale studies. It is argued that the treatment of bias, coming from…

Descriptors: Educational Assessment, Achievement Tests, Foreign Countries, International Assessment

Measuring Widening Proficiency Differences in International Assessments: Are Current Approaches Enough?

Peer reviewed

Direct link

Rutkowski, David; Rutkowski, Leslie; Liaw, Yuan-Ling – Educational Measurement: Issues and Practice, 2018

Participation in international large-scale assessments has grown over time with the largest, the Programme for International Student Assessment (PISA), including more than 70 education systems that are economically and educationally diverse. To help accommodate for large achievement differences among participants, in 2009 PISA offered…

Descriptors: Educational Assessment, Foreign Countries, Achievement Tests, Secondary School Students

Five Methods for Estimating Angoff Cut Scores with IRT

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017

This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…

Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics

A Synthesis of the Peer-Reviewed Differential Bundle Functioning Research

Peer reviewed

Direct link

Banks, Kathleen – Educational Measurement: Issues and Practice, 2013

The purpose of this article was to present a synthesis of the peer-reviewed differential bundle functioning (DBF) research that has been conducted to date. A total of 16 studies were synthesized according to the following characteristics: tests used and learner groups, organizing principles used for developing bundles, DBF detection methods used,…

Descriptors: Test Bias, Research, Tests, Student Characteristics

The Contestant Perspective on Taking Tests: Emanations from the Statue within

Peer reviewed

Direct link

Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012

Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…

Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability

First Language of Test Takers and Fairness Assessment Procedures

Peer reviewed

Direct link

Sinharay, Sandip; Dorans, Neil J.; Liang, Longjuan – Educational Measurement: Issues and Practice, 2011

Over the past few decades, those who take tests in the United States have exhibited increasing diversity with respect to native language. Standard psychometric procedures for ensuring item and test fairness that have existed for some time were developed when test-taking groups were predominantly native English speakers. A better understanding of…

Descriptors: Test Bias, Testing Programs, Psychometrics, Language Proficiency

An NCME Instructional Module on Using Differential Step Functioning to Refine the Analysis of DIF in Polytomous Items

Peer reviewed

Direct link

Penfield, Randall D.; Gattamorta, Karina; Childs, Ruth A. – Educational Measurement: Issues and Practice, 2009

Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item-level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has…

Descriptors: Test Bias, Test Items, Evaluation Methods, Scores

Raju's Differential Functioning of Items and Tests (DFIT)

Peer reviewed

Direct link

Oshima, T. C.; Morris, S. B. – Educational Measurement: Issues and Practice, 2008

Nambury S. Raju (1937-2005) developed two model-based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures (Raju, 1988) and Raju's DFIT (Raju, van der Linden, & Fleer, 1995), are based on quantifying the gap between item characteristic functions (ICFs). This approach…

Descriptors: Test Bias, Psychometrics, Methods, Test Items

Differentials of a State Reading Assessment: Item Functioning, Distractor Functioning, and Omission Frequency for Disability Categories

Peer reviewed

Direct link

Kato, Kentaro; Moen, Ross E.; Thurlow, Martha L. – Educational Measurement: Issues and Practice, 2009

Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional…

Descriptors: Learning Disabilities, Language Impairments, Behavior Disorders, Affective Behavior

Screening for Potentially Biased Items in Testing Programs.

Peer reviewed

Hills, John R. – Educational Measurement: Issues and Practice, 1989

Test bias detection methods based on item response theory (IRT) are reviewed. Five such methods are commonly used: (1) equality of item parameters; (2) area between item characteristic curves; (3) sums of squares; (4) pseudo-IRT; and (5) one-parameter-IRT. A table compares these and six newer or less tested methods. (SLD)

Descriptors: Item Analysis, Test Bias, Test Items, Testing Programs

The Golden Rule Bias Reduction Principle: A Practical Reform.

Peer reviewed

Weiss, John – Educational Measurement: Issues and Practice, 1987

Differences in test scores can be attributed to various causes, including genuine knowledge differences, test-taking abilities, and irrelevant and biased questions. The Golden Rule reform is a safeguard to ensure that standardized tests measure relevant knowledge differences between test takers and not irrelevant, culturally specific factors. (JAZ)

Descriptors: Culture Fair Tests, Minority Groups, Standardized Tests, Standards

Using Dimensionality-Based DIF Analyses to Identify and Interpret Constructs That Elicit Group Differences

Peer reviewed

Direct link

Gierl, Mark J. – Educational Measurement: Issues and Practice, 2005

In this paper I describe and illustrate the Roussos-Stout (1996) multidimensionality-based DIF analysis paradigm, with emphasis on its implication for the selection of a matching and studied subtest for DIF analyses. Standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a…

Descriptors: Models, Test Items, Test Bias, Statistical Analysis

Previous Page | Next Page »

Pages: 1 | 2

Dorans, Neil J.	2
Armstrong, Anne-Marie	1
Banks, Kathleen	1
Belzak, William C. M.	1
Bond, Lloyd	1
Childs, Ruth A.	1
Dee, Thomas S.	1
Domingue, Benjamin W.	1
Drasgow, Fritz	1
Gattamorta, Karina	1
Gierl, Mark J.	1
Hills, John R.	1
Jaeger, Richard M.	1
Kato, Kentaro	1
Kim, Sooyeon	1
Liang, Longjuan	1
Liaw, Yuan-Ling	1
Linn, Robert L.	1
Moen, Ross E.	1
Morris, S. B.	1
Oshima, T. C.	1
Penfield, Randall D.	1
Rutkowski, David	1
Rutkowski, Leslie	1
Sinharay, Sandip	1
More ▼