ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	27

Descriptor

Test Bias	49
Test Items	19
Test Construction	13
Testing Problems	13
Test Validity	10
Standards	9
Test Use	9
Court Litigation	8
Evaluation Methods	7
Item Response Theory	7
Minority Groups	7
Psychometrics	7
Scores	7
Educational Assessment	6
Test Reliability	6
Equated Scores	5
Foreign Countries	5
Licensing Examinations…	5
College Entrance Examinations	4
Comparative Testing	4
Computer Assisted Testing	4
Equal Education	4
High Stakes Tests	4
Item Analysis	4
Measurement	4
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	49
Reports - Research	17
Opinion Papers	13
Reports - Evaluative	12
Reports - Descriptive	10
Information Analyses	3

Education Level

Secondary Education	5
Elementary Secondary Education	2
Grade 10	2
High Schools	2
Elementary Education	1
Grade 3	1
Grade 5	1
Higher Education	1
Postsecondary Education	1

Audience

Location

United States	3
Canada	2
Israel	1
Massachusetts	1

Laws, Policies, & Programs

Civil Rights Act 1964 Title…	1
Debra P v Turlington	1

Assessments and Surveys

Program for International…	2
SAT (College Admission Test)	2
ACT Assessment	1
Graduate Record Examinations	1
National Assessment of…	1
National Teacher Examinations	1
Preliminary Scholastic…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 49 results Save | Export

The Multidimensionality of Measurement Bias in High-Stakes Testing: Using Machine Learning to Evaluate Complex Sources of Differential Item Functioning

Peer reviewed

Direct link

Belzak, William C. M. – Educational Measurement: Issues and Practice, 2023

Test developers and psychometricians have historically examined measurement bias and differential item functioning (DIF) across a single categorical variable (e.g., gender), independently of other variables (e.g., race, age, etc.). This is problematic when more complex forms of measurement bias may adversely affect test responses and, ultimately,…

Descriptors: Test Bias, High Stakes Tests, Artificial Intelligence, Test Items

Measurement Invariance for Multilingual Learners Using Item Response and Response Time in PISA 2018

Peer reviewed

Direct link

Jung Yeon Park; Sean Joo; Zikun Li; Hyejin Yoon – Educational Measurement: Issues and Practice, 2025

This study examines potential assessment bias based on students' primary language status in PISA 2018. Specifically, multilingual (MLs) and nonmultilingual (non-MLs) students in the United States are compared with regard to their response time as well as scored responses across three cognitive domains (reading, mathematics, and science).…

Descriptors: Achievement Tests, Secondary School Students, International Assessment, Test Bias

Adjusting for Ability Differences of Equating Samples When Randomization Is Suboptimal

Peer reviewed

Direct link

Kim, Sooyeon; Walker, Michael E. – Educational Measurement: Issues and Practice, 2022

Test equating requires collecting data to link the scores from different forms of a test. Problems arise when equating samples are not equivalent and the test forms to be linked share no common items by which to measure or adjust for the group nonequivalence. Using data from five operational test forms, we created five pairs of research forms for…

Descriptors: Ability, Tests, Equated Scores, Testing Problems

An Ecological Framework for Item Responding within the Context of a Youth Risk and Needs Assessment

Peer reviewed

Direct link

Vo, Thao T.; French, Brian F. – Educational Measurement: Issues and Practice, 2021

The use and interpretation of educational and psychological test scores are paramount to individual outcomes and opportunities. Methods for detecting differential item functioning (DIF) are imperative for item analysis when developing and revising assessments, particularly as it pertains to fairness across populations, languages, and cultures. We…

Descriptors: Risk Assessment, Needs Assessment, Test Bias, Youth

Item Response Theory Models for Polytomous Multidimensional Forced-Choice Items to Measure Construct Differentiation

Peer reviewed

Direct link

Xuelan Qiu; Jimmy de la Torre; You-Gan Wang; Jinran Wu – Educational Measurement: Issues and Practice, 2024

Multidimensional forced-choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed,…

Descriptors: Item Response Theory, Personality Traits, Personality Measures, Personality Assessment

Reframing Research and Assessment Practices: Advancing an Antiracist and Anti-Ableist Research Agenda

Peer reviewed

Direct link

Angela Johnson; Elizabeth Barker; Marcos Viveros Cespedes – Educational Measurement: Issues and Practice, 2024

Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy…

Descriptors: Equal Education, Inclusion, Evaluation Methods, Error of Measurement

Digital Module 12: Think-Aloud Interviews and Cognitive Labs https://ncme.elevate.commpartners.com

Peer reviewed

Direct link

Leighton, Jacqueline P.; Lehman, Blair – Educational Measurement: Issues and Practice, 2020

In this digital ITEMS module, Dr. Jacqueline Leighton and Dr. Blair Lehman review differences between think-aloud interviews to measure problem-solving processes and cognitive labs to measure comprehension processes. Learners are introduced to historical, theoretical, and procedural differences between these methods and how to use and analyze…

Descriptors: Protocol Analysis, Interviews, Problem Solving, Cognitive Processes

The Invariance Paradox: Using Optimal Test Design to Minimize Bias

Peer reviewed

Direct link

Jones, Andrew T.; Kopp, Jason P.; Ong, Thai Q. – Educational Measurement: Issues and Practice, 2020

Studies investigating invariance have often been limited to measurement or prediction invariance. Selection invariance, wherein the use of test scores for classification results in equivalent classification accuracy between groups, has received comparatively little attention in the psychometric literature. Previous research suggests that some form…

Descriptors: Test Construction, Test Bias, Classification, Accuracy

Assessing the Impact of a Test Question: Evidence from the "Underground Railroad" Controversy

Peer reviewed

Direct link

Dee, Thomas S.; Domingue, Benjamin W. – Educational Measurement: Issues and Practice, 2021

On the second day of a 2019 high-stakes English Language Arts assessment, Massachusetts 10th graders faced an essay question that was based on a passage from the novel "The Underground Railroad" and publicly characterized as racially insensitive. Though the state excluded the essay responses from student scores, an unresolved public…

Descriptors: High School Students, Grade 10, Language Arts, High Stakes Tests

Towards an Integrated Framework of Bias in Noncognitive Assessment in International Large-Scale Studies: Challenges and Prospects

Peer reviewed

Direct link

Vijver, Fons J. R. – Educational Measurement: Issues and Practice, 2018

A conceptual framework of measurement bias in cross-cultural comparisons, distinguishing between construct, method, and item bias (differential item functioning), is used to describe a methodological framework addressing assessment of noncognitive variables in international large-scale studies. It is argued that the treatment of bias, coming from…

Descriptors: Educational Assessment, Achievement Tests, Foreign Countries, International Assessment

An NCME Instructional Module on Latent DIF Analysis Using Mixture Item Response Models

Peer reviewed

Direct link

Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol – Educational Measurement: Issues and Practice, 2016

The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…

Descriptors: Test Bias, Research Methodology, Evaluation Methods, Models

Measuring Widening Proficiency Differences in International Assessments: Are Current Approaches Enough?

Peer reviewed

Direct link

Rutkowski, David; Rutkowski, Leslie; Liaw, Yuan-Ling – Educational Measurement: Issues and Practice, 2018

Participation in international large-scale assessments has grown over time with the largest, the Programme for International Student Assessment (PISA), including more than 70 education systems that are economically and educationally diverse. To help accommodate for large achievement differences among participants, in 2009 PISA offered…

Descriptors: Educational Assessment, Foreign Countries, Achievement Tests, Secondary School Students

Five Methods for Estimating Angoff Cut Scores with IRT

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017

This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…

Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics

The Accuracy of Aggregate Student Growth Percentiles as Indicators of Educator Performance

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F. – Educational Measurement: Issues and Practice, 2017

Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Value Added Models, Achievement Gains

An NCME Instructional Module on Population Invariance in Linking and Equating

Peer reviewed

Direct link

Huggins, Anne C.; Penfield, Randall D. – Educational Measurement: Issues and Practice, 2012

A goal for any linking or equating of two or more tests is that the linking function be invariant to the population used in conducting the linking or equating. Violations of population invariance in linking and equating jeopardize the fairness and validity of test scores, and pose particular problems for test-based accountability programs that…

Descriptors: Equated Scores, Tests, Test Bias, Validity

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Bond, Lloyd	2
Dorans, Neil J.	2
Penfield, Randall D.	2
Rooney, J. Patrick	2
Angela Johnson	1
Anrig, Gregory R.	1
Arim, Rubab	1
Armstrong, Anne-Marie	1
Banks, Kathleen	1
Beller, Michal	1
Belzak, William C. M.	1
Buhr, Dianne C.	1
Cardenas, Jose	1
Castellano, Katherine E.	1
Childs, Ruth A.	1
Cho, Sun-Joo	1
Dee, Thomas S.	1
Dolan, Conor V.	1
Domene, Jose	1
Domingue, Benjamin W.	1
Drasgow, Fritz	1
Elizabeth Barker	1
Ercikan, Kadriye	1
Faggen, Jane	1
French, Brian F.	1
More ▼