Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 3 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 12 |
Descriptor
Test Bias | 19 |
Test Items | 19 |
Testing Problems | 5 |
Item Analysis | 4 |
Minority Groups | 4 |
Standards | 4 |
Test Construction | 4 |
Educational Assessment | 3 |
Equated Scores | 3 |
Evaluation Methods | 3 |
Foreign Countries | 3 |
More ▼ |
Source
Educational Measurement:… | 19 |
Author
Dorans, Neil J. | 2 |
Armstrong, Anne-Marie | 1 |
Banks, Kathleen | 1 |
Belzak, William C. M. | 1 |
Bond, Lloyd | 1 |
Childs, Ruth A. | 1 |
Dee, Thomas S. | 1 |
Domingue, Benjamin W. | 1 |
Drasgow, Fritz | 1 |
Gattamorta, Karina | 1 |
Gierl, Mark J. | 1 |
More ▼ |
Publication Type
Journal Articles | 19 |
Reports - Research | 8 |
Reports - Evaluative | 5 |
Opinion Papers | 4 |
Reports - Descriptive | 3 |
Information Analyses | 1 |
Education Level
Secondary Education | 3 |
Elementary Secondary Education | 2 |
Elementary Education | 1 |
Grade 10 | 1 |
Grade 3 | 1 |
Grade 5 | 1 |
High Schools | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Location
United States | 2 |
Canada | 1 |
Massachusetts | 1 |
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 2 |
ACT Assessment | 1 |
Graduate Record Examinations | 1 |
National Teacher Examinations | 1 |
Preliminary Scholastic… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Belzak, William C. M. – Educational Measurement: Issues and Practice, 2023
Test developers and psychometricians have historically examined measurement bias and differential item functioning (DIF) across a single categorical variable (e.g., gender), independently of other variables (e.g., race, age, etc.). This is problematic when more complex forms of measurement bias may adversely affect test responses and, ultimately,…
Descriptors: Test Bias, High Stakes Tests, Artificial Intelligence, Test Items
Kim, Sooyeon; Walker, Michael E. – Educational Measurement: Issues and Practice, 2022
Test equating requires collecting data to link the scores from different forms of a test. Problems arise when equating samples are not equivalent and the test forms to be linked share no common items by which to measure or adjust for the group nonequivalence. Using data from five operational test forms, we created five pairs of research forms for…
Descriptors: Ability, Tests, Equated Scores, Testing Problems
Dee, Thomas S.; Domingue, Benjamin W. – Educational Measurement: Issues and Practice, 2021
On the second day of a 2019 high-stakes English Language Arts assessment, Massachusetts 10th graders faced an essay question that was based on a passage from the novel "The Underground Railroad" and publicly characterized as racially insensitive. Though the state excluded the essay responses from student scores, an unresolved public…
Descriptors: High School Students, Grade 10, Language Arts, High Stakes Tests
Vijver, Fons J. R. – Educational Measurement: Issues and Practice, 2018
A conceptual framework of measurement bias in cross-cultural comparisons, distinguishing between construct, method, and item bias (differential item functioning), is used to describe a methodological framework addressing assessment of noncognitive variables in international large-scale studies. It is argued that the treatment of bias, coming from…
Descriptors: Educational Assessment, Achievement Tests, Foreign Countries, International Assessment
Rutkowski, David; Rutkowski, Leslie; Liaw, Yuan-Ling – Educational Measurement: Issues and Practice, 2018
Participation in international large-scale assessments has grown over time with the largest, the Programme for International Student Assessment (PISA), including more than 70 education systems that are economically and educationally diverse. To help accommodate for large achievement differences among participants, in 2009 PISA offered…
Descriptors: Educational Assessment, Foreign Countries, Achievement Tests, Secondary School Students
Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017
This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…
Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics
Banks, Kathleen – Educational Measurement: Issues and Practice, 2013
The purpose of this article was to present a synthesis of the peer-reviewed differential bundle functioning (DBF) research that has been conducted to date. A total of 16 studies were synthesized according to the following characteristics: tests used and learner groups, organizing principles used for developing bundles, DBF detection methods used,…
Descriptors: Test Bias, Research, Tests, Student Characteristics
Dorans, Neil J. – Educational Measurement: Issues and Practice, 2012
Views on testing--its purpose and uses and how its data are analyzed--are related to one's perspective on test takers. Test takers can be viewed as learners, examinees, or contestants. I briefly discuss the perspective of test takers as learners. I maintain that much of psychometrics views test takers as examinees. I discuss test takers as a…
Descriptors: Testing, Test Theory, Item Response Theory, Test Reliability
Sinharay, Sandip; Dorans, Neil J.; Liang, Longjuan – Educational Measurement: Issues and Practice, 2011
Over the past few decades, those who take tests in the United States have exhibited increasing diversity with respect to native language. Standard psychometric procedures for ensuring item and test fairness that have existed for some time were developed when test-taking groups were predominantly native English speakers. A better understanding of…
Descriptors: Test Bias, Testing Programs, Psychometrics, Language Proficiency
Penfield, Randall D.; Gattamorta, Karina; Childs, Ruth A. – Educational Measurement: Issues and Practice, 2009
Traditional methods for examining differential item functioning (DIF) in polytomously scored test items yield a single item-level index of DIF and thus provide no information concerning which score levels are implicated in the DIF effect. To address this limitation of DIF methodology, the framework of differential step functioning (DSF) has…
Descriptors: Test Bias, Test Items, Evaluation Methods, Scores
Oshima, T. C.; Morris, S. B. – Educational Measurement: Issues and Practice, 2008
Nambury S. Raju (1937-2005) developed two model-based indices for differential item functioning (DIF) during his prolific career in psychometrics. Both methods, Raju's area measures (Raju, 1988) and Raju's DFIT (Raju, van der Linden, & Fleer, 1995), are based on quantifying the gap between item characteristic functions (ICFs). This approach…
Descriptors: Test Bias, Psychometrics, Methods, Test Items
Kato, Kentaro; Moen, Ross E.; Thurlow, Martha L. – Educational Measurement: Issues and Practice, 2009
Large data sets from a state reading assessment for third and fifth graders were analyzed to examine differential item functioning (DIF), differential distractor functioning (DDF), and differential omission frequency (DOF) between students with particular categories of disabilities (speech/language impairments, learning disabilities, and emotional…
Descriptors: Learning Disabilities, Language Impairments, Behavior Disorders, Affective Behavior

Hills, John R. – Educational Measurement: Issues and Practice, 1989
Test bias detection methods based on item response theory (IRT) are reviewed. Five such methods are commonly used: (1) equality of item parameters; (2) area between item characteristic curves; (3) sums of squares; (4) pseudo-IRT; and (5) one-parameter-IRT. A table compares these and six newer or less tested methods. (SLD)
Descriptors: Item Analysis, Test Bias, Test Items, Testing Programs

Weiss, John – Educational Measurement: Issues and Practice, 1987
Differences in test scores can be attributed to various causes, including genuine knowledge differences, test-taking abilities, and irrelevant and biased questions. The Golden Rule reform is a safeguard to ensure that standardized tests measure relevant knowledge differences between test takers and not irrelevant, culturally specific factors. (JAZ)
Descriptors: Culture Fair Tests, Minority Groups, Standardized Tests, Standards
Gierl, Mark J. – Educational Measurement: Issues and Practice, 2005
In this paper I describe and illustrate the Roussos-Stout (1996) multidimensionality-based DIF analysis paradigm, with emphasis on its implication for the selection of a matching and studied subtest for DIF analyses. Standard DIF practice encourages an exploratory search for matching subtest items based on purely statistical criteria, such as a…
Descriptors: Models, Test Items, Test Bias, Statistical Analysis
Previous Page | Next Page ยป
Pages: 1 | 2