Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 24 |
Descriptor
Comparative Analysis | 34 |
Evaluation Methods | 34 |
Test Bias | 34 |
Test Items | 17 |
Foreign Countries | 7 |
Test Validity | 7 |
Educational Assessment | 6 |
Simulation | 6 |
Achievement Tests | 5 |
Mathematics Achievement | 5 |
Models | 5 |
More ▼ |
Source
Author
Cho, Sun-Joo | 2 |
Ercikan, Kadriye | 2 |
Wyse, Adam E. | 2 |
Zumbo, Bruno D. | 2 |
Abedi, Jamal | 1 |
Avila, Dolores R. | 1 |
Beland, Sebastien | 1 |
Bottge, Brian A. | 1 |
Burts, Diane C. | 1 |
Carey, Jill | 1 |
Chen, Shu-Ying | 1 |
More ▼ |
Publication Type
Journal Articles | 29 |
Reports - Research | 20 |
Reports - Evaluative | 7 |
Opinion Papers | 3 |
Reports - Descriptive | 3 |
Dissertations/Theses -… | 2 |
Guides - Non-Classroom | 1 |
Information Analyses | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Audience
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Social Security | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Russell, Michael; Szendey, Olivia; Li, Zhushan – Educational Assessment, 2022
Recent research provides evidence that an intersectional approach to defining reference and focal groups results in a higher percentage of comparisons flagged for potential DIF. The study presented here examined the generalizability of this pattern across methods for examining DIF. While the level of DIF detection differed among the four methods…
Descriptors: Comparative Analysis, Item Analysis, Test Items, Test Construction
Cho, Sun-Joo; Suh, Youngsuk; Lee, Woo-yeol – Educational Measurement: Issues and Practice, 2016
The purpose of this ITEMS module is to provide an introduction to differential item functioning (DIF) analysis using mixture item response models. The mixture item response models for DIF analysis involve comparing item profiles across latent groups, instead of manifest groups. First, an overview of DIF analysis based on latent groups, called…
Descriptors: Test Bias, Research Methodology, Evaluation Methods, Models
Wyse, Adam E. – Educational Measurement: Issues and Practice, 2017
This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…
Descriptors: Cutting Scores, Item Response Theory, Bayesian Statistics, Maximum Likelihood Statistics
Dadey, Nathan; Lyons, Susan; DePascale, Charles – Applied Measurement in Education, 2018
Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the…
Descriptors: Evaluation Methods, Computer Assisted Testing, Educational Technology, Technology Uses in Education
Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna – Journal of Educational Measurement, 2014
Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…
Descriptors: Test Bias, Models, Simulation, Error Patterns
Liu, Yan; Zumbo, Bruno D.; Gustafson, Paul; Huang, Yi; Kroc, Edward; Wu, Amery D. – Practical Assessment, Research & Evaluation, 2016
A variety of differential item functioning (DIF) methods have been proposed and used for ensuring that a test is fair to all test takers in a target population in the situations of, for example, a test being translated to other languages. However, once a method flags an item as DIF, it is difficult to conclude that the grouping variable (e.g.,…
Descriptors: Test Items, Test Bias, Probability, Scores
Reardon, Sean F.; Kalogrides, Demetra; Ho, Andrew D. – Stanford Center for Education Policy Analysis, 2017
There is no comprehensive database of U.S. district-level test scores that is comparable across states. We describe and evaluate a method for constructing such a database. First, we estimate linear, reliability-adjusted linking transformations from state test score scales to the scale of the National Assessment of Educational Progress (NAEP). We…
Descriptors: School Districts, Scores, Statistical Distributions, Database Design
Liu, Jinghua; Zu, Jiyun; Curley, Edward; Carey, Jill – ETS Research Report Series, 2014
The purpose of this study is to investigate the impact of discrete anchor items versus passage-based anchor items on observed score equating using empirical data.This study compares an "SAT"® critical reading anchor that contains more discrete items proportionally, compared to the total tests to be equated, to another anchor that…
Descriptors: Equated Scores, Test Items, College Entrance Examinations, Comparative Analysis
Sachse, Karoline A.; Roppelt, Alexander; Haag, Nicole – Journal of Educational Measurement, 2016
Trend estimation in international comparative large-scale assessments relies on measurement invariance between countries. However, cross-national differential item functioning (DIF) has been repeatedly documented. We ran a simulation study using national item parameters, which required trends to be computed separately for each country, to compare…
Descriptors: Comparative Analysis, Measurement, Test Bias, Simulation
Seybert, Jacob; Stark, Stephen – Applied Psychological Measurement, 2012
A Monte Carlo study was conducted to examine the accuracy of differential item functioning (DIF) detection using the differential functioning of items and tests (DFIT) method. Specifically, the performance of DFIT was compared using "testwide" critical values suggested by Flowers, Oshima, and Raju, based on simulations involving large numbers of…
Descriptors: Test Bias, Monte Carlo Methods, Form Classes (Languages), Simulation
Roth, Wolff-Michael; Oliveri, Maria
Elena; Sandilands, Debra Dallie; Lyons-Thomas, Juliette; Ercikan, Kadriye – International Journal of Science Education, 2013
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias.…
Descriptors: Test Bias, Evaluation Methods, Protocol Analysis, Science Achievement
Ferjencík, Ján; Slavkovská, Miriam; Kresila, Juraj – Journal of Pedagogy, 2015
The paper reports on the adaptation of a D-KEFS test battery for Slovakia. Drawing on concrete examples, it describes and illustrates the key issues relating to the transfer of test items from one socio-cultural environment to another. The standardisation sample of the population of Slovak pupils in the fourth year of primary school included 250…
Descriptors: Executive Function, Foreign Countries, Test Construction, Test Items
Cho, Sun-Joo; Bottge, Brian A.; Cohen, Allan S.; Kim, Seock-Ho – Journal of Special Education, 2011
Current methods for detecting growth of students' problem-solving skills in math focus mainly on analyzing changes in test scores. Score-level analysis, however, may fail to reflect subtle changes that might be evident at the item level. This article demonstrates a method for studying item-level changes using data from a multiwave experiment with…
Descriptors: Test Bias, Group Membership, Mathematics Skills, Ability
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul – International Journal of Testing, 2011
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Descriptors: Language Skills, Identification, Foreign Countries, Evaluation Methods
Kim, Do-Hong; Lambert, Richard G.; Burts, Diane C. – Early Education and Development, 2013
Research Findings: This study examined the measurement equivalence of the "Teaching Strategies GOLD[R]" assessment system across subgroups of children based on their primary language and disability status. This study is based on teacher-collected assessment data for 3-, 4-, and 5-year-old children for the fall of 2010, winter of 2010, and spring…
Descriptors: English Language Learners, Teaching Methods, Educational Strategies, Special Needs Students