Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 10 |
Descriptor
Test Bias | 10 |
Foreign Countries | 6 |
Comparative Analysis | 5 |
Item Response Theory | 5 |
Mathematics Tests | 4 |
Test Items | 4 |
Psychometrics | 3 |
Difficulty Level | 2 |
Gender Bias | 2 |
High Stakes Tests | 2 |
Language | 2 |
More ▼ |
Source
International Journal of… | 10 |
Author
Sireci, Stephen G. | 2 |
Allalouf, Avi | 1 |
Berberoglu, Giray | 1 |
Brown, Richard S. | 1 |
Chulu, Bob Wajizigha | 1 |
Hauger, Jeffrey B. | 1 |
Lamprianou, Iasonas | 1 |
Le, Luc T. | 1 |
Mapuranga, Raymond | 1 |
Okabe, Junko | 1 |
Rapp, Joel | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Evaluative | 10 |
Education Level
Secondary Education | 2 |
Elementary Education | 1 |
Grade 8 | 1 |
High Schools | 1 |
Higher Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 3 |
What Works Clearinghouse Rating
Wiberg, Marie – International Journal of Testing, 2009
The aim of this study was to examine log linear modelling (LLM) compared with logistic regression (LR) and Mantel-Haenszel (MH) test for detecting Differential Item Functioning (DIF) in a mastery test. The three methods were chosen because they have similar components. The results showed fairly high matching percentages together with high…
Descriptors: Test Bias, Mastery Tests, Comparative Analysis, Regression (Statistics)
Chulu, Bob Wajizigha; Sireci, Stephen G. – International Journal of Testing, 2011
Many examination agencies, policy makers, media houses, and the public at large make high-stakes decisions based on test scores. Unfortunately, in some cases educational tests are not statistically equated to account for test differences over time, which leads to inappropriate interpretations of students' performance. In this study we illustrate…
Descriptors: Classification, Foreign Countries, Item Response Theory, High Stakes Tests
Le, Luc T. – International Journal of Testing, 2009
This study uses PISA cycle 3 field trial data to investigate the relationships between gender differential item functioning (DIF) across countries and test languages for science items and their formats and the four other dimensions defined in PISA framework: focus, context, competency, and scientific knowledge. The data used were collected from 60…
Descriptors: Test Bias, Gender Bias, Science Tests, Test Items
Wyse, Adam E.; Mapuranga, Raymond – International Journal of Testing, 2009
Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
Descriptors: Test Bias, Evaluation Methods, Test Items, Educational Assessment
Yildirim, Huseyin Husnu; Berberoglu, Giray – International Journal of Testing, 2009
Comparisons of human characteristics across different language groups and cultures become more important in today's educational assessment practices as evidenced by the increasing interest in international comparative studies. Within this context, the fairness of the results across different language and cultural groups draws the attention of…
Descriptors: Test Bias, Cross Cultural Studies, Comparative Analysis, Factor Analysis
Hauger, Jeffrey B.; Sireci, Stephen G. – International Journal of Testing, 2008
In this study, we examined the presence of differential item functioning (DIF) among groups of students who were tested in their native language or in a different language when participating in the 1999 Trends in International Mathematics and Science Study. Data from 18,837 examinees from three countries (Singapore, United States, and Iran) were…
Descriptors: Test Bias, Language Dominance, Second Languages, Language Proficiency
Lamprianou, Iasonas – International Journal of Testing, 2008
This study investigates the effect of reporting the unadjusted raw scores in a high-stakes language exam when raters differ significantly in severity and self-selected questions differ significantly in difficulty. More sophisticated models, introducing meaningful facets and parameters, are successively used to investigate the characteristics of…
Descriptors: High Stakes Tests, Raw Scores, Item Response Theory, Language Tests
Allalouf, Avi; Rapp, Joel; Stoller, Reuven – International Journal of Testing, 2009
When a test is adapted from a source language (SL) into a target language (TL), the two forms are usually not psychometrically equivalent. If linking between test forms is necessary, those items that have had their psychometric characteristics altered by the translation (differential item functioning [DIF] items) should be eliminated from the…
Descriptors: Test Items, Test Format, Verbal Tests, Psychometrics
Brown, Richard S.; Villarreal, Julio C. – International Journal of Testing, 2007
There has been considerable research regarding the extent to which psychometric sound assessments sometimes yield individual score estimates that are inconsistent with the response patterns of the individual. It has been suggested that individual response patterns may differ from expectations for a number of reasons, including subject motivation,…
Descriptors: Psychometrics, Test Bias, Testing, Simulation
Ross, Steven J.; Okabe, Junko – International Journal of Testing, 2006
Test validity is predicated on there being a lack of bias in tasks, items, or test content. It is well-known that factors such as test candidates' mother tongue, life experiences, and socialization practices of the wider community may serve to inject subtle interactions between individuals' background and the test content. When the gender of the…
Descriptors: Gender Bias, Language Tests, Test Validity, Reading Comprehension