Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 23 |
Descriptor
Comparative Analysis | 31 |
Difficulty Level | 31 |
Test Bias | 31 |
Test Items | 26 |
Item Analysis | 9 |
Item Response Theory | 9 |
Foreign Countries | 8 |
Scores | 7 |
Statistical Analysis | 6 |
Latent Trait Theory | 5 |
Mathematics Tests | 5 |
More ▼ |
Source
Author
Kamata, Akihito | 2 |
Magis, David | 2 |
Ahn, Soyeon | 1 |
Ali, Usama | 1 |
Atar, Burcu | 1 |
Babiar, Tasha Calvert | 1 |
Baghaei, Purya | 1 |
Batty, Aaron Olaf | 1 |
Bauer, Daniel | 1 |
Beretvas, S. Natasha | 1 |
Brown, Ted | 1 |
More ▼ |
Publication Type
Journal Articles | 25 |
Reports - Research | 25 |
Reports - Evaluative | 4 |
Reports - Descriptive | 2 |
Numerical/Quantitative Data | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Elementary Education | 5 |
Grade 4 | 3 |
Higher Education | 3 |
Postsecondary Education | 3 |
Secondary Education | 3 |
Grade 5 | 2 |
Grade 8 | 2 |
Intermediate Grades | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Early Childhood Education | 1 |
More ▼ |
Audience
Location
Germany | 2 |
United States | 2 |
Austria | 1 |
Belgium | 1 |
Colombia | 1 |
Denmark | 1 |
Iran | 1 |
Japan | 1 |
Luxembourg | 1 |
Norway | 1 |
Spain | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Trends in International… | 3 |
Progress in International… | 2 |
SAT (College Admission Test) | 2 |
Graduate Record Examinations | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Kuang, Huan; Sahin, Fusun – Large-scale Assessments in Education, 2023
Background: Examinees may not make enough effort when responding to test items if the assessment has no consequence for them. These disengaged responses can be problematic in low-stakes, large-scale assessments because they can bias item parameter estimates. However, the amount of bias, and whether this bias is similar across administrations, is…
Descriptors: Test Items, Comparative Analysis, Mathematics Tests, Reaction Time
Bundsgaard, Jeppe – Large-scale Assessments in Education, 2019
International large-scale assessments like international computer and information literacy study (ICILS) (Fraillon et al. in International Association for the Evaluation of Educational Achievement (IEA), 2015) provide important empirically-based knowledge through the proficiency scales, of what characterizes tasks at different difficulty levels,…
Descriptors: Test Bias, International Assessment, Test Items, Difficulty Level
Matlock, Ki Lynn; Turner, Ronna – Educational and Psychological Measurement, 2016
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Descriptors: Item Response Theory, Computation, Test Items, Difficulty Level
Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018
Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…
Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests
Ozdemir, Burhanettin – International Journal of Progressive Education, 2017
The purpose of this study is to equate Trends in International Mathematics and Science Study (TIMSS) mathematics subtest scores obtained from TIMSS 2011 to scores obtained from TIMSS 2007 form with different nonlinear observed score equating methods under Non-Equivalent Anchor Test (NEAT) design where common items are used to link two or more test…
Descriptors: Achievement Tests, Elementary Secondary Education, Foreign Countries, International Assessment
Facon, Bruno; Magis, David – Journal of Speech, Language, and Hearing Research, 2016
Purpose: An item analysis of Bishop's (1983) Test for Reception of Grammar (TROG) in its French version (F-TROG; Lecocq, 1996) was conducted to determine whether the difficulty of items is similar for participants with or without intellectual disability (ID). Method: In Study 1, responses to the 92 F-TROG items by 55 participants with Down…
Descriptors: Item Analysis, Grammar, Children, Adolescents
Frick, Hannah; Strobl, Carolin; Zeileis, Achim – Educational and Psychological Measurement, 2015
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch…
Descriptors: Item Response Theory, Test Bias, Comparative Analysis, Scores
Baghaei, Purya; Ravand, Hamdollah – SAGE Open, 2019
In many reading comprehension tests, different test formats are employed. Two commonly used test formats to measure reading comprehension are sustained passages followed by some questions and cloze items. Individual differences in handling test format peculiarities could constitute a source of score variance. In this study, a bifactor Rasch model…
Descriptors: Cloze Procedure, Test Bias, Individual Differences, Difficulty Level
Cheong, Yuk Fai; Kamata, Akihito – Applied Measurement in Education, 2013
In this article, we discuss and illustrate two centering and anchoring options available in differential item functioning (DIF) detection studies based on the hierarchical generalized linear and generalized linear mixed modeling frameworks. We compared and contrasted the assumptions of the two options, and examined the properties of their DIF…
Descriptors: Test Bias, Hierarchical Linear Modeling, Comparative Analysis, Test Items
Pohl, Steffi – Journal of Educational Measurement, 2013
This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for…
Descriptors: Adaptive Testing, Longitudinal Studies, Difficulty Level, Comparative Analysis
Liu, Junhui; Brown, Terran; Chen, Jianshen; Ali, Usama; Hou, Likun; Costanzo, Kate – Partnership for Assessment of Readiness for College and Careers, 2016
The Partnership for Assessment of Readiness for College and Careers (PARCC) is a state-led consortium working to develop next-generation assessments that more accurately, compared to previous assessments, measure student progress toward college and career readiness. The PARCC assessments include both English Language Arts/Literacy (ELA/L) and…
Descriptors: Testing, Achievement Tests, Test Items, Test Bias
Jin, Ying; Myers, Nicholas D.; Ahn, Soyeon; Penfield, Randall D. – Educational and Psychological Measurement, 2013
The Rasch model, a member of a larger group of models within item response theory, is widely used in empirical studies. Detection of uniform differential item functioning (DIF) within the Rasch model typically employs null hypothesis testing with a concomitant consideration of effect size (e.g., signed area [SA]). Parametric equivalence between…
Descriptors: Test Bias, Effect Size, Item Response Theory, Comparative Analysis
Demars, Christine E. – Applied Measurement in Education, 2011
Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…
Descriptors: Effect Size, Test Bias, Probability, Difficulty Level
Batty, Aaron Olaf – Language Testing, 2015
The rise in the affordability of quality video production equipment has resulted in increased interest in video-mediated tests of foreign language listening comprehension. Although research on such tests has continued fairly steadily since the early 1980s, studies have relied on analyses of raw scores, despite the growing prevalence of item…
Descriptors: Listening Comprehension Tests, Comparative Analysis, Video Technology, Audio Equipment
Beretvas, S. Natasha; Cawthon, Stephanie W.; Lockhart, L. Leland; Kaye, Alyssa D. – Educational and Psychological Measurement, 2012
This pedagogical article is intended to explain the similarities and differences between the parameterizations of two multilevel measurement model (MMM) frameworks. The conventional two-level MMM that includes item indicators and models item scores (Level 1) clustered within examinees (Level 2) and the two-level cross-classified MMM (in which item…
Descriptors: Test Bias, Comparative Analysis, Test Items, Difficulty Level