Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 4 |
Descriptor
Source
Applied Measurement in… | 6 |
Author
Norcini, John | 2 |
Abulela, Mohammed A. A. | 1 |
Anderson, Daniel | 1 |
Kahn, Joshua D. | 1 |
Kim, Seonghoon | 1 |
Kolen, Michael J. | 1 |
Lee, Yoonsun | 1 |
Rios, Joseph A. | 1 |
Shea, Judy | 1 |
Taylor, Catherine S. | 1 |
Tindal, Gerald | 1 |
More ▼ |
Publication Type
Journal Articles | 6 |
Reports - Research | 5 |
Reports - Evaluative | 1 |
Education Level
Grade 7 | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Grade 10 | 1 |
Grade 4 | 1 |
Grade 6 | 1 |
Grade 8 | 1 |
Intermediate Grades | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Anderson, Daniel; Kahn, Joshua D.; Tindal, Gerald – Applied Measurement in Education, 2017
Unidimensionality and local independence are two common assumptions of item response theory. The former implies that all items measure a common latent trait, while the latter implies that responses are independent, conditional on respondents' location on the latent trait. Yet, few tests are truly unidimensional. Unmodeled dimensions may result in…
Descriptors: Robustness (Statistics), Item Response Theory, Mathematics Tests, Grade 6
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis
Kim, Seonghoon; Kolen, Michael J. – Applied Measurement in Education, 2006
Four item response theory linking methods (2 moment methods and 2 characteristic curve methods) were compared to concurrent (CO) calibration with the focus on the degree of robustness to format effects (FEs) when applying the methods to multidimensional data that reflected the FEs associated with mixed-format tests. Based on the quantification of…
Descriptors: Item Response Theory, Robustness (Statistics), Test Format, Comparative Analysis

Norcini, John; Shea, Judy – Applied Measurement in Education, 1992
Two studies involving a total of 99 experts examined the reproducibility of standards for 2 medical certifying examinations set under different conditions. Together, results of both studies provide evidence that a modified version of the Angoff method is quite reliable and produces stable results under varying conditions. (SLD)
Descriptors: Academic Standards, Evaluators, Groups, Higher Education

Norcini, John; And Others – Applied Measurement in Education, 1994
Whether anchor item sets varying in difficulty and discrimination affect precision of cutting score equivalents generated through judge rescaling as much as equivalents from score equating was studied with 4 groups of experts and 250 and 1,000 examinees. Results indicate the robustness of judge rescaling and its superiority over equating. (SLD)
Descriptors: Cutting Scores, Decision Making, Difficulty Level, Equated Scores