ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	23

Descriptor

Comparative Analysis	31
Test Items	31
Item Response Theory	15
Test Bias	10
Simulation	8
Mathematics Tests	6
Accuracy	5
Difficulty Level	5
Evaluation Methods	5
Item Analysis	5
Monte Carlo Methods	5
Bayesian Statistics	4
Computation	4
Computer Assisted Testing	4
Error of Measurement	4
Item Bias	4
Statistical Analysis	4
Adaptive Testing	3
Classification	3
Educational Assessment	3
Educational Testing	3
Evaluation Criteria	3
Foreign Countries	3
Maximum Likelihood Statistics	3
Methods	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	31
Reports - Research	25
Reports - Evaluative	7
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	2
Grade 5	2
Grade 8	2
Higher Education	2
Postsecondary Education	1
Secondary Education	1

Audience

Location

California	1
Canada	1
Florida	1
Massachusetts	1
New York	1
North Carolina	1
Oklahoma	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	2
ACT Assessment	1
Graduate Record Examinations	1
SAT (College Admission Test)	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

The Impact of Three Factors on the Recovery of Item Parameters for the Three-Parameter Logistic Model

Peer reviewed

Direct link

Kim, Kyung Yong; Lee, Won-Chan – Applied Measurement in Education, 2017

This article provides a detailed description of three factors (specification of the ability distribution, numerical integration, and frame of reference for the item parameter estimates) that might affect the item parameter estimation of the three-parameter logistic model, and compares five item calibration methods, which are combinations of the…

Descriptors: Test Items, Item Response Theory, Comparative Analysis, Methods

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

A Comparison of Estimation Techniques for IRT Models with Small Samples

Peer reviewed

Direct link

Finch, Holmes; French, Brian F. – Applied Measurement in Education, 2019

The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact…

Descriptors: Item Response Theory, Accuracy, Test Items, Difficulty Level

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

IRT Item Parameter Scaling for Developing New Item Pools

Peer reviewed

Direct link

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Peer reviewed

Direct link

Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

Descriptors: Item Response Theory, Test Items, Bias, Computation

The Comparability of Scores from Different Digital Devices: A Literature Review and Synthesis with Recommendations for Practice

Peer reviewed

Direct link

Dadey, Nathan; Lyons, Susan; DePascale, Charles – Applied Measurement in Education, 2018

Evidence of comparability is generally needed whenever there are variations in the conditions of an assessment administration, including variations introduced by the administration of an assessment on multiple digital devices (e.g., tablet, laptop, desktop). This article is meant to provide a comprehensive examination of issues relevant to the…

Descriptors: Evaluation Methods, Computer Assisted Testing, Educational Technology, Technology Uses in Education

An Empirical Comparison of DDF Detection Methods for Understanding the Causes of DIF in Multiple-Choice Items

Peer reviewed

Direct link

Suh, Youngsuk; Talley, Anna E. – Applied Measurement in Education, 2015

This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…

Descriptors: Test Bias, Multiple Choice Tests, Test Items, Methods

An Exploratory Analysis of Differential Item Functioning and Its Possible Sources in a Higher Education Admissions Context

Peer reviewed

Direct link

Oliveri, Maria Elena; Lawless, Rene; Robin, Frederic; Bridgeman, Brent – Applied Measurement in Education, 2018

We analyzed a pool of items from an admissions test for differential item functioning (DIF) for groups based on age, socioeconomic status, citizenship, or English language status using Mantel-Haenszel and item response theory. DIF items were systematically examined to identify its possible sources by item type, content, and wording. DIF was…

Descriptors: Test Bias, Comparative Analysis, Item Banks, Item Response Theory

Centering, Scale Indeterminacy, and Differential Item Functioning Detection in Hierarchical Generalized Linear and Generalized Linear Mixed Models

Peer reviewed

Direct link

Cheong, Yuk Fai; Kamata, Akihito – Applied Measurement in Education, 2013

In this article, we discuss and illustrate two centering and anchoring options available in differential item functioning (DIF) detection studies based on the hierarchical generalized linear and generalized linear mixed modeling frameworks. We compared and contrasted the assumptions of the two options, and examined the properties of their DIF…

Descriptors: Test Bias, Hierarchical Linear Modeling, Comparative Analysis, Test Items

Parameter Recovery and Classification Accuracy under Conditions of Testlet Dependency: A Comparison of the Traditional 2PL, Testlet, and Bi-Factor Models

Peer reviewed

Direct link

Koziol, Natalie A. – Applied Measurement in Education, 2016

Testlets, or groups of related items, are commonly included in educational assessments due to their many logistical and conceptual advantages. Despite their advantages, testlets introduce complications into the theory and practice of educational measurement. Responses to items within a testlet tend to be correlated even after controlling for…

Descriptors: Classification, Accuracy, Comparative Analysis, Models

An Analytic Comparison of Effect Sizes for Differential Item Functioning

Peer reviewed

Direct link

Demars, Christine E. – Applied Measurement in Education, 2011

Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…

Descriptors: Effect Size, Test Bias, Probability, Difficulty Level

Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test

Peer reviewed

Direct link

Ho, Tsung-Han; Dodd, Barbara G. – Applied Measurement in Education, 2012

In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Previous Page | Next Page »

Pages: 1 | 2 | 3

Lee, Won-Chan	3
Ercikan, Kadriye	2
Finch, Holmes	2
Penfield, Randall D.	2
Abulela, Mohammed A. A.	1
Baldwin, Su	1
Ban, Jae-Chun	1
Banks, Kathleen	1
Boyer, Michelle	1
Bridgeman, Brent	1
Carlton, Sydell T.	1
Chang, Hua-Hua	1
Cheong, Yuk Fai	1
Cho, Sun-Joo	1
Clauser, Brian	1
Cohen, Allan S.	1
Dadey, Nathan	1
DeMars, Christine E.	1
DePascale, Charles	1
Demars, Christine E.	1
Dodd, Barbara G.	1
French, Brian F.	1
Gattamorta, Karina A.	1
Gierl, Mark J.	1
More ▼