ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	12

Descriptor

Comparative Analysis	16
Simulation	12
Item Response Theory	11
Test Items	10
Computer Simulation	4
Difficulty Level	4
Accuracy	3
Adaptive Testing	3
Bayesian Statistics	3
Computation	3
Computer Assisted Testing	3
Educational Assessment	3
Error of Measurement	3
Evaluation Criteria	3
Evaluation Methods	3
Models	3
Nonparametric Statistics	3
Sample Size	3
Scaling	3
Ability	2
Achievement Tests	2
Classification	2
Goodness of Fit	2
Guessing (Tests)	2
Item Analysis	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	16
Reports - Research	13
Reports - Evaluative	4

Education Level

Elementary Secondary Education	1
Kindergarten	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data

Peer reviewed

Direct link

Finch, Holmes – Applied Measurement in Education, 2022

Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…

Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

IRT Item Parameter Scaling for Developing New Item Pools

Peer reviewed

Direct link

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items

Are the Nonparametric Person-Fit Statistics More Powerful than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)

Peer reviewed

Direct link

Sinharay, Sandip – Applied Measurement in Education, 2017

Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…

Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Peer reviewed

Direct link

Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

Descriptors: Item Response Theory, Test Items, Bias, Computation

Centering, Scale Indeterminacy, and Differential Item Functioning Detection in Hierarchical Generalized Linear and Generalized Linear Mixed Models

Peer reviewed

Direct link

Cheong, Yuk Fai; Kamata, Akihito – Applied Measurement in Education, 2013

In this article, we discuss and illustrate two centering and anchoring options available in differential item functioning (DIF) detection studies based on the hierarchical generalized linear and generalized linear mixed modeling frameworks. We compared and contrasted the assumptions of the two options, and examined the properties of their DIF…

Descriptors: Test Bias, Hierarchical Linear Modeling, Comparative Analysis, Test Items

Parameter Recovery and Classification Accuracy under Conditions of Testlet Dependency: A Comparison of the Traditional 2PL, Testlet, and Bi-Factor Models

Peer reviewed

Direct link

Koziol, Natalie A. – Applied Measurement in Education, 2016

Testlets, or groups of related items, are commonly included in educational assessments due to their many logistical and conceptual advantages. Despite their advantages, testlets introduce complications into the theory and practice of educational measurement. Responses to items within a testlet tend to be correlated even after controlling for…

Descriptors: Classification, Accuracy, Comparative Analysis, Models

A Comparison of IRT Linking Procedures

Peer reviewed

Direct link

Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010

Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…

Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques

Vertical Scaling with the Rasch Model Utilizing Default and Tight Convergence Settings with WINSTEPS and BILOG-MG

Peer reviewed

Direct link

Custer, Michael; Omar, Md Hafidz; Pomplun, Mark – Applied Measurement in Education, 2006

This study compared vertical scaling results for the Rasch model from BILOG-MG and WINSTEPS. The item and ability parameters for the simulated vocabulary tests were scaled across 11 grades; kindergarten through 10th. Data were based on real data and were simulated under normal and skewed distribution assumptions. WINSTEPS and BILOG-MG were each…

Descriptors: Models, Scaling, Computer Software, Vocabulary

A Simulation Comparison of Parametric and Nonparametric Dimensionality Detection Procedures

Peer reviewed

Direct link

Mroch, Andrew A.; Bolt, Daniel M. – Applied Measurement in Education, 2006

Recently, nonparametric methods have been proposed that provide a dimensionally based description of test structure for tests with dichotomous items. Because such methods are based on different notions of dimensionality than are assumed when using a psychometric model, it remains unclear whether these procedures might lead to a different…

Descriptors: Simulation, Comparative Analysis, Psychometrics, Methods Research

A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests.

Peer reviewed

Kingsbury, G. Gage; Zara, Anthony R. – Applied Measurement in Education, 1991

This simulation investigated two procedures that reduce differences between paper-and-pencil testing and computerized adaptive testing (CAT) by making CAT content sensitive. Results indicate that the price in terms of additional test items of using constrained CAT for content balancing is much smaller than that of using testlets. (SLD)

Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Computer Simulation

Applying Bayesian Item Selection Approaches to Adaptive Tests Using Polytomous Items

Peer reviewed

Direct link

Penfield, Randall D. – Applied Measurement in Education, 2006

This study applied the maximum expected information (MEI) and the maximum posterior-weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability…

Descriptors: Bayesian Statistics, Adaptive Testing, Computer Assisted Testing, Test Items

The Utility of a Modified One-Parameter IRT Model with Small Samples.

Peer reviewed

Barnes, Laura L. B.; Wise, Steven L. – Applied Measurement in Education, 1991

One-parameter and three-parameter item response theory (IRT) model estimates were compared with estimates obtained from two modified one-parameter models that incorporated a constant nonzero guessing parameter. Using small-sample simulation data (50, 100, and 200 simulated examinees), modified 1-parameter models were most effective in estimating…

Descriptors: Ability, Achievement Tests, Comparative Analysis, Computer Simulation

The Effects of Purification of the Matching Criterion on the Identification of DIF Using the Mantel-Haenszel Procedure.

Peer reviewed

Clauser, Brian; And Others – Applied Measurement in Education, 1993

The usefulness of a two-step version of the Mantel Haenszel procedure for distinguishing between differential item functioning (DIF) and item impact was studied by comparing the single-step and two-step procedures using a simulated data set. Results show changes in the identification rate for the two-step methods. (SLD)

Descriptors: Comparative Analysis, Evaluation Methods, Identification, Item Bias

Previous Page | Next Page »

Pages: 1 | 2

Lee, Won-Chan	2
Abulela, Mohammed A. A.	1
Ban, Jae-Chun	1
Barnes, Laura L. B.	1
Bolt, Daniel M.	1
Chang, Hua-Hua	1
Cheong, Yuk Fai	1
Cho, Sun-Joo	1
Clauser, Brian	1
Custer, Michael	1
De Ayala, R. J.	1
Finch, Holmes	1
Kamata, Akihito	1
Kang, Hyeon-Ah	1
Kim, Stella Yun	1
Kingsbury, G. Gage	1
Koziol, Natalie A.	1
Lee, Wooyeol	1
Lu, Ying	1
Mroch, Andrew A.	1
Omar, Md Hafidz	1
Penfield, Randall D.	1
Pomplun, Mark	1
Rios, Joseph A.	1
More ▼