ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	17

Descriptor

Simulation	28
Test Items	28
Item Response Theory	18
Comparative Analysis	8
Evaluation Methods	8
Test Bias	8
Difficulty Level	6
Accuracy	5
Classification	5
Error of Measurement	5
Monte Carlo Methods	5
Scores	5
Psychometrics	4
Sample Size	4
Adaptive Testing	3
Computation	3
Computer Assisted Testing	3
Educational Assessment	3
Equated Scores	3
Evaluation Criteria	3
Item Bias	3
Nonparametric Statistics	3
Regression (Statistics)	3
Scaling	3
Student Evaluation	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	28
Reports - Research	19
Reports - Evaluative	9

Education Level

Early Childhood Education	1
Elementary Education	1
Grade 1	1
Grade 2	1
Grade 3	1
Primary Education	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Program for International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

The Impact of Non-Effortful Responding on Item and Person Parameters in Item-Pool Scaling Linking

Peer reviewed

Direct link

Yue Liu; Zhen Li; Hongyun Liu; Xiaofeng You – Applied Measurement in Education, 2024

Low test-taking effort of examinees has been considered a source of construct-irrelevant variance in item response modeling, leading to serious consequences on parameter estimation. This study aims to investigate how non-effortful response (NER) influences the estimation of item and person parameters in item-pool scale linking (IPSL) and whether…

Descriptors: Item Response Theory, Computation, Simulation, Responses

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning

Peer reviewed

Direct link

Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024

Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…

Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

IRT Item Parameter Scaling for Developing New Item Pools

Peer reviewed

Direct link

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Peer reviewed

Direct link

Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

Descriptors: Item Response Theory, Test Items, Bias, Computation

Effects of Population Heterogeneity on Accuracy of DIF Detection

Peer reviewed

Direct link

Oliveri, María Elena; Ercikan, Kadriye; Zumbo, Bruno D. – Applied Measurement in Education, 2014

Heterogeneity within English language learners (ELLs) groups has been documented. Previous research on differential item functioning (DIF) analyses suggests that accurate DIF detection rates are reduced greatly when groups are heterogeneous. In this simulation study, we investigated the effects of heterogeneity within linguistic (ELL) groups on…

Descriptors: Test Bias, Accuracy, English Language Learners, Simulation

The Effect of Anchor Test Construction on Scale Drift

Peer reviewed

Direct link

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…

Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory

Centering, Scale Indeterminacy, and Differential Item Functioning Detection in Hierarchical Generalized Linear and Generalized Linear Mixed Models

Peer reviewed

Direct link

Cheong, Yuk Fai; Kamata, Akihito – Applied Measurement in Education, 2013

In this article, we discuss and illustrate two centering and anchoring options available in differential item functioning (DIF) detection studies based on the hierarchical generalized linear and generalized linear mixed modeling frameworks. We compared and contrasted the assumptions of the two options, and examined the properties of their DIF…

Descriptors: Test Bias, Hierarchical Linear Modeling, Comparative Analysis, Test Items

Parameter Recovery and Classification Accuracy under Conditions of Testlet Dependency: A Comparison of the Traditional 2PL, Testlet, and Bi-Factor Models

Peer reviewed

Direct link

Koziol, Natalie A. – Applied Measurement in Education, 2016

Testlets, or groups of related items, are commonly included in educational assessments due to their many logistical and conceptual advantages. Despite their advantages, testlets introduce complications into the theory and practice of educational measurement. Responses to items within a testlet tend to be correlated even after controlling for…

Descriptors: Classification, Accuracy, Comparative Analysis, Models

The Impact of Multidirectional Item Parameter Drift on IRT Scaling Coefficients and Proficiency Estimates

Peer reviewed

Direct link

Han, Kyung T.; Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2012

Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some…

Descriptors: Item Response Theory, Test Items, Scaling, Methods

Multistage Computerized Adaptive Testing with Uniform Item Exposure

Peer reviewed

Direct link

Edwards, Michael C.; Flora, David B.; Thissen, David – Applied Measurement in Education, 2012

This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Test Items

A Comparison of IRT Linking Procedures

Peer reviewed

Direct link

Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010

Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…

Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques

Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design

Peer reviewed

Direct link

Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009

In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…

Descriptors: Test Items, Test Content, Testing Programs, Simulation

Previous Page | Next Page »

Pages: 1 | 2

Bolt, Daniel M.	2
Gierl, Mark J.	2
Lee, Won-Chan	2
Penfield, Randall D.	2
Su, Ya-Hui	2
Wang, Wen-Chung	2
Wells, Craig S.	2
Abulela, Mohammed A. A.	1
Antal, Judit	1
Ban, Jae-Chun	1
Boughton, Keith A.	1
Chang, Hua-Hua	1
Cheong, Yuk Fai	1
Cho, Sun-Joo	1
Clauser, Brian	1
Edwards, Michael C.	1
Enders, Craig K.	1
Ercikan, Kadriye	1
Finney, Sara J.	1
Flora, David B.	1
Freeman, Sharon A.	1
Glas, Cees A. W.	1
Gotzmann, Andrea	1
Han, Kyung T.	1
More ▼