ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	10
Since 2016 (last 10 years)	18
Since 2006 (last 20 years)	34

Descriptor

Error of Measurement	40
Item Response Theory	20
Test Items	18
Comparative Analysis	10
Simulation	9
Computation	8
Scores	8
Generalizability Theory	7
Item Analysis	7
Sample Size	7
Accuracy	6
Difficulty Level	6
Equated Scores	6
Monte Carlo Methods	6
Reliability	6
Sampling	6
Test Bias	6
Achievement Tests	5
Mathematics Tests	5
Models	5
Test Construction	5
Test Reliability	5
Evaluation Methods	4
Foreign Countries	4
Grade 8	4
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	40
Reports - Research	40
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	6
Secondary Education	5
Grade 8	3
Junior High Schools	3
Middle Schools	3
Early Childhood Education	2
Elementary Education	2
Grade 10	2
Grade 3	2
Grade 5	2
Grade 1	1
Grade 2	1
Grade 4	1
Grade 6	1
Grade 7	1
Grade 9	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Postsecondary Education	1
Preschool Education	1
Primary Education	1
More ▼

Audience

Researchers

Location

California	1
Canada	1
Georgia	1
Iran	1
North Carolina	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	3
Trends in International…	3
National Assessment of…	2
Iowa Tests of Basic Skills	1
Progress in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Bayesian Maximal Reliability Evaluation Using Latent Variable Modeling

Peer reviewed

Direct link

Tenko Raykov; George A. Marcoulides; Natalja Menold – Applied Measurement in Education, 2024

We discuss an application of Bayesian factor analysis for estimation of the optimal linear combination and associated maximal reliability of a multi-component measuring instrument. The described procedure yields point and credibility interval estimates of this reliability coefficient, which are readily obtained in educational and behavioral…

Descriptors: Bayesian Statistics, Test Reliability, Error of Measurement, Measurement Equipment

Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST

Peer reviewed

Direct link

Chalmers, R. Philip; Zheng, Guoguo – Applied Measurement in Education, 2023

This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To…

Descriptors: Test Bias, Test Items, Item Response Theory, Error of Measurement

Analyzing Complete Generalizability Theory Designs Using Structural Equation Models

Peer reviewed

Direct link

Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023

We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…

Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data

Peer reviewed

Direct link

Finch, Holmes – Applied Measurement in Education, 2022

Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…

Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Performance of Infit and Outfit Confidence Intervals Calculated via Parametric Bootstrapping

Peer reviewed

Direct link

Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes – Applied Measurement in Education, 2022

Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. "Infit" and "outfit" fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the…

Descriptors: Intervals, Item Response Theory, Item Analysis, Inferences

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Gauging Uncertainty in Test-to-Curriculum Alignment Indices

Peer reviewed

Direct link

Traynor, Anne; Li, Tingxuan; Zhou, Shuqi – Applied Measurement in Education, 2020

During the development of large-scale school achievement tests, panels of independent subject-matter experts use systematic judgmental methods to rate the correspondence between a given test's items and performance objective statements. The individual experts' ratings may then be used to compute summary indices to quantify the match between a…

Descriptors: Alignment (Education), Achievement Tests, Curriculum, Error of Measurement

Equating with Small and Unbalanced Samples

Peer reviewed

Direct link

Goodman, Joshua T.; Dallas, Andrew D.; Fan, Fen – Applied Measurement in Education, 2020

Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students,…

Descriptors: Equated Scores, Sample Size, Sampling, Weighted Scores

Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model

Peer reviewed

Direct link

Zhang, Zhonghua – Applied Measurement in Education, 2020

The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…

Descriptors: Error of Measurement, Computation, Equated Scores, True Scores

Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning

Peer reviewed

Direct link

Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024

Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…

Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

A Comparison of Estimation Techniques for IRT Models with Small Samples

Peer reviewed

Direct link

Finch, Holmes; French, Brian F. – Applied Measurement in Education, 2019

The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact…

Descriptors: Item Response Theory, Accuracy, Test Items, Difficulty Level

Previous Page | Next Page »

Pages: 1 | 2 | 3

Finch, Holmes	4
Lee, Won-Chan	2
Abulela, Mohammed A. A.	1
Antal, Judit	1
Bergstrom, Betty A.	1
Brennan, Robert L.	1
Briggs, Derek C.	1
Cao, Yi	1
Chalmers, R. Philip	1
Dallas, Andrew D.	1
DeMars, Christine	1
Engelhard, George, Jr.	1
Fan, Fen	1
Feldt, Leonard S.	1
French, Brian F.	1
Gao, Xiaohong	1
George A. Marcoulides	1
Goodman, Joshua T.	1
Haag, Nicole	1
Haertel, Edward H.	1
Hartig, Johannes	1
Hou, Liling	1
Hyeri Hong	1
Hyeryung Lee	1
More ▼