ERIC - Search Results

Publication Date

In 2025	0
Since 2024	4
Since 2021 (last 5 years)	12
Since 2016 (last 10 years)	21
Since 2006 (last 20 years)	38

Descriptor

Error of Measurement	50
Item Response Theory	23
Test Items	19
Scores	11
Comparative Analysis	10
Reliability	10
Equated Scores	9
Simulation	9
Test Construction	9
Computation	8
Generalizability Theory	8
Sample Size	8
Difficulty Level	7
Item Analysis	7
Sampling	7
Test Bias	7
Accuracy	6
Achievement Tests	6
Evaluation Methods	6
Models	6
Monte Carlo Methods	6
Foreign Countries	5
Mathematics Tests	5
Psychometrics	5
Statistical Bias	5
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	50
Reports - Research	40
Reports - Evaluative	9
Information Analyses	1
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	6
Secondary Education	6
Junior High Schools	4
Middle Schools	4
Grade 3	3
Grade 8	3
Early Childhood Education	2
Elementary Education	2
Grade 10	2
Grade 2	2
Grade 5	2
Grade 1	1
Grade 4	1
Grade 6	1
Grade 7	1
Grade 9	1
High Schools	1
Higher Education	1
Intermediate Grades	1
Postsecondary Education	1
Preschool Education	1
Primary Education	1
More ▼

Audience

Researchers

Location

California	1
Canada	1
Georgia	1
Germany	1
Iran	1
North Carolina	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	3
Trends in International…	3
National Assessment of…	2
Iowa Tests of Basic Skills	1
Progress in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 50 results Save | Export

Bayesian Maximal Reliability Evaluation Using Latent Variable Modeling

Peer reviewed

Direct link

Tenko Raykov; George A. Marcoulides; Natalja Menold – Applied Measurement in Education, 2024

We discuss an application of Bayesian factor analysis for estimation of the optimal linear combination and associated maximal reliability of a multi-component measuring instrument. The described procedure yields point and credibility interval estimates of this reliability coefficient, which are readily obtained in educational and behavioral…

Descriptors: Bayesian Statistics, Test Reliability, Error of Measurement, Measurement Equipment

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST

Peer reviewed

Direct link

Chalmers, R. Philip; Zheng, Guoguo – Applied Measurement in Education, 2023

This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To…

Descriptors: Test Bias, Test Items, Item Response Theory, Error of Measurement

Analyzing Complete Generalizability Theory Designs Using Structural Equation Models

Peer reviewed

Direct link

Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023

We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…

Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data

Peer reviewed

Direct link

Finch, Holmes – Applied Measurement in Education, 2022

Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…

Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Performance of Infit and Outfit Confidence Intervals Calculated via Parametric Bootstrapping

Peer reviewed

Direct link

Silva Diaz, John Alexander; Köhler, Carmen; Hartig, Johannes – Applied Measurement in Education, 2022

Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. "Infit" and "outfit" fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the…

Descriptors: Intervals, Item Response Theory, Item Analysis, Inferences

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Some Methods and Evaluation for Linking and Equating with Small Samples

Peer reviewed

Direct link

Peabody, Michael R. – Applied Measurement in Education, 2020

The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required…

Descriptors: Evaluation Methods, Equated Scores, Sample Size, Item Response Theory

Gauging Uncertainty in Test-to-Curriculum Alignment Indices

Peer reviewed

Direct link

Traynor, Anne; Li, Tingxuan; Zhou, Shuqi – Applied Measurement in Education, 2020

During the development of large-scale school achievement tests, panels of independent subject-matter experts use systematic judgmental methods to rate the correspondence between a given test's items and performance objective statements. The individual experts' ratings may then be used to compute summary indices to quantify the match between a…

Descriptors: Alignment (Education), Achievement Tests, Curriculum, Error of Measurement

Equating with Small and Unbalanced Samples

Peer reviewed

Direct link

Goodman, Joshua T.; Dallas, Andrew D.; Fan, Fen – Applied Measurement in Education, 2020

Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students,…

Descriptors: Equated Scores, Sample Size, Sampling, Weighted Scores

Measurement Invariance in Relation to First Language: An Evaluation of German Reading and Spelling Tests

Peer reviewed

Direct link

Visser, Linda; Cartschau, Friederike; von Goldammer, Ariane; Brandenburg, Janin; Timmerman, Marieke; Hasselhorn, Marcus; Mähler, Claudia – Applied Measurement in Education, 2023

The growing number of children in primary schools in Germany who have German as their second language (L2) has raised questions about the fairness of performance assessment. Fair tests are a prerequisite for distinguishing between L2 learning delay and a specific learning disability. We evaluated five commonly used reading and spelling tests for…

Descriptors: Foreign Countries, Error of Measurement, Second Language Learning, German

Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model

Peer reviewed

Direct link

Zhang, Zhonghua – Applied Measurement in Education, 2020

The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…

Descriptors: Error of Measurement, Computation, Equated Scores, True Scores

Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning

Peer reviewed

Direct link

Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024

Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…

Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Finch, Holmes	4
Feldt, Leonard S.	3
Kane, Michael	2
Lee, Won-Chan	2
Qualls, Audrey L.	2
Abulela, Mohammed A. A.	1
Antal, Judit	1
Bergstrom, Betty A.	1
Brandenburg, Janin	1
Brennan, Robert L.	1
Briggs, Derek C.	1
Cao, Yi	1
Carol Eckerly	1
Cartschau, Friederike	1
Chalmers, R. Philip	1
Chen, Wen-Hung	1
Chen, Yu-Jen	1
Cheng, Chien-Fen	1
Dallas, Andrew D.	1
DeMars, Christine	1
Engelhard, George, Jr.	1
Fan, Fen	1
Ferrara, Steve	1
Forsyth, Robert A.	1
More ▼