ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	7

Descriptor

Error of Measurement	11
Scores	11
Test Items	4
Item Response Theory	3
Reliability	3
Test Bias	3
Test Construction	3
Accuracy	2
Computation	2
Educational Assessment	2
Generalizability Theory	2
Grade 8	2
Models	2
Psychometrics	2
Reading Tests	2
Responses	2
Sampling	2
Statistical Analysis	2
Test Reliability	2
Academic Achievement	1
Achievement Tests	1
Alternative Assessment	1
Classification	1
Comparative Analysis	1
Data	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	11
Reports - Research	8
Reports - Evaluative	2
Reports - Descriptive	1

Education Level

Elementary Secondary Education	2
Elementary Education	1
Grade 10	1
Grade 2	1
Grade 3	1
Grade 5	1
Grade 7	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1
More ▼

Audience

Location

Georgia	1
Iran	1

Laws, Policies, & Programs

Assessments and Surveys

Trends in International…

What Works Clearinghouse Rating

Showing all 11 results Save | Export

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Item Parameter Drift in a Time-Varying Predictor

Peer reviewed

Direct link

Lee, HyeSun – Applied Measurement in Education, 2018

The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…

Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores

Estimating Variance Components from Sparse Data Matrices in Large-Scale Educational Assessments

Peer reviewed

Direct link

DeMars, Christine – Applied Measurement in Education, 2015

In generalizability theory studies in large-scale testing contexts, sometimes a facet is very sparsely crossed with the object of measurement. For example, when assessments are scored by human raters, it may not be practical to have every rater score all students. Sometimes the scoring is systematically designed such that the raters are…

Descriptors: Educational Assessment, Measurement, Data, Generalizability Theory

An Application of Generalizability Theory to Evaluate the Technical Quality of an Alternate Assessment

Peer reviewed

Direct link

Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013

Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…

Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores

Using Confirmatory Factor Analysis and the Rasch Model to Assess Measurement Invariance in a High Stakes Reading Assessment

Peer reviewed

Direct link

Randall, Jennifer; Engelhard, George, Jr. – Applied Measurement in Education, 2010

The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based…

Descriptors: Testing Accommodations, Test Items, Learning Disabilities, Factor Analysis

The Precision of Change Scores under Absolute and Relative Interpretations.

Peer reviewed

Miller, Tamara B.; Kane, Michael – Applied Measurement in Education, 2001

Examined the precision of change scores in terms of error-tolerance (E/T) ratios for both relative and absolute interpretations of change scores. Used E/T ratios to evaluate the error in estimating the change relative to tolerance for error in a particular context. Illustrates the results with achievement test data. (SLD)

Descriptors: Achievement Tests, Error of Measurement, Estimation (Mathematics), Scores

Reliability Estimation When a Test Is Split into Two Parts of Unknown Effective Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…

Descriptors: Error of Measurement, Reliability, Scores, Test Construction

Weighting Constructed-Response Items in IRT-Based Exams

Peer reviewed

Direct link

Sykes, Robert C.; Hou, Liling – Applied Measurement in Education, 2003

Weighting responses to Constructed-Response (CR) items has been proposed as a way to increase the contribution these items make to the test score when there is insufficient testing time to administer additional CR items. The effect of various types of weighting items of an IRT-based mixed-format writing examination was investigated.…

Descriptors: Item Response Theory, Weighted Scores, Responses, Scores

Vertically Articulated Performance Standards: Logic, Procedures, and Likely Classification Accuracy

Peer reviewed

Direct link

Ferrara, Steve; Johnson, Eugene; Chen, Wen-Hung – Applied Measurement in Education, 2005

Psychometricians continue to develop and evaluate methods for linking test scores, both horizontally and vertically. This article describes a social moderation process for articulating (i.e., linking) performance standards across grade levels for an operational state assessment program. The researchers used generated data to evaluate the likely…

Descriptors: Grade 2, Grade 3, Scores, Error of Measurement

Carol Eckerly	1
Chen, Wen-Hung	1
DeMars, Christine	1
Engelhard, George, Jr.	1
Feldt, Leonard S.	1
Ferrara, Steve	1
Hou, Liling	1
John R. Donoghue	1
Johnson, Eugene	1
Jones, Andrew T.	1
Kane, Michael	1
Kopp, Jason P.	1
Lee, HyeSun	1
Lee, Won-Chan	1
Miller, Tamara B.	1
Pastor, Dena A.	1
Randall, Jennifer	1
Song, Yoon Ah	1
Sykes, Robert C.	1
Taylor, Melinda Ann	1
More ▼