ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	12

Source

Educational Measurement:…

Publication Type

Journal Articles	13
Reports - Research	13

Education Level

Junior High Schools	3
Middle Schools	3
Secondary Education	3
Elementary Education	2
Elementary Secondary Education	2
Grade 4	1
Grade 5	1

Audience

Location

United States

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Growth across Grades and Common Item Grade Alignment in Vertical Scaling Using the Rasch Model

Peer reviewed

Direct link

Sanford R. Student; Derek C. Briggs; Laurie Davis – Educational Measurement: Issues and Practice, 2025

Vertical scales are frequently developed using common item nonequivalent group linking. In this design, one can use upper-grade, lower-grade, or mixed-grade common items to estimate the linking constants that underlie the absolute measurement of growth. Using the Rasch model and a dataset from Curriculum Associates' i-Ready Diagnostic in math in…

Descriptors: Elementary School Mathematics, Elementary School Students, Middle School Mathematics, Middle School Students

It's Not Just Angoff: Misperceptions of Hard and Easy Items in Bookmark-Type Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020

A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…

Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items

The Invariance Paradox: Using Optimal Test Design to Minimize Bias

Peer reviewed

Direct link

Jones, Andrew T.; Kopp, Jason P.; Ong, Thai Q. – Educational Measurement: Issues and Practice, 2020

Studies investigating invariance have often been limited to measurement or prediction invariance. Selection invariance, wherein the use of test scores for classification results in equivalent classification accuracy between groups, has received comparatively little attention in the psychometric literature. Previous research suggests that some form…

Descriptors: Test Construction, Test Bias, Classification, Accuracy

Covariate Measurement Error Correction for Student Growth Percentiles Using the SIMEX Method

Peer reviewed

Direct link

Shang, Yi; VanIwaarden, Adam; Betebenner, Damian W. – Educational Measurement: Issues and Practice, 2015

In this study, we examined the impact of covariate measurement error (ME) on the estimation of quantile regression and student growth percentiles (SGPs), and find that SGPs tend to be overestimated among students with higher prior achievement and underestimated among those with lower prior achievement, a problem we describe as ME endogeneity in…

Descriptors: Error of Measurement, Regression (Statistics), Achievement Gains, Students

The Impact of Measurement Error on the Accuracy of Individual and Aggregate SGP

Peer reviewed

Direct link

McCaffrey, Daniel F.; Castellano, Katherine E.; Lockwood, J. R. – Educational Measurement: Issues and Practice, 2015

Student growth percentiles (SGPs) express students' current observed scores as percentile ranks in the distribution of scores among students with the same prior-year scores. A common concern about SGPs at the student level, and mean or median SGPs (MGPs) at the aggregate level, is potential bias due to test measurement error (ME). Shang,…

Descriptors: Error of Measurement, Accuracy, Achievement Gains, Students

The Accuracy of Aggregate Student Growth Percentiles as Indicators of Educator Performance

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F. – Educational Measurement: Issues and Practice, 2017

Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Value Added Models, Achievement Gains

Uncovering Multivariate Structure in Classroom Observations in the Presence of Rater Errors

Peer reviewed

Direct link

McCaffrey, Daniel F.; Yuan, Kun; Savitsky, Terrance D.; Lockwood, J. R.; Edelen, Maria O. – Educational Measurement: Issues and Practice, 2015

We examine the factor structure of scores from the CLASS-S protocol obtained from observations of middle school classroom teaching. Factor analysis has been used to support both interpretations of scores from classroom observation protocols, like CLASS-S, and the theories about teaching that underlie them. However, classroom observations contain…

Descriptors: Factor Structure, Multivariate Analysis, Scores, Factor Analysis

Quantifying Error and Uncertainty Reductions in Scaling Functions: An ITEMS Module

Peer reviewed

Direct link

Moses, Tim – Educational Measurement: Issues and Practice, 2014

This module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and…

Descriptors: Scaling, Regression (Statistics), Equated Scores, Comparative Analysis

Exploring the Utility of Sequential Analysis in Studying Informal Formative Assessment Practices

Peer reviewed

Direct link

Furtak, Erin Marie; Ruiz-Primo, Maria Araceli; Bakeman, Roger – Educational Measurement: Issues and Practice, 2017

Formative assessment is a classroom practice that has received much attention in recent years for its established potential at increasing student learning. A frequent analytic approach for determining the quality of formative assessment practices is to develop a coding scheme and determine frequencies with which the codes are observed; however,…

Descriptors: Sequential Approach, Formative Evaluation, Alternative Assessment, Incidence

The Effect of Ignoring Classroom-Level Variance in Estimating the Generalizability of School Mean Scores

Peer reviewed

Direct link

Wei, Xin; Haertel, Edward – Educational Measurement: Issues and Practice, 2011

Contemporary educational accountability systems, including state-level systems prescribed under No Child Left Behind as well as those envisioned under the "Race to the Top" comprehensive assessment competition, rely on school-level summaries of student test scores. The precision of these score summaries is almost always evaluated using models that…

Descriptors: Scores, Reliability, Computation, Generalizability Theory

Mean Effects of Test Accommodations for ELLs and Non-ELLs: A Meta-Analysis of Experimental Studies

Peer reviewed

Direct link

Pennock-Roman, Maria; Rivera, Charlene – Educational Measurement: Issues and Practice, 2011

The objective was to examine the impact of different types of accommodations on performance in content tests such as mathematics. The meta-analysis included 14 U.S. studies that randomly assigned school-aged English language learners (ELLs) to test accommodation versus control conditions or used repeated measures in counter-balanced order.…

Descriptors: Testing Accommodations, Printed Materials, Second Language Learning, Glossaries

Generalizability of Cognitive Interview-Based Measures across Cultural Groups

Peer reviewed

Direct link

Solano-Flores, Guillermo; Li, Min – Educational Measurement: Issues and Practice, 2009

We addressed the challenge of scoring cognitive interviews in research involving multiple cultural groups. We interviewed 123 fourth- and fifth-grade students from three cultural groups to probe how they related a mathematics item to their personal lives. Item meaningfulness--the tendency of students to relate the content and/or context of an item…

Descriptors: Generalizability Theory, Scoring, Error of Measurement, Grade 5

Exemplary LEA Practice: The Great Pencil Panic of 1984.

Peer reviewed

Bauer, Ernest A. – Educational Measurement: Issues and Practice, 1985

Misreadings of pencil answer marks on test answer sheets by optical scanners cause scoring errors. Twenty-six different pencils were tested for readability differences when optically scanned. Complete light and dark marks scanned perfectly for 18 pencils. Totals for all six mark types ranged from 947 to 1726 out of 1800. (BS)

Descriptors: Answer Sheets, Elementary Secondary Education, Error of Measurement, Optical Scanners

Error of Measurement	13
Achievement Gains	4
Scores	4
Test Reliability	4
Computation	3
Correlation	3
Evaluation Methods	3
Test Validity	3
Accuracy	2
Alternative Assessment	2
Bias	2
Classroom Observation…	2
Comparative Analysis	2
Elementary School Students	2
Evaluation Criteria	2
Generalizability Theory	2
Middle School Teachers	2
Regression (Statistics)	2
Scaling	2
Scoring	2
Simulation	2
Students	2
Test Bias	2
Testing Problems	2
Accountability	1
More ▼

McCaffrey, Daniel F.	3
Castellano, Katherine E.	2
Lockwood, J. R.	2
Babcock, Ben	1
Bakeman, Roger	1
Bauer, Ernest A.	1
Betebenner, Damian W.	1
Derek C. Briggs	1
Edelen, Maria O.	1
Furtak, Erin Marie	1
Haertel, Edward	1
Jones, Andrew T.	1
Kopp, Jason P.	1
Laurie Davis	1
Li, Min	1
Moses, Tim	1
Ong, Thai Q.	1
Pennock-Roman, Maria	1
Rivera, Charlene	1
Ruiz-Primo, Maria Araceli	1
Sanford R. Student	1
Savitsky, Terrance D.	1
Shang, Yi	1
Solano-Flores, Guillermo	1
VanIwaarden, Adam	1
More ▼