ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	5

Descriptor

Error of Measurement	10
Reliability	10
Generalizability Theory	5
Sampling	3
Scores	3
Educational Assessment	2
Scoring	2
Test Items	2
Achievement Tests	1
Bilingual Teachers	1
Classroom Environment	1
Comparative Analysis	1
Computation	1
Cutting Scores	1
Data	1
Data Analysis	1
Design	1
Educational Environment	1
Elementary School Students	1
Elementary Secondary Education	1
English	1
English Language Learners	1
Error Patterns	1
Estimation (Mathematics)	1
Generalization	1
More ▼

Source

Applied Measurement in…

Author

Feldt, Leonard S.	3
Qualls, Audrey L.	2
Carol Eckerly	1
DeMars, Christine	1
Gao, Xiaohong	1
John R. Donoghue	1
Kachchaf, Rachel	1
Kane, Michael	1
Kannan, Priya	1
Katz, Irvin R.	1
Schweig, Jonathan David	1
Sgammato, Adrienne	1
Solano-Flores, Guillermo	1
Tannenbaum, Richard J.	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	6
Reports - Evaluative	3
Reports - Descriptive	1

Education Level

Elementary Secondary Education	1
Grade 4	1
Grade 5	1

Audience

Location

California	1
North Carolina	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 10 results Save | Export

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Estimating Variance Components from Sparse Data Matrices in Large-Scale Educational Assessments

Peer reviewed

Direct link

DeMars, Christine – Applied Measurement in Education, 2015

In generalizability theory studies in large-scale testing contexts, sometimes a facet is very sparsely crossed with the object of measurement. For example, when assessments are scored by human raters, it may not be practical to have every rater score all students. Sometimes the scoring is systematically designed such that the raters are…

Descriptors: Educational Assessment, Measurement, Data, Generalizability Theory

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Quantifying Error in Survey Measures of School and Classroom Environments

Peer reviewed

Direct link

Schweig, Jonathan David – Applied Measurement in Education, 2014

Developing indicators that reflect important aspects of school and classroom environments has become central in a nationwide effort to develop comprehensive programs that measure teacher quality and effectiveness. Formulating teacher evaluation policy necessitates accurate and reliable methods for measuring these environmental variables. This…

Descriptors: Error of Measurement, Educational Environment, Classroom Environment, Surveys

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Approximating Scale Score Standard Error of Measurement from the Raw Score Standard Error.

Peer reviewed

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1998

Two relatively simple methods for estimating the condition standard error of measurement (SEM) for nonlinearly derived score scales are proposed. Applications indicate that these two procedures produce fairly consistent estimates that tend to peak near the high end of the scale and reach a minimum in the middle of the raw score scale. (SLD)

Descriptors: Error of Measurement, Estimation (Mathematics), Raw Scores, Reliability

Reliability Estimation When a Test Is Split into Two Parts of Unknown Effective Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…

Descriptors: Error of Measurement, Reliability, Scores, Test Construction

Variability in Reliability Coefficients and the Standard Error of Measurement from School District to District.

Peer reviewed

Feldt, Leonard S.; Qualls, Audrey L. – Applied Measurement in Education, 1999

Examined the stability of the standard error of measurement and the relationship between the reliability coefficient and the variance of both true scores and error scores for 170 school districts in a state. As expected, reliability coefficients varied as a function of group variability, but the variation in split-half coefficients from school to…

Descriptors: Elementary Secondary Education, Error of Measurement, Reliability, School Districts

The Precision of Measurements.

Peer reviewed

Kane, Michael – Applied Measurement in Education, 1996

This overview of the role of error and tolerance for error in measurement asserts that the generic precision associated with a measurement procedure is defined as the root mean square error, or standard error, in some relevant population. This view of precision is explored in several applications of measurement. (SLD)

Descriptors: Error of Measurement, Error Patterns, Generalizability Theory, Measurement Techniques

Generalizability of Large-Scale Performance Assessments in Science: Promises and Problems.

Peer reviewed

Gao, Xiaohong; And Others – Applied Measurement in Education, 1994

This study provides empirical evidence about the sampling variability and generalizability (reliability) of a statewide performance assessment for grade six. Results for 600 students at individual and school levels indicate that task-sampling variability was the major source of measurement error. Rater-sampling variability was negligible. (SLD)

Descriptors: Achievement Tests, Educational Assessment, Elementary School Students, Error of Measurement