ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	8

Descriptor

Comparative Analysis	10
Reliability	6
Correlation	4
Scoring	4
Validity	3
Error of Measurement	2
Essay Tests	2
Evaluators	2
Generalizability Theory	2
Interrater Reliability	2
Item Response Theory	2
Mathematics Tests	2
Scores	2
Statistical Analysis	2
Student Evaluation	2
Test Items	2
Test Reliability	2
Test Results	2
Test Theory	2
Writing Evaluation	2
Automation	1
Case Studies	1
Classroom Environment	1
Competition	1
Computation	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	10
Reports - Research	8
Reports - Evaluative	3

Education Level

Elementary Secondary Education	2
Grade 3	1
Grade 5	1
Grade 8	1

Audience

Location

California	1
Canada	1
North Carolina	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Quantifying Error in Survey Measures of School and Classroom Environments

Peer reviewed

Direct link

Schweig, Jonathan David – Applied Measurement in Education, 2014

Developing indicators that reflect important aspects of school and classroom environments has become central in a nationwide effort to develop comprehensive programs that measure teacher quality and effectiveness. Formulating teacher evaluation policy necessitates accurate and reliable methods for measuring these environmental variables. This…

Descriptors: Error of Measurement, Educational Environment, Classroom Environment, Surveys

Generalizability Theory and Classical Test Theory

Peer reviewed

Direct link

Brennan, Robert L. – Applied Measurement in Education, 2011

Broadly conceived, reliability involves quantifying the consistencies and inconsistencies in observed scores. Generalizability theory, or G theory, is particularly well suited to addressing such matters in that it enables an investigator to quantify and distinguish the sources of inconsistencies in observed scores that arise, or could arise, over…

Descriptors: Generalizability Theory, Test Theory, Test Reliability, Item Response Theory

Do Different Approaches to Examining Construct Comparability in Multilanguage Assessments Lead to Similar Conclusions?

Peer reviewed

Direct link

Oliveri, Maria E.; Ercikan, Kadriye – Applied Measurement in Education, 2011

In this study, we examine the degree of construct comparability and possible sources of incomparability of the English and French versions of the Programme for International Student Assessment (PISA) 2003 problem-solving measure administered in Canada. Several approaches were used to examine construct comparability at the test- (examination of…

Descriptors: Foreign Countries, English, French, Tests

The Utility of Augmented Subscores in a Licensure Exam: An Evaluation of Methods Using Empirical Data

Peer reviewed

Direct link

Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010

Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…

Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods

Providing Subscale Scores for Diagnostic Information: A Case Study when the Test Is Essentially Unidimensional

Peer reviewed

Direct link

Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010

Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…

Descriptors: Item Response Theory, Case Studies, Reliability, Scores

The Critical Role of Anchor Paper Selection in Writing Assessment

Peer reviewed

Direct link

Osborn Popp, Sharon E.; Ryan, Joseph M.; Thompson, Marilyn S. – Applied Measurement in Education, 2009

Scoring rubrics are routinely used to evaluate the quality of writing samples produced for writing performance assessments, with anchor papers chosen to represent score points defined in the rubric. Although the careful selection of anchor papers is associated with best practices for scoring, little research has been conducted on the role of…

Descriptors: Writing Evaluation, Scoring Rubrics, Selection, Scoring

Linking Statewide Tests to the National Assessment of Educational Progress: Stability of Results.

Peer reviewed

Linn, Robert L.; Kiplinger, Vonda L. – Applied Measurement in Education, 1995

The adequacy of linking statewide standardized test results to the National Assessment of Educational Progress by using equipercentile equating procedures was investigated using statewide mathematics data from four states. Results suggest that the linkings are not sufficiently trustworthy to make comparisons based on the tails of the distribution.…

Descriptors: Comparative Analysis, Educational Assessment, Equated Scores, Mathematics Tests

Cross-State Comparability of Judgments of Student Writing: Results from the New Standards Project.

Peer reviewed

Linn, Robert L.; And Others – Applied Measurement in Education, 1992

Ten states participated in a cross-state scoring workshop in 1991, evaluating writing from elementary school, middle school, and high school students. Correlation of scores assigned by readers from one state with those from readers from another state were generally quite high. Implications for defining common standards are discussed. (SLD)

Descriptors: Comparative Analysis, Correlation, Elementary School Students, Elementary Secondary Education

Linn, Robert L.	2
Boyer, Michelle	1
Brennan, Robert L.	1
Ercikan, Kadriye	1
Ferrara, Steve	1
Haberman, Shelby	1
Kieftenbeld, Vincent	1
Kiplinger, Vonda L.	1
Lane, Suzanne	1
Larkin, Kevin	1
Oliveri, Maria E.	1
Osborn Popp, Sharon E.	1
Puhan, Gautam	1
Ryan, Joseph M.	1
Schweig, Jonathan David	1
Sinharay, Sandip	1
Steedle, Jeffrey T.	1
Stone, Clement A.	1
Thompson, Marilyn S.	1
Ye, Feifei	1
Zhu, Xiaowen	1
More ▼