ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	4

Descriptor

Generalizability Theory	6
Scoring	6
Test Items	3
Comparative Analysis	2
Computer Assisted Testing	2
Decision Making	2
Foreign Countries	2
Physicians	2
Test Scoring Machines	2
Automation	1
Best Practices	1
Bilingual Teachers	1
Computation	1
Credentials	1
Cutting Scores	1
Data Collection	1
Data Interpretation	1
Design	1
Difficulty Level	1
English	1
English Language Learners	1
Error of Measurement	1
Essay Tests	1
Grade 4	1
Grade 5	1
More ▼

Source

Applied Measurement in…

Author

Clauser, Brian E.	2
Bimpeh, Yaw	1
Chis, Liliana	1
Clyman, Stephen G.	1
Harik, Polina	1
Harrison, Liz	1
Kachchaf, Rachel	1
Margolis, Melissa J.	1
Marzano, Robert J.	1
McManus, I. C.	1
Mollon, Jennifer	1
Pointer, William	1
Rupp, André A.	1
Smith, Ben Alexander	1
Solano-Flores, Guillermo	1
Swanson, David B.	1
Williams, Simon	1
More ▼

Publication Type

Journal Articles	6
Reports - Research	3
Reports - Evaluative	2
Reports - Descriptive	1

Education Level

Grade 4	1
Grade 5	1

Audience

Location

United Kingdom

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 6 results Save | Export

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Designing, Evaluating, and Deploying Automated Scoring Systems with Validity in Mind: Methodological Design Decisions

Peer reviewed

Direct link

Rupp, André A. – Applied Measurement in Education, 2018

This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…

Descriptors: Design, Automation, Scoring, Test Scoring Machines

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring

A Comparison of the Generalizability of Scores Produced by Expert Raters and Automated Scoring Systems.

Peer reviewed

Clauser, Brian E.; Swanson, David B.; Clyman, Stephen G. – Applied Measurement in Education, 1999

Performed generalizability analyses of expert ratings and computer-produced scores for a computer-delivered performance assessment of physicians' patient management skills. The two automated scoring systems produced scores for the 200 medical students that were approximately as generalizable as those produced by the four expert raters. (SLD)

Descriptors: Comparative Analysis, Computer Assisted Testing, Generalizability Theory, Higher Education

A Comparison of Selected Methods of Scoring Classroom Assessments.

Peer reviewed

Marzano, Robert J. – Applied Measurement in Education, 2002

Two studies, each involving 10 eighth graders,compared the findings from generalizability (G) studies and alternative decision (D) studies for 4 approaches to scoring classroom assessments. In terms of less rater x person variability and higher G and D coefficients, the methods ranked in this order: topic-specific rubric, constrained point,…

Descriptors: Comparative Analysis, Decision Making, Generalizability Theory, Junior High School Students