ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	4

Descriptor

Generalizability Theory	6
Test Items	6
Reliability	3
Scoring	3
Cutting Scores	2
Difficulty Level	2
Error of Measurement	2
Foreign Countries	2
Licensing Examinations…	2
Probability	2
Standard Setting (Scoring)	2
Bilingual Teachers	1
Certification	1
Computation	1
Credentials	1
English	1
English Language Learners	1
Estimation (Mathematics)	1
Grade 4	1
Grade 5	1
Group Discussion	1
High Stakes Tests	1
Interrater Reliability	1
Mathematics Tests	1
Medicine	1
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	6
Reports - Evaluative	3
Reports - Research	3

Education Level

Grade 4	1
Grade 5	1

Audience

Location

United Kingdom

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 6 results Save | Export

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring

Estimating Reliability under a Generalizability Theory Model for Test Scores Composed of Testlets.

Peer reviewed

Lee, Guemin; Frisbie, David A. – Applied Measurement in Education, 1999

Studied the appropriateness and implications of using a generalizability theory approach to estimating the reliability of scores from tests composed of testlets. Analyses of data from two national standardization samples suggest that manipulating the number of passages is a more productive way to obtain efficient measurement than manipulating the…

Descriptors: Generalizability Theory, Models, National Surveys, Reliability

The Generalizability of Ratings of Item Relevance.

Peer reviewed

Norcini, John; Grosso, Lou – Applied Measurement in Education, 1998

Ratings of test item relevance were collected from 57 practitioners from a pretest of a medical certifying examination. Ratings were correlated with item difficulty, but the relationship between ratings and item discrimination was less clear. Application of generalizability theory shows that reasonable estimates of item, stem, and total test…

Descriptors: Certification, Difficulty Level, Estimation (Mathematics), Generalizability Theory

Bimpeh, Yaw	1
Chis, Liliana	1
Clauser, Brian E.	1
Frisbie, David A.	1
Grosso, Lou	1
Harik, Polina	1
Harrison, Liz	1
Kachchaf, Rachel	1
Kannan, Priya	1
Katz, Irvin R.	1
Lee, Guemin	1
Margolis, Melissa J.	1
McManus, I. C.	1
Mollon, Jennifer	1
Norcini, John	1
Pointer, William	1
Sgammato, Adrienne	1
Smith, Ben Alexander	1
Solano-Flores, Guillermo	1
Tannenbaum, Richard J.	1
Williams, Simon	1
More ▼