Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 7 |
Descriptor
Generalizability Theory | 10 |
Reliability | 10 |
Test Items | 10 |
Error of Measurement | 4 |
Scores | 3 |
Comparative Analysis | 2 |
English | 2 |
English (Second Language) | 2 |
Mathematics Tests | 2 |
Measurement | 2 |
Probability | 2 |
More ▼ |
Source
Applied Measurement in… | 3 |
Asia Pacific Education Review | 1 |
ETS Research Report Series | 1 |
Educational Researcher | 1 |
Journal of Education and… | 1 |
Online Submission | 1 |
Author
Frisbie, David A. | 2 |
Lee, Guemin | 2 |
Solano-Flores, Guillermo | 2 |
Chon, Kyong Hee | 1 |
Conley, David | 1 |
Dogan, C. Deha | 1 |
Kachchaf, Rachel | 1 |
Kamis, Ömer | 1 |
Kannan, Priya | 1 |
Kantor, Robert | 1 |
Katz, Irvin R. | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 7 |
Reports - Evaluative | 3 |
Speeches/Meeting Papers | 3 |
Numerical/Quantitative Data | 2 |
Education Level
Higher Education | 2 |
Grade 4 | 1 |
Grade 5 | 1 |
Postsecondary Education | 1 |
Audience
Location
Turkey (Ankara) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Kamis, Ömer; Dogan, C. Deha – Journal of Education and Learning, 2018
This research aimed to compare the G and Phi coefficients estimated in Decision studies in Generalizability theory and obtained in actual cases for the same conditions of similar facets by using crossed design. The research was conducted as pure research on 120 individuals (students), six items and 12 raters. An achievement test composed of six…
Descriptors: Generalizability Theory, Decision Making, Reliability, Computation
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012
We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…
Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers
Lombardi, Allison; Seburn, Mary; Conley, David; Snow, Eric – Online Submission, 2010
In alignment studies, expert raters evaluate assessment items against standards and ratings are used to compute various alignment indices. Questions about rater reliability, however, are often ignored or inadequately addressed. This paper reports the results of a generalizability theory study of cognitive demand and rigor ratings of assessment…
Descriptors: Generalizability Theory, Test Items, College Entrance Examinations, Readiness
Solano-Flores, Guillermo – Educational Researcher, 2008
The testing of English language learners (ELLs) is, to a large extent, a random process because of poor implementation and factors that are uncertain or beyond control. Yet current testing practices and policies appear to be based on deterministic views of language and linguistic groups and erroneous assumptions about the capacity of assessment…
Descriptors: Generalizability Theory, Testing, Second Language Learning, Error of Measurement
Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007
This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…
Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

Lee, Guemin; Frisbie, David A. – Applied Measurement in Education, 1999
Studied the appropriateness and implications of using a generalizability theory approach to estimating the reliability of scores from tests composed of testlets. Analyses of data from two national standardization samples suggest that manipulating the number of passages is a more productive way to obtain efficient measurement than manipulating the…
Descriptors: Generalizability Theory, Models, National Surveys, Reliability
Lee, Guemin; Frisbie, David A. – 1997
Previous studies have indicated that the reliability of test scores composed of testlets might be overestimated by conventional item-based reliability estimation methods (R. Thorndike, 1953; A. Anastasi, 1988; S. Sireci, D. Thissen, and H. Wainer, 1991; H. Wainer and D. Thissen, 1996). This study used generalizability theory to investigate the…
Descriptors: Estimation (Mathematics), Generalizability Theory, Reliability, Scores
Lee, Yong-Won; Kantor, Robert; Mollaun, Pam – 2002
This paper reports the results of generalizability theory (G) analyses done for new writing and speaking tasks for the Test of English as a Foreign Language (TOEFL). For writing, a special focus was placed on evaluating the impact on the reliability of the number of raters (or ratings) per essay (one or two) and the number of tasks (one, two, or…
Descriptors: English (Second Language), Generalizability Theory, Reliability, Scores