Publication Date
In 2025 | 2 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 14 |
Since 2006 (last 20 years) | 37 |
Descriptor
Generalizability Theory | 56 |
Test Items | 56 |
Error of Measurement | 15 |
Foreign Countries | 14 |
Scores | 13 |
Test Reliability | 13 |
Reliability | 11 |
Test Construction | 11 |
Difficulty Level | 10 |
Item Response Theory | 9 |
English (Second Language) | 8 |
More ▼ |
Source
Author
Solano-Flores, Guillermo | 5 |
Lee, Guemin | 3 |
Brennan, Robert L. | 2 |
Clauser, Brian E. | 2 |
Frisbie, David A. | 2 |
Harik, Polina | 2 |
Li, Min | 2 |
Margolis, Melissa J. | 2 |
Webb, Noreen M. | 2 |
Alonzo, Julie | 1 |
Anderson, Dan | 1 |
More ▼ |
Publication Type
Journal Articles | 41 |
Reports - Research | 37 |
Reports - Evaluative | 16 |
Speeches/Meeting Papers | 9 |
Information Analyses | 3 |
Numerical/Quantitative Data | 3 |
Reports - Descriptive | 2 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 10 |
Postsecondary Education | 7 |
Secondary Education | 6 |
Grade 8 | 4 |
Elementary Education | 3 |
Grade 5 | 3 |
Grade 7 | 3 |
Junior High Schools | 3 |
Middle Schools | 3 |
Grade 3 | 2 |
Grade 4 | 2 |
More ▼ |
Audience
Researchers | 2 |
Location
Turkey | 3 |
Turkey (Ankara) | 3 |
United Kingdom | 2 |
Alabama | 1 |
California | 1 |
Colorado | 1 |
Germany | 1 |
Haiti | 1 |
Illinois (Chicago) | 1 |
Indiana | 1 |
Mexico | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
ACT Assessment | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
Test of English for… | 1 |
Trends in International… | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Stella Y. Kim; Sungyeun Kim – Educational Measurement: Issues and Practice, 2025
This study presents several multivariate Generalizability theory designs for analyzing automatic item-generated (AIG) based test forms. The study used real data to illustrate the analysis procedure and discuss practical considerations. We collected the data from two groups of students, each group receiving a different form generated by AIG. A…
Descriptors: Generalizability Theory, Automation, Test Items, Students
Joshua B. Gilbert; Zachary Himmelsbach; Luke W. Miratrix; Andrew D. Ho; Benjamin W. Domingue – Annenberg Institute for School Reform at Brown University, 2025
Value added models (VAMs) attempt to estimate the causal effects of teachers and schools on student test scores. We apply Generalizability Theory to show how estimated VA effects depend upon the selection of test items. Standard VAMs estimate causal effects on the items that are included on the test. Generalizability demands consideration of how…
Descriptors: Value Added Models, Reliability, Effect Size, Test Items
Maria Bolsinova; Jesper Tijmstra; Leslie Rutkowski; David Rutkowski – Journal of Educational and Behavioral Statistics, 2024
Profile analysis is one of the main tools for studying whether differential item functioning can be related to specific features of test items. While relevant, profile analysis in its current form has two restrictions that limit its usefulness in practice: It assumes that all test items have equal discrimination parameters, and it does not test…
Descriptors: Test Items, Item Analysis, Generalizability Theory, Achievement Tests
Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model
Custer, Michael; Kim, Jongpil – Online Submission, 2023
This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…
Descriptors: Sample Size, Item Response Theory, Test Items, Computation
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Deniz, Kaan Zulfikar; Ilican, Emel – International Journal of Assessment Tools in Education, 2021
This study aims to compare the G and Phi coefficients as estimated by D studies for a measurement tool with the G and Phi coefficients obtained from real cases in which items of differing difficulty levels were added and also to determine the conditions under which the D studies estimated reliability coefficients closer to reality. The study group…
Descriptors: Generalizability Theory, Test Items, Difficulty Level, Test Reliability
Atilgan, Hakan; Demir, Elif Kübra; Ogretmen, Tuncay; Basokcu, Tahsin Oguz – International Journal of Progressive Education, 2020
It has become a critical question what the reliability level would be when open-ended questions are used in large-scale selection tests. One of the aims of the present study is to determine what the reliability would be in the event that the answers given by test-takers are scored by experts when open-ended short answer questions are used in…
Descriptors: Foreign Countries, Secondary School Students, Test Items, Test Reliability
Shin, Ji-young – Language Testing, 2022
With the present study I investigated the sources of score variance and dependability in a local oral English proficiency test for potential international teaching assistants (ITAs) across four first language (L1) groups, and suggested alternative test designs. Using generalizability theory, I examined the relative importance of L1s (i.e., Indian,…
Descriptors: Foreign Students, Language Tests, Language Proficiency, Oral Language
Kamis, Ömer; Dogan, C. Deha – Journal of Education and Learning, 2018
This research aimed to compare the G and Phi coefficients estimated in Decision studies in Generalizability theory and obtained in actual cases for the same conditions of similar facets by using crossed design. The research was conducted as pure research on 120 individuals (students), six items and 12 raters. An achievement test composed of six…
Descriptors: Generalizability Theory, Decision Making, Reliability, Computation
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen – Advances in Health Sciences Education, 2017
The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…
Descriptors: Interviews, Scores, Generalizability Theory, Monte Carlo Methods
Byram, Jessica N.; Seifert, Mark F.; Brooks, William S.; Fraser-Cotlin, Laura; Thorp, Laura E.; Williams, James M.; Wilson, Adam B. – Anatomical Sciences Education, 2017
With integrated curricula and multidisciplinary assessments becoming more prevalent in medical education, there is a continued need for educational research to explore the advantages, consequences, and challenges of integration practices. This retrospective analysis investigated the number of items needed to reliably assess anatomical knowledge in…
Descriptors: Anatomy, Science Tests, Test Items, Test Reliability
Li, Feifei – ETS Research Report Series, 2017
An information-correction method for testlet-based tests is introduced. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By…
Descriptors: Item Response Theory, Generalizability Theory, Tests, Error of Measurement
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items