Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 5 |
Since 2016 (last 10 years) | 12 |
Since 2006 (last 20 years) | 35 |
Descriptor
Generalizability Theory | 54 |
Test Items | 54 |
Error of Measurement | 15 |
Foreign Countries | 13 |
Scores | 13 |
Test Reliability | 13 |
Test Construction | 11 |
Difficulty Level | 10 |
Reliability | 10 |
Item Response Theory | 9 |
English (Second Language) | 8 |
More ▼ |
Source
Author
Solano-Flores, Guillermo | 5 |
Lee, Guemin | 3 |
Brennan, Robert L. | 2 |
Clauser, Brian E. | 2 |
Frisbie, David A. | 2 |
Harik, Polina | 2 |
Li, Min | 2 |
Margolis, Melissa J. | 2 |
Webb, Noreen M. | 2 |
Alonzo, Julie | 1 |
Anderson, Dan | 1 |
More ▼ |
Publication Type
Journal Articles | 40 |
Reports - Research | 35 |
Reports - Evaluative | 16 |
Speeches/Meeting Papers | 9 |
Information Analyses | 3 |
Numerical/Quantitative Data | 3 |
Reports - Descriptive | 2 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 10 |
Postsecondary Education | 7 |
Secondary Education | 5 |
Grade 8 | 4 |
Elementary Education | 3 |
Grade 5 | 3 |
Grade 7 | 3 |
Junior High Schools | 3 |
Middle Schools | 3 |
Grade 3 | 2 |
Grade 4 | 2 |
More ▼ |
Audience
Researchers | 2 |
Location
Turkey | 3 |
Turkey (Ankara) | 3 |
United Kingdom | 2 |
Alabama | 1 |
California | 1 |
Colorado | 1 |
Germany | 1 |
Haiti | 1 |
Illinois (Chicago) | 1 |
Indiana | 1 |
Mexico | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
ACT Assessment | 1 |
National Assessment of… | 1 |
Program for International… | 1 |
Test of English for… | 1 |
Trends in International… | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Maria Bolsinova; Jesper Tijmstra; Leslie Rutkowski; David Rutkowski – Journal of Educational and Behavioral Statistics, 2024
Profile analysis is one of the main tools for studying whether differential item functioning can be related to specific features of test items. While relevant, profile analysis in its current form has two restrictions that limit its usefulness in practice: It assumes that all test items have equal discrimination parameters, and it does not test…
Descriptors: Test Items, Item Analysis, Generalizability Theory, Achievement Tests
Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model
Custer, Michael; Kim, Jongpil – Online Submission, 2023
This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…
Descriptors: Sample Size, Item Response Theory, Test Items, Computation
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Deniz, Kaan Zulfikar; Ilican, Emel – International Journal of Assessment Tools in Education, 2021
This study aims to compare the G and Phi coefficients as estimated by D studies for a measurement tool with the G and Phi coefficients obtained from real cases in which items of differing difficulty levels were added and also to determine the conditions under which the D studies estimated reliability coefficients closer to reality. The study group…
Descriptors: Generalizability Theory, Test Items, Difficulty Level, Test Reliability
Atilgan, Hakan; Demir, Elif Kübra; Ogretmen, Tuncay; Basokcu, Tahsin Oguz – International Journal of Progressive Education, 2020
It has become a critical question what the reliability level would be when open-ended questions are used in large-scale selection tests. One of the aims of the present study is to determine what the reliability would be in the event that the answers given by test-takers are scored by experts when open-ended short answer questions are used in…
Descriptors: Foreign Countries, Secondary School Students, Test Items, Test Reliability
Shin, Ji-young – Language Testing, 2022
With the present study I investigated the sources of score variance and dependability in a local oral English proficiency test for potential international teaching assistants (ITAs) across four first language (L1) groups, and suggested alternative test designs. Using generalizability theory, I examined the relative importance of L1s (i.e., Indian,…
Descriptors: Foreign Students, Language Tests, Language Proficiency, Oral Language
Kamis, Ömer; Dogan, C. Deha – Journal of Education and Learning, 2018
This research aimed to compare the G and Phi coefficients estimated in Decision studies in Generalizability theory and obtained in actual cases for the same conditions of similar facets by using crossed design. The research was conducted as pure research on 120 individuals (students), six items and 12 raters. An achievement test composed of six…
Descriptors: Generalizability Theory, Decision Making, Reliability, Computation
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Zaidi, Nikki L.; Swoboda, Christopher M.; Kelcey, Benjamin M.; Manuel, R. Stephen – Advances in Health Sciences Education, 2017
The extant literature has largely ignored a potentially significant source of variance in multiple mini-interview (MMI) scores by "hiding" the variance attributable to the sample of attributes used on an evaluation form. This potential source of hidden variance can be defined as rating items, which typically comprise an MMI evaluation…
Descriptors: Interviews, Scores, Generalizability Theory, Monte Carlo Methods
Byram, Jessica N.; Seifert, Mark F.; Brooks, William S.; Fraser-Cotlin, Laura; Thorp, Laura E.; Williams, James M.; Wilson, Adam B. – Anatomical Sciences Education, 2017
With integrated curricula and multidisciplinary assessments becoming more prevalent in medical education, there is a continued need for educational research to explore the advantages, consequences, and challenges of integration practices. This retrospective analysis investigated the number of items needed to reliably assess anatomical knowledge in…
Descriptors: Anatomy, Science Tests, Test Items, Test Reliability
Li, Feifei – ETS Research Report Series, 2017
An information-correction method for testlet-based tests is introduced. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By…
Descriptors: Item Response Theory, Generalizability Theory, Tests, Error of Measurement
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Solano-Flores, Guillermo; Backhoff, Eduardo; Contreras-Niño, Luis A.; Vázquez-Muñoz, Mariana – International Journal of Testing, 2015
Indicators of academic achievement for bilingual students can be inaccurate due to linguistic heterogeneity. For indigenous populations, language shift (the gradual replacement of one language by another) is a factor that can increase this heterogeneity and poses an additional challenge for valid testing. We investigated whether and how indigenous…
Descriptors: Foreign Countries, Maya (People), Preschool Children, Mathematics Tests
Teker, Gulsen Tasdelen; Dogan, Nuri – Educational Sciences: Theory and Practice, 2015
Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1,500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state…
Descriptors: Foreign Countries, Test Items, Test Bias, Item Response Theory