Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 19 |
Since 2006 (last 20 years) | 34 |
Descriptor
Generalizability Theory | 59 |
Scoring | 59 |
Interrater Reliability | 27 |
Scores | 21 |
Reliability | 17 |
Error of Measurement | 15 |
Test Reliability | 10 |
Performance Based Assessment | 9 |
Evaluation Methods | 8 |
Foreign Countries | 8 |
Scoring Rubrics | 8 |
More ▼ |
Source
Author
Clauser, Brian E. | 3 |
Clyman, Stephen G. | 2 |
Crawford, Angela R. | 2 |
Haertel, Edward H. | 2 |
Harik, Polina | 2 |
Johnson, Evelyn S. | 2 |
Moylan, Laura A. | 2 |
Solano-Flores, Guillermo | 2 |
Zheng, Yuzhu | 2 |
Aksu, Gökhan | 1 |
Aktas, Mehtap | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 6 |
Postsecondary Education | 6 |
Elementary Education | 5 |
Grade 4 | 3 |
Secondary Education | 3 |
Grade 5 | 2 |
Early Childhood Education | 1 |
Elementary Secondary Education | 1 |
Grade 6 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
More ▼ |
Audience
Researchers | 3 |
Location
Turkey | 2 |
United Kingdom | 2 |
Australia | 1 |
Canada | 1 |
Hong Kong | 1 |
Iowa | 1 |
Japan | 1 |
Mexico | 1 |
Netherlands | 1 |
Taiwan | 1 |
United States | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
Teacher Performance… | 1 |
Test of English as a Foreign… | 1 |
Test of English for… | 1 |
Texas Assessment of Academic… | 1 |
Trends in International… | 1 |
United States Medical… | 1 |
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Chan, Wendy – American Journal of Evaluation, 2022
Over the past ten years, propensity score methods have made an important contribution to improving generalizations from studies that do not select samples randomly from a population of inference. However, these methods require assumptions and recent work has considered the role of bounding approaches that provide a range of treatment impact…
Descriptors: Probability, Scores, Scoring, Generalization
Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Khodi, Ali – Language Testing in Asia, 2021
The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating, three peers,-rating and…
Descriptors: Writing Tests, Scores, Generalizability Theory, English (Second Language)
Lyrica Lucas; Anum Khushal; Robert Mayes; Brian A. Couch; Joseph Dauer – International Journal of Science Education, 2025
Educational reform priorities such as emphasis on quantitative modelling (QM) have positioned undergraduate biology instructors as designers of QM experiences to engage students in authentic science practices that support the development of data-driven and evidence-based reasoning. Yet, little is known about how biology instructors adapt to the…
Descriptors: Undergraduate Students, College Science, Biology, Classroom Observation Techniques
Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Journal of Experimental Education, 2022
In this study, we examined the scoring and generalizability assumptions of an explicit instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Education, Classroom Observation Techniques, Validity
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Grantee Submission, 2020
In this study, we examined the scoring and generalizability assumptions of an Explicit Instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Evaluation, Classroom Observation Techniques, Validity
Pua, Daisy J.; Peyton, David J.; Brownell, Mary T.; Contesse, Valentina A.; Jones, Nathan D. – Journal of Learning Disabilities, 2021
Advancing teacher candidates' overall competence through use of valid teacher observation systems should be an essential element of teacher preparation. Yet, the field of special education has not provided observation protocols designed specifically for preservice teachers that are founded in theoretical perspectives and research on effective…
Descriptors: Preservice Teachers, Preservice Teacher Education, Observation, Special Education
Uzun, N. Bilge; Alici, Devrim; Aktas, Mehtap – European Journal of Educational Research, 2019
The purpose of study is to examine the reliability of analytical rubrics and checklists developed for the assessment of story writing skills by means of generalizability theory. The study group consisted of 52 students attending the 5th grade at primary school and 20 raters in Mersin University. The G study was carried out with the fully crossed…
Descriptors: Foreign Countries, Scoring Rubrics, Check Lists, Writing Tests
Borowiec, Katrina; Castle, Courtney – Practical Assessment, Research & Evaluation, 2019
Rater cognition or "think-aloud" studies have historically been used to enhance rater accuracy and consistency in writing and language assessments. As assessments are developed for new, complex constructs from the "Next Generation Science Standards (NGSS)," the present study illustrates the utility of extending…
Descriptors: Evaluators, Scoring, Scoring Rubrics, Protocol Analysis
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Schmidgall, Jonathan E. – ETS Research Report Series, 2017
This report briefly reviews the design and scoring procedure for the "TOEIC"® Speaking test and summarizes existing evidence about the consistency of TOEIC Speaking test scores. It then describes several analyses conducted using generalizability theory to provide additional information about the consistency of scores across different…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Speech Tests
Johnson, Austin H.; Chafouleas, Sandra M.; Briesch, Amy M. – School Psychology Quarterly, 2017
In this study, generalizability theory was used to examine the extent to which (a) time-sampling methodology, (b) number of simultaneous behavior targets, and (c) individual raters influenced variance in ratings of academic engagement for an elementary-aged student. Ten graduate-student raters, with an average of 7.20 hr of previous training in…
Descriptors: Generalizability Theory, Sampling, Elementary School Students, Learner Engagement
McLaughlin, Tara W.; Snyder, Patricia A.; Algina, James – Grantee Submission, 2017
The Learning Target Rating Scale (LTRS) is a measure designed to evaluate the quality of teacher-developed learning targets for embedded instruction for early learning. In the present study, we examined the measurement dependability of LTRS scores by conducting a generalizability study (G-study). We used a partially nested, three-facet model to…
Descriptors: Generalizability Theory, Scores, Rating Scales, Evaluation Methods
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy