NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers3
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 59 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Chan, Wendy – American Journal of Evaluation, 2022
Over the past ten years, propensity score methods have made an important contribution to improving generalizations from studies that do not select samples randomly from a population of inference. However, these methods require assumptions and recent work has considered the role of bounding approaches that provide a range of treatment impact…
Descriptors: Probability, Scores, Scoring, Generalization
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022
The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…
Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Khodi, Ali – Language Testing in Asia, 2021
The present study attempted to to investigate factors which affect EFL writing scores through using generalizability theory (G-theory). To this purpose, one hundred and twenty students participated in one independent and one integrated writing tasks. Proceeding, their performances were scored by six raters: one self-rating, three peers,-rating and…
Descriptors: Writing Tests, Scores, Generalizability Theory, English (Second Language)
Peer reviewed Peer reviewed
Direct linkDirect link
Lyrica Lucas; Anum Khushal; Robert Mayes; Brian A. Couch; Joseph Dauer – International Journal of Science Education, 2025
Educational reform priorities such as emphasis on quantitative modelling (QM) have positioned undergraduate biology instructors as designers of QM experiences to engage students in authentic science practices that support the development of data-driven and evidence-based reasoning. Yet, little is known about how biology instructors adapt to the…
Descriptors: Undergraduate Students, College Science, Biology, Classroom Observation Techniques
Peer reviewed Peer reviewed
Direct linkDirect link
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Journal of Experimental Education, 2022
In this study, we examined the scoring and generalizability assumptions of an explicit instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Education, Classroom Observation Techniques, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Grantee Submission, 2020
In this study, we examined the scoring and generalizability assumptions of an Explicit Instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Evaluation, Classroom Observation Techniques, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Pua, Daisy J.; Peyton, David J.; Brownell, Mary T.; Contesse, Valentina A.; Jones, Nathan D. – Journal of Learning Disabilities, 2021
Advancing teacher candidates' overall competence through use of valid teacher observation systems should be an essential element of teacher preparation. Yet, the field of special education has not provided observation protocols designed specifically for preservice teachers that are founded in theoretical perspectives and research on effective…
Descriptors: Preservice Teachers, Preservice Teacher Education, Observation, Special Education
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Uzun, N. Bilge; Alici, Devrim; Aktas, Mehtap – European Journal of Educational Research, 2019
The purpose of study is to examine the reliability of analytical rubrics and checklists developed for the assessment of story writing skills by means of generalizability theory. The study group consisted of 52 students attending the 5th grade at primary school and 20 raters in Mersin University. The G study was carried out with the fully crossed…
Descriptors: Foreign Countries, Scoring Rubrics, Check Lists, Writing Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Borowiec, Katrina; Castle, Courtney – Practical Assessment, Research & Evaluation, 2019
Rater cognition or "think-aloud" studies have historically been used to enhance rater accuracy and consistency in writing and language assessments. As assessments are developed for new, complex constructs from the "Next Generation Science Standards (NGSS)," the present study illustrates the utility of extending…
Descriptors: Evaluators, Scoring, Scoring Rubrics, Protocol Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Schmidgall, Jonathan E. – ETS Research Report Series, 2017
This report briefly reviews the design and scoring procedure for the "TOEIC"® Speaking test and summarizes existing evidence about the consistency of TOEIC Speaking test scores. It then describes several analyses conducted using generalizability theory to provide additional information about the consistency of scores across different…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Speech Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Johnson, Austin H.; Chafouleas, Sandra M.; Briesch, Amy M. – School Psychology Quarterly, 2017
In this study, generalizability theory was used to examine the extent to which (a) time-sampling methodology, (b) number of simultaneous behavior targets, and (c) individual raters influenced variance in ratings of academic engagement for an elementary-aged student. Ten graduate-student raters, with an average of 7.20 hr of previous training in…
Descriptors: Generalizability Theory, Sampling, Elementary School Students, Learner Engagement
McLaughlin, Tara W.; Snyder, Patricia A.; Algina, James – Grantee Submission, 2017
The Learning Target Rating Scale (LTRS) is a measure designed to evaluate the quality of teacher-developed learning targets for embedded instruction for early learning. In the present study, we examined the measurement dependability of LTRS scores by conducting a generalizability study (G-study). We used a partially nested, three-facet model to…
Descriptors: Generalizability Theory, Scores, Rating Scales, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4