Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 10 |
Descriptor
Source
Author
Baker, Eva L. | 2 |
Abedi, Jamal | 1 |
Ahmet Guven | 1 |
Arnold, Margery E. | 1 |
Bennett, Randy Elliot | 1 |
Bimpeh, Yaw | 1 |
Brennan, Robert L. | 1 |
Bunch, Michael B. | 1 |
Burton, Elizabeth | 1 |
Chen, Michael | 1 |
Claire Riddell | 1 |
More ▼ |
Publication Type
Reports - Evaluative | 32 |
Journal Articles | 19 |
Speeches/Meeting Papers | 12 |
Information Analyses | 1 |
Numerical/Quantitative Data | 1 |
Reports - Research | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 5 |
Elementary Secondary Education | 3 |
Adult Education | 1 |
Elementary Education | 1 |
Grade 1 | 1 |
Grade 5 | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Students Evaluation of… | 1 |
Trends in International… | 1 |
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Robert Schoen; Lanrong Li; Xiaotong Yang; Ahmet Guven; Claire Riddell – Society for Research on Educational Effectiveness, 2021
Many classroom-observation instruments have been developed (e.g., Gleason et al., 2017; Nava et al., 2019; Sawada et al., 2002), but a very small number of studies published in refereed journals have rigorously examined the quality of the ratings and the instrument using measurement models. For example, Gleason et al. developed a mathematics…
Descriptors: Item Response Theory, Models, Measurement, Mathematics Instruction
Garte, Rebecca – International Journal of Progressive Education, 2017
This paper provides a historical analysis of the past century of progressive education, within the general socio-political context of schooling within the US. The purpose of this review is to create a social, historical and philosophical context for understanding the current narrative of progressive education that exists in educational policy…
Descriptors: Progressive Education, Educational History, Educational Practices, Philosophy
Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013
A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…
Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance
Leclerc, Bernard-Simon; Dassa, Clement – Canadian Journal of Program Evaluation, 2009
This study examines the usefulness of the Montreal Service Concept framework of service quality measurement, when it was used as a predefined set of codes in content analysis of patients' responses. As well, the study quantifies the interrater agreement of coded data. Two raters independently reviewed each of the responses from a mail survey of…
Descriptors: Interrater Reliability, Content Analysis, Health Services, Mail Surveys
Henry, Beverly W.; Smith, Thomas J. – Journal of Nutrition Education and Behavior, 2010
Objective: To develop an instrument to assess client-centered counseling behaviors (skills) of student-counselors in a standardized patient (SP) exercise. Methods: Descriptive study of the accuracy and utility of a newly developed counseling evaluation instrument. Study participants included 11 female student-counselors at a Midwestern…
Descriptors: Feedback (Response), Generalizability Theory, Nutrition, Diseases
Rodriguez-Campos, Liliana; Rincones-Gomez, Rigoberto; Shen, Jianping – Frontiers of Education in China, 2008
Structural Equation Modeling (SEM) was used in this study to determine the extent to which teachers, principals, and superintendents perceive the leadership construct in the same way. The researchers found that the two-factor model fits the principal group and particularly the superintendent group better than does the four-factor model. The…
Descriptors: Structural Equation Models, Superintendents, Principals, Teacher Attitudes
Martinez, Jose Felipe; Goldschmidt, Pete; Niemi, David; Baker, Eva L.; Sylvester, Roxanne M. – Educational Assessment, 2007
We conducted generalizability studies to examine the extent to which ratings of language arts performance assignments, administered in a large, diverse, urban district to students in second through ninth grades, result in reliable and precise estimates of true student performance. The results highlight three important points when considering the…
Descriptors: Assignments, Language Arts, Academic Achievement, Urban Areas

Li, Mao-Neng Fred; Lautenschlager, Gary – Educational and Psychological Measurement, 1997
lllustrates a link between the multiple-rater kappa of J. Fleiss (1971) or other analogues and the generalizability (G) coefficient for a single facet design, and discusses the use and interpretation of G theory in the study of interrater agreement when data are measured on a nominal scale. (SLD)
Descriptors: Classification, Generalizability Theory, Interrater Reliability, Research Design
Fan, Xitao; Chen, Michael – 1999
It is erroneous to extend or generalize the inter-rater reliability coefficient estimated from only a (small) proportion of the sample to the rest of the sample data where only one rater is used for scoring, although such generalization is often made implicitly in practice. It is shown that if inter-rater reliability estimate from part of a sample…
Descriptors: Estimation (Mathematics), Generalizability Theory, Interrater Reliability, Sample Size
Hintze, John M.; Matthews, William J. – School Psychology Review, 2004
This study examined the generalizability of systematic direct observation across setting and time. Participants included 14 students from an intact inclusionary fifth grade classroom. On-task/off-task behavior was directly observed using momentary time-sampling recording, twice a day, for 10 school days. Using Generalizability (G) theory, results…
Descriptors: Grade 5, Psychometrics, Classroom Observation Techniques, Interrater Reliability

Hurtz, Gregory M.; Hertz, Norman R. – Educational and Psychological Measurement, 1999
Evaluated Angoff ratings from eight different occupational licensing examinations through generalizability theory to estimate the optimal number of raters. Results indicate that approximately 10 to 15 raters is an optimal target range. (SLD)
Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Interrater Reliability
Arnold, Margery E. – 1996
It is incorrect to say "the test is reliable" because reliability is a function not only of the test itself, but of many factors. The present paper explains how different factors affect classical reliability estimates such as test-retest, interrater, internal consistency, and equivalent forms coefficients. Furthermore, the limits of classical test…
Descriptors: Estimation (Mathematics), Generalizability Theory, Heuristics, Interrater Reliability

Figueredo, Aurelio Jose; And Others – Multivariate Behavioral Research, 1995
Two longitudinal studies involving 29 raters concerning the construct validity, temporal stability, and interrater reliability of the latent common factors underlying subjective assessments by human raters of personality traits in the stumptail macaque and the zebra finch illustrate the use of generalizability analysis to test prespecified…
Descriptors: Animal Behavior, Construct Validity, Evaluation Methods, Generalizability Theory

Marsh, Herbert W. – International Journal of Educational Research, 1987
The reliability, long-term stability, and generalizability of student ratings of teacher effectiveness are discussed. The Students' Evaluation of Educational Quality (SEEQ) instrument is examined from these perspectives. The multidimensionality of student response to such evaluation instruments must be recognized. (SLD)
Descriptors: College Students, Generalizability Theory, Interrater Reliability, Postsecondary Education