NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 32 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Robert Schoen; Lanrong Li; Xiaotong Yang; Ahmet Guven; Claire Riddell – Society for Research on Educational Effectiveness, 2021
Many classroom-observation instruments have been developed (e.g., Gleason et al., 2017; Nava et al., 2019; Sawada et al., 2002), but a very small number of studies published in refereed journals have rigorously examined the quality of the ratings and the instrument using measurement models. For example, Gleason et al. developed a mathematics…
Descriptors: Item Response Theory, Models, Measurement, Mathematics Instruction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Garte, Rebecca – International Journal of Progressive Education, 2017
This paper provides a historical analysis of the past century of progressive education, within the general socio-political context of schooling within the US. The purpose of this review is to create a social, historical and philosophical context for understanding the current narrative of progressive education that exists in educational policy…
Descriptors: Progressive Education, Educational History, Educational Practices, Philosophy
Peer reviewed Peer reviewed
Direct linkDirect link
Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013
A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…
Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance
Peer reviewed Peer reviewed
Direct linkDirect link
Leclerc, Bernard-Simon; Dassa, Clement – Canadian Journal of Program Evaluation, 2009
This study examines the usefulness of the Montreal Service Concept framework of service quality measurement, when it was used as a predefined set of codes in content analysis of patients' responses. As well, the study quantifies the interrater agreement of coded data. Two raters independently reviewed each of the responses from a mail survey of…
Descriptors: Interrater Reliability, Content Analysis, Health Services, Mail Surveys
Peer reviewed Peer reviewed
Direct linkDirect link
Henry, Beverly W.; Smith, Thomas J. – Journal of Nutrition Education and Behavior, 2010
Objective: To develop an instrument to assess client-centered counseling behaviors (skills) of student-counselors in a standardized patient (SP) exercise. Methods: Descriptive study of the accuracy and utility of a newly developed counseling evaluation instrument. Study participants included 11 female student-counselors at a Midwestern…
Descriptors: Feedback (Response), Generalizability Theory, Nutrition, Diseases
Peer reviewed Peer reviewed
Direct linkDirect link
Rodriguez-Campos, Liliana; Rincones-Gomez, Rigoberto; Shen, Jianping – Frontiers of Education in China, 2008
Structural Equation Modeling (SEM) was used in this study to determine the extent to which teachers, principals, and superintendents perceive the leadership construct in the same way. The researchers found that the two-factor model fits the principal group and particularly the superintendent group better than does the four-factor model. The…
Descriptors: Structural Equation Models, Superintendents, Principals, Teacher Attitudes
Peer reviewed Peer reviewed
Direct linkDirect link
Martinez, Jose Felipe; Goldschmidt, Pete; Niemi, David; Baker, Eva L.; Sylvester, Roxanne M. – Educational Assessment, 2007
We conducted generalizability studies to examine the extent to which ratings of language arts performance assignments, administered in a large, diverse, urban district to students in second through ninth grades, result in reliable and precise estimates of true student performance. The results highlight three important points when considering the…
Descriptors: Assignments, Language Arts, Academic Achievement, Urban Areas
Peer reviewed Peer reviewed
Li, Mao-Neng Fred; Lautenschlager, Gary – Educational and Psychological Measurement, 1997
lllustrates a link between the multiple-rater kappa of J. Fleiss (1971) or other analogues and the generalizability (G) coefficient for a single facet design, and discusses the use and interpretation of G theory in the study of interrater agreement when data are measured on a nominal scale. (SLD)
Descriptors: Classification, Generalizability Theory, Interrater Reliability, Research Design
Fan, Xitao; Chen, Michael – 1999
It is erroneous to extend or generalize the inter-rater reliability coefficient estimated from only a (small) proportion of the sample to the rest of the sample data where only one rater is used for scoring, although such generalization is often made implicitly in practice. It is shown that if inter-rater reliability estimate from part of a sample…
Descriptors: Estimation (Mathematics), Generalizability Theory, Interrater Reliability, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
Hintze, John M.; Matthews, William J. – School Psychology Review, 2004
This study examined the generalizability of systematic direct observation across setting and time. Participants included 14 students from an intact inclusionary fifth grade classroom. On-task/off-task behavior was directly observed using momentary time-sampling recording, twice a day, for 10 school days. Using Generalizability (G) theory, results…
Descriptors: Grade 5, Psychometrics, Classroom Observation Techniques, Interrater Reliability
Peer reviewed Peer reviewed
Hurtz, Gregory M.; Hertz, Norman R. – Educational and Psychological Measurement, 1999
Evaluated Angoff ratings from eight different occupational licensing examinations through generalizability theory to estimate the optimal number of raters. Results indicate that approximately 10 to 15 raters is an optimal target range. (SLD)
Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Interrater Reliability
Arnold, Margery E. – 1996
It is incorrect to say "the test is reliable" because reliability is a function not only of the test itself, but of many factors. The present paper explains how different factors affect classical reliability estimates such as test-retest, interrater, internal consistency, and equivalent forms coefficients. Furthermore, the limits of classical test…
Descriptors: Estimation (Mathematics), Generalizability Theory, Heuristics, Interrater Reliability
Peer reviewed Peer reviewed
Figueredo, Aurelio Jose; And Others – Multivariate Behavioral Research, 1995
Two longitudinal studies involving 29 raters concerning the construct validity, temporal stability, and interrater reliability of the latent common factors underlying subjective assessments by human raters of personality traits in the stumptail macaque and the zebra finch illustrate the use of generalizability analysis to test prespecified…
Descriptors: Animal Behavior, Construct Validity, Evaluation Methods, Generalizability Theory
Peer reviewed Peer reviewed
Marsh, Herbert W. – International Journal of Educational Research, 1987
The reliability, long-term stability, and generalizability of student ratings of teacher effectiveness are discussed. The Students' Evaluation of Educational Quality (SEEQ) instrument is examined from these perspectives. The multidimensionality of student response to such evaluation instruments must be recognized. (SLD)
Descriptors: College Students, Generalizability Theory, Interrater Reliability, Postsecondary Education
Previous Page | Next Page ยป
Pages: 1  |  2  |  3