ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	10

Descriptor

Generalizability Theory	32
Interrater Reliability	32
Test Reliability	11
Scoring	10
Performance Based Assessment	7
Test Validity	7
Error of Measurement	6
Evaluation Methods	6
Scores	6
Foreign Countries	5
Educational Assessment	4
Estimation (Mathematics)	4
Research Methodology	4
Scoring Rubrics	4
Statistical Analysis	4
Test Interpretation	4
Test Use	4
Correlation	3
Cutting Scores	3
High Stakes Tests	3
Higher Education	3
Interaction	3
Measurement Techniques	3
Sample Size	3
Test Items	3
More ▼

Source

Educational and Psychological…	4
Applied Measurement in…	1
Assessment & Evaluation in…	1
Canadian Journal of Program…	1
Educational Assessment	1
Educational Measurement:…	1
Frontiers of Education in…	1
International Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Journal of Nutrition…	1
Journal of Outcome Measurement	1
Language Testing	1
Multivariate Behavioral…	1
School Psychology Review	1
Society for Research on…	1
More ▼

Publication Type

Reports - Evaluative	32
Journal Articles	19
Speeches/Meeting Papers	12
Information Analyses	1
Numerical/Quantitative Data	1
Reports - Research	1
Tests/Questionnaires	1

Education Level

Higher Education	5
Elementary Secondary Education	3
Adult Education	1
Elementary Education	1
Grade 1	1
Grade 5	1
Postsecondary Education	1

Audience

Researchers

Location

Canada (Montreal)	1
Finland (Helsinki)	1
Japan	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Students Evaluation of…	1
Trends in International…	1
Work Keys (ACT)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 32 results Save | Export

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Using a Many-Facet Rasch Model to Gain Insight into Measurement of Instructional Practice in Mathematics

Peer reviewed

Direct link

Robert Schoen; Lanrong Li; Xiaotong Yang; Ahmet Guven; Claire Riddell – Society for Research on Educational Effectiveness, 2021

Many classroom-observation instruments have been developed (e.g., Gleason et al., 2017; Nava et al., 2019; Sawada et al., 2002), but a very small number of studies published in refereed journals have rigorously examined the quality of the ratings and the instrument using measurement models. For example, Gleason et al. developed a mathematics…

Descriptors: Item Response Theory, Models, Measurement, Mathematics Instruction

American Progressive Education and the Schooling of Poor Children: A Brief History of a Philosophy in Practice

Peer reviewed
PDF on ERIC

Download full text

Garte, Rebecca – International Journal of Progressive Education, 2017

This paper provides a historical analysis of the past century of progressive education, within the general socio-political context of schooling within the US. The purpose of this review is to create a social, historical and philosophical context for understanding the current narrative of progressive education that exists in educational policy…

Descriptors: Progressive Education, Educational History, Educational Practices, Philosophy

The Number of Feedbacks Needed for Reliable Evaluation. A Multilevel Analysis of the Reliability, Stability and Generalisability of Students' Evaluation of Teaching

Peer reviewed

Direct link

Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013

A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…

Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance

Interrater Reliability in Content Analysis of Healthcare Service Quality Using Montreal's Conceptual Framework

Peer reviewed

Direct link

Leclerc, Bernard-Simon; Dassa, Clement – Canadian Journal of Program Evaluation, 2009

This study examines the usefulness of the Montreal Service Concept framework of service quality measurement, when it was used as a predefined set of codes in content analysis of patients' responses. As well, the study quantifies the interrater agreement of coded data. Two raters independently reviewed each of the responses from a mail survey of…

Descriptors: Interrater Reliability, Content Analysis, Health Services, Mail Surveys

Evaluation of the FOCUS (Feedback on Counseling Using Simulation) Instrument for Assessment of Client-Centered Nutrition Counseling Behaviors

Peer reviewed

Direct link

Henry, Beverly W.; Smith, Thomas J. – Journal of Nutrition Education and Behavior, 2010

Objective: To develop an instrument to assess client-centered counseling behaviors (skills) of student-counselors in a standardized patient (SP) exercise. Methods: Descriptive study of the accuracy and utility of a newly developed counseling evaluation instrument. Study participants included 11 female student-counselors at a Midwestern…

Descriptors: Feedback (Response), Generalizability Theory, Nutrition, Diseases

Do Teachers, Principals, and Superintendents Perceive Leadership the Same Way? A Structural Equation Modeling Test of the Equivalence of a Multi-Dimensional Construct across Groups

Peer reviewed

Direct link

Rodriguez-Campos, Liliana; Rincones-Gomez, Rigoberto; Shen, Jianping – Frontiers of Education in China, 2008

Structural Equation Modeling (SEM) was used in this study to determine the extent to which teachers, principals, and superintendents perceive the leadership construct in the same way. The researchers found that the two-factor model fits the principal group and particularly the superintendent group better than does the four-factor model. The…

Descriptors: Structural Equation Models, Superintendents, Principals, Teacher Attitudes

Language Arts Performance Assignments: Generalizability Studies of Local and Central Ratings

Peer reviewed

Direct link

Martinez, Jose Felipe; Goldschmidt, Pete; Niemi, David; Baker, Eva L.; Sylvester, Roxanne M. – Educational Assessment, 2007

We conducted generalizability studies to examine the extent to which ratings of language arts performance assignments, administered in a large, diverse, urban district to students in second through ninth grades, result in reliable and precise estimates of true student performance. The results highlight three important points when considering the…

Descriptors: Assignments, Language Arts, Academic Achievement, Urban Areas

Generalizability Theory Applied to Categorical Data.

Peer reviewed

Li, Mao-Neng Fred; Lautenschlager, Gary – Educational and Psychological Measurement, 1997

lllustrates a link between the multiple-rater kappa of J. Fleiss (1971) or other analogues and the generalizability (G) coefficient for a single facet design, and discusses the use and interpretation of G theory in the study of interrater agreement when data are measured on a nominal scale. (SLD)

Descriptors: Classification, Generalizability Theory, Interrater Reliability, Research Design

When Inter-Rater Reliability Is Obtained from Only Part of a Sample.

Download full text

Fan, Xitao; Chen, Michael – 1999

It is erroneous to extend or generalize the inter-rater reliability coefficient estimated from only a (small) proportion of the sample to the rest of the sample data where only one rater is used for scoring, although such generalization is often made implicitly in practice. It is shown that if inter-rater reliability estimate from part of a sample…

Descriptors: Estimation (Mathematics), Generalizability Theory, Interrater Reliability, Sample Size

The Generalizability of Systematic Direct Observations across Time and Setting: A Preliminary Investigation of the Psychometrics of Behavioral Observation. General Articles

Peer reviewed

Direct link

Hintze, John M.; Matthews, William J. – School Psychology Review, 2004

This study examined the generalizability of systematic direct observation across setting and time. Participants included 14 students from an intact inclusionary fifth grade classroom. On-task/off-task behavior was directly observed using momentary time-sampling recording, twice a day, for 10 school days. Using Generalizability (G) theory, results…

Descriptors: Grade 5, Psychometrics, Classroom Observation Techniques, Interrater Reliability

How Many Raters Should Be Used for Establishing Cutoff Scores with the Angoff Method? A Generalizability Theory Study.

Peer reviewed

Hurtz, Gregory M.; Hertz, Norman R. – Educational and Psychological Measurement, 1999

Evaluated Angoff ratings from eight different occupational licensing examinations through generalizability theory to estimate the optimal number of raters. Results indicate that approximately 10 to 15 raters is an optimal target range. (SLD)

Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Interrater Reliability

Influences on and Limitations of Classical Test Theory Reliability Estimates.

Download full text

Arnold, Margery E. – 1996

It is incorrect to say "the test is reliable" because reliability is a function not only of the test itself, but of many factors. The present paper explains how different factors affect classical reliability estimates such as test-retest, interrater, internal consistency, and equivalent forms coefficients. Furthermore, the limits of classical test…

Descriptors: Estimation (Mathematics), Generalizability Theory, Heuristics, Interrater Reliability

A Generalizability Analysis of Subjective Personality Assessments in the Stumptail Macaque and the Zebra Finch.

Peer reviewed

Figueredo, Aurelio Jose; And Others – Multivariate Behavioral Research, 1995

Two longitudinal studies involving 29 raters concerning the construct validity, temporal stability, and interrater reliability of the latent common factors underlying subjective assessments by human raters of personality traits in the stumptail macaque and the zebra finch illustrate the use of generalizability analysis to test prespecified…

Descriptors: Animal Behavior, Construct Validity, Evaluation Methods, Generalizability Theory

Reliability, Stability and Generalizability.

Peer reviewed

Marsh, Herbert W. – International Journal of Educational Research, 1987

The reliability, long-term stability, and generalizability of student ratings of teacher effectiveness are discussed. The Students' Evaluation of Educational Quality (SEEQ) instrument is examined from these perspectives. The multidimensionality of student response to such evaluation instruments must be recognized. (SLD)

Descriptors: College Students, Generalizability Theory, Interrater Reliability, Postsecondary Education

Previous Page | Next Page »

Pages: 1 | 2 | 3

Baker, Eva L.	2
Abedi, Jamal	1
Ahmet Guven	1
Arnold, Margery E.	1
Bennett, Randy Elliot	1
Bimpeh, Yaw	1
Brennan, Robert L.	1
Bunch, Michael B.	1
Burton, Elizabeth	1
Chen, Michael	1
Claire Riddell	1
Crehan, Kevin D.	1
Dassa, Clement	1
Fan, Xitao	1
Fawson, Parker C.	1
Figueredo, Aurelio Jose	1
Garte, Rebecca	1
Goldschmidt, Pete	1
Hafner, John C.	1
Hafner, Patti M.	1
Harrison, Liz	1
Henry, Beverly W.	1
Hertz, Norman R.	1
Hintze, John M.	1
More ▼