ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	9

Descriptor

Evaluation Methods	15
Generalizability Theory	15
Interrater Reliability	15
Test Reliability	8
Scoring	4
Error of Measurement	3
Performance Based Assessment	3
Scores	3
Scoring Rubrics	3
Correlation	2
Evaluation Problems	2
Evaluation Research	2
Evaluators	2
Feedback (Response)	2
Foreign Countries	2
Instructional Effectiveness	2
Item Analysis	2
Reliability	2
Statistical Analysis	2
Student Evaluation	2
Teacher Effectiveness	2
Test Theory	2
Test Validity	2
Writing Tests	2
Academic Achievement	1
More ▼

Source

Multivariate Behavioral…	2
Advances in Health Sciences…	1
Assessment & Evaluation in…	1
Educational Assessment	1
International Journal of…	1
International Journal of…	1
Journal of Communication…	1
Journal of Nutrition…	1
Journal of Technology,…	1
Language Testing	1
Learning and Instruction	1
Research & Practice in…	1
More ▼

Publication Type

Journal Articles	13
Reports - Research	8
Reports - Evaluative	6
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Higher Education	6
Postsecondary Education	3
Elementary Secondary Education	2
Elementary Education	1
Grade 8	1

Audience

Researchers

Location

Asia	1
Finland (Helsinki)	1
Oklahoma	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Comparison of the Results of the Generalizability Theory with the Inter-Rater Agreement Coefficients

Peer reviewed
PDF on ERIC

Download full text

Eser, Mehmet Taha; Aksu, Gökhan – International Journal of Curriculum and Instruction, 2022

The agreement between raters is examined within the scope of the concept of "inter-rater reliability". Although there are clear definitions of the concepts of agreement between raters and reliability between raters, there is no clear information about the conditions under which agreement and reliability level methods are appropriate to…

Descriptors: Generalizability Theory, Interrater Reliability, Evaluation Methods, Test Theory

Working with Sparse Data in Rated Language Tests: Generalizability Theory Applications

Peer reviewed

Direct link

Lin, Chih-Kai – Language Testing, 2017

Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…

Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy

Observer Ratings of Instructional Quality: Do They Fulfill What They Promise?

Peer reviewed

Direct link

Praetorius, Anna-Katharina; Lenske, Gerlinde; Helmke, Andreas – Learning and Instruction, 2012

Despite considerable interest in the topic of instructional quality in research as well as practice, little is known about the quality of its assessment. Using generalizability analysis as well as content analysis, the present study investigates how reliably and validly instructional quality is measured by observer ratings. Twelve trained raters…

Descriptors: Student Teachers, Interrater Reliability, Content Analysis, Observation

The Number of Feedbacks Needed for Reliable Evaluation. A Multilevel Analysis of the Reliability, Stability and Generalisability of Students' Evaluation of Teaching

Peer reviewed

Direct link

Rantanen, Pekka – Assessment & Evaluation in Higher Education, 2013

A multilevel analysis approach was used to analyse students' evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher's general teaching effectiveness, one needs to evaluate four randomly chosen course implementations.…

Descriptors: Test Reliability, Feedback (Response), Generalizability Theory, Student Evaluation of Teacher Performance

Generalizability of Student Writing across Multiple Tasks: A Challenge for Authentic Assessment

Peer reviewed
PDF on ERIC

Download full text

Hathcoat, John D.; Penn, Jeremy D. – Research & Practice in Assessment, 2012

Critics of standardized testing have recommended replacing standardized tests with more authentic assessment measures, such as classroom assignments, projects, or portfolios rated by a panel of raters using common rubrics. Little research has examined the consistency of scores across multiple authentic assignments or the implications of this…

Descriptors: Generalizability Theory, Performance Based Assessment, Writing Across the Curriculum, Standardized Tests

Evaluation of the FOCUS (Feedback on Counseling Using Simulation) Instrument for Assessment of Client-Centered Nutrition Counseling Behaviors

Peer reviewed

Direct link

Henry, Beverly W.; Smith, Thomas J. – Journal of Nutrition Education and Behavior, 2010

Objective: To develop an instrument to assess client-centered counseling behaviors (skills) of student-counselors in a standardized patient (SP) exercise. Methods: Descriptive study of the accuracy and utility of a newly developed counseling evaluation instrument. Study participants included 11 female student-counselors at a Midwestern…

Descriptors: Feedback (Response), Generalizability Theory, Nutrition, Diseases

Multi-Source Evaluation of Interpersonal and Communication Skills of Family Medicine Residents

Peer reviewed

Direct link

Leung, Kai-Kuen; Wang, Wei-Dan; Chen, Yen-Yuan – Advances in Health Sciences Education, 2012

There is a lack of information on the use of multi-source evaluation to assess trainees' interpersonal and communication skills in Oriental settings. This study is conducted to assess the reliability and applicability of assessing the interpersonal and communication skills of family medicine residents by patients, peer residents, nurses, and…

Descriptors: Foreign Countries, Clinical Teaching (Health Professions), Communication Skills, Patients

Language Arts Performance Assignments: Generalizability Studies of Local and Central Ratings

Peer reviewed

Direct link

Martinez, Jose Felipe; Goldschmidt, Pete; Niemi, David; Baker, Eva L.; Sylvester, Roxanne M. – Educational Assessment, 2007

We conducted generalizability studies to examine the extent to which ratings of language arts performance assignments, administered in a large, diverse, urban district to students in second through ninth grades, result in reliable and precise estimates of true student performance. The results highlight three important points when considering the…

Descriptors: Assignments, Language Arts, Academic Achievement, Urban Areas

Interrater/Test Reliability System (ITRS).

Peer reviewed

Abedi, Jamal – Multivariate Behavioral Research, 1996

The Interrater/Test Reliability System (ITRS) is described. The ITRS is a comprehensive computer tool used to address questions of interrater reliability that computes several different indices of interrater reliability and the generalizability coefficient over raters and topics. The system is available in IBM compatible or Macintosh format. (SLD)

Descriptors: Computer Software, Computer Software Evaluation, Evaluation Methods, Evaluators

Discourse Analysis Procedures: Reliability Issues.

Peer reviewed

Hux, Karen; And Others – Journal of Communication Disorders, 1997

A study evaluated and compared four methods of assessing reliability on one discourse analysis procedure--a modified version of Damico's Clinical Discourse Analysis. The methods were Pearson product-moment correlations; interobserver agreement; Cohen's kappa; and generalizability coefficients. The strengths and weaknesses of the methods are…

Descriptors: Communication Disorders, Discourse Analysis, Evaluation Methods, Evaluation Problems

Toward More Substantively Meaningful Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Direct link

Ben-Simon, Anat; Bennett, Randy Elliott – Journal of Technology, Learning, and Assessment, 2007

This study evaluated a "substantively driven" method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater[R], to compare the performance of three approaches to automated essay scoring: a "brute-empirical" approach in which variables are selected and weighted solely according to…

Descriptors: Writing Evaluation, Writing Tests, Scoring, Essays

A Generalizability Analysis of Subjective Personality Assessments in the Stumptail Macaque and the Zebra Finch.

Peer reviewed

Figueredo, Aurelio Jose; And Others – Multivariate Behavioral Research, 1995

Two longitudinal studies involving 29 raters concerning the construct validity, temporal stability, and interrater reliability of the latent common factors underlying subjective assessments by human raters of personality traits in the stumptail macaque and the zebra finch illustrate the use of generalizability analysis to test prespecified…

Descriptors: Animal Behavior, Construct Validity, Evaluation Methods, Generalizability Theory

A Discussion of Analytic Scoring for Writing Performance Assessments.

Download full text

Crehan, Kevin D. – 1997

Writing fits well within the realm of outcomes suitable for observation by performance assessments. Studies of the reliability of performance assessments have suggested that interrater reliability can be consistently high. Scoring consistency, however, is only one aspect of quality in decisions based on assessment results. Another is…

Descriptors: Evaluation Methods, Feedback, Generalizability Theory, Interrater Reliability

Quantitative Analysis of the Rubric as an Assessment Tool: An Empirical Study of Student Peer-Group Rating

Peer reviewed

Direct link

Hafner, John C.; Hafner, Patti M. – International Journal of Science Education, 2003

Although the rubric has emerged as one of the most popular assessment tools in progressive educational programs, there is an unfortunate dearth of information in the literature quantifying the actual effectiveness of the rubric as an assessment tool "in the hands of the students." This study focuses on the validity and reliability of the rubric as…

Descriptors: Interrater Reliability, Generalizability Theory, Biology, Scoring Rubrics

Establishing the Reliability of the Florida Performance Measurement System's Research Based Observation Instrument.

Download full text

Micceri, Theodore – 1984

This paper investigates the reliability of the Florida Performance Measurement Systems' Summative Observation instrument. Developed for the Florida Beginning Teacher Evaluation Program, it provides behavioral ratings for teachers in a classroom setting. Data came from ratings of videotapes of nine teachers conducting actual lessons by nine teams…

Descriptors: Analysis of Variance, Classroom Observation Techniques, Elementary Secondary Education, Evaluation Methods

Abedi, Jamal	1
Aksu, Gökhan	1
Baker, Eva L.	1
Ben-Simon, Anat	1
Bennett, Randy Elliott	1
Chen, Yen-Yuan	1
Crehan, Kevin D.	1
Eser, Mehmet Taha	1
Figueredo, Aurelio Jose	1
Goldschmidt, Pete	1
Hafner, John C.	1
Hafner, Patti M.	1
Hathcoat, John D.	1
Helmke, Andreas	1
Henry, Beverly W.	1
Hux, Karen	1
Lenske, Gerlinde	1
Leung, Kai-Kuen	1
Lin, Chih-Kai	1
Martinez, Jose Felipe	1
Micceri, Theodore	1
Niemi, David	1
Penn, Jeremy D.	1
Praetorius, Anna-Katharina	1
More ▼