Publication Date
| In 2026 | 0 |
| Since 2025 | 60 |
| Since 2022 (last 5 years) | 286 |
| Since 2017 (last 10 years) | 782 |
| Since 2007 (last 20 years) | 2044 |
Descriptor
| Interrater Reliability | 3126 |
| Foreign Countries | 655 |
| Test Reliability | 504 |
| Evaluation Methods | 503 |
| Test Validity | 411 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
The Attending Round Observation System: A Procedure for Describing Teaching During Attending Rounds.
Peer reviewedWeinholtz, Donn; And Others – Evaluation and the Health Professions, 1986
Two separate reliability studies were conducted on an observational instrument derived from previous qualitative research and designed for collecting data on teaching behaviors during attending rounds. The reliability estimates from both studies were quite high, indicating that the instrument shows promise for use in both research and evaluation…
Descriptors: Clinical Teaching (Health Professions), Graduate Medical Education, Higher Education, Interrater Reliability
Peer reviewedEpstein, Michael H.; Nieminen, Gayla S. – School Psychology Review, 1983
Teachers and classroom aides of learning disabled students completed the Conners Abbreviated Teacher Rating Scale (CATRS) on two separate occasions. The study investigated the inter-rater and intra-rater reliability of this instrument. CATRS appeared to have sufficient reliability to recommend its continued frequent use. (Author/DWH)
Descriptors: Behavior Rating Scales, Elementary Education, Elementary School Students, Hyperactivity
Iramaneerat, Cherdsak; Myford, Carol M. – Online Submission, 2006
A multi-faceted Rasch measurement (MFRM) approach was used to analyze clinical performance ratings of 24 first-year residents in one surgery residency program in Thailand to investigate three types of rater effects: leniency, rater inconsistency, and restriction of range. Faculty from 14 surgical services rated the clinical performance of…
Descriptors: Foreign Countries, Measures (Individuals), Job Performance, Interrater Reliability
Iramaneerat, Cherdsak; Yudkowsky, Rachel – Online Submission, 2006
A multi-faceted Rasch measurement (MFRM) model was used to analyze a clinical skills assessment of 173 fourth-year medical students in a Midwestern medical school to investigate four types of rater errors: leniency, inconsistency, halo, and restriction of range. Each student performed six clinical tasks with six standardized patients (SPs), who…
Descriptors: Patients, Physical Examinations, Medical Students, Clinical Experience
Peer reviewedHalpin, Gerald; And Others – Educational and Psychological Measurement, 1983
Although arbitrary, whenever multiple judgmental standard-setting procedures are utilized by different groups concurrently, stability across raters can be achieved and decisions can be made in a relatively judicious manner. Greater stability across methods (Ebel, Nedelsky, Angoff) may be effected by slightly modifying the Ebel approach. (Author/PN)
Descriptors: Admission Criteria, College Entrance Examinations, Cutting Scores, Higher Education
Peer reviewedJohnson, David W.; And Others – Review of Educational Research, 1983
A theoretical model is presented with a review of supportive literature to establish the conditions under which desegregation and mainstreaming will result in constructive or destructive outcomes. Meta-analysis procedures examine all the available research relevant to the model, and point toward practical intergroup procedures based on the…
Descriptors: Desegregation Effects, Disabilities, Elementary Secondary Education, Ethnic Relations
Peer reviewedOrsmond, Paul; Merry, Stephen; Reiling, Kevin – Assessment & Evaluation in Higher Education, 1997
Reports on a study of a student self-assessment method in college biology, comparing students' self-evaluation, students' peer evaluation, and the teacher's evaluation criteria. Results illustrate potential problems in making assumptions about student ability to self-evaluate but also support previous findings about the instructional usefulness of…
Descriptors: Biology, College Faculty, College Instruction, College Students
Peer reviewedOrsmond, Paul; And Others – Assessment & Evaluation in Higher Education, 1996
A study comparing peer and teacher evaluations of British university biology students' (n=39) performance found such comparison misleading as a guide to the validity of peer assessment. When individual criteria were analyzed, agreement of peers and teacher ranged from 31-62%, with specific areas of the criteria prone to over- and undervaluation.…
Descriptors: Bias, Biology, College Students, Comparative Analysis
Peer reviewedGrant, Leslie – Language Testing, 1997
Describes current procedures used for testing bilingual teachers in the United States and focuses on one means of assessment used in Arizona. Examinee questionnaire responses, teacher questionnaire responses and test section analysis all contributed evidence for validity. (33 references) (Author/CK)
Descriptors: Bilingualism, Criterion Referenced Tests, Interrater Reliability, Language Teachers
Peer reviewedTomada, Giovanna; Schneider, Barry H. – Developmental Psychology, 1997
Replicated and extended American research on overt and relational aggression with Italian children. Found that peer and teacher nominations for aggression and prosocial behavior were highly stable, although with very poor concordance between them. Peer nominations for overt and relational aggression were linked to peer rejection. Boys' scores were…
Descriptors: Aggression, Bullying, Child Behavior, Children
Peer reviewedPugh, Malcolm; Lock, Roger – Research in Science and Technological Education, 1989
The development of a framework for analyzing pupil talk is described and the reliability of scoring transcribed conversions using the framework discussed. Definitions and examples of the terms used in the framework are appended. (Author/YP)
Descriptors: Biology, Foreign Countries, Group Discussion, Interrater Reliability
Peer reviewedReid, William J.; And Others – Journal of Social Work Education, 1996
In a study with 13 social work and counseling interns, field supervisors' ratings of students' field performance were compared to an independent judge's content analysis of performance. Results revealed significant correlations between the evaluations, providing evidence of validity of the supervisors' assessments. Validity may have been enhanced…
Descriptors: Evaluation Methods, Field Experience Programs, Higher Education, Interrater Reliability
Peer reviewedLevine, Phyllis; Edgar, Eugene – Exceptional Children, 1994
High school graduates in regular (n=280) and special education (n=223) and their parents were interviewed. Parent-student agreement percentages were high for the variables of attending postsecondary school, employment status, type of residence, marital status, and number of children. Low agreement rates were obtained for salary level, hours…
Descriptors: Disabilities, Employment, Followup Studies, Graduate Surveys
Peer reviewedSigafoos, Jeff; Pennell, Donna – Education and Training in Mental Retardation and Developmental Disabilities, 1995
Comparison using paired t-tests of parent and teacher ratings for 16 preschool children on the Receptive-Expressive Emergent Language Scale found no significant differences between parent and teacher ratings of expressive language, but a significant difference on the receptive language subscale. However, interrater reliability was relatively low…
Descriptors: Developmental Disabilities, Expressive Language, Interrater Reliability, Language Skills
Peer reviewedThompson, Irene – Foreign Language Annals, 1995
Considers the interrater reliability of certified testers in five European languages, the relationship between interviewer-assigned ratings and second ratings based on audio replay, interrater reliability as a function of proficiency level, effect of different languages on interrater agreement, and interrater disagreements with regard to…
Descriptors: Audiotape Recordings, English (Second Language), Evaluators, French


