Publication Date
| In 2026 | 0 |
| Since 2025 | 60 |
| Since 2022 (last 5 years) | 286 |
| Since 2017 (last 10 years) | 782 |
| Since 2007 (last 20 years) | 2044 |
Descriptor
| Interrater Reliability | 3126 |
| Foreign Countries | 655 |
| Test Reliability | 504 |
| Evaluation Methods | 503 |
| Test Validity | 411 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedKreiman, Jody; And Others – Journal of Speech and Hearing Research, 1992
Sixteen listeners (10 expert, 6 naive) judged the dissimilarity of pairs of voices drawn from pathological and normal populations. Only parameters that showed substantial variability were perceptually salient across listeners. Results suggest that traditional means of assessing listener reliability in voice perception tasks may not be appropriate.…
Descriptors: Evaluation Methods, Individual Differences, Interrater Reliability, Perception
Peer reviewedTindley, Howard E. A.; And Others – Career Development Quarterly, 1994
Describes investigation employing within-counselor design. Investigators analyzed audio recordings of career counseling interviews with clients who held either relatively negative expectations or relatively positive expectations regarding counseling. Clients who held relatively positive expectations were rated significantly higher on global…
Descriptors: Career Counseling, Expectation, Higher Education, Interrater Reliability
Peer reviewedIngham, Roger J.; And Others – Journal of Speech and Hearing Research, 1993
Two experiments investigating interval-by-interval interjudge and intrajudge agreement for stuttered and nonstuttered speech intervals found that training of judges could improve reliability levels; judges with relatively high intrajudge agreement also showed relatively higher interjudge agreement; and interval-by-interval interjudge agreement was…
Descriptors: Evaluation Methods, Interrater Reliability, Performance Factors, Speech Evaluation
Peer reviewedCox, Maureen V.; Perara, Julian – Educational Psychology: An International Journal of Experimental Educational Psychology, 1998
Devises a nine-point scale for scoring drawings of a cube. Provides detailed criteria and examples for each category. Shows that interrater reliability of the scale is high, and scores trace a linear trend through a sample age-range. Suggests that the scale is suitable for use as a diagnostic or assessment tool. (DSK)
Descriptors: Art Education, Evaluation Methods, Foreign Countries, Geometric Constructions
Peer reviewedDyson, Maree; Allen, Felicity; Duckett, Stephen – Evaluation and Program Planning, 2000
Reports on the interrater reliability of the Educational Needs Questionnaire (Victoria Department of Education, Australia), which was applied to 70 school-age children by their parents and 2 therapists. Results indicate that six of the subscales are reliable when evaluated by therapists and parents, but three subscales did not achieve the…
Descriptors: Children, Disabilities, Foreign Countries, Interrater Reliability
Peer reviewedMacMillan, Peter D. – Journal of Experimental Education, 2000
Compared classical test theory (CTT), generalizability theory (GT), and multifaceted Rasch model (MFRM) approaches to detecting and correcting for rater variability using responses of 4,930 high school students graded by 3 raters on 9 scales. The MFRM approach identified far more raters as different than did the CTT analysis. GT and Rasch…
Descriptors: Generalizability Theory, High School Students, High Schools, Interrater Reliability
Peer reviewedHellawell, D. J.; Signorini, D. F. – International Journal of Rehabilitation Research, 1997
Describes pilot studies of the Edinburgh Extended Glasgow Outcome Scale (EEGOS), designed to retain the advantages of the GOS (a measure commonly used in head injury research) but to allow comparison of recovery patterns in behavioral, cognitive, and physical function. Studies show that the interrater reliability of the EEGOS is comparable to that…
Descriptors: Head Injuries, Interrater Reliability, Neurological Impairments, Outcomes of Treatment
Peer reviewedPenny, Jim; Johnson, Robert L.; Gordon, Belita – Journal of Experimental Education, 2000
Used an analytic rubric to score 120 writing samples from Georgia's 11th grade writing assessment. Raters augmented scores by adding a "+" or "-" to the score. Results indicate that this method of augmentation tends to improve most indices of interrater reliability, although the percentage of exact and adjacent agreement…
Descriptors: High School Students, High Schools, Interrater Reliability, Scoring Rubrics
Peer reviewedCanivez, Gary L.; Watkins, Marley W.; Schaefer, Barbara A. – Psychology in the Schools, 2002
Investigation of interrater agreement for the Adjustment Scales for Children and Adolescents (ASCA) discriminant classifications is reported. Two teaching professionals provided independent ratings of the same child using the ASCA. A total of 119 students ranging in age from 7 to 18 years were independently rated. Results indicated significant and…
Descriptors: Adolescents, Children, Elementary Secondary Education, Interrater Reliability
Peer reviewedKaufman, James C.; Gentile, Claudia A.; Baer, John – Gifted Child Quarterly, 2005
Little research has been conducted on how gifted novices compare to experts in their judgments of creative writing. If novices and experts assign similar ratings, it could be argued that gifted novices are able to offer their peers feedback of a similar quality to that provided by experts. Such a finding would support the use of collaborative…
Descriptors: Psychologists, Literary Genres, Interrater Reliability, Feedback
Raghavan, R.; Marshall, M.; Lockwood, A.; Duggan, L. – Journal of Intellectual Disability Research, 2004
People with learning disability (LD) experience a range of mental health problems. They are a complex population, whose needs are not well understood. This study focuses on the development of a systematic process of needs assessment for this population. The Cardinal Needs Schedule used in general psychiatry was adapted for people with learning…
Descriptors: Psychiatry, Needs Assessment, Mental Disorders, Interrater Reliability
van der Schaaf, Marieke; Stokking, Karel; Verloop, Nico – Studies in Educational Evaluation, 2005
Portfolios are frequently used to assess teachers' competences. In portfolio assessment, the issue of rater reliability is a notorious problem. To improve the quality of assessments insight into raters' judgment processes is crucial. Using a mixed quantitative and qualitative approach we studied cognitive processes underlying raters' judgments and…
Descriptors: Portfolios (Background Materials), Systems Approach, Cognitive Processes, Portfolio Assessment
Jobes, David A.; Nelson, Kathryn N.; Peterson, Erin M.; Pentiuc, Daniel; Downing, Vanessa; Francini, Kristen; Kiernan, Amy – Suicide and Life-Threatening Behavior, 2004
Given the incidence and seriousness of suicidality in clinical practice, the need for new and better ways to assess suicide risk is clear. While there are many published assessment instruments in the literature, survey data suggest that these measure are not widely used. One possible explanation is that current quantitatively developed assessment…
Descriptors: Patients, Research Methodology, Interrater Reliability, Suicide
Livingston, Samuel A. – Journal of Educational and Behavioral Statistics, 2004
A performance assessment consisting of 10 separate exercises was scored with a randomized scoring procedure. All responses to each exercise were rated once; in addition, a randomly selected subset of the responses to each exercise received an independent second rating. Each second rating was averaged with the corresponding first rating before the…
Descriptors: Scoring, Performance Based Assessment, Interrater Reliability, Computation
Doabler, Christian; Smolkowski, Keith; Fien, Hank; Kosty, Derek B.; Cary, Mari Strand – Society for Research on Educational Effectiveness, 2010
In this paper, the authors report research focused directly on the validation of the Coding of Academic Teacher-Student interactions (CATS) direct observation instrument. They use classroom information gathered by the CATS instrument to better understand the potential mediating variables hypothesized to influence student achievement. Their study's…
Descriptors: Feedback (Response), Curriculum Based Assessment, Observation, Construct Validity

Direct link
