Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedTate, Richard L. – Journal of Educational Measurement, 1999
Suggests that a modification of traditional linking is necessary when tests consist of constructed response items judged by raters and a possibility of year-to-year variation in rating discrimination and severity exists. Illustrates this situation with an artificial example. (SLD)
Descriptors: Equated Scores, Interrater Reliability, Item Response Theory, Multiple Choice Tests
Peer reviewedBrutus, Stephane; Fleenor, John W.; London, Manuel – Journal of Management Development, 1998
Self, subordinate, peer, and supervisor ratings of 1,080 managers in education, military, government, manufacturing, finance, and health were analyzed for leniency, interrater agreement, and effectiveness. In the private sector, more poor performing managers tended to overestimate their performance. Interrater agreement was lowest in government…
Descriptors: Comparative Analysis, Feedback, Interrater Reliability, Job Performance
Peer reviewedVan Noord, Robert G.; Prevatt, Frances F. – Journal of School Psychology, 2002
Evaluates the effects of rater reliability of common IQ and achievement tests on subsequent learning disorder eligibility determinations, particularly with respect to difficulty level of individual subtests and expertise of the scorer. The study corroborates previous findings of strong interrater reliability on most subtests of common IQ and…
Descriptors: Achievement Tests, Disability Identification, Intelligence Tests, Interrater Reliability
Peer reviewedCordes, Anne K. – Journal of Speech, Language, and Hearing Research, 2000
In this study, 30 judges identified disfluency types they perceived in audiovisually recorded speech stimuli, first individually and then with a partner. Although intrapair and interpair agreement was higher in the partner than the individual condition, agreement for occurrences still averaged below 50 percent. Findings suggest caution in use of…
Descriptors: Adults, Evaluation Methods, Interrater Reliability, Speech Acts
Falk, Ruma; Lann, Avital – Teaching Statistics: An International Journal for Teachers, 2006
A coefficient of unfairness in the allocation of goods to people can be extended to measuring consensus among judges. The notion of relative variability underlies the formation of these measures.
Descriptors: Judges, Measures (Individuals), Interrater Reliability, Measurement Techniques
Moon, Tonya R.; Brighton, Catherine M.; Callahan, Carolyn M.; Robinson, Ann – Journal of Secondary Gifted Education, 2005
This article discusses the rationale for, and explicates the process used in, developing differentiated authentic assessments for middle school classrooms (many of which contain gifted students) that are aligned with state academic standards. The assessments were developed based on learner-centered psychological principles and revised based on a…
Descriptors: Instruction, Classrooms, Academic Standards, Interrater Reliability
A Measure of Agreement for Interval or Nominal Multivariate Observations by Different Sets of Judges
Janson, Harald; Olsson, Ulf – Educational and Psychological Measurement, 2004
This article addresses the problem of accounting overall multivariate chance-corrected interobserver agreement when targets have been rated by different sets of judges (not necessarily equal in number). The proposed approach builds on Janson and Olsson's multivariate generalization of Cohen's kappa but incorporates weighting for number of judges…
Descriptors: Interrater Reliability, Multivariate Analysis, Evaluation Methods, Measurement Techniques
Taylor, Steven; McKay, Dean; Abramowitz, Jonathan S. – Psychological Review, 2005
This paper comments on the response offered by Szechtman and Woody to Taylor et al's initial comments on Szechtman and Woody's original article. Taylor et al highlight one problem with their model that Woody and Szechtman seem to think is unimportant: the treatment relevance of their model. The analogy of aspirin and colds was used, suggesting…
Descriptors: Motivation, Item Analysis, Reader Response, Criticism
Sulzen, James; Young, Michael F. – Online Submission, 2007
This study describes a rubric supporting fast and reliable assessment of preservice teacher electronic portfolios. The assessment calls for raters to quickly scan a portfolio to gain an overall impression, then dichotomously score a large number of indicators (e.g., educational philosophy, educational technology use, imaginative use of…
Descriptors: Portfolios (Background Materials), Educational Technology, Preservice Teachers, Interrater Reliability
Lam, Kristen S. L.; Aman, Michael G. – Journal of Autism and Developmental Disorders, 2007
A key feature of autism is restricted repetitive behavior (RRB). Despite the significance of RRBs, little is known about their phenomenology, assessment, and treatment. The Repetitive Behavior Scale-Revised (RBS-R) is a recently-developed questionnaire that captures the breadth of RRB in autism. To validate the RBS-R in an independent sample, we…
Descriptors: Caregivers, Phenomenology, Interrater Reliability, Factor Analysis
Loughry, Misty L.; Ohland, Matthew W.; Moore, D. DeWayne – Educational and Psychological Measurement, 2007
This article describes the development of the Comprehensive Assessment of Team Member Effectiveness. The authors used the teamwork literature to create potential items, which they tested using two surveys of college students (Ns = 2,777 and 1,157). The authors used exploratory factor analysis and confirmatory factor analysis to help them select…
Descriptors: Factor Analysis, Factor Structure, College Students, School Surveys
Laforest, Sophie; Goldin, Benita; Nour, Kareen; Roy, Marie-Andree; Payette, Helene – Canadian Journal on Aging, 2007
Nutrition screening and early intervention in home-bound older adults are key to preventing unfavourable health outcomes and functional decline. This pilot study's objectives were (a) to test the reliability of the Elderly Nutrition Screening Tool (ENS [C]) when administered by dietician-trained and supervised nutrition volunteers, and (b) to…
Descriptors: Early Intervention, Nutrition, Interrater Reliability, Older Adults
Martinez, Jose Felipe; Goldschmidt, Pete; Niemi, David; Baker, Eva L.; Sylvester, Roxanne M. – Educational Assessment, 2007
We conducted generalizability studies to examine the extent to which ratings of language arts performance assignments, administered in a large, diverse, urban district to students in second through ninth grades, result in reliable and precise estimates of true student performance. The results highlight three important points when considering the…
Descriptors: Assignments, Language Arts, Academic Achievement, Urban Areas
Griffith, Annette K.; Trout, Alexandra L.; Hagaman, Jessica L.; Harper, John – Behavioral Disorders, 2008
This review examines interventions intended to improve the literacy functioning of adolescent students with emotional and/or behavior disorders. Seventeen studies met inclusion criteria and included a variety of interventions designed to affect a variety of literacy areas, including spelling, writing, and reading fluency. Findings from these…
Descriptors: Intervention, Reading Fluency, Behavior Disorders, Emotional Disturbances
Yick, Alice G.; Oomen-Early, Jody – Journal of Interpersonal Violence, 2008
Until recently, research studies have implied that domestic violence does not affect Asian American and immigrant communities, or even Asians abroad, because ethnicity or culture has not been addressed. In this content analysis, the authors examined trends in publications in leading scholarly journals on violence relating to Asian women and…
Descriptors: Family Violence, Asian Culture, Interrater Reliability, Family Structure

Direct link
