Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedGoodwin, Laura D.; Goodwin, William L. – Evaluation and the Health Professions, 1984
The views of prominant qualitative methodologists on the appropriateness of validity and reliability estimation for the measurement strategies employed in qualitative evaluations are summarized. A case is made for the relevance of validity and reliability estimation. Definitions of validity and reliability for qualitative measurement are presented…
Descriptors: Evaluation Methods, Experimenter Characteristics, Interrater Reliability, Reliability
Peer reviewedCornelius, Edwin T.; And Others – Personnel Psychology, 1984
Questions the observed correlation between job experts and naive raters using the Position Analysis Questionnaire (PAQ); and conducts a replication of the Smith and Hakel study (1979) with college students (N=39). Concluded that PAQ ratings from job experts and college students are not equivalent and therefore are not interchangeable. (LLL)
Descriptors: College Students, Higher Education, Interrater Reliability, Job Analysis
van der Linden, Wim J.; Vos, Hans J.; Chang, Lei – 2000
In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…
Descriptors: Interrater Reliability, Judges, Probability, Standard Setting
De Champlain, Andre F.; Gessaroli, Marc E.; Floreck, Lisa M. – 2000
The purpose of this study was to estimate the extent to which recording variability among standardized patients (SPs) has an impact on classification consistency with data sets simulated to reflect performances on a large-scale clinical skills examination. SPs are laypersons trained to portray patients in clinical encounters (cases) and to record…
Descriptors: Classification, Interrater Reliability, Licensing Examinations (Professions), Medical Education
Peer reviewedSandburg, Jorgen – Higher Education Research and Development, 1997
Argues that interrater reliability, traditionally used in phenomenographic research, is unreliable for establishing the reliability of research results; it does not take into account the researcher's procedures for achieving fidelity to the individuals' conceptions investigated, and use of interrater reliability based on objectivist epistemology…
Descriptors: Educational Research, Epistemology, Interrater Reliability, Qualitative Research
Peer reviewedLewis, Chad T.; Stevens, Cynthia Kay – Public Personnel Management, 1990
A total of 204 business students organized in committees evaluated jobs for accountability, knowledge and skills, and mental demands. The same position was rated more highly when held by a male rather than a female, regardless of whether the committee was predominantly male or female. The importance of anonymity of job holders when conducting job…
Descriptors: College Students, Interrater Reliability, Job Analysis, Sex Bias
Peer reviewedUmesh, U. N.; And Others – Educational and Psychological Measurement, 1989
An approach is provided for calculating maximum values of the Kappa statistic of J. Cohen (1960) as a function of observed agreement proportions between evaluators. Separate calculations are required for different matrix sizes and observed agreement levels. (SLD)
Descriptors: Equations (Mathematics), Evaluators, Heuristics, Interrater Reliability
Peer reviewedCordes, Anne K.; Ingham, Roger J. – Journal of Speech and Hearing Research, 1994
This paper reviews the prominent concepts of the stuttering event and concerns about the reliability of stuttering event measurements, specifically interjudge agreement. Recent attempts to resolve the stuttering measurement problem are reviewed, and the implications of developing an improved measurement system are discussed. (Author/JDD)
Descriptors: Data Collection, Interrater Reliability, Measurement Techniques, Observation
Peer reviewedMarcoulides, George A.; Simkin, Mark G. – Journal of Education for Business, 1995
Each paper written by 60 sophomores in computer classes received 3 peer evaluations using a structured evaluation process. Overall, students were able to grade efficiently and consistently in terms of overall score and selected criteria (subject matter, content, and mechanics). (SK)
Descriptors: Higher Education, Interrater Reliability, Peer Evaluation, Undergraduate Students
Peer reviewedDriessen, Marie-Jose; And Others – Occupational Therapy Journal of Research, 1995
Two occupational therapists in an interrater test and 9 in an intrarater test used a form based on the International Classification of Impairments, Disabilities, and Handicaps to evaluate 50 patients in a psychiatric hospital and 50 in a rehabilitation center. Based on percentage of agreement and Cohen's kappa, the reliability of the diagnoses was…
Descriptors: Clinical Diagnosis, Disabilities, Interrater Reliability, Occupational Therapy
Peer reviewedKvalseth, Tarald O. – Educational and Psychological Measurement, 1991
An asymmetric version of J. Cohen's kappa statistic is presented as an appropriate measure for the agreement between two observers classifying items into nominal categories, when one observer represents the "standard." A numerical example with three categories is provided. (SLD)
Descriptors: Classification, Equations (Mathematics), Interrater Reliability, Mathematical Models
Peer reviewedHubbard, Carol P. – Journal of Communication Disorders, 1998
This study examined interjudge agreement levels for five adult listeners assessing either overt stuttering or disfluency types in the spontaneous speech of eight young children. Results showed that the interjudge reliability for judgments based on a disfluency taxonomy was not significantly different from that based on stuttering. The importance…
Descriptors: Interrater Reliability, Phonology, Speech Evaluation, Speech Impairments
Peer reviewedFrederiksen, John R.; Sipusic, Mike; Sherin, Miriam; Wolfe, Edward W. – Educational Assessment, 1998
Developed a video portfolio technique of teacher assessment and evaluated the technique through studies of six teachers and their raters. Results show that teachers are consistent in observing teaching functions and using their observations to evaluate teaching. (SLD)
Descriptors: Evaluation Methods, Interrater Reliability, Portfolio Assessment, Teacher Evaluation
Peer reviewedBerr, Seth A.; Church, Allan H.; Waclawski, Janine – Human Resource Development Quarterly, 2000
Behavior measures and the Myers Briggs Type Indicator were completed by 343 senior managers; 3,158 of their peers, supervisees, and supervisors rated managers' behavior. A modest correlation appeared between personality type and manager behavior. Differences related to raters' perceptions were found. (SK)
Descriptors: Administrator Behavior, Feedback, Interprofessional Relationship, Interrater Reliability
Peer reviewedKlin, Ami; Lang, Jason; Cicchetti, Domenic V.; Volkmar, Fred R. – Journal of Autism and Developmental Disorders, 2000
This study examined the inter-rater reliability of clinician-assigned diagnosis of autism using or not using the criteria specified in the Diagnostic and Statistical Manual IV (DSM-IV). For experienced raters there was little difference in reliability in the two conditions. However, a clinically significant improvement in diagnostic reliability…
Descriptors: Autism, Clinical Diagnosis, Clinical Experience, Developmental Disabilities


