Publication Date
| In 2026 | 0 |
| Since 2025 | 60 |
| Since 2022 (last 5 years) | 286 |
| Since 2017 (last 10 years) | 782 |
| Since 2007 (last 20 years) | 2044 |
Descriptor
| Interrater Reliability | 3126 |
| Foreign Countries | 655 |
| Test Reliability | 504 |
| Evaluation Methods | 503 |
| Test Validity | 411 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Griffith, Annette K.; Trout, Alexandra L.; Hagaman, Jessica L.; Harper, John – Behavioral Disorders, 2008
This review examines interventions intended to improve the literacy functioning of adolescent students with emotional and/or behavior disorders. Seventeen studies met inclusion criteria and included a variety of interventions designed to affect a variety of literacy areas, including spelling, writing, and reading fluency. Findings from these…
Descriptors: Intervention, Reading Fluency, Behavior Disorders, Emotional Disturbances
Yick, Alice G.; Oomen-Early, Jody – Journal of Interpersonal Violence, 2008
Until recently, research studies have implied that domestic violence does not affect Asian American and immigrant communities, or even Asians abroad, because ethnicity or culture has not been addressed. In this content analysis, the authors examined trends in publications in leading scholarly journals on violence relating to Asian women and…
Descriptors: Family Violence, Asian Culture, Interrater Reliability, Family Structure
Pare, D. E.; Joordens, S. – Journal of Computer Assisted Learning, 2008
As class sizes increase, methods of assessments shift from costly traditional approaches (e.g. expert-graded writing assignments) to more economic and logistically feasible methods (e.g. multiple-choice testing, computer-automated scoring, or peer assessment). While each method of assessment has its merits, it is peer assessment in particular,…
Descriptors: Writing Assignments, Undergraduate Students, Teaching Assistants, Peer Evaluation
Stoddard, Sarah A.; Kubik, Martha Y.; Skay, Carol – Journal of School Nursing, 2008
The Institute of Medicine recommends school-based body mass index (BMI) screening as an obesity prevention strategy. While school nurses have provided height/weight screening for years, little has been published describing measurement reliability or process. This study evaluated the reliability of height/weight measures collected by school nurses…
Descriptors: Obesity, Body Composition, School Nurses, Interrater Reliability
Peer reviewedWyatt, W. Joseph; And Others – Psychology in the Schools, 1985
Compared traditional percentage and correlational methods of estimating reliability of duration recording to reliability obtained with event-by-event examination of observers' records in which the actual percentage of time that the observers were in agreement was calculated. Traditional percentage reliability scores were found to be significantly…
Descriptors: College Students, Correlation, Higher Education, Interrater Reliability
Peer reviewedHarrington, Robert G.; And Others – Psychology in the Schools, 1985
Evaluated interscorer reliability of the Spatial Memory subtest, which appears on the Simultaneous Processing scale of the Kaufman Assessment Battery for Children. Responses from 19 gifted children were scored by two independent examiners. Results showed this subtest may be prone to scoring errors because no permanent record of responses exists.…
Descriptors: Elementary Education, Gifted, Interrater Reliability, Preadolescents
Peer reviewedWeider-Hatfield, Deborah; Hatfield, John D. – Communication Quarterly, 1984
Evaluation approaches to measuring reliabilty in interaction analysis by (1) presenting criteria for a sound reliability estimate, (2) evaluating currently used tests against these criteria, and (3) discussing application of appropriate tests to interaction data. (PD)
Descriptors: Communication Research, Evaluation Criteria, Interaction Process Analysis, Interrater Reliability
Peer reviewedOrwin, Robert G.; Cordray, David S. – Psychological Bulletin, 1985
Identifies three sources of reporting deficiency for meta-analytic results: quality (adequacy) of publicizing; quality of macrolevel reporting, and quality of microlevel reporting. Reanalysis of 25 reports from the Smith, Glass and Miller (1980) psychotherapy meta-analysis established two sources of misinformation, interrater reliabilities and…
Descriptors: Confidence Testing, Interrater Reliability, Meta Analysis, Psychotherapy
Miller-Whitehead, Marie – 2001
A hypothetical case study provides examples of the inter-rater reliability issues involved in complex performance assessment, focusing on the Baldrige model. A hypothetical team of five evaluators was asked to rate a Baldrige model performance assessment along the seven defined criteria or performance dimensions that comprise the Baldrige model…
Descriptors: Case Studies, Criteria, Evaluators, Interrater Reliability
Fan, Xitao; Chen, Michael – 1999
It is erroneous to extend or generalize the inter-rater reliability coefficient estimated from only a (small) proportion of the sample to the rest of the sample data where only one rater is used for scoring, although such generalization is often made implicitly in practice. It is shown that if inter-rater reliability estimate from part of a sample…
Descriptors: Estimation (Mathematics), Generalizability Theory, Interrater Reliability, Sample Size
Michaelides, Michalis P.; Haertel, Edward H. – Center for Research on Evaluation Standards and Student Testing CRESST, 2004
There is variability in the estimation of an equating transformation because common-item parameters are obtained from responses of samples of examinees. The most commonly used standard error of equating quantifies this source of sampling error, which decreases as the sample size of examinees used to derive the transformation increases. In a…
Descriptors: Test Items, Testing, Error Patterns, Interrater Reliability
Peer reviewedKennison, Monica Metrick; Misselwitz, Shirley – Nursing Education Perspectives, 2002
Samples from 17 reflective journals of nursing students were evaluated by 6 faculty. Results indicate a lack of consistency in grading reflective writing, lack of consensus regarding evaluation, and differences among faculty regarding their view of such exercises. (Contains 26 references.) (JOW)
Descriptors: Grading, Higher Education, Interrater Reliability, Nursing Education
Peer reviewedMaurer, Steven D.; Fay, Charles – Personnel Psychology, 1988
Examined degree to which agreement in interviewer ratings may be influenced by training, use of structured conventional interviews, or situational interviews. Results from 42 managers experienced as interviewers revealed no training effect on rating agreement; impact of situational format on consistency in assessments of applicant suitability was…
Descriptors: Administrators, Employment Interviews, Examiners, Experimenter Characteristics
Peer reviewedCordes, Anne K. – Journal of Speech and Hearing Research, 1994
This paper contends that behavior observation data relating to speech-language pathology are reliable if they are not affected by differences among observers or other variations in the recording context. The theoretical bases of methods used to estimate reliability for observational data are reviewed, and suggestions are provided for improving the…
Descriptors: Data Collection, Interrater Reliability, Observation, Reliability
Readers' Responses to the Rating of Non-Uniform Portfolios: Are There Limits on Portfolios' Utility?
Peer reviewedDespain, LaRene; Hilgers, Thomas L. – WPA: Writing Program Administration, 1992
Describes readers' responses to the task of assigning scores to nonuniform portfolios of student writing. Suggests that reaching the goal of reliability in reading practices will not be easy. Concludes that writing program administrators should greet suggestions for the use of nonuniform portfolios with questioning restraint. (RS)
Descriptors: Higher Education, Interrater Reliability, Portfolios (Background Materials), Student Evaluation

Direct link
