Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 17 |
Descriptor
Interrater Reliability | 55 |
Judges | 55 |
Scoring | 14 |
Test Items | 13 |
Evaluation Methods | 12 |
Standard Setting (Scoring) | 12 |
Comparative Analysis | 10 |
Measurement Techniques | 10 |
Standards | 9 |
Difficulty Level | 7 |
Rating Scales | 7 |
More ▼ |
Source
Author
Lunz, Mary E. | 5 |
Bothe, Anne K. | 2 |
Chang, Lei | 2 |
McGinty, Dixie | 2 |
Neel, John H. | 2 |
O'Neill, Thomas R. | 2 |
Alvermann, Donna E. | 1 |
Andrich, David | 1 |
Ansorge, Charles J. | 1 |
Baer, John | 1 |
Bennett, Randy Elliot | 1 |
More ▼ |
Publication Type
Reports - Research | 36 |
Journal Articles | 29 |
Speeches/Meeting Papers | 19 |
Reports - Evaluative | 13 |
Reports - Descriptive | 5 |
Opinion Papers | 3 |
Information Analyses | 1 |
Numerical/Quantitative Data | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 6 |
Adult Education | 3 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Elementary Secondary Education | 1 |
High Schools | 1 |
Audience
Researchers | 7 |
Location
Belgium | 1 |
Netherlands | 1 |
New Jersey | 1 |
South Korea | 1 |
Sweden | 1 |
United Kingdom | 1 |
United Kingdom (Liverpool) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 2 |
Multidimensional… | 1 |
Praxis Series | 1 |
What Works Clearinghouse Rating
Álvarez-Díaz, Marcos; Muñiz-Bascón, Luis Magín; Soria-Alemany, Antonio; Veintimilla-Bonet, Alberto; Fernández-Alonso, Rubén – International Journal of Music Education, 2021
Evaluation of music performance in competitive contexts often produces discrepancies between the expert judges. These discrepancies can be reduced by using appropriate rubrics that minimise the differences between judges. The objective of this study was the design and validation of an analytical evaluation rubric, which would allow the most…
Descriptors: Competition, Music Activities, Performance, Scoring Rubrics
Benton, Tom; Leech, Tony; Hughes, Sarah – Cambridge Assessment, 2020
In the context of examinations, the phrase "maintaining standards" usually refers to any activity designed to ensure that it is no easier (or harder) to achieve a given grade in one year than in another. Specifically, it tends to mean activities associated with setting examination grade boundaries. Benton et al (2020) describes a method…
Descriptors: Mathematics Tests, Equated Scores, Comparative Analysis, Difficulty Level
van Daal, Tine; Lesterhuis, Marije; Coertjens, Liesje; Donche, Vincent; De Maeyer, Sven – Assessment in Education: Principles, Policy & Practice, 2019
Recently, comparative judgement has been introduced as an alternative method for scoring essays. Although this method is promising in terms of obtaining reliable scores, empirical evidence concerning its validity is lacking. The current study examines implications resulting from two critical assumptions underpinning the use of comparative…
Descriptors: Academic Discourse, Validity, Writing Evaluation, Value Judgment
From Aggregation to Interpretation: How Assessors Judge Complex Data in a Competency-Based Portfolio
Oudkerk Pool, Andrea; Govaerts, Marjan J. B.; Jaarsma, Debbie A. D. C.; Driessen, Erik W. – Advances in Health Sciences Education, 2018
While portfolios are increasingly used to assess competence, the validity of such portfolio-based assessments has hitherto remained unconfirmed. The purpose of the present research is therefore to further our understanding of how assessors form judgments when interpreting the complex data included in a competency-based portfolio. Eighteen…
Descriptors: Undergraduate Students, Medical Students, Medical Education, Competency Based Education
Kaufman, James C.; Baer, John – Creativity Research Journal, 2012
The Consensual Assessment Technique (CAT) is a common creativity assessment. According to this technique, the best judges of creativity are qualified experts. Yet what does it mean to be an expert in a domain? What level of expertise is needed to rate creativity? This article reviews the literature on novice, expert, and quasi-expert creativity…
Descriptors: Creativity, Expertise, Creativity Tests, Literature Reviews
How Good Is Good Enough? Educational Standard Setting and Its Effect on African American Test Takers
Caines, Jade; Engelhard, George, Jr. – Journal of Negro Education, 2012
Standard setting (the process of establishing minimum passing scores on high-stakes exams) is a highly evaluative and policy-driven process. It is a common belief that standard setting panels should be diverse and representative. There is concern, however, that panelists with varying characteristics may differentially influence the results of the…
Descriptors: Geographic Regions, Cutting Scores, Standard Setting, African American Achievement
Winkler, Karen J. – Chronicle of Higher Education, 2009
Most professors have mixed feelings about participating on peer-review panels. It's an honor. It helps the discipline. It's a waste of time. It's biased. Michele Lamont wanted to know whether it works: specifically, whether, and how, professors identify excellence. So the multi-titled Harvard University scholar--professor of European studies,…
Descriptors: Social Sciences, Humanities, College Faculty, Peer Evaluation
Bothe, Anne K. – Journal of Speech, Language, and Hearing Research, 2008
Purpose: The purposes of this study were (a) to determine whether highly experienced clinicians and researchers agreed with each other in judging the presence or absence of stuttering in the speech of children who stutter and (b) to determine how those binary stuttered/nonstuttered judgments related to categorizations of the same speech based on…
Descriptors: Stuttering, Identification, Young Children, Speech
Napoles, Jessica – Journal of Research in Music Education, 2009
The purpose of this study was to determine whether viewing a musical score while listening (as opposed to not viewing the score) would affect musicians' ratings of choral performance excerpts. University musicians (N = 240) listened to four excerpts of choral music (from Vivaldi's "Gloria") and rated them on a 10-point Likert-type scale for…
Descriptors: Singing, Likert Scales, Musical Composition, Statistical Analysis
Nasstrom, Gunilla – International Journal of Research & Method in Education, 2009
In education, standards have to be interpreted, for planning of teaching, for development of assessments and for alignment analysis. In most cases, it is important that there is an agreement between individuals and organizations about how to interpret standards. However, there is a lack of studies of how consistent different group of judges are…
Descriptors: Classification, Standards, Evaluation Criteria, Interrater Reliability
Gotwals, John K.; Dunn, John G. H. – Measurement in Physical Education and Exercise Science, 2009
This article presents a chronology of three empirical studies that outline the measurement process by which two new subscales ("Doubts about Actions" and "Organization") were developed and integrated into a revised version of Dunn, Causgrove Dunn, and Syrotuik's (2002) "Sport Multidimensional Perfectionism Scale"…
Descriptors: Construct Validity, Measures (Individuals), Multidimensional Scaling, Multitrait Multimethod Techniques
Crisp, Victoria; Novakovic, Nadezda – Research in Post-Compulsory Education, 2009
The consistency of assessment demands is important to validity. This research investigated the comparability of the demands of college-assessed units within a vocationally related qualification, drawing on methodological approaches that have previously been used to compare assessments. Assessment materials from five colleges were obtained. After…
Descriptors: Student Evaluation, Qualifications, Task Analysis, Item Analysis
Fowell, S. L.; Fewtrell, R.; McLaughlin, P. J. – Advances in Health Sciences Education, 2008
Absolute standard setting procedures are recommended for assessment in medical education. Absolute, test-centred standard setting procedures were introduced for written assessments in the Liverpool MBChB in 2001. The modified Angoff and Ebel methods have been used for short answer question-based and extended matching question-based papers,…
Descriptors: Medical Education, Standard Setting (Scoring), Judges, Interrater Reliability

Lunz, Mary E. – Popular Measurement, 1999
Describes a study of judge leniency and consistency using a Rasch approach and involving 4,683 candidates and 53 judges. (SLD)
Descriptors: Interrater Reliability, Judges, Longitudinal Studies

Janson, Harald; Olsson, Ulf – Educational and Psychological Measurement, 2001
Proposes a generalization of Cohen's kappa coefficient (J. Cohen, 1960) to address the problem of accounting for overall chance-corrected interobserver agreement among the multivariate ratings of several judges. The statistic's metric is conventional and in the univariate case it is equivalent to existing extensions of the kappa coefficient to…
Descriptors: Interrater Reliability, Judges, Multivariate Analysis