Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 5 |
Descriptor
Source
Author
Lunz, Mary E. | 4 |
McGinty, Dixie | 2 |
Neel, John H. | 2 |
O'Neill, Thomas R. | 2 |
Aden, Roger | 1 |
Ansorge, Charles, Jr. | 1 |
Barber, Jill | 1 |
Bejar, Isaac I. | 1 |
Bennett, Randy Elliot | 1 |
Braungart-Bloom, Diane S. | 1 |
Buckendahl, Chad | 1 |
More ▼ |
Publication Type
Reports - Research | 18 |
Journal Articles | 13 |
Speeches/Meeting Papers | 13 |
Reports - Evaluative | 9 |
Reports - Descriptive | 2 |
Information Analyses | 1 |
Numerical/Quantitative Data | 1 |
Tests/Questionnaires | 1 |
Audience
Researchers | 2 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 3 |
General Educational… | 1 |
What Works Clearinghouse Rating
van Daal, Tine; Lesterhuis, Marije; Coertjens, Liesje; Donche, Vincent; De Maeyer, Sven – Assessment in Education: Principles, Policy & Practice, 2019
Recently, comparative judgement has been introduced as an alternative method for scoring essays. Although this method is promising in terms of obtaining reliable scores, empirical evidence concerning its validity is lacking. The current study examines implications resulting from two critical assumptions underpinning the use of comparative…
Descriptors: Academic Discourse, Validity, Writing Evaluation, Value Judgment
Barber, Jill – Practitioner Research in Higher Education, 2018
Adaptive Comparative Judgement (ACJ) is an alternative to conventional marking in which the assessor (judge) merely compares two answers and chooses a winner. (Scripts are typically uploaded to the CompareAssess interface as pdf files and are presented side-by-side.) Repeated comparisons and application of the sorting algorithm leads to scripts…
Descriptors: Student Evaluation, Alternative Assessment, Scoring, Test Bias
Ebuoh, Casmir N.; Ezeudu, S. A. – Journal of Education and Practice, 2015
The study investigated the effects of scoring by section, use of independent scorers and conventional patterns on scorer reliability in Biology essay tests. It was revealed from literature review that conventional pattern of scoring all items at a time in essay tests had been criticized for not being reliable. The study was true experimental study…
Descriptors: Foreign Countries, Biology, Science Instruction, Reliability
Looney, Marilyn A. – Measurement in Physical Education and Exercise Science, 2012
The purpose of this study was to determine if the 2010 Olympic figure skating judges had trouble scoring Plushenko and the transitions program component, and if the International Skating Union's (ISU) "corridor" method flagged the same judging anomalies as the Rasch analyses. A 3-facet (skater by program component by judge) Rasch rating…
Descriptors: Rating Scales, Scoring, Factor Analysis, Item Response Theory
Pollitt, Alastair – Assessment in Education: Principles, Policy & Practice, 2012
Adaptive Comparative Judgement (ACJ) is a modification of Thurstone's method of comparative judgement that exploits the power of adaptivity, but in scoring rather than testing. Professional judgement by teachers replaces the marking of tests; a judge is asked to compare the work of two students and simply to decide which of them is the better.…
Descriptors: Value Judgment, Comparative Analysis, Scoring, Teachers

Kay, Jack; Aden, Roger – National Forensic Journal, 1984
Describes and analyzes the agreement rate of judges at the 1984 National Individual Events Tournament. Reveals a low agreement rate (65.22%) among judges; discusses ramifications for forensic education. (PD)
Descriptors: Communication Research, Competition, Evaluation Methods, Higher Education
Sykes, Robert C.; Ito, Kyoko – 1998
A common procedure for obtaining multiple readings (ratings) for a constructed response item, especially in high-stakes tests, is to have two readers read the papers independently, with a third reading if the results differ by more than one point. This necessitates a scoring rule that specifies how the ratings will be aggregated into a single item…
Descriptors: Ability, Constructed Response, High Stakes Tests, Judges
Shelton, Karen; Shelton, Michael W. – 1993
The question of what variables affect success in debate has long been an area of interest and concern in the forensic community. For many years, it was thought that traditional performance variables--delivery, reasoning, organization, analysis, refutation and use of evidence--were the key factors influencing evaluations of debaters. Some…
Descriptors: Communication Research, Debate, Higher Education, Holistic Evaluation
Livingston, Samuel A.; Kastrinos, William – 1982
Leo Nedelsky developed a method for determining absolute grading standards for multiple choice tests. His method required a group of judges to examine each test question and eliminate those responses which the lowest D- student should be able to reject as incorrect. The correct answer probabilities remaining were used in computing an expected test…
Descriptors: Cutting Scores, Judges, Multiple Choice Tests, Real Estate
Scheer, John K.; Ansorge, Charles, Jr. – Research Quarterly, 1975
The results of this study revealed that a gymnast performing the same routine is at a significant advantage if he is judged as the fourth competitor for his team rather than the first competitor and that regional differences existed among the scores awarded by gymnastics judges. (RC)
Descriptors: Athletes, Athletics, College Students, Exercise (Physiology)
Lunz, Mary E.; O'Neill, Thomas R. – 1997
This retrospective longitudinal study was designed to show grading leniency patterns of judges within and across clinical examination administrations. Data from 17 different administrations of the histology examination of the American Society of Clinical Pathologists over 10 years were studied. Over the 10 years there were 4,683 candidates and 57…
Descriptors: Higher Education, Interrater Reliability, Item Response Theory, Judges
Buckendahl, Chad; Impara, James C.; Giraud, Gerald; Irwin, Patrick M. – 2000
School districts and credentialing agencies use information gathered in standard setting studies to establish minimum passing scores (MPS) for a variety of purposes. These scores may be used to make decisions ranging from subject remediation to licensure. Multiple standard setting methods may be used to provide a range of scores to the…
Descriptors: Attitudes, Certification, Cutting Scores, Elementary Secondary Education

Williamson, David M.; Bejar, Isaac I.; Hone, Anne S. – Journal of Educational Measurement, 1999
Contrasts "mental models" used by automated scoring for the simulation division of the computerized Architect Registration Examination with those used by experienced human graders for 3,613 candidate solutions. Discusses differences in the models used and the potential of automated scoring to enhance the validity evidence of scores. (SLD)
Descriptors: Architects, Comparative Analysis, Computer Assisted Testing, Judges

Lunz, Mary E.; And Others – Applied Measurement in Education, 1990
An extension of the Rasch model is used to obtain objective measurements for examinations graded by judges. The model calibrates elements of each facet of the examination on a common log-linear scale. Real examination data illustrate the way correcting for judge severity improves fairness of examinee measures. (SLD)
Descriptors: Certification, Difficulty Level, Interrater Reliability, Judges

Lunz, Mary E.; And Others – Educational and Psychological Measurement, 1994
In a study involving eight judges, analysis with the FACETS model provides evidence that judges grade differently, whether or not scores correlate well. This outcome suggests that adjustments for differences among judges should be made before student measures are estimated to produce reproducible decisions. (SLD)
Descriptors: Correlation, Decision Making, Evaluation Methods, Evaluators