Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 13 |
Descriptor
Interrater Reliability | 60 |
Standard Setting (Scoring) | 60 |
Cutting Scores | 30 |
Standards | 23 |
Evaluators | 20 |
Scoring | 18 |
Test Items | 18 |
Evaluation Methods | 12 |
Higher Education | 12 |
Judges | 12 |
Minimum Competency Testing | 12 |
More ▼ |
Source
Author
Plake, Barbara S. | 6 |
Jaeger, Richard M. | 4 |
Wyse, Adam E. | 4 |
Chang, Lei | 3 |
Busch, John Christian | 2 |
Cope, Ronald T. | 2 |
Hambleton, Ronald K. | 2 |
Hsieh, Mingchuan | 2 |
Linn, Robert L. | 2 |
McGinty, Dixie | 2 |
Neel, John H. | 2 |
More ▼ |
Publication Type
Reports - Research | 35 |
Journal Articles | 32 |
Speeches/Meeting Papers | 27 |
Reports - Evaluative | 25 |
Information Analyses | 2 |
Opinion Papers | 2 |
Reports - Descriptive | 1 |
Education Level
Elementary Education | 4 |
Intermediate Grades | 4 |
Elementary Secondary Education | 2 |
Grade 6 | 2 |
Grade 4 | 1 |
Grade 5 | 1 |
Grade 7 | 1 |
Higher Education | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
More ▼ |
Audience
Researchers | 4 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 3 |
Alabama High School… | 1 |
National Teacher Examinations | 1 |
Praxis Series | 1 |
What Works Clearinghouse Rating
Wyse, Adam E. – Practical Assessment, Research & Evaluation, 2018
One common modification to the Angoff standard-setting method is to have panelists round their ratings to the nearest 0.05 or 0.10 instead of 0.01. Several reasons have been offered as to why it may make sense to have panelists round their ratings to the nearest 0.05 or 0.10. In this article, we examine one reason that has been suggested, which is…
Descriptors: Interrater Reliability, Evaluation Criteria, Scoring Formulas, Achievement Rating
Wyse, Adam E. – Applied Measurement in Education, 2018
This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)
Harrison, George M. – Journal of Educational Measurement, 2015
The credibility of standard-setting cut scores depends in part on two sources of consistency evidence: intrajudge and interjudge consistency. Although intrajudge consistency feedback has often been provided to Angoff judges in practice, more evidence is needed to determine whether it achieves its intended effect. In this randomized experiment with…
Descriptors: Interrater Reliability, Standard Setting (Scoring), Cutting Scores, Feedback (Response)
O'Neill, Thomas R.; Peabody, Michael R.; Stelter, Keith L.; Hagen, Michael D. – Online Submission, 2015
(Purpose) The purpose of our study was to assess the need for an external searchable resource to be used in conjunction with the American Board of Family Medicine's (ABFM) Maintenance of Certification for Family Physicians (MC-FP) Examination, discuss the philosophical question of whether an ESR should be allowed on the examination, and outline…
Descriptors: Licensing Examinations (Professions), Family Practice (Medicine), Physicians, Online Searching
Skaggs, Gary – Measurement: Interdisciplinary Research and Perspectives, 2013
The construct map is a particularly good way to approach instrument development, and this author states that he was delighted to read Adam Wyse's thoughts about how to use construct maps for standard setting. For a number of popular standard-setting methods, Wyse shows how typical feedback to panelists fits within a construct map framework.…
Descriptors: Standard Setting (Scoring), Maps, Test Construction, Measurement
Wyse, Adam E.; Bunch, Michael B.; Deville, Craig; Viger, Steven G. – Educational and Psychological Measurement, 2014
This article describes a novel variation of the Body of Work method that uses construct maps to overcome problems of transparency, rater inconsistency, and scores gaps commonly occurring with the Body of Work method. The Body of Work method with construct maps was implemented to set cut-scores for two separate K-12 assessment programs in a large…
Descriptors: Standard Setting (Scoring), Educational Assessment, Elementary Secondary Education, Measurement
Hsieh, Mingchuan – Language Testing, 2013
When implementing standard setting procedures, there are two major concerns: variance between panelists and efficiency in conducting multiple rounds of judgments. With regard to the former, there is concern over the consistency of the cutoff scores made by different panelists. If the cut scores show an inordinately wide range then further rounds…
Descriptors: Item Response Theory, Standard Setting (Scoring), Language Tests, English (Second Language)
Wyse, Adam E. – Measurement: Interdisciplinary Research and Perspectives, 2013
Construct maps are tools that display how the underlying achievement construct upon which one is trying to set cut-scores is related to other information used in the process of standard setting. This article reviews what construct maps are, uses construct maps to provide a conceptual framework to view commonly used standard-setting procedures (the…
Descriptors: Standard Setting (Scoring), Maps, Cutting Scores, Methods
Hsieh, Mingchuan – Language Assessment Quarterly, 2013
The Yes/No Angoff and Bookmark method for setting standards on educational assessment are currently two of the most popular standard-setting methods. However, there is no research into the comparability of these two methods in the context of language assessment. This study compared results from the Yes/No Angoff and Bookmark methods as applied to…
Descriptors: Standard Setting (Scoring), Comparative Analysis, Language Tests, Multiple Choice Tests
Wyatt-Smith, Claire; Klenowski, Val; Gunn, Stephanie – Assessment in Education: Principles, Policy & Practice, 2010
There is a strong quest in several countries including Australia for greater national consistency in education and intensifying interest in standards for reporting. Given this, it is important to make explicit the intended and unintended consequences of assessment reform strategies and the pressures to pervert and conform. In a policy context that…
Descriptors: Foreign Countries, Student Evaluation, Teacher Attitudes, Teachers
Fowell, S. L.; Fewtrell, R.; McLaughlin, P. J. – Advances in Health Sciences Education, 2008
Absolute standard setting procedures are recommended for assessment in medical education. Absolute, test-centred standard setting procedures were introduced for written assessments in the Liverpool MBChB in 2001. The modified Angoff and Ebel methods have been used for short answer question-based and extended matching question-based papers,…
Descriptors: Medical Education, Standard Setting (Scoring), Judges, Interrater Reliability
Stone, Gregory Ethan; Beltyukova, Svetlana; Fox, Christine M. – International Journal of Testing, 2008
Judge-mediated examinations are defined as those for which expert evaluation (using rubrics) is required to determine correctness, completeness, and reasonability of test-taker responses. The use of multifaceted Rasch modeling has led to improvements in the reliability of scoring such examinations. The establishment of criterion-referenced…
Descriptors: Interrater Reliability, High Stakes Tests, Standard Setting, Minimum Competencies

Hurtz, Gregory M.; Hertz, Norman R. – Educational and Psychological Measurement, 1999
Evaluated Angoff ratings from eight different occupational licensing examinations through generalizability theory to estimate the optimal number of raters. Results indicate that approximately 10 to 15 raters is an optimal target range. (SLD)
Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Interrater Reliability
Abbott, Marilyn L. – Alberta Journal of Educational Research, 2006
The purpose of this article is to promote an increased awareness of the processes for setting cut-scores for complex performance assessments by (a) describing the Analytic Judgment Method (AJM) for setting cut-scores, and (b) critically evaluating the technical adequacy and practicability of the AJM by focusing on one investigation where the AJM…
Descriptors: Interrater Reliability, Cutting Scores, Performance Based Assessment, Standard Setting (Scoring)

Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 1996
Data from two standard-setting exercises were analyzed using the logistic regression model that assumes no variation in severity of raters, and results were compared with those obtained by logistic regression that allowed for severity variation. Results illustrate the importance of taking between-rater differences into account. (SLD)
Descriptors: Cutting Scores, Decision Making, Evaluators, Individual Differences