Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 6 |
Descriptor
Difficulty Level | 11 |
Evaluators | 11 |
Cutting Scores | 4 |
Academic Standards | 3 |
Comparative Analysis | 3 |
Decision Making | 3 |
Interrater Reliability | 3 |
Scoring | 3 |
Standard Setting (Scoring) | 3 |
Test Format | 3 |
Test Items | 3 |
More ▼ |
Source
Educational Measurement:… | 2 |
Cambridge Assessment | 1 |
Evaluation & Research in… | 1 |
International Journal of… | 1 |
Journal of Educational… | 1 |
Quality Assurance in… | 1 |
Thomas B. Fordham Institute | 1 |
Author
Alexander, David E. | 1 |
Carlson, Alfred B. | 1 |
Clauser, Brian E. | 1 |
Clauser, Jerome C. | 1 |
Coleman, Tori | 1 |
Crisp, Victoria | 1 |
Darlington, Ellie | 1 |
Davis, Ian R. | 1 |
Elliott, Gill | 1 |
Faggen, Jane | 1 |
Fennell, Francis | 1 |
More ▼ |
Publication Type
Reports - Evaluative | 11 |
Journal Articles | 6 |
Speeches/Meeting Papers | 1 |
Education Level
Early Childhood Education | 1 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Higher Education | 1 |
Kindergarten | 1 |
Postsecondary Education | 1 |
Primary Education | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Alexander, David E.; Davis, Ian R. – Quality Assurance in Education: An International Perspective, 2019
Purpose: The purpose of this paper is to review the issues and challenges associated with examining PhD theses in the modern, rapidly changing academic world. The PhD degree has been described as the "pinnacle of academic qualifications", but it is under threat in terms of the quality of supervision and the outcome of examinations. By…
Descriptors: Doctoral Dissertations, Doctoral Degrees, Educational Quality, Difficulty Level
Greatorex, Jackie; Rushton, Nicky; Coleman, Tori; Darlington, Ellie; Elliott, Gill – Cambridge Assessment, 2019
A curriculum map is a visualisation of relationships within and between a curriculum or curricula. Curriculum mapping refers to the method for creating and using the curriculum map, however this term is used broadly and encompasses a variety of methodological approaches. Often, researchers in the field of curriculum studies conduct curriculum…
Descriptors: Comparative Analysis, Visualization, Curriculum, Maps
Friedberg, Solomon; Shanahan, Tim; Fennell, Francis; Fisher, Douglas; Howe, Roger – Thomas B. Fordham Institute, 2020
A decade ago, states across the nation adopted the Common Core State Standards (CCSS) in an effort to raise the academic bar for their students. This has provoked countless political battles since then--including an especially intense one in Florida. That fight culminated in Governor Ron DeSantis vowing to eliminate and replace the Common Core,…
Descriptors: Common Core State Standards, Academic Achievement, Benchmarking, Academic Standards
Crisp, Victoria; Novakovic, Nadezda – Evaluation & Research in Education, 2009
Maintaining standards over time is a much debated topic in the context of national examinations in the UK. This study used a pilot method to compare the demands, over time, of two examination units testing administration. The method involved 15 experts revising a framework of demand types and making paired comparisons of examinations from…
Descriptors: Pilot Projects, Test Reliability, Difficulty Level, Comparative Analysis
Lamprianou, Iasonas – International Journal of Testing, 2008
This study investigates the effect of reporting the unadjusted raw scores in a high-stakes language exam when raters differ significantly in severity and self-selected questions differ significantly in difficulty. More sophisticated models, introducing meaningful facets and parameters, are successively used to investigate the characteristics of…
Descriptors: High Stakes Tests, Raw Scores, Item Response Theory, Language Tests

Faggen, Jane; And Others – 1995
The objective of this study was to determine the degree to which recommendations for passing scores, calculated on the basis of a traditional standard-setting methodology, might be affected by the mode (paper versus computer-screen prints) in which test items were presented to standard setting panelists. Results were based on the judgments of 31…
Descriptors: Computer Assisted Testing, Cutting Scores, Difficulty Level, Evaluators
Lunz, Mary E.; Stahl, John A. – 1990
Three examinations administered to medical students were analyzed to determine differences among severities of judges' assessments and among grading periods. The examinations included essay, clinical, and oral forms of the tests. Twelve judges graded the three essays for 32 examinees during a 4-day grading session, which was divided into eight…
Descriptors: Clinical Diagnosis, Comparative Testing, Difficulty Level, Essay Tests

Mills, Craig N.; And Others – Educational Measurement: Issues and Practice, 1991
An approach is presented to the definition of minimal competence for judges to use in standard setting. Panelists in standard setting must receive training to ensure that differences in rating result from differences in perceptions of item difficulty, not in differences of opinion about the definition of minimal competence. (SLD)
Descriptors: Cutting Scores, Decision Making, Definitions, Difficulty Level

Reid, Jerry B. – Educational Measurement: Issues and Practice, 1991
Training judges to generate item ratings in standard setting once the reference group has been defined is discussed. It is proposed that sensitivity to the factors that determine difficulty can be improved through training. Three criteria for determining when training is sufficient are offered. (SLD)
Descriptors: Computer Assisted Instruction, Difficulty Level, Evaluators, Interrater Reliability
Smith, Robert L.; Carlson, Alfred B. – 1995
The feasibility of constructing test forms with practically equivalent cut scores using judges' estimates of item difficulty as target "statistical" specifications was investigated. Test forms with equivalent judgmental cut scores (based on judgments of item difficulty) were assembled using items from six operational forms of the…
Descriptors: Cutting Scores, Decision Making, Difficulty Level, Equated Scores