Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 8 |
Since 2006 (last 20 years) | 16 |
Descriptor
Scoring | 75 |
Standard Setting (Scoring) | 75 |
Cutting Scores | 32 |
Standards | 25 |
Elementary Secondary Education | 24 |
Test Items | 21 |
Evaluators | 18 |
Interrater Reliability | 18 |
Test Construction | 18 |
Evaluation Methods | 16 |
Higher Education | 16 |
More ▼ |
Source
Author
Publication Type
Education Level
Audience
Researchers | 4 |
Policymakers | 1 |
Practitioners | 1 |
Location
Tennessee | 2 |
Australia | 1 |
California | 1 |
China | 1 |
Kansas | 1 |
Maine | 1 |
Minnesota | 1 |
Nebraska | 1 |
New Hampshire | 1 |
United Kingdom | 1 |
United Kingdom (England) | 1 |
More ▼ |
Laws, Policies, & Programs
Comprehensive Education… | 2 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
National Assessment of… | 10 |
National Teacher Examinations | 4 |
Alabama High School… | 1 |
General Educational… | 1 |
New Jersey College Basic… | 1 |
What Works Clearinghouse Rating
New Meridian Corporation, 2020
New Meridian Corporation has developed the "Quality Testing Standards and Criteria for Comparability Claims" (QTS). The goal of the QTS is to provide guidance to states that are interested in including content from the New Meridian item bank and intend to make comparability claims with "other assessments" that include New…
Descriptors: Testing, Standards, Comparative Analysis, Guidelines
New Meridian Corporation, 2020
New Meridian Corporation has developed the "Quality Testing Standards and Criteria for Comparability Claims" (QTS). The goal of the QTS is to provide guidance to states that are interested in including content from the New Meridian item bank and intend to make comparability claims with "other assessments" that include New…
Descriptors: Testing, Standards, Comparative Analysis, Guidelines
Winter, Phoebe C.; Hansen, Mark; McCoy, Michelle – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2019
In order to accurately assess the English language proficiency of special populations of English learners, student assessment programs must maintain the comparability of standard and modified assessment formats, allowing for equivalent inferences to be made across student classifications. However, given the typically small size of special…
Descriptors: English Language Learners, Language Proficiency, Student Evaluation, Evaluation Methods
New Meridian Corporation, 2020
New Meridian Corporation has developed the "Quality Testing Standards and Criteria for Comparability Claims" (QTS) to provide guidance to states that are interested in including New Meridian content and would like to either keep reporting scores on the New Meridian Scale or use the New Meridian performance levels; that is, the state…
Descriptors: Testing, Standards, Comparative Analysis, Test Content
Grabovsky, Irina; Wainer, Howard – Journal of Educational and Behavioral Statistics, 2017
In this essay, we describe the construction and use of the Cut-Score Operating Function in aiding standard setting decisions. The Cut-Score Operating Function shows the relation between the cut-score chosen and the consequent error rate. It allows error rates to be defined by multiple loss functions and will show the behavior of each loss…
Descriptors: Cutting Scores, Standard Setting (Scoring), Decision Making, Error Patterns
Foley, Brett P. – Practical Assessment, Research & Evaluation, 2016
There is always a chance that examinees will answer multiple choice (MC) items correctly by guessing. Design choices in some modern exams have created situations where guessing at random through the full exam--rather than only for a subset of items where the examinee does not know the answer--can be an effective strategy to pass the exam. This…
Descriptors: Guessing (Tests), Multiple Choice Tests, Case Studies, Test Construction
Jiang, Yu; Zhang, Jiahui; Xin, Tao – Journal of Educational and Behavioral Statistics, 2019
This article is an overview of the National Assessment of Education Quality (NAEQ) of China in reading, mathematics, sciences, arts, physical education, and moral education at Grades 4 and 8. After a review of the background and history of NAEQ, we present the assessment framework with students' holistic development at the core and the design for…
Descriptors: Foreign Countries, Educational Quality, Educational Improvement, National Competency Tests
Nebraska Department of Education, 2018
The 2018 Nebraska Student-Centered Assessment System (NSCAS) Summative technical report documents the processes and procedures implemented to support the Spring 2018 NSCAS Summative English Language Arts (ELA), Mathematics, and Science assessments by NWEA under the supervision of the Nebraska Department of Education (NDE). The technical report…
Descriptors: Summative Evaluation, Language Tests, English, Mathematics Tests
GED Testing Service, 2014
This manual was written to provide technical information regarding the General Educational Development (GED®) test as evidence that the GED® test is technically sound. Throughout this manual, documentation is provided regarding the development of the GED® test and data collection activities, as well as evidence of reliability and validity. This…
Descriptors: High School Equivalency Programs, Equivalency Tests, Testing Programs, Test Validity
Northwest Evaluation Association, 2014
Recently, Northwest Evaluation Association (NWEA) completed a study to connect the scale of the Minnesota Comprehensive Assessments (MCA) Testing Program used for Minnesota's mathematics and reading assessments with NWEA's RIT (Rasch Unit) scale. Information from the state assessments was used in a study to establish performance-level scores on…
Descriptors: Alignment (Education), Testing Programs, State Programs, Mathematics Tests
Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009
Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…
Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring
Judd, Wallace – Practical Assessment, Research & Evaluation, 2009
Over the past twenty years in performance testing a specific item type with distinguishing characteristics has arisen time and time again. It's been invented independently by dozens of test development teams. And yet this item type is not recognized in the research literature. This article is an invitation to investigate the item type, evaluate…
Descriptors: Test Items, Test Format, Evaluation, Item Analysis
Dorans, Neil J.; Liang, Longjuan; Puhan, Gautam – Educational Testing Service, 2010
Scores are the most visible and widely used products of a testing program. The choice of score scale has implications for test specifications, equating, and test reliability and validity, as well as for test interpretation. At the same time, the score scale should be viewed as infrastructure likely to require repair at some point. In this report…
Descriptors: Testing Programs, Standard Setting (Scoring), Test Interpretation, Certification
Ferdous, Abdullah A.; Plake, Barbara S. – Educational and Psychological Measurement, 2008
Even when the scoring of an examination is based on item response theory (IRT), standard-setting methods seldom use this information directly when determining the minimum passing score (MPS) for an examination from an Angoff-based standard-setting study. Often, when IRT scoring is used, the MPS value for a test is converted to an IRT-based theta…
Descriptors: Standard Setting (Scoring), Scoring, Cutting Scores, Item Response Theory
Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009
Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…
Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel