ERIC - Search Results

Publication Date

In 2025	2
Since 2024	3
Since 2021 (last 5 years)	10
Since 2016 (last 10 years)	38
Since 2006 (last 20 years)	77

Descriptor

Standards	25
Cutting Scores	20
Test Items	19
Evaluation Methods	17
Standard Setting	17
Standard Setting (Scoring)	16
Academic Standards	14
Educational Testing	13
Test Construction	13
State Standards	12
Educational Assessment	11
Validity	11
Scores	10
Accountability	9
Alignment (Education)	9
Psychometrics	8
Test Validity	8
Academic Achievement	7
Content Analysis	7
Mathematics Tests	7
Measurement	7
Student Evaluation	7
Decision Making	6
Evaluation	6
Models	6
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	77
Reports - Research	35
Reports - Descriptive	24
Reports - Evaluative	14
Opinion Papers	4
Information Analyses	1

Education Level

Elementary Secondary Education	12
Secondary Education	6
Higher Education	5
Elementary Education	4
Postsecondary Education	4
Grade 3	3
Grade 5	3
Grade 4	2
High Schools	2
Junior High Schools	2
Middle Schools	2
Adult Education	1
Early Childhood Education	1
Grade 9	1
Kindergarten	1
More ▼

Audience

Location

Colorado	1
Germany	1
Haiti	1
Indiana	1
Maryland	1
New Hampshire	1
Oregon	1
United States	1

Laws, Policies, & Programs

No Child Left Behind Act 2001	3
Every Student Succeeds Act…	2

Assessments and Surveys

ACT Assessment	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 77 results Save | Export

Evaluating Panelists' Understanding of Standard Setting Data

Peer reviewed

Direct link

Baron, Patricia; Sireci, Stephen G.; Slater, Sharon C. – Educational Measurement: Issues and Practice, 2021

Since the No Child Left Behind Act (No Child Left Behind [NCLB], 2001) was enacted, the Bookmark method has been used in many state standard setting studies (Karantonis and Sireci; Zieky, Perie, and Livingston). The purpose of the current study is to evaluate the criticism that when panelists are presented with data during the Bookmark standard…

Descriptors: State Standards, Standard Setting, Evaluators, Training

Causal Inference and COVID: Contrasting Methods for Evaluating Pandemic Impacts Using State Assessments

Peer reviewed

Direct link

Shear, Benjamin R. – Educational Measurement: Issues and Practice, 2023

In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to…

Descriptors: COVID-19, Pandemics, School Closing, Measurement

An Application of Text Embeddings to Support Alignment of Educational Content Standards

Peer reviewed

Direct link

Reese Butterfuss; Harold Doran – Educational Measurement: Issues and Practice, 2025

Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper…

Descriptors: Alignment (Education), Academic Standards, Content Analysis, Concept Mapping

Do Subject Matter Experts' Judgments of Multiple-Choice Format Suitability Predict Item Quality?

Peer reviewed

Direct link

Berenbon, Rebecca F.; McHugh, Bridget C. – Educational Measurement: Issues and Practice, 2023

To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ…

Descriptors: Test Items, Multiple Choice Tests, Standards, Difficulty Level

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

Demystifying Adequate Growth Percentiles

Peer reviewed

Direct link

Katherine E. Castellano; Daniel F. McCaffrey; Joseph A. Martineau – Educational Measurement: Issues and Practice, 2025

Growth-to-standard models evaluate student growth against the growth needed to reach a future standard or target of interest, such as proficiency. A common growth-to-standard model involves comparing the popular Student Growth Percentile (SGP) to Adequate Growth Percentiles (AGPs). AGPs follow from an involved process based on fitting a series of…

Descriptors: Student Evaluation, Growth Models, Student Educational Objectives, Educational Indicators

A Problem with the Bookmark Procedure's Correction for Guessing

Peer reviewed

Direct link

Baldwin, Peter – Educational Measurement: Issues and Practice, 2021

In the Bookmark standard-setting procedure, panelists are instructed to consider what examinees know rather than what they might attain by guessing; however, because examinees sometimes do guess, the procedure includes a correction for guessing. Like other corrections for guessing, the Bookmark's correction assumes that examinees either know the…

Descriptors: Guessing (Tests), Student Evaluation, Evaluation Methods, Standard Setting (Scoring)

Supporting the Interpretive Validity of Student-Level Claims in Science Assessment with Tiered Claim Structures

Peer reviewed

Direct link

Student, Sanford R.; Gong, Brian – Educational Measurement: Issues and Practice, 2022

We address two persistent challenges in large-scale assessments of the Next Generation Science Standards: (a) the validity of score interpretations that target the standards broadly and (b) how to structure claims for assessments of this complex domain. The NGSS pose a particular challenge for specifying claims about students that evidence from…

Descriptors: Science Tests, Test Validity, Test Items, Test Construction

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

Peer reviewed

Direct link

Baldwin, Peter; Margolis, Melissa J.; Clauser, Brian E.; Mee, Janet; Winward, Marcia – Educational Measurement: Issues and Practice, 2020

Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the…

Descriptors: Standard Setting (Scoring), Probability, Cutting Scores, Evaluation Methods

A Critical Look into the Beuk Standard-Setting Method

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2020

One commonly used compromise standard-setting method is the Beuk (1984) method. A key assumption of the Beuk method is that the emphasis given to the pass rate and the percent correct ratings should be proportional to the extent that the panelists agree on their ratings. However, whether the slope of Beuk line reflects the emphasis that panelists…

Descriptors: Standard Setting (Scoring), Cutting Scores, Weighted Scores, Evaluation Methods

Condensed Mastery Profile Method for Setting Standards for Diagnostic Assessment Systems

Peer reviewed

Direct link

Clark, A. K.; Nash, B.; Karvonen, M.; Kingston, N. – Educational Measurement: Issues and Practice, 2017

The purpose of this study was to develop a standard-setting method appropriate for use with a diagnostic assessment that produces profiles of student mastery rather than a single raw or scale score value. The condensed mastery profile method draws from established holistic standard-setting methods to use rounds of range finding and pinpointing to…

Descriptors: Diagnostic Tests, Standard Setting (Scoring), Cutting Scores, Performance

Generating Performance-Level Descriptors under a Principled Assessment Design Paradigm: An Example for Assessments under the Next-Generation Science Standards

Peer reviewed

Direct link

Luecht, Richard M. – Educational Measurement: Issues and Practice, 2020

The educational testing landscape is changing in many significant ways as evidence-based, principled assessment design (PAD) approaches are formally adopted. This article discusses the challenges and presents some score scale- and task-focused strategies for developing useful performance-level descriptors (PLDs) under a PAD approach. Details of…

Descriptors: Test Construction, Academic Standards, Science Education, Educational Testing

Using Diagnostic Profiles to Describe Borderline Performance in Standard Setting

Peer reviewed

Direct link

Skaggs, Gary; Hein, Serge F.; Wilkins, Jesse L. M. – Educational Measurement: Issues and Practice, 2020

In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can…

Descriptors: Standard Setting (Scoring), Cutting Scores, Testing Problems, Profiles

Measuring Textbook Content Coverage: Efficient Content Analysis with Lesson Sampling

Peer reviewed

Direct link

Zhang, Jiahui; Cogan, Leland S.; Schmidt, William H. – Educational Measurement: Issues and Practice, 2020

This study addresses measurement issues around a standards-based content analysis of mathematics textbooks' coverage of standards for use in large-scale monitoring of standards implementation as proposed in a 2013 report by the National Research Council. An earlier study produced an exhaustive content analysis of textbooks using the 2012 Common…

Descriptors: Textbook Content, Academic Standards, Mathematics Curriculum, Content Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Sireci, Stephen G.	5
Clauser, Brian E.	4
Margolis, Melissa J.	4
Polikoff, Morgan S.	4
Wyse, Adam E.	4
Mee, Janet	3
Anderson, Daniel	2
Babcock, Ben	2
Baldwin, Peter	2
Camara, Wayne J.	2
Hein, Serge F.	2
Nichols, Paul	2
Plake, Barbara S.	2
Porter, Andrew C.	2
Reckase, Mark D.	2
Skaggs, Gary	2
Winward, Marcia	2
Wise, Lauress L.	2
Abedi, Jamal	1
Alonzo, Julie	1
April L. Zenisky	1
Baron, Patricia	1
Berenbon, Rebecca F.	1
Bottsford-Miller, Nicole A.	1
Brookhart, Susan M.	1
More ▼