ERIC - Search Results

Publication Date

In 2025	3
Since 2024	3
Since 2021 (last 5 years)	11
Since 2016 (last 10 years)	22
Since 2006 (last 20 years)	25

Source

Educational Measurement:…

Publication Type

Journal Articles	30
Reports - Research	30
Information Analyses	1
Reports - Descriptive	1

Education Level

Elementary Education	2
Elementary Secondary Education	2
Early Childhood Education	1
Grade 3	1
Grade 4	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Location

Florida	1
Idaho	1
New Hampshire	1
Washington	1
Wisconsin	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Progress in International…

What Works Clearinghouse Rating

Showing 1 to 15 of 30 results Save | Export

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

An Application of Text Embeddings to Support Alignment of Educational Content Standards

Peer reviewed

Direct link

Reese Butterfuss; Harold Doran – Educational Measurement: Issues and Practice, 2025

Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper…

Descriptors: Alignment (Education), Academic Standards, Content Analysis, Concept Mapping

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Validation as Evaluating Desired and Undesired Effects: Insights from Cross-Classified Mixed Effects Model

Peer reviewed

Direct link

Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023

The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…

Descriptors: Measurement, Validity, Reliability, Models

Evaluating Panelists' Understanding of Standard Setting Data

Peer reviewed

Direct link

Baron, Patricia; Sireci, Stephen G.; Slater, Sharon C. – Educational Measurement: Issues and Practice, 2021

Since the No Child Left Behind Act (No Child Left Behind [NCLB], 2001) was enacted, the Bookmark method has been used in many state standard setting studies (Karantonis and Sireci; Zieky, Perie, and Livingston). The purpose of the current study is to evaluate the criticism that when panelists are presented with data during the Bookmark standard…

Descriptors: State Standards, Standard Setting, Evaluators, Training

Does Special Educator Effectiveness Vary Depending on the Observation Instrument Used?

Peer reviewed

Direct link

Johnson, Evelyn S.; Crawford, Angela R.; Zheng, Yuzhu; Moylan, Laura A. – Educational Measurement: Issues and Practice, 2021

In this study, we compared the results of 27 special education teachers' evaluations using two different observation instruments, the Framework for Teaching (FFT), and the Explicit Instruction observation protocol of the Recognizing Effective Special Education Teachers (RESET) observation system. Results indicate differences in the rank-ordering…

Descriptors: Special Education Teachers, Teacher Evaluation, Teacher Effectiveness, Evaluation Methods

The Choice of Response Probability in Bookmark Standard Setting: An Experimental Study

Peer reviewed

Direct link

Baldwin, Peter; Margolis, Melissa J.; Clauser, Brian E.; Mee, Janet; Winward, Marcia – Educational Measurement: Issues and Practice, 2020

Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the…

Descriptors: Standard Setting (Scoring), Probability, Cutting Scores, Evaluation Methods

A Critical Look into the Beuk Standard-Setting Method

Peer reviewed

Direct link

Wyse, Adam E. – Educational Measurement: Issues and Practice, 2020

One commonly used compromise standard-setting method is the Beuk (1984) method. A key assumption of the Beuk method is that the emphasis given to the pass rate and the percent correct ratings should be proportional to the extent that the panelists agree on their ratings. However, whether the slope of Beuk line reflects the emphasis that panelists…

Descriptors: Standard Setting (Scoring), Cutting Scores, Weighted Scores, Evaluation Methods

Exploring the Impact of Rater Effects on Person Fit in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2020

Researchers have documented the impact of rater effects, or raters' tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers' achievement estimates given their response patterns,…

Descriptors: Performance Based Assessment, Evaluators, Achievement, Influences

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

It's Not Just Angoff: Misperceptions of Hard and Easy Items in Bookmark-Type Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020

A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…

Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items

Weighing the Value of Complex Growth Estimation Methods to Evaluate Individual Student Response to Instruction

Peer reviewed

Direct link

Van Norman, Ethan R. – Educational Measurement: Issues and Practice, 2023

Sophisticated analytic strategies have been proposed as viable methods to improve the quantification of student improvement and to assist educators in making treatment decisions. The performance of three categories of latent growth modeling techniques (linear, quadratic, and dual change) to capture growth in oral reading fluency in response to a…

Descriptors: Evaluation Methods, Student Reaction, Teaching Methods, Structural Equation Models

Embedded Standard Setting: Aligning Standard-Setting Methodology with Contemporary Assessment Design Principles

Peer reviewed

Direct link

Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020

In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…

Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment

Previous Page | Next Page »

Pages: 1 | 2

Evaluation Methods	30
Cutting Scores	8
Test Items	7
Test Validity	7
Standard Setting (Scoring)	5
Comparative Analysis	4
Performance Based Assessment	4
Scores	4
Test Interpretation	4
Testing Problems	4
Validity	4
Academic Achievement	3
Achievement Tests	3
Alternative Assessment	3
Decision Making	3
Error of Measurement	3
Evaluation Criteria	3
Evaluators	3
School Districts	3
Scoring	3
Standard Setting	3
Test Bias	3
Achievement	2
Achievement Gains	2
Artificial Intelligence	2
More ▼

Wyse, Adam E.	4
Babcock, Ben	2
Sireci, Stephen G.	2
Wind, Stefanie A.	2
Abedi, Jamal	1
An, Lily Shiao	1
Aray, Henry	1
Bakeman, Roger	1
Baldwin, Peter	1
Baron, Patricia	1
Burkett, Ruth S.	1
Castellano, Katherine E.	1
Childs, Ruth A.	1
Choi, Kilchan	1
Clauser, Brian E.	1
Cook, Robert	1
Crawford, Angela R.	1
Davis, Laurie Laughlin	1
Downing, Joyce Anderson	1
Evans, Carla M.	1
Furtak, Erin Marie	1
Gattamorta, Karina	1
Guher Gorgun	1
Gullickson, Arlen R.	1
More ▼