ERIC - Search Results

Publication Date

In 2025	3
Since 2024	8
Since 2021 (last 5 years)	22
Since 2016 (last 10 years)	45
Since 2006 (last 20 years)	67

Descriptor

Evaluation Methods	119
Educational Assessment	27
Student Evaluation	26
Test Construction	22
Elementary Secondary Education	21
Test Use	19
Measurement Techniques	18
Test Validity	17
Measurement	15
Models	15
Validity	12
Test Interpretation	11
Test Items	11
Academic Achievement	10
Cutting Scores	10
Scores	10
Achievement Tests	9
Computer Assisted Testing	9
Educational Testing	9
Evaluation Criteria	9
Item Response Theory	9
School Districts	9
Standardized Tests	9
Accountability	8
Standard Setting (Scoring)	8
More ▼

Source

Educational Measurement:…

119

Publication Type

Journal Articles	119
Reports - Descriptive	38
Reports - Evaluative	38
Reports - Research	30
Opinion Papers	11
Speeches/Meeting Papers	6
Information Analyses	5
Guides - Non-Classroom	3
Tests/Questionnaires	3
Book/Product Reviews	1
Historical Materials	1
More ▼

Education Level

Elementary Secondary Education	9
Higher Education	7
Postsecondary Education	4
Adult Education	2
Elementary Education	2
Early Childhood Education	1
Grade 3	1
Grade 4	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Teachers

Location

Nebraska	4
California	2
China	1
Florida	1
Hungary	1
Idaho	1
New Hampshire	1
Ohio	1
Poland	1
USSR	1
Washington	1
Wisconsin	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	2
Race to the Top	1

Assessments and Surveys

ACT Assessment	1
Program for International…	1
Progress in International…	1
SAT (College Admission Test)	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 119 results Save | Export

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

Improving Instructional Decision-Making Using Diagnostic Classification Models

Peer reviewed

Direct link

W. Jake Thompson; Amy K. Clark – Educational Measurement: Issues and Practice, 2024

In recent years, educators, administrators, policymakers, and measurement experts have called for assessments that support educators in making better instructional decisions. One promising approach to measurement to support instructional decision-making is diagnostic classification models (DCMs). DCMs are flexible psychometric models that…

Descriptors: Decision Making, Instructional Improvement, Evaluation Methods, Models

Personalizing Large-Scale Assessment in Practice

Peer reviewed

Direct link

Buzick, Heather M.; Casabianca, Jodi M.; Gholson, Melissa L. – Educational Measurement: Issues and Practice, 2023

The article describes practical suggestions for measurement researchers and psychometricians to respond to calls for social responsibility in assessment. The underlying assumption is that personalizing large-scale assessment improves the chances that assessment and the use of test scores will contribute to equity in education. This article…

Descriptors: Achievement Tests, Individualized Instruction, Evaluation Methods, Equal Education

An Application of Text Embeddings to Support Alignment of Educational Content Standards

Peer reviewed

Direct link

Reese Butterfuss; Harold Doran – Educational Measurement: Issues and Practice, 2025

Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper…

Descriptors: Alignment (Education), Academic Standards, Content Analysis, Concept Mapping

Digital Module 29: Multidimensional Item Response Theory Equating

Peer reviewed

Direct link

Kim, Stella Y. – Educational Measurement: Issues and Practice, 2022

In this digital ITEMS module, Dr. Stella Kim provides an overview of multidimensional item response theory (MIRT) equating. Traditional unidimensional item response theory (IRT) equating methods impose the sometimes untenable restriction on data that only a single ability is assessed. This module discusses potential sources of multidimensionality…

Descriptors: Item Response Theory, Models, Equated Scores, Evaluation Methods

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Peer reviewed

Direct link

Jiangang Hao; Alina A. von Davier; Victoria Yaneva; Susan Lottridge; Matthias von Davier; Deborah J. Harris – Educational Measurement: Issues and Practice, 2024

The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely,…

Descriptors: Evaluation Methods, Artificial Intelligence, Educational Change, Computer Software

The Good Side of COVID-19

Peer reviewed

Direct link

Bennett, Randy E. – Educational Measurement: Issues and Practice, 2022

This commentary focuses on one of the positive impacts of COVID-19, which was to tie societal inequity to testing in a manner that could motivate the reimagining of our field. That reimagining needs to account for our nation's dramatically changing demographics so that assessment generally, and standardized testing specifically, better fit the…

Descriptors: COVID-19, Pandemics, Social Justice, Testing

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

A Problem with the Bookmark Procedure's Correction for Guessing

Peer reviewed

Direct link

Baldwin, Peter – Educational Measurement: Issues and Practice, 2021

In the Bookmark standard-setting procedure, panelists are instructed to consider what examinees know rather than what they might attain by guessing; however, because examinees sometimes do guess, the procedure includes a correction for guessing. Like other corrections for guessing, the Bookmark's correction assumes that examinees either know the…

Descriptors: Guessing (Tests), Student Evaluation, Evaluation Methods, Standard Setting (Scoring)

NCME Presidential Address 2021: Assessment Research and Practice in the Post-COVID-19 Era

Peer reviewed

Direct link

Tong, Ye – Educational Measurement: Issues and Practice, 2022

COVID-19 is disrupting assessment practices and accelerating changes. With special focus on K-12 and credentialing exams, this article describes the series of changes observed during the pandemic, the solutions assessment providers have implemented, and the long-term impact on future practices. Additionally, this article highlights the importance…

Descriptors: COVID-19, Pandemics, Elementary Secondary Education, Evaluation Methods

Considerations for Future Online Testing and Assessment in Colleges and Universities

Peer reviewed

Direct link

Middleton, Kyndra V. – Educational Measurement: Issues and Practice, 2022

The onset of the coronavirus pandemic forced schools and universities across the nation and world to close and move to distance learning rather immediately. Almost two years later, colleges and universities have reopened, and most students have returned to campuses, but distance learning still occurs at a much higher rate than before the beginning…

Descriptors: Computer Assisted Testing, Internet, Student Evaluation, College Students

Validation as Evaluating Desired and Undesired Effects: Insights from Cross-Classified Mixed Effects Model

Peer reviewed

Direct link

Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023

The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…

Descriptors: Measurement, Validity, Reliability, Models

Evaluating Panelists' Understanding of Standard Setting Data

Peer reviewed

Direct link

Baron, Patricia; Sireci, Stephen G.; Slater, Sharon C. – Educational Measurement: Issues and Practice, 2021

Since the No Child Left Behind Act (No Child Left Behind [NCLB], 2001) was enacted, the Bookmark method has been used in many state standard setting studies (Karantonis and Sireci; Zieky, Perie, and Livingston). The purpose of the current study is to evaluate the criticism that when panelists are presented with data during the Bookmark standard…

Descriptors: State Standards, Standard Setting, Evaluators, Training

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Wyse, Adam E.	4
Linn, Robert L.	3
Penfield, Randall D.	3
Reckase, Mark D.	3
Shepard, Lorrie A.	3
Sireci, Stephen G.	3
Wind, Stefanie A.	3
Babcock, Ben	2
Baldwin, Peter	2
Bandalos, Deborah L.	2
Buckendahl, Chad W.	2
Herman, Joan	2
Impara, James C.	2
Nichols, Paul D.	2
Nitko, Anthony J.	2
Plake, Barbara S.	2
Stiggins, Richard J.	2
Yen, Wendy M.	2
Abedi, Jamal	1
Airasian, Peter W.	1
Alina A. von Davier	1
Ames, Allison J.	1
Amy K. Clark	1
An, Lily Shiao	1
More ▼