Publication Date
In 2025 | 3 |
Since 2024 | 8 |
Since 2021 (last 5 years) | 22 |
Since 2016 (last 10 years) | 45 |
Since 2006 (last 20 years) | 67 |
Descriptor
Evaluation Methods | 119 |
Educational Assessment | 27 |
Student Evaluation | 26 |
Test Construction | 22 |
Elementary Secondary Education | 21 |
Test Use | 19 |
Measurement Techniques | 18 |
Test Validity | 17 |
Measurement | 15 |
Models | 15 |
Validity | 12 |
More ▼ |
Source
Educational Measurement:… | 119 |
Author
Wyse, Adam E. | 4 |
Linn, Robert L. | 3 |
Penfield, Randall D. | 3 |
Reckase, Mark D. | 3 |
Shepard, Lorrie A. | 3 |
Sireci, Stephen G. | 3 |
Wind, Stefanie A. | 3 |
Babcock, Ben | 2 |
Baldwin, Peter | 2 |
Bandalos, Deborah L. | 2 |
Buckendahl, Chad W. | 2 |
More ▼ |
Publication Type
Education Level
Audience
Teachers | 1 |
Location
Nebraska | 4 |
California | 2 |
China | 1 |
Florida | 1 |
Hungary | 1 |
Idaho | 1 |
New Hampshire | 1 |
Ohio | 1 |
Poland | 1 |
USSR | 1 |
Washington | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Race to the Top | 1 |
Assessments and Surveys
ACT Assessment | 1 |
Program for International… | 1 |
Progress in International… | 1 |
SAT (College Admission Test) | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025
Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…
Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods
W. Jake Thompson; Amy K. Clark – Educational Measurement: Issues and Practice, 2024
In recent years, educators, administrators, policymakers, and measurement experts have called for assessments that support educators in making better instructional decisions. One promising approach to measurement to support instructional decision-making is diagnostic classification models (DCMs). DCMs are flexible psychometric models that…
Descriptors: Decision Making, Instructional Improvement, Evaluation Methods, Models
Buzick, Heather M.; Casabianca, Jodi M.; Gholson, Melissa L. – Educational Measurement: Issues and Practice, 2023
The article describes practical suggestions for measurement researchers and psychometricians to respond to calls for social responsibility in assessment. The underlying assumption is that personalizing large-scale assessment improves the chances that assessment and the use of test scores will contribute to equity in education. This article…
Descriptors: Achievement Tests, Individualized Instruction, Evaluation Methods, Equal Education
Reese Butterfuss; Harold Doran – Educational Measurement: Issues and Practice, 2025
Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper…
Descriptors: Alignment (Education), Academic Standards, Content Analysis, Concept Mapping
Kim, Stella Y. – Educational Measurement: Issues and Practice, 2022
In this digital ITEMS module, Dr. Stella Kim provides an overview of multidimensional item response theory (MIRT) equating. Traditional unidimensional item response theory (IRT) equating methods impose the sometimes untenable restriction on data that only a single ability is assessed. This module discusses potential sources of multidimensionality…
Descriptors: Item Response Theory, Models, Equated Scores, Evaluation Methods
Jiangang Hao; Alina A. von Davier; Victoria Yaneva; Susan Lottridge; Matthias von Davier; Deborah J. Harris – Educational Measurement: Issues and Practice, 2024
The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely,…
Descriptors: Evaluation Methods, Artificial Intelligence, Educational Change, Computer Software
Bennett, Randy E. – Educational Measurement: Issues and Practice, 2022
This commentary focuses on one of the positive impacts of COVID-19, which was to tie societal inequity to testing in a manner that could motivate the reimagining of our field. That reimagining needs to account for our nation's dramatically changing demographics so that assessment generally, and standardized testing specifically, better fit the…
Descriptors: COVID-19, Pandemics, Social Justice, Testing
Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023
The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…
Descriptors: Item Response Theory, Standard Setting, Testing, Sampling
Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022
Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…
Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis
Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025
Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…
Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation
Baldwin, Peter – Educational Measurement: Issues and Practice, 2021
In the Bookmark standard-setting procedure, panelists are instructed to consider what examinees know rather than what they might attain by guessing; however, because examinees sometimes do guess, the procedure includes a correction for guessing. Like other corrections for guessing, the Bookmark's correction assumes that examinees either know the…
Descriptors: Guessing (Tests), Student Evaluation, Evaluation Methods, Standard Setting (Scoring)
Tong, Ye – Educational Measurement: Issues and Practice, 2022
COVID-19 is disrupting assessment practices and accelerating changes. With special focus on K-12 and credentialing exams, this article describes the series of changes observed during the pandemic, the solutions assessment providers have implemented, and the long-term impact on future practices. Additionally, this article highlights the importance…
Descriptors: COVID-19, Pandemics, Elementary Secondary Education, Evaluation Methods
Middleton, Kyndra V. – Educational Measurement: Issues and Practice, 2022
The onset of the coronavirus pandemic forced schools and universities across the nation and world to close and move to distance learning rather immediately. Almost two years later, colleges and universities have reopened, and most students have returned to campuses, but distance learning still occurs at a much higher rate than before the beginning…
Descriptors: Computer Assisted Testing, Internet, Student Evaluation, College Students
Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023
The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…
Descriptors: Measurement, Validity, Reliability, Models
Baron, Patricia; Sireci, Stephen G.; Slater, Sharon C. – Educational Measurement: Issues and Practice, 2021
Since the No Child Left Behind Act (No Child Left Behind [NCLB], 2001) was enacted, the Bookmark method has been used in many state standard setting studies (Karantonis and Sireci; Zieky, Perie, and Livingston). The purpose of the current study is to evaluate the criticism that when panelists are presented with data during the Bookmark standard…
Descriptors: State Standards, Standard Setting, Evaluators, Training