ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	13
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	37

Descriptor

Evaluation Methods	72
Test Construction	22
Test Use	19
Educational Assessment	18
Student Evaluation	17
Test Validity	17
Elementary Secondary Education	15
Test Interpretation	11
Test Items	11
Models	10
Achievement Tests	9
Measurement	9
Measurement Techniques	9
Scores	9
Standardized Tests	9
Educational Testing	8
Testing Problems	8
Test Bias	7
Validity	7
Computer Assisted Testing	6
Cutting Scores	6
Evaluation Criteria	6
Evaluation Utilization	6
Item Response Theory	6
Program Evaluation	6
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	72
Reports - Evaluative	26
Reports - Descriptive	19
Reports - Research	16
Opinion Papers	10
Speeches/Meeting Papers	5
Information Analyses	4
Tests/Questionnaires	3
Book/Product Reviews	1
Guides - Non-Classroom	1
Historical Materials	1
More ▼

Education Level

Elementary Secondary Education	4
Adult Education	1
Elementary Education	1
Grade 4	1
Higher Education	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

California	2
Hungary	1
Nebraska	1
Ohio	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Progress in International…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 72 results Save | Export

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Evolving Educational Testing to Meet Students' Needs: Design-in-Real-Time Assessment

Peer reviewed

Direct link

Stephen G. Sireci; Javier Suárez-Álvarez; April L. Zenisky; Maria Elena Oliveri – Educational Measurement: Issues and Practice, 2024

The goal in personalized assessment is to best fit the needs of each individual test taker, given the assessment purposes. Design-in-Real-Time (DIRTy) assessment reflects the progressive evolution in testing from a single test, to an adaptive test, to an adaptive assessment "system." In this article, we lay the foundation for DIRTy…

Descriptors: Educational Assessment, Student Needs, Test Format, Test Construction

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

Reframing Research and Assessment Practices: Advancing an Antiracist and Anti-Ableist Research Agenda

Peer reviewed

Direct link

Angela Johnson; Elizabeth Barker; Marcos Viveros Cespedes – Educational Measurement: Issues and Practice, 2024

Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy…

Descriptors: Equal Education, Inclusion, Evaluation Methods, Error of Measurement

Validation as Evaluating Desired and Undesired Effects: Insights from Cross-Classified Mixed Effects Model

Peer reviewed

Direct link

Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023

The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…

Descriptors: Measurement, Validity, Reliability, Models

Improving Instructional Decision-Making Using Diagnostic Classification Models

Peer reviewed

Direct link

W. Jake Thompson; Amy K. Clark – Educational Measurement: Issues and Practice, 2024

In recent years, educators, administrators, policymakers, and measurement experts have called for assessments that support educators in making better instructional decisions. One promising approach to measurement to support instructional decision-making is diagnostic classification models (DCMs). DCMs are flexible psychometric models that…

Descriptors: Decision Making, Instructional Improvement, Evaluation Methods, Models

Personalizing Large-Scale Assessment in Practice

Peer reviewed

Direct link

Buzick, Heather M.; Casabianca, Jodi M.; Gholson, Melissa L. – Educational Measurement: Issues and Practice, 2023

The article describes practical suggestions for measurement researchers and psychometricians to respond to calls for social responsibility in assessment. The underlying assumption is that personalizing large-scale assessment improves the chances that assessment and the use of test scores will contribute to equity in education. This article…

Descriptors: Achievement Tests, Individualized Instruction, Evaluation Methods, Equal Education

Exploring the Impact of Rater Effects on Person Fit in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Educational Measurement: Issues and Practice, 2020

Researchers have documented the impact of rater effects, or raters' tendencies to give different ratings than would be expected given examinee achievement levels, in performance assessments. However, the degree to which rater effects influence person fit, or the reasonableness of test-takers' achievement estimates given their response patterns,…

Descriptors: Performance Based Assessment, Evaluators, Achievement, Influences

The Good Side of COVID-19

Peer reviewed

Direct link

Bennett, Randy E. – Educational Measurement: Issues and Practice, 2022

This commentary focuses on one of the positive impacts of COVID-19, which was to tie societal inequity to testing in a manner that could motivate the reimagining of our field. That reimagining needs to account for our nation's dramatically changing demographics so that assessment generally, and standardized testing specifically, better fit the…

Descriptors: COVID-19, Pandemics, Social Justice, Testing

Embedded Standard Setting: Aligning Standard-Setting Methodology with Contemporary Assessment Design Principles

Peer reviewed

Direct link

Lewis, Daniel; Cook, Robert – Educational Measurement: Issues and Practice, 2020

In this paper we assert that the practice of principled assessment design renders traditional standard-setting methodology redundant at best and contradictory at worst. We describe the rationale for, and methodological details of, Embedded Standard Setting (ESS; previously, Engineered Cut Scores. Lewis, 2016), an approach to establish performance…

Descriptors: Standard Setting, Evaluation, Cutting Scores, Performance Based Assessment

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

A Problem with the Bookmark Procedure's Correction for Guessing

Peer reviewed

Direct link

Baldwin, Peter – Educational Measurement: Issues and Practice, 2021

In the Bookmark standard-setting procedure, panelists are instructed to consider what examinees know rather than what they might attain by guessing; however, because examinees sometimes do guess, the procedure includes a correction for guessing. Like other corrections for guessing, the Bookmark's correction assumes that examinees either know the…

Descriptors: Guessing (Tests), Student Evaluation, Evaluation Methods, Standard Setting (Scoring)

NCME Presidential Address 2021: Assessment Research and Practice in the Post-COVID-19 Era

Peer reviewed

Direct link

Tong, Ye – Educational Measurement: Issues and Practice, 2022

COVID-19 is disrupting assessment practices and accelerating changes. With special focus on K-12 and credentialing exams, this article describes the series of changes observed during the pandemic, the solutions assessment providers have implemented, and the long-term impact on future practices. Additionally, this article highlights the importance…

Descriptors: COVID-19, Pandemics, Elementary Secondary Education, Evaluation Methods

It's Not Just Angoff: Misperceptions of Hard and Easy Items in Bookmark-Type Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Educational Measurement: Issues and Practice, 2020

A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark…

Descriptors: Standard Setting (Scoring), Evaluation Methods, Testing Problems, Test Items

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Linn, Robert L.	3
Shepard, Lorrie A.	3
Wind, Stefanie A.	3
Nichols, Paul D.	2
Nitko, Anthony J.	2
Reckase, Mark D.	2
Sireci, Stephen G.	2
Wyse, Adam E.	2
Airasian, Peter W.	1
Amy K. Clark	1
An, Lily Shiao	1
Angela Johnson	1
April L. Zenisky	1
Aray, Henry	1
Babcock, Ben	1
Bakeman, Roger	1
Baldwin, Peter	1
Bennett, Randy E.	1
Benson, Jeri	1
Bolt, Daniel M.	1
Burling, Kelly S.	1
Buzick, Heather M.	1
Casabianca, Jodi M.	1
Castellano, Katherine E.	1
More ▼