ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	16

Descriptor

Evaluation Methods	30
Test Validity	17
Validity	12
Test Construction	8
Achievement Tests	6
Test Use	6
Educational Assessment	5
Program Evaluation	5
Standardized Tests	5
Student Evaluation	5
Test Interpretation	5
Educational Testing	4
Elementary Secondary Education	4
Evaluation Criteria	4
Measurement	4
Scores	4
Test Items	4
Testing Problems	4
Educational Improvement	3
Elementary Education	3
Error of Measurement	3
Evaluation Utilization	3
Formative Evaluation	3
Models	3
National Norms	3
More ▼

Source

Educational Measurement:…

Publication Type

Journal Articles	30
Reports - Research	11
Reports - Descriptive	9
Reports - Evaluative	7
Opinion Papers	4
Information Analyses	2
Speeches/Meeting Papers	2

Education Level

Higher Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 4	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

California	1
Ohio	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Progress in International…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 30 results Save | Export

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Peer reviewed

Direct link

Jiangang Hao; Alina A. von Davier; Victoria Yaneva; Susan Lottridge; Matthias von Davier; Deborah J. Harris – Educational Measurement: Issues and Practice, 2024

The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely,…

Descriptors: Evaluation Methods, Artificial Intelligence, Educational Change, Computer Software

Applying a Mixture Rasch Model-Based Approach to Standard Setting

Peer reviewed

Direct link

Peabody, Michael R.; Muckle, Timothy J.; Meng, Yu – Educational Measurement: Issues and Practice, 2023

The subjective aspect of standard-setting is often criticized, yet data-driven standard-setting methods are rarely applied. Therefore, we applied a mixture Rasch model approach to setting performance standards across several testing programs of various sizes and compared the results to existing passing standards derived from traditional…

Descriptors: Item Response Theory, Standard Setting, Testing, Sampling

Setting and Validating Multiple Standards on a Multistage-Adaptive Test

Peer reviewed

Direct link

Lewis, Jennifer; Lim, Hwanggyu; Padellaro, Frank; Sireci, Stephen G.; Zenisky, April L. – Educational Measurement: Issues and Practice, 2022

Setting cut scores on (MSTs) is difficult, particularly when the test spans several grade levels, and the selection of items from MST panels must reflect the operational test specifications. In this study, we describe, illustrate, and evaluate three methods for mapping panelists' Angoff ratings into cut scores on the scale underlying an MST. The…

Descriptors: Cutting Scores, Adaptive Testing, Test Items, Item Analysis

Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study

Peer reviewed

Direct link

Guher Gorgun; Okan Bulut – Educational Measurement: Issues and Practice, 2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for…

Descriptors: Artificial Intelligence, Quality Control, Technology Uses in Education, Automation

Validation as Evaluating Desired and Undesired Effects: Insights from Cross-Classified Mixed Effects Model

Peer reviewed

Direct link

Ji, Xuejun Ryan; Wu, Amery D. – Educational Measurement: Issues and Practice, 2023

The Cross-Classified Mixed Effects Model (CCMEM) has been demonstrated to be a flexible framework for evaluating reliability by measurement specialists. Reliability can be estimated based on the variance components of the test scores. Built upon their accomplishment, this study extends the CCMEM to be used for evaluating validity evidence.…

Descriptors: Measurement, Validity, Reliability, Models

Disrupted Data: Using Longitudinal Assessment Systems to Monitor Test Score Quality

Peer reviewed

Direct link

An, Lily Shiao; Ho, Andrew Dean; Davis, Laurie Laughlin – Educational Measurement: Issues and Practice, 2022

Technical documentation for educational tests focuses primarily on properties of individual scores at single points in time. Reliability, standard errors of measurement, item parameter estimates, fit statistics, and linking constants are standard technical features that external stakeholders use to evaluate items and individual scale scores.…

Descriptors: Documentation, Scores, Evaluation Methods, Longitudinal Studies

Reframing Research and Assessment Practices: Advancing an Antiracist and Anti-Ableist Research Agenda

Peer reviewed

Direct link

Angela Johnson; Elizabeth Barker; Marcos Viveros Cespedes – Educational Measurement: Issues and Practice, 2024

Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy…

Descriptors: Equal Education, Inclusion, Evaluation Methods, Error of Measurement

The Accuracy of Aggregate Student Growth Percentiles as Indicators of Educator Performance

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F. – Educational Measurement: Issues and Practice, 2017

Mean or median student growth percentiles (MGPs) are a popular measure of educator performance, but they lack rigorous evaluation. This study investigates the error in MGP due to test score measurement error (ME). Using analytic derivations, we find that errors in the commonly used MGP are correlated with average prior latent achievement: Teachers…

Descriptors: Teacher Evaluation, Teacher Effectiveness, Value Added Models, Achievement Gains

Exploring the Utility of Sequential Analysis in Studying Informal Formative Assessment Practices

Peer reviewed

Direct link

Furtak, Erin Marie; Ruiz-Primo, Maria Araceli; Bakeman, Roger – Educational Measurement: Issues and Practice, 2017

Formative assessment is a classroom practice that has received much attention in recent years for its established potential at increasing student learning. A frequent analytic approach for determining the quality of formative assessment practices is to develop a coding scheme and determine frequencies with which the codes are observed; however,…

Descriptors: Sequential Approach, Formative Evaluation, Alternative Assessment, Incidence

A Critical Review of Some Qualitative Research Methods Used to Explore Rater Cognition

Peer reviewed

Direct link

Suto, Irenka – Educational Measurement: Issues and Practice, 2012

Internationally, many assessment systems rely predominantly on human raters to score examinations. Arguably, this facilitates the assessment of multiple sophisticated educational constructs, strengthening assessment validity. It can introduce subjectivity into the scoring process, however, engendering threats to accuracy. The present objectives…

Descriptors: Evaluation Methods, Scoring, Qualitative Research, Protocol Analysis

Reliability as Argument

Peer reviewed

Direct link

Parkes, Jay – Educational Measurement: Issues and Practice, 2007

Reliability consists of both important social and scientific values and methods for evidencing those values, though in practice methods are often conflated with the values. With the two distinctly understood, a reliability argument can be made that articulates the particular reliability values most relevant to the particular measurement situation…

Descriptors: Validity, Reliability, Evaluation Methods, Measurement

Validity Issues in Test Speededness

Peer reviewed

Direct link

Lu, Ying; Sireci, Stephen G. – Educational Measurement: Issues and Practice, 2007

Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe…

Descriptors: Test Items, Timed Tests, Standardized Tests, Test Validity

Validity Evidence of an Electronic Portfolio for Preservice Teachers

Peer reviewed

Direct link

Yao, Yuankun; Thomas, Matt; Nickens, Nicole; Downing, Joyce Anderson; Burkett, Ruth S.; Lamson, Sharon – Educational Measurement: Issues and Practice, 2008

This study applied Messick's unified, multifaceted concept of construct validity to an electronic portfolio system used in a teacher education program. The subjects included 128 preservice teachers who recently completed their final portfolio reviews and student teaching experiences. Four of Messick's six facets of validity were investigated for…

Descriptors: Student Teaching, Portfolios (Background Materials), Preservice Teachers, Preservice Teacher Education

Construct-Irrelevant Variance in High-Stakes Testing

Peer reviewed

Direct link

Haladyna, Thomas M.; Downing, Steven M. – Educational Measurement: Issues and Practice, 2004

There are many threats to validity in high-stakes achievement testing. One major threat is construct-irrelevant variance (CIV). This article defines CIV in the context of the contemporary, unitary view of validity and presents logical arguments, hypotheses, and documentation for a variety of CIV sources that commonly threaten interpretations of…

Descriptors: Student Evaluation, Evaluation Methods, High Stakes Tests, Construct Validity

A Framework for Evaluating and Planning Assessments Intended to Improve Student Achievement

Peer reviewed

Direct link

Nichols, Paul D.; Meyers, Jason L.; Burling, Kelly S. – Educational Measurement: Issues and Practice, 2009

Assessments labeled as formative have been offered as a means to improve student achievement. But labels can be a powerful way to miscommunicate. For an assessment use to be appropriately labeled "formative," both empirical evidence and reasoned arguments must be offered to support the claim that improvements in student achievement can be linked…

Descriptors: Academic Achievement, Tutoring, Student Evaluation, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2

Shepard, Lorrie A.	3
Linn, Robert L.	2
Sireci, Stephen G.	2
Abedi, Jamal	1
Alina A. von Davier	1
An, Lily Shiao	1
Angela Johnson	1
Bakeman, Roger	1
Benson, Jeri	1
Burkett, Ruth S.	1
Burling, Kelly S.	1
Castellano, Katherine E.	1
Chester, Mitchell D.	1
Davis, Laurie Laughlin	1
Deborah J. Harris	1
Downing, Joyce Anderson	1
Downing, Steven M.	1
Elizabeth Barker	1
Furtak, Erin Marie	1
Gorin, Joanna S.	1
Guher Gorgun	1
Haladyna, Thomas M.	1
Herman, Joan L.	1
Ho, Andrew Dean	1
More ▼