ERIC - Search Results

Publication Date

In 2025	4
Since 2024	4
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	14

Descriptor

Evaluation Methods	26
Test Validity	17
Educational Assessment	6
Scores	6
Test Reliability	5
Validity	5
Elementary Secondary Education	4
Error of Measurement	4
Item Response Theory	4
Models	4
Student Evaluation	4
Item Analysis	3
Simulation	3
Test Bias	3
Test Items	3
Academic Standards	2
Accuracy	2
Comparative Analysis	2
Computer Assisted Testing	2
Cutting Scores	2
English Instruction	2
Evaluation Criteria	2
Evaluators	2
Interrater Reliability	2
Mathematics	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	22
Reports - Research	15
Reports - Evaluative	4
Reports - Descriptive	2
Information Analyses	1
Opinion Papers	1
Reports - General	1

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Location

Israel

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	2
Sequential Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Validating Performance Standards via Latent Class Analysis

Peer reviewed

Direct link

Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022

Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…

Descriptors: Validity, Performance, Standards, Multivariate Analysis

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment

Peer reviewed

Direct link

Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022

We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…

Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods

Gathering and Evaluating Validity Evidence: The Generalized Assessment Alignment Tool

Peer reviewed

Direct link

Cizek, Gregory J.; Kosh, Audra E.; Toutkoushian, Emily K. – Journal of Educational Measurement, 2018

Alignment is an essential piece of validity evidence for both educational (K-12) and credentialing (licensure and certification) assessments. In this article, a comprehensive review of commonly used contemporary alignment procedures is provided; some key weaknesses in current alignment approaches are identified; principles for evaluating alignment…

Descriptors: Test Validity, Evidence, Evaluation Methods, Alignment (Education)

An Experimental Study of the Internal Consistency of Judgments Made in Bookmark Standard Setting

Peer reviewed

Direct link

Clauser, Brian E.; Baldwin, Peter; Margolis, Melissa J.; Mee, Janet; Winward, Marcia – Journal of Educational Measurement, 2017

Validating performance standards is challenging and complex. Because of the difficulties associated with collecting evidence related to external criteria, validity arguments rely heavily on evidence related to internal criteria--especially evidence that expert judgments are internally consistent. Given its importance, it is somewhat surprising…

Descriptors: Evaluation Methods, Standard Setting, Cutting Scores, Expertise

Maintaining Equivalent Cut Scores for Small Sample Test Forms

Peer reviewed

Direct link

Dwyer, Andrew C. – Journal of Educational Measurement, 2016

This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…

Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards

Detection of Invalid Test Scores: The Usefulness of Simple Nonparametric Statistics

Peer reviewed

Direct link

Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014

In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…

Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis

An Empirically Based Method of Q-Matrix Validation for the DINA Model: Development and Applications

Peer reviewed

Direct link

de la Torre, Jimmy – Journal of Educational Measurement, 2008

Most model fit analyses in cognitive diagnosis assume that a Q matrix is correct after it has been constructed, without verifying its appropriateness. Consequently, any model misfit attributable to the Q matrix cannot be addressed and remedied. To address this concern, this paper proposes an empirically based method of validating a Q matrix used…

Descriptors: Matrices, Validity, Models, Evaluation Methods

Validation of Group Domain Score Estimates Using a Test of Domain

Peer reviewed

Direct link

Pommerich, Mary – Journal of Educational Measurement, 2006

Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because…

Descriptors: Test Validity, Scores, Simulation, Evaluation Methods

The Validity of Scores from Alternative Methods of Assessing Spelling Achievement.

Peer reviewed

Frisbie, David A.; Cantor, Nancy K. – Journal of Educational Measurement, 1995

Studied the validity of alternative methods for assessing the spelling achievements of students in grades 2 through 7. Results from 760 third graders, 721 fifth graders, and 639 seventh graders indicate that no single objective format stood out above the others, although some demonstrated superiority to the dictation format on several dimensions.…

Descriptors: Dictation, Educational Assessment, Elementary Education, Elementary School Students

The Use of Experimental Design in Educational Evaluation

Peer reviewed

Stufflebeam, Daniel L. – Journal of Educational Measurement, 1971

Descriptors: Data Analysis, Educational Experiments, Evaluation Methods, Individual Differences

Previous Page | Next Page »

Pages: 1 | 2

Amery D. Wu	1
Anderson, Ronald E.	1
Baldwin, Peter	1
Binici, Salih	1
Cantor, Nancy K.	1
Carl Westine	1
Cizek, Gregory J.	1
Clauser, Brian E.	1
Cuhadar, Ismail	1
Dorsey, David W.	1
Dwyer, Andrew C.	1
Emrick, John A.	1
Frisbie, David A.	1
Hamid Mohammadi	1
He, Yinhong	1
Jake Stone	1
Judy, Chester J.	1
Kosh, Audra E.	1
Kylie Gorney	1
Madaus, George F.	1
Margolis, Melissa J.	1
Mark J. Gierl	1
Mee, Janet	1
Meijer, Rob R.	1
More ▼