ERIC - Search Results

Publication Date

In 2025	4
Since 2024	4
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	10

Descriptor

Evaluation Methods	17
Test Validity	17
Scores	6
Test Reliability	5
Educational Assessment	3
Elementary Secondary Education	3
Error of Measurement	3
Item Response Theory	3
Student Evaluation	3
Accuracy	2
Comparative Analysis	2
Cutting Scores	2
English Instruction	2
Evaluators	2
Interrater Reliability	2
Item Analysis	2
Models	2
Simulation	2
Test Bias	2
Test Interpretation	2
Test Items	2
Tests	2
Writing Skills	2
Academic Standards	1
Achievement	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	15
Reports - Research	13
Information Analyses	1
Reports - Evaluative	1
Reports - General	1

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Location

Israel

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Sequential Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Gathering and Evaluating Validity Evidence: The Generalized Assessment Alignment Tool

Peer reviewed

Direct link

Cizek, Gregory J.; Kosh, Audra E.; Toutkoushian, Emily K. – Journal of Educational Measurement, 2018

Alignment is an essential piece of validity evidence for both educational (K-12) and credentialing (licensure and certification) assessments. In this article, a comprehensive review of commonly used contemporary alignment procedures is provided; some key weaknesses in current alignment approaches are identified; principles for evaluating alignment…

Descriptors: Test Validity, Evidence, Evaluation Methods, Alignment (Education)

An Experimental Study of the Internal Consistency of Judgments Made in Bookmark Standard Setting

Peer reviewed

Direct link

Clauser, Brian E.; Baldwin, Peter; Margolis, Melissa J.; Mee, Janet; Winward, Marcia – Journal of Educational Measurement, 2017

Validating performance standards is challenging and complex. Because of the difficulties associated with collecting evidence related to external criteria, validity arguments rely heavily on evidence related to internal criteria--especially evidence that expert judgments are internally consistent. Given its importance, it is somewhat surprising…

Descriptors: Evaluation Methods, Standard Setting, Cutting Scores, Expertise

Maintaining Equivalent Cut Scores for Small Sample Test Forms

Peer reviewed

Direct link

Dwyer, Andrew C. – Journal of Educational Measurement, 2016

This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…

Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards

Detection of Invalid Test Scores: The Usefulness of Simple Nonparametric Statistics

Peer reviewed

Direct link

Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014

In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…

Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis

Validation of Group Domain Score Estimates Using a Test of Domain

Peer reviewed

Direct link

Pommerich, Mary – Journal of Educational Measurement, 2006

Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because…

Descriptors: Test Validity, Scores, Simulation, Evaluation Methods

Face Validity Revisited.

Peer reviewed

Nevo, Baruch – Journal of Educational Measurement, 1985

A literature review and a proposed means of measuring face validity, a test's appearance of being valid, are presented. Empirical evidence from examinees' perceptions of a college entrance examination support the reliability of measuring face validity. (GDC)

Descriptors: College Entrance Examinations, Evaluation Methods, Evaluators, Foreign Countries

Content Biases in Achievement Tests.

Peer reviewed

Schmidt, William H. – Journal of Educational Measurement, 1983

A conception of invalidity as bias is related to content validity for standardized achievement tests. A method of estimating content bias for each of three content domains (a priori, curricular, and instructional) based on the specification of a content taxonomy is also proposed. (Author/CM)

Descriptors: Achievement Tests, Content Analysis, Evaluation Methods, Instruction

Methodological Considerations in the Development of Indicators of Achievement in Data from the National Assessment.

Peer reviewed

Anderson, Ronald E.; And Others – Journal of Educational Measurement, 1982

Findings on alternative procedures for evaluating measures of achievement in individual data packages at the National Assessment of Educational Progress are presented with their methodological implications. The need for secondary analysts to be aware of the organization of the data, and positive and negative features are discussed. (Author/CM)

Descriptors: Achievement, Databases, Educational Assessment, Elementary Secondary Education

An Evaluation Model for Mastery Testing

Peer reviewed

Emrick, John A. – Journal of Educational Measurement, 1971

Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation Methods, Item Analysis

A Comparison of Procedures to Assess Written Language Skills at Grades 4, 7, and 10.

Peer reviewed

Moss, Pamela A.; And Others – Journal of Educational Measurement, 1982

Scores on a multiple-choice language test involving recognition of language errors were related to those on writing samples, scored atomistically for the same language errors and holistically for communicative effectiveness and correctness. Results suggest the need for clear limits in generalizing from one assessment to others. (Author/GK)

Descriptors: Comparative Analysis, Elementary Secondary Education, Evaluation Methods, Grade 10

Previous Page | Next Page »

Pages: 1 | 2

Amery D. Wu	1
Anderson, Ronald E.	1
Baldwin, Peter	1
Carl Westine	1
Cizek, Gregory J.	1
Clauser, Brian E.	1
Dwyer, Andrew C.	1
Emrick, John A.	1
Hamid Mohammadi	1
He, Yinhong	1
Jake Stone	1
Kosh, Audra E.	1
Kylie Gorney	1
Madaus, George F.	1
Margolis, Melissa J.	1
Mark J. Gierl	1
Mee, Janet	1
Meijer, Rob R.	1
Michelle Boyer	1
Moss, Pamela A.	1
Nevo, Baruch	1
Pommerich, Mary	1
Rippey, Robert M.	1
Sandip Sinharay	1
More ▼