Publication Date
In 2025 | 4 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 7 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 14 |
Descriptor
Evaluation Methods | 26 |
Test Validity | 17 |
Educational Assessment | 6 |
Scores | 6 |
Test Reliability | 5 |
Validity | 5 |
Elementary Secondary Education | 4 |
Error of Measurement | 4 |
Item Response Theory | 4 |
Models | 4 |
Student Evaluation | 4 |
More ▼ |
Source
Journal of Educational… | 26 |
Author
Amery D. Wu | 1 |
Anderson, Ronald E. | 1 |
Baldwin, Peter | 1 |
Binici, Salih | 1 |
Cantor, Nancy K. | 1 |
Carl Westine | 1 |
Cizek, Gregory J. | 1 |
Clauser, Brian E. | 1 |
Cuhadar, Ismail | 1 |
Dorsey, David W. | 1 |
Dwyer, Andrew C. | 1 |
More ▼ |
Publication Type
Journal Articles | 22 |
Reports - Research | 15 |
Reports - Evaluative | 4 |
Reports - Descriptive | 2 |
Information Analyses | 1 |
Opinion Papers | 1 |
Reports - General | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Audience
Location
Israel | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 2 |
Sequential Tests of… | 1 |
What Works Clearinghouse Rating
Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022
Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…
Descriptors: Validity, Performance, Standards, Multivariate Analysis
Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025
While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…
Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity
He, Yinhong – Journal of Educational Measurement, 2023
Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…
Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods
Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025
The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…
Descriptors: College Students, Slavic Languages, German, Italian
Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025
Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…
Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Dorsey, David W.; Michaels, Hillary R. – Journal of Educational Measurement, 2022
We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement--one that has captured our collective interest and imagination. Scientists and practitioners within the domains…
Descriptors: Validity, Ethics, Artificial Intelligence, Evaluation Methods
Cizek, Gregory J.; Kosh, Audra E.; Toutkoushian, Emily K. – Journal of Educational Measurement, 2018
Alignment is an essential piece of validity evidence for both educational (K-12) and credentialing (licensure and certification) assessments. In this article, a comprehensive review of commonly used contemporary alignment procedures is provided; some key weaknesses in current alignment approaches are identified; principles for evaluating alignment…
Descriptors: Test Validity, Evidence, Evaluation Methods, Alignment (Education)
Clauser, Brian E.; Baldwin, Peter; Margolis, Melissa J.; Mee, Janet; Winward, Marcia – Journal of Educational Measurement, 2017
Validating performance standards is challenging and complex. Because of the difficulties associated with collecting evidence related to external criteria, validity arguments rely heavily on evidence related to internal criteria--especially evidence that expert judgments are internally consistent. Given its importance, it is somewhat surprising…
Descriptors: Evaluation Methods, Standard Setting, Cutting Scores, Expertise
Dwyer, Andrew C. – Journal of Educational Measurement, 2016
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards
Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014
In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…
Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis
de la Torre, Jimmy – Journal of Educational Measurement, 2008
Most model fit analyses in cognitive diagnosis assume that a Q matrix is correct after it has been constructed, without verifying its appropriateness. Consequently, any model misfit attributable to the Q matrix cannot be addressed and remedied. To address this concern, this paper proposes an empirically based method of validating a Q matrix used…
Descriptors: Matrices, Validity, Models, Evaluation Methods
Pommerich, Mary – Journal of Educational Measurement, 2006
Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because…
Descriptors: Test Validity, Scores, Simulation, Evaluation Methods

Frisbie, David A.; Cantor, Nancy K. – Journal of Educational Measurement, 1995
Studied the validity of alternative methods for assessing the spelling achievements of students in grades 2 through 7. Results from 760 third graders, 721 fifth graders, and 639 seventh graders indicate that no single objective format stood out above the others, although some demonstrated superiority to the dictation format on several dimensions.…
Descriptors: Dictation, Educational Assessment, Elementary Education, Elementary School Students

Stufflebeam, Daniel L. – Journal of Educational Measurement, 1971
Descriptors: Data Analysis, Educational Experiments, Evaluation Methods, Individual Differences
Previous Page | Next Page ยป
Pages: 1 | 2