ERIC - Search Results

Publication Date

In 2025	4
Since 2024	4
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	19
Since 2006 (last 20 years)	23

Descriptor

Test Validity	165
Test Reliability	67
Test Construction	46
Test Items	28
Multiple Choice Tests	27
Item Analysis	26
Achievement Tests	24
Higher Education	24
Test Interpretation	18
Testing Problems	18
Comparative Analysis	17
Evaluation Methods	17
Scores	17
Scoring	16
Test Bias	15
Criterion Referenced Tests	14
Cognitive Processes	12
Elementary Education	11
Mathematical Models	11
Measurement Techniques	11
Response Style (Tests)	11
Test Use	11
Academic Achievement	10
Error of Measurement	10
Guessing (Tests)	10
More ▼

Source

Journal of Educational…

165

Publication Type

Journal Articles	103
Reports - Research	78
Reports - Evaluative	17
Information Analyses	6
Opinion Papers	4
Speeches/Meeting Papers	3
Reports - Descriptive	2
Book/Product Reviews	1
Reports - General	1
Tests/Questionnaires	1

Education Level

Higher Education	5
Postsecondary Education	5
Secondary Education	1

Audience

Researchers	5
Practitioners	2

Location

Canada	2
Australia	1
Israel	1
Jordan	1

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 165 results Save | Export

A Note on Latent Traits Estimates under IRT Models with Missingness

Peer reviewed

Direct link

Guo, Jinxin; Xu, Xin; Xin, Tao – Journal of Educational Measurement, 2023

Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based…

Descriptors: Psychometrics, Bias, Error of Measurement, Test Validity

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

A General Framework for the Validation of Embedded Formative Assessment

Peer reviewed

Direct link

Hopster-den Otter, Dorien; Wools, Saskia; Eggen, Theo J. H. M.; Veldkamp, Bernard P. – Journal of Educational Measurement, 2019

In educational practice, test results are used for several purposes. However, validity research is especially focused on the validity of summative assessment. This article aimed to provide a general framework for validating formative assessment. The authors applied the argument-based approach to validation to the context of formative assessment.…

Descriptors: Formative Evaluation, Test Validity, Scores, Inferences

Gathering and Evaluating Validity Evidence: The Generalized Assessment Alignment Tool

Peer reviewed

Direct link

Cizek, Gregory J.; Kosh, Audra E.; Toutkoushian, Emily K. – Journal of Educational Measurement, 2018

Alignment is an essential piece of validity evidence for both educational (K-12) and credentialing (licensure and certification) assessments. In this article, a comprehensive review of commonly used contemporary alignment procedures is provided; some key weaknesses in current alignment approaches are identified; principles for evaluating alignment…

Descriptors: Test Validity, Evidence, Evaluation Methods, Alignment (Education)

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions: A Demonstration

Peer reviewed

Direct link

Yaneva, Victoria; Clauser, Brian E.; Morales, Amy; Paniagua, Miguel – Journal of Educational Measurement, 2021

Eye-tracking technology can create a record of the location and duration of visual fixations as a test-taker reads test questions. Although the cognitive process the test-taker is using cannot be directly observed, eye-tracking data can support inferences about these unobserved cognitive processes. This type of information has the potential to…

Descriptors: Eye Movements, Test Validity, Multiple Choice Tests, Cognitive Processes

Nonparametric Evidence of Validity, Reliability, and Fairness for Rater-Mediated Assessments: An Illustration Using Mokken Scale Analysis

Peer reviewed

Direct link

Wind, Stefanie A. – Journal of Educational Measurement, 2019

Numerous researchers have proposed methods for evaluating the quality of rater-mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many-facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On…

Descriptors: Nonparametric Statistics, Test Validity, Test Reliability, Item Response Theory

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

Exploring the Influence of Judge Proficiency on Standard-Setting Judgments

Peer reviewed

Direct link

Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…

Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators

An Experimental Study of the Internal Consistency of Judgments Made in Bookmark Standard Setting

Peer reviewed

Direct link

Clauser, Brian E.; Baldwin, Peter; Margolis, Melissa J.; Mee, Janet; Winward, Marcia – Journal of Educational Measurement, 2017

Validating performance standards is challenging and complex. Because of the difficulties associated with collecting evidence related to external criteria, validity arguments rely heavily on evidence related to internal criteria--especially evidence that expert judgments are internally consistent. Given its importance, it is somewhat surprising…

Descriptors: Evaluation Methods, Standard Setting, Cutting Scores, Expertise

A Comparison of Experimental and Observational Approaches to Assessing the Effects of Time Constraints in a Medical Licensing Examination

Peer reviewed

Direct link

Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Baldwin, Peter; Margolis, Melissa J.; Bucak, Deniz; Jodoin, Michael; Walsh, William; Haist, Steven – Journal of Educational Measurement, 2018

Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how…

Descriptors: Comparative Analysis, Observation, Medical Education, Licensing Examinations (Professions)

Can We Learn from Student Mistakes in a Formative, Reading Comprehension Assessment?

Peer reviewed

Direct link

Liu, Bowen; Kennedy, Patrick C.; Seipel, Ben; Carlson, Sarah E.; Biancarosa, Gina; Davison, Mark L. – Journal of Educational Measurement, 2019

This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low-scoring students based on patterns of mistakes,…

Descriptors: Formative Evaluation, Reading Comprehension, Story Reading, Test Construction

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

Clauser, Brian E.	3
Hanna, Gerald S.	3
Wainer, Howard	3
Baldwin, Peter	2
Bennett, Randy Elliot	2
Brandenburg, Dale C.	2
Ebel, Robert L.	2
Farr, Roger	2
Fitzpatrick, Anne R.	2
Hakstian, A. Ralph	2
Hambleton, Ronald K.	2
Kansup, Wanlop	2
Koehler, Roger A.	2
Leinhardt, Gaea	2
Margolis, Melissa J.	2
Sawyer, Richard	2
Seewald, Andrea Mar	2
Shavelson, Richard J.	2
Tuckman, Bruce W.	2
Whitney, Douglas R.	2
Wind, Stefanie A.	2
Abeles, Harold F.	1
Ackerman, Terry A.	1
Airasian, Peter W.	1
More ▼

SAT (College Admission Test)	4
Comprehensive Tests of Basic…	3
Stanford Achievement Tests	3
Differential Aptitude Test	2
Graduate Record Examinations	2
Iowa Tests of Basic Skills	2
Peabody Picture Vocabulary…	2
ACT Interest Inventory	1
Alabama High School…	1
Classroom Environment Scale	1
College and University…	1
General Aptitude Test Battery	1
Kaufman Assessment Battery…	1
Law School Admission Test	1
Metropolitan Achievement Tests	1
My Class Inventory	1
National Assessment of…	1
National Teacher Examinations	1
Program for International…	1
Remote Associates Test	1
Sarason Test Anxiety Scale…	1
Self Description Questionnaire	1
Sequential Tests of…	1
System of Multicultural…	1
Test of Standard Written…	1
More ▼