ERIC - Search Results

Publication Date

In 2025	4
Since 2024	8
Since 2021 (last 5 years)	19
Since 2016 (last 10 years)	35
Since 2006 (last 20 years)	57

Descriptor

Test Validity	165
Test Reliability	68
Test Construction	52
Validity	52
Higher Education	36
Test Items	35
Predictive Validity	33
Scores	33
Item Analysis	31
Test Interpretation	30
Test Bias	29
Achievement Tests	28
Multiple Choice Tests	28
Evaluation Methods	26
Comparative Analysis	24
Scoring	23
Item Response Theory	21
Testing Problems	21
Models	20
Test Use	20
College Entrance Examinations	18
Measurement Techniques	18
Correlation	16
Academic Achievement	15
Criterion Referenced Tests	15
More ▼

Source

Journal of Educational…

252

Publication Type

Journal Articles	173
Reports - Research	118
Reports - Evaluative	30
Opinion Papers	14
Reports - Descriptive	10
Information Analyses	7
Speeches/Meeting Papers	7
Book/Product Reviews	1
Reports - General	1
Tests/Questionnaires	1

Education Level

Higher Education	6
Postsecondary Education	6
Secondary Education	4
Middle Schools	3
Elementary Education	2
Elementary Secondary Education	2
Junior High Schools	2
Grade 7	1
Grade 8	1
High Schools	1

Audience

Researchers	7
Practitioners	2

Location

Canada	2
Australia	1
Ireland	1
Israel	1
Jordan	1
United Kingdom	1

Laws, Policies, & Programs

What Works Clearinghouse Rating

Journal of Educational Measurement X

Showing 1 to 15 of 252 results Save | Export

Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration

Peer reviewed

Direct link

Daria Gerasimova – Journal of Educational Measurement, 2024

I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document…

Descriptors: Validity, Documentation, Methods, Research Reports

A Note on Latent Traits Estimates under IRT Models with Missingness

Peer reviewed

Direct link

Guo, Jinxin; Xu, Xin; Xin, Tao – Journal of Educational Measurement, 2023

Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based…

Descriptors: Psychometrics, Bias, Error of Measurement, Test Validity

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Peer reviewed

Direct link

Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024

Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…

Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory

Validity Arguments for AI-Based Automated Scores: Essay Scoring as an Illustration

Peer reviewed

Direct link

Ferrara, Steve; Qunbar, Saed – Journal of Educational Measurement, 2022

In this article, we argue that automated scoring engines should be transparent and construct relevant--that is, as much as is currently feasible. Many current automated scoring engines cannot achieve high degrees of scoring accuracy without allowing in some features that may not be easily explained and understood and may not be obviously and…

Descriptors: Artificial Intelligence, Scoring, Essays, Automation

Validating Performance Standards via Latent Class Analysis

Peer reviewed

Direct link

Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022

Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…

Descriptors: Validity, Performance, Standards, Multivariate Analysis

Using a Projection IRT Method for Vertical Scaling When Construct Shift Is Present

Peer reviewed

Direct link

Strachan, Tyler; Cho, Uk Hyun; Kim, Kyung Yong; Willse, John T.; Chen, Shyh-Huei; Ip, Edward H.; Ackerman, Terry A.; Weeks, Jonathan P. – Journal of Educational Measurement, 2021

In vertical scaling, results of tests from several different grade levels are placed on a common scale. Most vertical scaling methodologies rely heavily on the assumption that the construct being measured is unidimensional. In many testing situations, however, such an assumption could be problematic. For instance, the construct measured at one…

Descriptors: Item Response Theory, Scaling, Tests, Construct Validity

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Anchoring Validity Evidence for Automated Essay Scoring

Peer reviewed

Direct link

Shermis, Mark D. – Journal of Educational Measurement, 2022

One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…

Descriptors: Scoring, Essays, Validity, Writing Evaluation

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Does Timed Testing Affect the Interpretation of Efficiency Scores?--A GLMM Analysis of Reading Components

Peer reviewed

Direct link

Frank Goldhammer; Ulf Kroehne; Carolin Hahnel; Johannes Naumann; Paul De Boeck – Journal of Educational Measurement, 2024

The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability…

Descriptors: Timed Tests, Efficiency, Scores, Test Interpretation

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving

Peer reviewed

Direct link

Kaiwen Man; Joni M. Lakin – Journal of Educational Measurement, 2024

Eye-tracking procedures generate copious process data that could be valuable in establishing the response processes component of modern validity theory. However, there is a lack of tools for assessing and visualizing response processes using process data such as eye-tracking fixation sequences, especially those suitable for young children. This…

Descriptors: Problem Solving, Spatial Ability, Task Analysis, Network Analysis

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments

Peer reviewed

Direct link

A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022

The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…

Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 17

Bennett, Randy Elliot	4
Wainer, Howard	4
Whitney, Douglas R.	4
Clauser, Brian E.	3
Goldman, Roy D.	3
Hanna, Gerald S.	3
Kane, Michael T.	3
Linn, Robert L.	3
Novick, Melvin R.	3
Ackerman, Terry A.	2
Airasian, Peter W.	2
Algina, James	2
Baldwin, Peter	2
Bejar, Isaac I.	2
Brandenburg, Dale C.	2
Chang, Hua-Hua	2
Ebel, Robert L.	2
Embretson, Susan	2
Farr, Roger	2
Fitzpatrick, Anne R.	2
Frisbie, David A.	2
Haertel, Edward	2
Hakstian, A. Ralph	2
Hambleton, Ronald K.	2
More ▼

SAT (College Admission Test)	11
Comprehensive Tests of Basic…	3
Graduate Record Examinations	3
Stanford Achievement Tests	3
Differential Aptitude Test	2
Iowa Tests of Basic Skills	2
National Assessment of…	2
Peabody Picture Vocabulary…	2
ACT Interest Inventory	1
Advanced Placement…	1
Alabama High School…	1
Classroom Environment Scale	1
College and University…	1
General Aptitude Test Battery	1
Kaufman Assessment Battery…	1
Law School Admission Test	1
Lexile Scale of Reading	1
McCarthy Scales of Childrens…	1
Metropolitan Achievement Tests	1
Metropolitan Readiness Tests	1
My Class Inventory	1
National Teacher Examinations	1
Preschool Inventory	1
Program for International…	1
Remote Associates Test	1
More ▼