ERIC - Search Results

Publication Date

In 2025	4
Since 2024	7
Since 2021 (last 5 years)	14
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	32

Descriptor

Test Validity	78
Higher Education	27
Test Items	25
Test Reliability	23
Test Construction	21
Predictive Validity	20
Validity	20
Scores	19
College Entrance Examinations	16
Test Bias	16
Evaluation Methods	15
Item Analysis	15
Achievement Tests	14
Item Response Theory	13
Comparative Analysis	12
Multiple Choice Tests	12
Cognitive Processes	11
Elementary Education	11
High Schools	10
Models	10
Test Interpretation	10
Grade Point Average	9
Scoring	9
Statistical Analysis	9
Test Format	9
More ▼

Source

Journal of Educational…

118

Publication Type

Journal Articles	118
Reports - Research	118
Speeches/Meeting Papers	5
Information Analyses	2
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Higher Education	5
Postsecondary Education	5
Secondary Education	4
Middle Schools	3
Elementary Education	2
Elementary Secondary Education	2
Junior High Schools	2
Grade 7	1
Grade 8	1
High Schools	1

Audience

Researchers

Location

Australia	1
Canada	1
Ireland	1
Israel	1
Jordan	1
United Kingdom	1

Laws, Policies, & Programs

What Works Clearinghouse Rating

Showing 1 to 15 of 118 results Save | Export

A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption

Peer reviewed

Direct link

Sooyong Lee; Suhwa Han; Seung W. Choi – Journal of Educational Measurement, 2024

Research has shown that multiple-indicator multiple-cause (MIMIC) models can result in inflated Type I error rates in detecting differential item functioning (DIF) when the assumption of equal latent variance is violated. This study explains how the violation of the equal variance assumption adversely impacts the detection of nonuniform DIF and…

Descriptors: Factor Analysis, Bayesian Statistics, Test Bias, Item Response Theory

Validating Performance Standards via Latent Class Analysis

Peer reviewed

Direct link

Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022

Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…

Descriptors: Validity, Performance, Standards, Multivariate Analysis

Using a Projection IRT Method for Vertical Scaling When Construct Shift Is Present

Peer reviewed

Direct link

Strachan, Tyler; Cho, Uk Hyun; Kim, Kyung Yong; Willse, John T.; Chen, Shyh-Huei; Ip, Edward H.; Ackerman, Terry A.; Weeks, Jonathan P. – Journal of Educational Measurement, 2021

In vertical scaling, results of tests from several different grade levels are placed on a common scale. Most vertical scaling methodologies rely heavily on the assumption that the construct being measured is unidimensional. In many testing situations, however, such an assumption could be problematic. For instance, the construct measured at one…

Descriptors: Item Response Theory, Scaling, Tests, Construct Validity

IRT Observed-Score Equating for Rater-Mediated Assessments Using a Hierarchical Rater Model

Peer reviewed

Direct link

Tong Wu; Stella Y. Kim; Carl Westine; Michelle Boyer – Journal of Educational Measurement, 2025

While significant attention has been given to test equating to ensure score comparability, limited research has explored equating methods for rater-mediated assessments, where human raters inherently introduce error. If not properly addressed, these errors can undermine score interchangeability and test validity. This study proposes an equating…

Descriptors: Item Response Theory, Evaluators, Error of Measurement, Test Validity

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

Using Automated Procedures to Score Educational Essays Written in Three Languages

Peer reviewed

Direct link

Tahereh Firoozi; Hamid Mohammadi; Mark J. Gierl – Journal of Educational Measurement, 2025

The purpose of this study is to describe and evaluate a multilingual automated essay scoring (AES) system for grading essays in three languages. Two different sentence embedding models were evaluated within the AES system, multilingual BERT (mBERT) and language-agnostic BERT sentence embedding (LaBSE). German, Italian, and Czech essays were…

Descriptors: College Students, Slavic Languages, German, Italian

Does Timed Testing Affect the Interpretation of Efficiency Scores?--A GLMM Analysis of Reading Components

Peer reviewed

Direct link

Frank Goldhammer; Ulf Kroehne; Carolin Hahnel; Johannes Naumann; Paul De Boeck – Journal of Educational Measurement, 2024

The efficiency of cognitive component skills is typically assessed with speeded performance tests. Interpreting only effective ability or effective speed as efficiency may be challenging because of the within-person dependency between both variables (speed-ability tradeoff, SAT). The present study measures efficiency as effective ability…

Descriptors: Timed Tests, Efficiency, Scores, Test Interpretation

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving

Peer reviewed

Direct link

Kaiwen Man; Joni M. Lakin – Journal of Educational Measurement, 2024

Eye-tracking procedures generate copious process data that could be valuable in establishing the response processes component of modern validity theory. However, there is a lack of tools for assessing and visualizing response processes using process data such as eye-tracking fixation sequences, especially those suitable for young children. This…

Descriptors: Problem Solving, Spatial Ability, Task Analysis, Network Analysis

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Toward Argument-Based Fairness with an Application to AI-Enhanced Educational Assessments

Peer reviewed

Direct link

A. Corinne Huggins-Manley; Brandon M. Booth; Sidney K. D'Mello – Journal of Educational Measurement, 2022

The field of educational measurement places validity and fairness as central concepts of assessment quality. Prior research has proposed embedding fairness arguments within argument-based validity processes, particularly when fairness is conceived as comparability in assessment properties across groups. However, we argue that a more flexible…

Descriptors: Educational Assessment, Persuasive Discourse, Validity, Artificial Intelligence

A General Framework for the Validation of Embedded Formative Assessment

Peer reviewed

Direct link

Hopster-den Otter, Dorien; Wools, Saskia; Eggen, Theo J. H. M.; Veldkamp, Bernard P. – Journal of Educational Measurement, 2019

In educational practice, test results are used for several purposes. However, validity research is especially focused on the validity of summative assessment. This article aimed to provide a general framework for validating formative assessment. The authors applied the argument-based approach to validation to the context of formative assessment.…

Descriptors: Formative Evaluation, Test Validity, Scores, Inferences

Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity?

Peer reviewed

Direct link

van Laar, Saskia; Braeken, Johan – Journal of Educational Measurement, 2022

The low-stakes character of international large-scale educational assessments implies that a participating student might at times provide unrelated answers as if s/he was not even reading the items and choosing a response option randomly throughout. Depending on the severity of this invalid response behavior, interpretations of the assessment…

Descriptors: Achievement Tests, Elementary Secondary Education, International Assessment, Foreign Countries

Using Eye-Tracking Data as Part of the Validity Argument for Multiple-Choice Questions: A Demonstration

Peer reviewed

Direct link

Yaneva, Victoria; Clauser, Brian E.; Morales, Amy; Paniagua, Miguel – Journal of Educational Measurement, 2021

Eye-tracking technology can create a record of the location and duration of visual fixations as a test-taker reads test questions. Although the cognitive process the test-taker is using cannot be directly observed, eye-tracking data can support inferences about these unobserved cognitive processes. This type of information has the potential to…

Descriptors: Eye Movements, Test Validity, Multiple Choice Tests, Cognitive Processes

Nonparametric Evidence of Validity, Reliability, and Fairness for Rater-Mediated Assessments: An Illustration Using Mokken Scale Analysis

Peer reviewed

Direct link

Wind, Stefanie A. – Journal of Educational Measurement, 2019

Numerous researchers have proposed methods for evaluating the quality of rater-mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many-facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On…

Descriptors: Nonparametric Statistics, Test Validity, Test Reliability, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Bennett, Randy Elliot	3
Clauser, Brian E.	3
Linn, Robert L.	3
Baldwin, Peter	2
Bejar, Isaac I.	2
Frisbie, David A.	2
Margolis, Melissa J.	2
Rock, Donald A.	2
Wainer, Howard	2
Wind, Stefanie A.	2
Young, John W.	2
A. Corinne Huggins-Manley	1
Ackerman, Terry A.	1
Ahn, Meeyeon	1
Algina, James	1
Allalouf, Avi	1
Amery D. Wu	1
Anderson, Ronald E.	1
Belfry, M. Joan	1
Bell, John F.	1
Ben-Shakhar, Gershon	1
Benson, Jeri	1
Betebenner, Damian	1
Beuchert, A. Kent	1
More ▼

SAT (College Admission Test)	7
Comprehensive Tests of Basic…	3
Graduate Record Examinations	3
Iowa Tests of Basic Skills	2
Peabody Picture Vocabulary…	2
Stanford Achievement Tests	2
ACT Interest Inventory	1
Advanced Placement…	1
Alabama High School…	1
Classroom Environment Scale	1
Differential Aptitude Test	1
Kaufman Assessment Battery…	1
Law School Admission Test	1
McCarthy Scales of Childrens…	1
Metropolitan Achievement Tests	1
Metropolitan Readiness Tests	1
My Class Inventory	1
National Assessment of…	1
Preschool Inventory	1
Program for International…	1
Self Description Questionnaire	1
Teaching and Learning…	1
Test of Standard Written…	1
Trends in International…	1
More ▼