ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	15
Since 2006 (last 20 years)	25

Descriptor

Tests	44
Item Response Theory	12
Models	11
Comparative Analysis	9
Scores	9
Statistical Analysis	8
Test Validity	8
Test Items	7
Equated Scores	6
Test Construction	6
Test Reliability	6
Accuracy	5
Correlation	5
Error of Measurement	5
Evaluation Methods	5
Simulation	5
Test Bias	5
Test Interpretation	5
Maximum Likelihood Statistics	4
Tables (Data)	4
Test Use	4
Computation	3
Data Analysis	3
Difficulty Level	3
Identification	3
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	31
Reports - Research	19
Reports - Evaluative	7
Reports - Descriptive	3
Book/Product Reviews	2

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

California Test of Mental…	1
Purdue Teacher Opinionaire	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 44 results Save | Export

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Robust Estimation for Response Time Modeling

Peer reviewed

Direct link

Hong, Maxwell; Rebouças, Daniella A.; Cheng, Ying – Journal of Educational Measurement, 2021

Response time has started to play an increasingly important role in educational and psychological testing, which prompts many response time models to be proposed in recent years. However, response time modeling can be adversely impacted by aberrant response behavior. For example, test speededness can cause response time to certain items to deviate…

Descriptors: Reaction Time, Models, Computation, Robustness (Statistics)

Generating Models for Item Preknowledge

Peer reviewed

Direct link

Gorney, Kylie; Wollack, James A. – Journal of Educational Measurement, 2022

Detection methods for item preknowledge are often evaluated in simulation studies where models are used to generate the data. To ensure the reliability of such methods, it is crucial that these models are able to accurately represent situations that are encountered in practice. The purpose of this article is to provide a critical analysis of…

Descriptors: Prior Learning, Simulation, Models, Reaction Time

A General Framework for the Validation of Embedded Formative Assessment

Peer reviewed

Direct link

Hopster-den Otter, Dorien; Wools, Saskia; Eggen, Theo J. H. M.; Veldkamp, Bernard P. – Journal of Educational Measurement, 2019

In educational practice, test results are used for several purposes. However, validity research is especially focused on the validity of summative assessment. This article aimed to provide a general framework for validating formative assessment. The authors applied the argument-based approach to validation to the context of formative assessment.…

Descriptors: Formative Evaluation, Test Validity, Scores, Inferences

Using a Projection IRT Method for Vertical Scaling When Construct Shift Is Present

Peer reviewed

Direct link

Strachan, Tyler; Cho, Uk Hyun; Kim, Kyung Yong; Willse, John T.; Chen, Shyh-Huei; Ip, Edward H.; Ackerman, Terry A.; Weeks, Jonathan P. – Journal of Educational Measurement, 2021

In vertical scaling, results of tests from several different grade levels are placed on a common scale. Most vertical scaling methodologies rely heavily on the assumption that the construct being measured is unidimensional. In many testing situations, however, such an assumption could be problematic. For instance, the construct measured at one…

Descriptors: Item Response Theory, Scaling, Tests, Construct Validity

Accounting for Rater Effects with the Hierarchical Rater Model Framework When Scoring Simple Structured Constructed Response Tests

Peer reviewed

Direct link

Nieto, Ricardo; Casabianca, Jodi M. – Journal of Educational Measurement, 2019

Many large-scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple-choice and/or constructed responses sections of items to generate multiple…

Descriptors: Tests, Scoring, Responses, Test Items

Confidence Intervals for Weighted Composite Scores under the Compound Binomial Error Model

Peer reviewed

Direct link

Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2018

Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly…

Descriptors: Weighted Scores, Error of Measurement, Test Use, Decision Making

Further Study of the Choice of Anchor Tests in Equating

Peer reviewed

Direct link

Trierweiler, Tammy J.; Lewis, Charles; Smith, Robert L. – Journal of Educational Measurement, 2016

In this study, we describe what factors influence the observed score correlation between an (external) anchor test and a total test. We show that the anchor to full-test observed score correlation is based on two components: the true score correlation between the anchor and total test, and the reliability of the anchor test. Findings using an…

Descriptors: Scores, Correlation, Tests, Test Reliability

Robust Estimation of Ability and Mental Speed Employing the Hierarchical Model for Responses and Response Times

Peer reviewed

Direct link

Ranger, Jochen; Kuhn, Jörg-Tobias; Wolgast, Anett – Journal of Educational Measurement, 2021

Van der Linden's hierarchical model for responses and response times can be used in order to infer the ability and mental speed of test takers from their responses and response times in an educational test. A standard approach for this is maximum likelihood estimation. In real-world applications, the data of some test takers might be partly…

Descriptors: Models, Reaction Time, Item Response Theory, Tests

Pedagogical Considerations for Examining Rater Variability in Rater-Mediated Assessments: A Three-Model Framework

Peer reviewed

Direct link

Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019

Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…

Descriptors: Interrater Reliability, Models, Observation, Measurement

Use of Data Mining Methods to Detect Test Fraud

Peer reviewed

Direct link

Man, Kaiwen; Harring, Jeffrey R.; Sinharay, Sandip – Journal of Educational Measurement, 2019

Data mining methods have drawn considerable attention across diverse scientific fields. However, few applications could be found in the areas of psychological and educational measurement, and particularly pertinent to this article, in test security research. In this study, various data mining methods for detecting cheating behaviors on large-scale…

Descriptors: Information Retrieval, Data Analysis, Identification, Tests

Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

Peer reviewed

Direct link

Andersson, Björn – Journal of Educational Measurement, 2016

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…

Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests

A New Statistic for Detection of Aberrant Answer Changes

Peer reviewed

Direct link

Sinharay, Sandip; Duong, Minh Q.; Wood, Scott W. – Journal of Educational Measurement, 2017

As noted by Fremer and Olson, analysis of answer changes is often used to investigate testing irregularities because the analysis is readily performed and has proven its value in practice. Researchers such as Belov, Sinharay and Johnson, van der Linden and Jeon, van der Linden and Lewis, and Wollack, Cohen, and Eckerly have suggested several…

Descriptors: Identification, Statistics, Change, Tests

Addressing the Extreme Assumptions of Presumed Linkings

Peer reviewed

Direct link

Dorans, Neil J.; Middleton, Kyndra – Journal of Educational Measurement, 2012

The interpretability of score comparisons depends on the design and execution of a sound data collection plan and the establishment of linkings between these scores. When comparisons are made between scores from two or more assessments that are built to different specifications and are administered to different populations under different…

Descriptors: Tests, Equated Scores, Test Interpretation, Validity

Previous Page | Next Page »

Pages: 1 | 2 | 3

Kim, Kyung Yong	2
Sinharay, Sandip	2
Ackerman, Terry A.	1
Alexeev, Natalia	1
Amery D. Wu	1
Andersson, Björn	1
Baird, Leonard L.	1
Bleiler, Timothy	1
Bollenbacher, Joan	1
Bolt, Daniel M.	1
Casabianca, Jodi M.	1
Cashen, Valjean M.	1
Chen, Haiwen	1
Chen, Shyh-Huei	1
Cheng, Ying	1
Cho, Uk Hyun	1
Cohen, Allan S.	1
Corman, Myron N.	1
DeMars, Christine E.	1
Doherty, William J.	1
Dorans, Neil J.	1
Duong, Minh Q.	1
Eggen, Theo J. H. M.	1
Feldt, Leonard S.	1
More ▼