ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	11

Source

Journal of Educational…

Publication Type

Journal Articles	22
Reports - Research	22

Education Level

Elementary Education	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing 1 to 15 of 22 results Save | Export

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Assessing the Impact of Equating Error on Group Means and Group Mean Differences

Peer reviewed

Direct link

Li, Dongmei – Journal of Educational Measurement, 2022

Equating error is usually small relative to the magnitude of measurement error, but it could be one of the major sources of error contributing to mean scores of large groups in educational measurement, such as the year-to-year state mean score fluctuations. Though testing programs may routinely calculate the standard error of equating (SEE), the…

Descriptors: Error Patterns, Educational Testing, Group Testing, Statistical Analysis

Evaluation of Factors Affecting the Performance of the "S - X[superscript 2]" Item-Fit Index

Peer reviewed

Direct link

Kim, Hyung Jin; Lee, Won-Chan – Journal of Educational Measurement, 2022

Orlando and Thissen (2000) introduced the "S - X[superscript 2]" item-fit index for testing goodness-of-fit with dichotomous item response theory (IRT) models. This study considers and evaluates an alternative approach for computing "S - X[superscript 2]" values and other factors associated with collapsing tables of observed…

Descriptors: Goodness of Fit, Test Items, Item Response Theory, Computation

Modeling Nonlinear Effects of Person-by-Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions

Peer reviewed

Direct link

Sun-Joo Cho; Amanda Goodwin; Matthew Naveiras; Paul De Boeck – Journal of Educational Measurement, 2024

Explanatory item response models (EIRMs) have been applied to investigate the effects of person covariates, item covariates, and their interactions in the fields of reading education and psycholinguistics. In practice, it is often assumed that the relationships between the covariates and the logit transformation of item response probability are…

Descriptors: Item Response Theory, Test Items, Models, Maximum Likelihood Statistics

Differential and Functional Response Time Item Analysis: An Application to Understanding Paper versus Digital Reading Processes

Peer reviewed

Direct link

Sun-Joo Cho; Amanda Goodwin; Matthew Naveiras; Jorge Salas – Journal of Educational Measurement, 2024

Despite the growing interest in incorporating response time data into item response models, there has been a lack of research investigating how the effect of speed on the probability of a correct response varies across different groups (e.g., experimental conditions) for various items (i.e., differential response time item analysis). Furthermore,…

Descriptors: Item Response Theory, Reaction Time, Models, Accuracy

Effectiveness of Equating at the Passing Score for Exams with Small Sample Sizes

Peer reviewed

Direct link

Wolkowitz, Amanda A.; Wright, Keith D. – Journal of Educational Measurement, 2019

This article explores the amount of equating error at a passing score when equating scores from exams with small samples sizes. This article focuses on equating using classical test theory methods of Tucker linear, Levine linear, frequency estimation, and chained equipercentile equating. Both simulation and real data studies were used in the…

Descriptors: Error Patterns, Sample Size, Test Theory, Test Bias

Standard Errors of IRT Parameter Scale Transformation Coefficients: Comparison of Bootstrap Method, Delta Method, and Multiple Imputation Method

Peer reviewed

Direct link

Zhang, Zhonghua; Zhao, Mingren – Journal of Educational Measurement, 2019

The present study evaluated the multiple imputation method, a procedure that is similar to the one suggested by Li and Lissitz (2004), and compared the performance of this method with that of the bootstrap method and the delta method in obtaining the standard errors for the estimates of the parameter scale transformation coefficients in item…

Descriptors: Item Response Theory, Error Patterns, Item Analysis, Simulation

The Effects of Incomplete Rating Designs in Combination with Rater Effects

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019

Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…

Descriptors: Rating Scales, Models, Evaluators, Data Collection

Can We Learn from Student Mistakes in a Formative, Reading Comprehension Assessment?

Peer reviewed

Direct link

Liu, Bowen; Kennedy, Patrick C.; Seipel, Ben; Carlson, Sarah E.; Biancarosa, Gina; Davison, Mark L. – Journal of Educational Measurement, 2019

This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low-scoring students based on patterns of mistakes,…

Descriptors: Formative Evaluation, Reading Comprehension, Story Reading, Test Construction

An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

Peer reviewed

Direct link

Liang, Tie; Wells, Craig S.; Hambleton, Ronald K. – Journal of Educational Measurement, 2014

As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…

Descriptors: Item Response Theory, Measurement Techniques, Nonparametric Statistics, Models

Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

Peer reviewed

Direct link

Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna – Journal of Educational Measurement, 2014

Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study…

Descriptors: Test Bias, Models, Simulation, Error Patterns

A Comparative Study of Indices for Internal Consistency.

Peer reviewed

Cudeck, Robert – Journal of Educational Measurement, 1980

Methods for evaluating the consistency of responses to test items were compared. When a researcher is unwilling to make the assumptions of classical test theory, has only a small number of items, or is in a tailored testing context, Cliff's dominance indices may be useful. (Author/CTM)

Descriptors: Error Patterns, Item Analysis, Test Items, Test Reliability

Components of Rater Error in a Complex Performance Assessment.

Peer reviewed

Clauser, Brian E.; Clyman, Stephen G.; Swanson, David B. – Journal of Educational Measurement, 1999

Two studies focused on aspects of the rating process in performance assessment. The first, which involved 15 raters and about 400 medical students, made the "committee" facet of raters working in groups explicit, and the second, which involved about 200 medical students and four raters, made the "rating-occasion" facet…

Descriptors: Error Patterns, Evaluation Methods, Evaluators, Higher Education

Spotting Erroneous Rules of Operation by the Individual Consistency Index.

Peer reviewed

Tatsuoka, Kikumi K.; Tatsuoka, Maurice M. – Journal of Educational Measurement, 1983

This study introduces the individual consistency index (ICI), which measures the extent to which patterns of responses to parallel sets of items remain consistent over time. ICI is used as an error diagnostic tool to detect aberrant response patterns resulting from the consistent application of erroneous rules of operation. (Author/PN)

Descriptors: Achievement Tests, Algorithms, Error Patterns, Measurement Techniques

Sex Differences in Self-Report Errors: A Note of Caution.

Peer reviewed

Hamilton, Lawrence C. – Journal of Educational Measurement, 1981

Errors in self-reports of three academic performance measures are analyzed. Empirical errors are shown to depart radically from both no-error and random-error assumptions. Self-reports by females depart farther from the no-error and random-error models for all three performance measures. (Author/BW)

Descriptors: Academic Achievement, Error Patterns, Grade Point Average, Models

Previous Page | Next Page »

Pages: 1 | 2

Error Patterns	22
Item Response Theory	6
Models	6
Simulation	5
Test Items	5
Computer Assisted Testing	4
Goodness of Fit	4
Item Analysis	4
Latent Trait Theory	4
Measurement Techniques	4
Sample Size	4
Scores	4
Accuracy	3
Computation	3
Evaluation Methods	3
Evaluators	3
Higher Education	3
Maximum Likelihood Statistics	3
Response Style (Tests)	3
Scoring	3
Test Reliability	3
Test Theory	3
Algorithms	2
Comparative Analysis	2
Diagnostic Tests	2
More ▼

Tatsuoka, Kikumi K.	4
Amanda Goodwin	2
Matthew Naveiras	2
Sun-Joo Cho	2
Alex J. Mechaber	1
Biancarosa, Gina	1
Birenbaum, Menucha	1
Brian E. Clauser	1
Carlson, Sarah E.	1
Clauser, Brian E.	1
Clyman, Stephen G.	1
Cudeck, Robert	1
Davison, Mark L.	1
Hambleton, Ronald K.	1
Hamilton, Lawrence C.	1
Hou, Likun	1
Jones, Eli	1
Jorge Salas	1
Kai North	1
Kennedy, Patrick C.	1
Kim, Hyung Jin	1
Le An Ha	1
Lee, Won-Chan	1
Li, Dongmei	1
More ▼