ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	6

Descriptor

Computer Software	9
Computer Assisted Testing	6
Item Response Theory	4
Test Items	4
Models	3
Psychometrics	3
Simulation	3
Test Construction	3
Adaptive Testing	2
Cognitive Measurement	2
Mathematics	2
Measurement Techniques	2
Response Style (Tests)	2
Scoring	2
Statistical Analysis	2
Student Evaluation	2
Accuracy	1
Achievement Tests	1
Algorithms	1
Artificial Intelligence	1
Automation	1
Bayesian Statistics	1
Classification	1
Computation	1
Computational Linguistics	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	9
Reports - Descriptive	3
Reports - Research	3
Reports - Evaluative	2
Book/Product Reviews	1

Education Level

Secondary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 9 results Save | Export

A Generalized Objective Function for Computer Adaptive Item Selection

Peer reviewed

Direct link

Harold Doran; Testsuhiro Yamada; Ted Diaz; Emre Gonulates; Vanessa Culver – Journal of Educational Measurement, 2025

Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Modeling Skipped and Not-Reached Items Using IRTrees

Peer reviewed

Direct link

Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017

When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…

Descriptors: Item Response Theory, Test Items, Responses, Testing Problems

Item Response Theory Models for Performance Decline during Testing

Peer reviewed

Direct link

Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…

Descriptors: Student Evaluation, Item Response Theory, Models, Simulation

Automated Test Assembly for Cognitive Diagnosis Models Using a Genetic Algorithm

Peer reviewed

Direct link

Finkelman, Matthew; Kim, Wonsuk; Roussos, Louis A. – Journal of Educational Measurement, 2009

Much recent psychometric literature has focused on cognitive diagnosis models (CDMs), a promising class of instruments used to measure the strengths and weaknesses of examinees. This article introduces a genetic algorithm to perform automated test assembly alongside CDMs. The algorithm is flexible in that it can be applied whether the goal is to…

Descriptors: Identification, Genetics, Test Construction, Mathematics

Modeling Diagnostic Assessments with Bayesian Networks

Peer reviewed

Direct link

Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego – Journal of Educational Measurement, 2007

This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…

Descriptors: Inferences, Models, Item Response Theory, Cognitive Measurement

MicroCAT Testing System Version 3.0.

Peer reviewed

Patience, Wayne – Journal of Educational Measurement, 1990

The four main subsystems of the MicroCAT Testing System for developing, administering, scoring, and analyzing computerized tests using conventional or item response theory methods are described. Judgments of three users of the system are included in the evaluation of this software. (SLD)

Descriptors: Adaptive Testing, Computer Assisted Testing, Computer Software, Computer Software Reviews

A SIBTEST Approach to Testing DIF Hypotheses Using Experimentally Designed Test Items.

Peer reviewed

Bolt, Daniel M. – Journal of Educational Measurement, 2000

Reviewed aspects of the SIBTEST procedure through three studies. Study 1 examined the effects of item format using 40 mathematics items from the Scholastic Assessment Test. Study 2 considered the effects of a problem type factor and its interaction with item format for eight items, and study 3 evaluated the degree to which factors varied in the…

Descriptors: Computer Software, Hypothesis Testing, Item Bias, Mathematics

Parameter Recovery in the Graded Response Model Using MULTILOG.

Peer reviewed

Reise, Steve P.; Yu, Jiayuan – Journal of Educational Measurement, 1990

Parameter recovery in the graded-response model was investigated using the MULTILOG computer program under default conditions. Results from 36 simulated data sets suggest that at least 500 examinees are needed to achieve adequate calibration under the graded model. Sample size had little influence on the true ability parameter's recovery. (SLD)

Descriptors: Computer Assisted Testing, Computer Simulation, Computer Software, Estimation (Mathematics)

Alex J. Mechaber	1
Almond, Russell G.	1
Bolt, Daniel M.	1
Brian E. Clauser	1
De Boeck, Paul	1
Debeer, Dries	1
DiBello, Louis V.	1
Emre Gonulates	1
Finkelman, Matthew	1
Harold Doran	1
Janssen, Rianne	1
Jin, Kuan-Yu	1
Kai North	1
Kim, Wonsuk	1
Le An Ha	1
Moulder, Brad	1
Patience, Wayne	1
Peter Baldwin	1
Reise, Steve P.	1
Roussos, Louis A.	1
Ted Diaz	1
Testsuhiro Yamada	1
Vanessa Culver	1
Victoria Yaneva	1
Wang, Wen-Chung	1
More ▼