ERIC - Search Results

Publication Date

In 2025	3
Since 2024	3
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	15

Source

Journal of Educational…

Publication Type

Journal Articles	15
Reports - Research	11
Reports - Descriptive	4

Education Level

Elementary Education	1
High Schools	1
Higher Education	1
Middle Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Teachers

Location

China

Laws, Policies, & Programs

Assessments and Surveys

Program for International…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

A Generalized Objective Function for Computer Adaptive Item Selection

Peer reviewed

Direct link

Harold Doran; Testsuhiro Yamada; Ted Diaz; Emre Gonulates; Vanessa Culver – Journal of Educational Measurement, 2025

Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms

The Vulnerability of AI-Based Scoring Systems to Gaming Strategies: A Case Study

Peer reviewed

Direct link

Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025

Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…

Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy

Modeling Directional Testlet Effects on Multiple Open-Ended Questions

Peer reviewed

Direct link

Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025

Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…

Descriptors: Models, Test Items, Educational Assessment, Scores

Anchoring Validity Evidence for Automated Essay Scoring

Peer reviewed

Direct link

Shermis, Mark D. – Journal of Educational Measurement, 2022

One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…

Descriptors: Scoring, Essays, Validity, Writing Evaluation

Bayesian Model Selection Methods for Multilevel IRT Models: A Comparison of Five DIC-Based Indices

Peer reviewed

Direct link

Zhang, Xue; Tao, Jian; Wang, Chun; Shi, Ning-Zhong – Journal of Educational Measurement, 2019

Model selection is important in any statistical analysis, and the primary goal is to find the preferred (or most parsimonious) model, based on certain criteria, from a set of candidate models given data. Several recent publications have employed the deviance information criterion (DIC) to do model selection among different forms of multilevel item…

Descriptors: Bayesian Statistics, Item Response Theory, Measurement, Models

Modeling Skipped and Not-Reached Items Using IRTrees

Peer reviewed

Direct link

Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017

When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…

Descriptors: Item Response Theory, Test Items, Responses, Testing Problems

Item Response Theory Models for Performance Decline during Testing

Peer reviewed

Direct link

Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014

Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…

Descriptors: Student Evaluation, Item Response Theory, Models, Simulation

Item Response Models for Examinee-Selected Items

Peer reviewed

Direct link

Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei – Journal of Educational Measurement, 2012

In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…

Descriptors: Item Response Theory, Test Items, Selection, Models

The Random-Effect Generalized Rating Scale Model

Peer reviewed

Direct link

Wang, Wen-Chung; Wu, Shiu-Lien – Journal of Educational Measurement, 2011

Rating scale items have been widely used in educational and psychological tests. These items require people to make subjective judgments, and these subjective judgments usually involve randomness. To account for this randomness, Wang, Wilson, and Shih proposed the random-effect rating scale model in which the threshold parameters are treated as…

Descriptors: Rating Scales, Models, Statistical Analysis, Computation

Estimation Methods for One-Parameter Testlet Models

Peer reviewed

Direct link

Jiao, Hong; Wang, Shudong; He, Wei – Journal of Educational Measurement, 2013

This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…

Descriptors: Computation, Item Response Theory, Models, Monte Carlo Methods

Automated Test Assembly for Cognitive Diagnosis Models Using a Genetic Algorithm

Peer reviewed

Direct link

Finkelman, Matthew; Kim, Wonsuk; Roussos, Louis A. – Journal of Educational Measurement, 2009

Much recent psychometric literature has focused on cognitive diagnosis models (CDMs), a promising class of instruments used to measure the strengths and weaknesses of examinees. This article introduces a genetic algorithm to perform automated test assembly alongside CDMs. The algorithm is flexible in that it can be applied whether the goal is to…

Descriptors: Identification, Genetics, Test Construction, Mathematics

Examining Teacher Grades Using Rasch Measurement Theory

Peer reviewed

Direct link

Randall, Jennifer; Engelhard, George, Jr. – Journal of Educational Measurement, 2009

In this study, we present an approach to questionnaire design within educational research based on Guttman's mapping sentences and Many-Facet Rasch Measurement Theory. We designed a 54-item questionnaire using Guttman's mapping sentences to examine the grading practices of teachers. Each item in the questionnaire represented a unique student…

Descriptors: Student Evaluation, Educational Research, Grades (Scholastic), Public School Teachers

Generalizability in Item Response Modeling

Peer reviewed

Direct link

Briggs, Derek C.; Wilson, Mark – Journal of Educational Measurement, 2007

An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random…

Descriptors: Markov Processes, Generalizability Theory, Item Response Theory, Computation

Modeling Diagnostic Assessments with Bayesian Networks

Peer reviewed

Direct link

Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego – Journal of Educational Measurement, 2007

This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…

Descriptors: Inferences, Models, Item Response Theory, Cognitive Measurement

A New Method for Assessing the Statistical Significance in the Differential Functioning of Items and Tests (DFIT) Framework

Peer reviewed

Direct link

Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O. – Journal of Educational Measurement, 2006

A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…

Descriptors: Evaluation Methods, Statistical Distributions, Statistical Significance, Test Bias

Computer Software	15
Item Response Theory	9
Models	8
Test Items	6
Computation	5
Simulation	5
Computer Assisted Testing	4
Accuracy	3
Measurement Techniques	3
Monte Carlo Methods	3
Psychometrics	3
Statistical Analysis	3
Student Evaluation	3
Bayesian Statistics	2
Cognitive Measurement	2
Comparative Analysis	2
Correlation	2
Difficulty Level	2
Educational Assessment	2
Evaluation Methods	2
Foreign Countries	2
Markov Processes	2
Response Style (Tests)	2
Scores	2
Scoring	2
More ▼

Wang, Wen-Chung	3
Jin, Kuan-Yu	2
Alex J. Mechaber	1
Almond, Russell G.	1
Brian E. Clauser	1
Briggs, Derek C.	1
De Boeck, Paul	1
Debeer, Dries	1
DiBello, Louis V.	1
Emre Gonulates	1
Engelhard, George, Jr.	1
Finkelman, Matthew	1
Harold Doran	1
He, Wei	1
Janssen, Rianne	1
Jiao, Hong	1
Kai North	1
Kim, Wonsuk	1
Kuan-Yu Jin	1
Le An Ha	1
Moulder, Brad	1
Nanda, Alice O.	1
Oshima, T. C.	1
Peter Baldwin	1
Qiu, Xue-Lan	1
More ▼