Publication Date
In 2025 | 3 |
Since 2024 | 3 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 15 |
Descriptor
Computer Software | 15 |
Item Response Theory | 9 |
Models | 8 |
Test Items | 6 |
Computation | 5 |
Simulation | 5 |
Computer Assisted Testing | 4 |
Accuracy | 3 |
Measurement Techniques | 3 |
Monte Carlo Methods | 3 |
Psychometrics | 3 |
More ▼ |
Source
Journal of Educational… | 15 |
Author
Wang, Wen-Chung | 3 |
Jin, Kuan-Yu | 2 |
Alex J. Mechaber | 1 |
Almond, Russell G. | 1 |
Brian E. Clauser | 1 |
Briggs, Derek C. | 1 |
De Boeck, Paul | 1 |
Debeer, Dries | 1 |
DiBello, Louis V. | 1 |
Emre Gonulates | 1 |
Engelhard, George, Jr. | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 11 |
Reports - Descriptive | 4 |
Education Level
Elementary Education | 1 |
High Schools | 1 |
Higher Education | 1 |
Middle Schools | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Teachers | 1 |
Location
China | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 1 |
What Works Clearinghouse Rating
Harold Doran; Testsuhiro Yamada; Ted Diaz; Emre Gonulates; Vanessa Culver – Journal of Educational Measurement, 2025
Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025
Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…
Descriptors: Models, Test Items, Educational Assessment, Scores
Shermis, Mark D. – Journal of Educational Measurement, 2022
One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays.…
Descriptors: Scoring, Essays, Validity, Writing Evaluation
Zhang, Xue; Tao, Jian; Wang, Chun; Shi, Ning-Zhong – Journal of Educational Measurement, 2019
Model selection is important in any statistical analysis, and the primary goal is to find the preferred (or most parsimonious) model, based on certain criteria, from a set of candidate models given data. Several recent publications have employed the deviance information criterion (DIC) to do model selection among different forms of multilevel item…
Descriptors: Bayesian Statistics, Item Response Theory, Measurement, Models
Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017
When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…
Descriptors: Item Response Theory, Test Items, Responses, Testing Problems
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2014
Sometimes, test-takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to…
Descriptors: Student Evaluation, Item Response Theory, Models, Simulation
Wang, Wen-Chung; Jin, Kuan-Yu; Qiu, Xue-Lan; Wang, Lei – Journal of Educational Measurement, 2012
In some tests, examinees are required to choose a fixed number of items from a set of given items to answer. This practice creates a challenge to standard item response models, because more capable examinees may have an advantage by making wiser choices. In this study, we developed a new class of item response models to account for the choice…
Descriptors: Item Response Theory, Test Items, Selection, Models
Wang, Wen-Chung; Wu, Shiu-Lien – Journal of Educational Measurement, 2011
Rating scale items have been widely used in educational and psychological tests. These items require people to make subjective judgments, and these subjective judgments usually involve randomness. To account for this randomness, Wang, Wilson, and Shih proposed the random-effect rating scale model in which the threshold parameters are treated as…
Descriptors: Rating Scales, Models, Statistical Analysis, Computation
Jiao, Hong; Wang, Shudong; He, Wei – Journal of Educational Measurement, 2013
This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…
Descriptors: Computation, Item Response Theory, Models, Monte Carlo Methods
Finkelman, Matthew; Kim, Wonsuk; Roussos, Louis A. – Journal of Educational Measurement, 2009
Much recent psychometric literature has focused on cognitive diagnosis models (CDMs), a promising class of instruments used to measure the strengths and weaknesses of examinees. This article introduces a genetic algorithm to perform automated test assembly alongside CDMs. The algorithm is flexible in that it can be applied whether the goal is to…
Descriptors: Identification, Genetics, Test Construction, Mathematics
Randall, Jennifer; Engelhard, George, Jr. – Journal of Educational Measurement, 2009
In this study, we present an approach to questionnaire design within educational research based on Guttman's mapping sentences and Many-Facet Rasch Measurement Theory. We designed a 54-item questionnaire using Guttman's mapping sentences to examine the grading practices of teachers. Each item in the questionnaire represented a unique student…
Descriptors: Student Evaluation, Educational Research, Grades (Scholastic), Public School Teachers
Briggs, Derek C.; Wilson, Mark – Journal of Educational Measurement, 2007
An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random…
Descriptors: Markov Processes, Generalizability Theory, Item Response Theory, Computation
Almond, Russell G.; DiBello, Louis V.; Moulder, Brad; Zapata-Rivera, Juan-Diego – Journal of Educational Measurement, 2007
This paper defines Bayesian network models and examines their applications to IRT-based cognitive diagnostic modeling. These models are especially suited to building inference engines designed to be synchronous with the finer grained student models that arise in skills diagnostic assessment. Aspects of the theory and use of Bayesian network models…
Descriptors: Inferences, Models, Item Response Theory, Cognitive Measurement
Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O. – Journal of Educational Measurement, 2006
A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…
Descriptors: Evaluation Methods, Statistical Distributions, Statistical Significance, Test Bias