NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Owen Henkel; Hannah Horne-Robinson; Maria Dyshel; Greg Thompson; Ralph Abboud; Nabil Al Nahin Ch; Baptiste Moreau-Pernet; Kirk Vanacore – Journal of Learning Analytics, 2025
This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a mathematics learning platform used by middle and high school students in several African countries. Using this dataset, we conducted two experiments to evaluate the use of large language models (LLM) for grading particularly challenging…
Descriptors: Learning Analytics, Learning Management Systems, Mathematics Instruction, Middle School Students
Peer reviewed Peer reviewed
Direct linkDirect link
Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024
A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…
Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Huang, Hung-Yu – Educational and Psychological Measurement, 2023
The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs)…
Descriptors: Test Items, Classification, Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Feinberg, Richard A. – Educational Measurement: Issues and Practice, 2021
Unforeseen complications during the administration of large-scale testing programs are inevitable and can prevent examinees from accessing all test material. For classification tests in which the primary purpose is to yield a decision, such as a pass/fail result, the current study investigated a model-based standard error approach, Bayesian…
Descriptors: High Stakes Tests, Classification, Decision Making, Bayesian Statistics
Tingir, Seyfullah – ProQuest LLC, 2019
Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…
Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Choi, In-Hee; Wilson, Mark – Educational and Psychological Measurement, 2015
An essential feature of the linear logistic test model (LLTM) is that item difficulties are explained using item design properties. By taking advantage of this explanatory aspect of the LLTM, in a mixture extension of the LLTM, the meaning of latent classes is specified by how item properties affect item difficulties within each class. To improve…
Descriptors: Classification, Test Items, Difficulty Level, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Koziol, Natalie A. – Applied Measurement in Education, 2016
Testlets, or groups of related items, are commonly included in educational assessments due to their many logistical and conceptual advantages. Despite their advantages, testlets introduce complications into the theory and practice of educational measurement. Responses to items within a testlet tend to be correlated even after controlling for…
Descriptors: Classification, Accuracy, Comparative Analysis, Models
Md Desa, Zairul Nor Deana – ProQuest LLC, 2012
In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…
Descriptors: Item Response Theory, Computation, Reliability, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Jihyun; Corter, James E. – Applied Psychological Measurement, 2011
Diagnosis of misconceptions or "bugs" in procedural skills is difficult because of their unstable nature. This study addresses this problem by proposing and evaluating a probability-based approach to the diagnosis of bugs in children's multicolumn subtraction performance using Bayesian networks. This approach assumes a causal network relating…
Descriptors: Misconceptions, Probability, Children, Subtraction
Peer reviewed Peer reviewed
Direct linkDirect link
Maguire, Angela M.; Humphreys, Michael S.; Dennis, Simon; Lee, Michael D. – Journal of Memory and Language, 2010
This paper addresses two Global Matching predictions in embedded-category designs: the within-category choice advantage in forced-choice recognition (superior discrimination for test choices comprising a same-category distractor); and the category length effect in forced-choice and old/new recognition (a loss in discriminability with increases in…
Descriptors: Bayesian Statistics, Models, Prediction, Classification
Kim, Hyun Seok John – ProQuest LLC, 2011
Cognitive diagnostic assessment (CDA) is a new theoretical framework for psychological and educational testing that is designed to provide detailed information about examinees' strengths and weaknesses in specific knowledge structures and processing skills. During the last three decades, more than a dozen psychometric models have been developed…
Descriptors: Cognitive Measurement, Diagnostic Tests, Bayesian Statistics, Statistical Inference
Peer reviewed Peer reviewed
Direct linkDirect link
DeCarlo, Lawrence T. – Applied Psychological Measurement, 2011
Cognitive diagnostic models (CDMs) attempt to uncover latent skills or attributes that examinees must possess in order to answer test items correctly. The DINA (deterministic input, noisy "and") model is a popular CDM that has been widely used. It is shown here that a logistic version of the model can easily be fit with standard software for…
Descriptors: Bayesian Statistics, Computation, Cognitive Tests, Diagnostic Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Rudner, Lawrence M. – Practical Assessment, Research & Evaluation, 2009
This paper describes and evaluates the use of measurement decision theory (MDT) to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1) the…
Descriptors: Classification, Scoring, Item Response Theory, Measurement
Glas, Cees A. W.; Vos, Hans J. – 2000
This paper focuses on a version of sequential mastery testing (i.e., classifying students as a master/nonmaster or continuing testing and administering another item or testlet) in which response behavior is modeled by a multidimensional item response theory (IRT) model. First, a general theoretical framework is outlined that is based on a…
Descriptors: Adaptive Testing, Bayesian Statistics, Classification, Computer Assisted Testing