ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	13

Descriptor

Bayesian Statistics	14
Classification	14
Test Items	14
Accuracy	7
Item Response Theory	6
Comparative Analysis	5
Models	5
Computation	4
Diagnostic Tests	4
Monte Carlo Methods	4
Computer Software	3
Difficulty Level	3
Probability	3
Sample Size	3
Statistical Analysis	3
Correlation	2
Decision Making	2
Markov Processes	2
Maximum Likelihood Statistics	2
Reliability	2
Scoring	2
Simulation	2
Subtraction	2
Test Format	2
Test Length	2
More ▼

Source

Educational and Psychological…	3
ProQuest LLC	3
Applied Psychological…	2
Applied Measurement in…	1
Educational Measurement:…	1
Journal of Learning Analytics	1
Journal of Memory and Language	1
Practical Assessment,…	1

Publication Type

Journal Articles	10
Reports - Research	8
Dissertations/Theses -…	3
Reports - Evaluative	2
Reports - Descriptive	1

Education Level

High Schools	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Africa	1
Ghana	1
Nigeria	1
South Africa	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Learning to Love LLMs for Answer Interpretation: Chain-of-Thought Prompting and the AMMORE Dataset

Peer reviewed
PDF on ERIC

Download full text

Owen Henkel; Hannah Horne-Robinson; Maria Dyshel; Greg Thompson; Ralph Abboud; Nabil Al Nahin Ch; Baptiste Moreau-Pernet; Kirk Vanacore – Journal of Learning Analytics, 2025

This paper introduces AMMORE, a new dataset of 53,000 math open-response question-answer pairs from Rori, a mathematics learning platform used by middle and high school students in several African countries. Using this dataset, we conducted two experiments to evaluate the use of large language models (LLM) for grading particularly challenging…

Descriptors: Learning Analytics, Learning Management Systems, Mathematics Instruction, Middle School Students

An Evaluation of Fit Indices Used in Model Selection of Dichotomous Mixture IRT Models

Peer reviewed

Direct link

Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024

A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…

Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification

Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests

Peer reviewed

Direct link

Huang, Hung-Yu – Educational and Psychological Measurement, 2023

The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs)…

Descriptors: Test Items, Classification, Bayesian Statistics, Decision Making

Estimating Classification Decisions for Incomplete Tests

Peer reviewed

Direct link

Feinberg, Richard A. – Educational Measurement: Issues and Practice, 2021

Unforeseen complications during the administration of large-scale testing programs are inevitable and can prevent examinees from accessing all test material. For classification tests in which the primary purpose is to yield a decision, such as a pass/fail result, the current study investigated a model-based standard error approach, Bayesian…

Descriptors: High Stakes Tests, Classification, Decision Making, Bayesian Statistics

Evaluating the Effectiveness of the Expectation-Maximization (EM) Algorithm for Bayesian Network Calibration

Direct link

Tingir, Seyfullah – ProQuest LLC, 2019

Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…

Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability

Multidimensional Classification of Examinees Using the Mixture Random Weights Linear Logistic Test Model

Peer reviewed

Direct link

Choi, In-Hee; Wilson, Mark – Educational and Psychological Measurement, 2015

An essential feature of the linear logistic test model (LLTM) is that item difficulties are explained using item design properties. By taking advantage of this explanatory aspect of the LLTM, in a mixture extension of the LLTM, the meaning of latent classes is specified by how item properties affect item difficulties within each class. To improve…

Descriptors: Classification, Test Items, Difficulty Level, Statistical Analysis

Parameter Recovery and Classification Accuracy under Conditions of Testlet Dependency: A Comparison of the Traditional 2PL, Testlet, and Bi-Factor Models

Peer reviewed

Direct link

Koziol, Natalie A. – Applied Measurement in Education, 2016

Testlets, or groups of related items, are commonly included in educational assessments due to their many logistical and conceptual advantages. Despite their advantages, testlets introduce complications into the theory and practice of educational measurement. Responses to items within a testlet tend to be correlated even after controlling for…

Descriptors: Classification, Accuracy, Comparative Analysis, Models

Bi-Factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification

Direct link

Md Desa, Zairul Nor Deana – ProQuest LLC, 2012

In recent years, there has been increasing interest in estimating and improving subscore reliability. In this study, the multidimensional item response theory (MIRT) and the bi-factor model were combined to estimate subscores, to obtain subscores reliability, and subscores classification. Both the compensatory and partially compensatory MIRT…

Descriptors: Item Response Theory, Computation, Reliability, Classification

Diagnosis of Subtraction Bugs Using Bayesian Networks

Peer reviewed

Direct link

Lee, Jihyun; Corter, James E. – Applied Psychological Measurement, 2011

Diagnosis of misconceptions or "bugs" in procedural skills is difficult because of their unstable nature. This study addresses this problem by proposing and evaluating a probability-based approach to the diagnosis of bugs in children's multicolumn subtraction performance using Bayesian networks. This approach assumes a causal network relating…

Descriptors: Misconceptions, Probability, Children, Subtraction

Peer reviewed

Direct link

Maguire, Angela M.; Humphreys, Michael S.; Dennis, Simon; Lee, Michael D. – Journal of Memory and Language, 2010

This paper addresses two Global Matching predictions in embedded-category designs: the within-category choice advantage in forced-choice recognition (superior discrimination for test choices comprising a same-category distractor); and the category length effect in forced-choice and old/new recognition (a loss in discriminability with increases in…

Descriptors: Bayesian Statistics, Models, Prediction, Classification

Diagnosing Examinees' Attributes-Mastery Using the Bayesian Inference for Binomial Proportion: A New Method for Cognitive Diagnostic Assessment

Direct link

Kim, Hyun Seok John – ProQuest LLC, 2011

Cognitive diagnostic assessment (CDA) is a new theoretical framework for psychological and educational testing that is designed to provide detailed information about examinees' strengths and weaknesses in specific knowledge structures and processing skills. During the last three decades, more than a dozen psychometric models have been developed…

Descriptors: Cognitive Measurement, Diagnostic Tests, Bayesian Statistics, Statistical Inference

On the Analysis of Fraction Subtraction Data: The DINA Model, Classification, Latent Class Sizes, and the Q-Matrix

Peer reviewed

Direct link

DeCarlo, Lawrence T. – Applied Psychological Measurement, 2011

Cognitive diagnostic models (CDMs) attempt to uncover latent skills or attributes that examinees must possess in order to answer test items correctly. The DINA (deterministic input, noisy "and") model is a popular CDM that has been widely used. It is shown here that a logistic version of the model can easily be fit with standard software for…

Descriptors: Bayesian Statistics, Computation, Cognitive Tests, Diagnostic Tests

Scoring and Classifying Examinees Using Measurement Decision Theory

Peer reviewed

Direct link

Rudner, Lawrence M. – Practical Assessment, Research & Evaluation, 2009

This paper describes and evaluates the use of measurement decision theory (MDT) to classify examinees based on their item response patterns. The model has a simple framework that starts with the conditional probabilities of examinees in each category or mastery state responding correctly to each item. The presented evaluation investigates: (1) the…

Descriptors: Classification, Scoring, Item Response Theory, Measurement

Adaptive Mastery Testing Using a Multidimensional IRT Model and Bayesian Sequential Decision Theory. Research Report.

Download full text

Glas, Cees A. W.; Vos, Hans J. – 2000

This paper focuses on a version of sequential mastery testing (i.e., classifying students as a master/nonmaster or continuing testing and administering another item or testlet) in which response behavior is modeled by a multidimensional item response theory (IRT) model. First, a general theoretical framework is outlined that is based on a…

Descriptors: Adaptive Testing, Bayesian Statistics, Classification, Computer Assisted Testing

Allan S. Cohen	1
Baptiste Moreau-Pernet	1
Choi, In-Hee	1
Corter, James E.	1
DeCarlo, Lawrence T.	1
Dennis, Simon	1
Feinberg, Richard A.	1
Glas, Cees A. W.	1
Greg Thompson	1
Hannah Horne-Robinson	1
Huang, Hung-Yu	1
Humphreys, Michael S.	1
Kim, Hyun Seok John	1
Kirk Vanacore	1
Koziol, Natalie A.	1
Lee, Jihyun	1
Lee, Michael D.	1
Maguire, Angela M.	1
Maria Dyshel	1
Md Desa, Zairul Nor Deana	1
Nabil Al Nahin Ch	1
Owen Henkel	1
Ralph Abboud	1
Rudner, Lawrence M.	1
Sedat Sen	1
More ▼