ERIC - Search Results

Publication Date

In 2026	0
Since 2025	4
Since 2022 (last 5 years)	9
Since 2017 (last 10 years)	26
Since 2007 (last 20 years)	73

Descriptor

Computation	82
Difficulty Level	82
Test Items	82
Item Response Theory	49
Models	19
Comparative Analysis	18
Statistical Analysis	18
Accuracy	12
Mathematics Tests	12
Sample Size	12
Simulation	12
Error of Measurement	11
Foreign Countries	11
Scores	11
Bayesian Statistics	10
Correlation	10
Equated Scores	10
Maximum Likelihood Statistics	9
Test Construction	9
Guessing (Tests)	8
Statistical Bias	8
Achievement Tests	7
Item Analysis	7
Monte Carlo Methods	7
Probability	7
More ▼

Publication Type

Journal Articles	63
Reports - Research	60
Dissertations/Theses -…	8
Reports - Descriptive	8
Reports - Evaluative	6
Numerical/Quantitative Data	3
Speeches/Meeting Papers	3
Tests/Questionnaires	3

Education Level

Elementary Education	10
Middle Schools	8
Secondary Education	8
Higher Education	7
Postsecondary Education	7
Grade 3	6
Grade 8	6
Grade 4	5
Grade 5	5
Junior High Schools	5
Elementary Secondary Education	4
Grade 6	4
Grade 7	4
Early Childhood Education	3
Primary Education	3
Grade 1	2
Grade 2	2
Grade 11	1
Grade 12	1
High Schools	1
Intermediate Grades	1
Kindergarten	1
More ▼

Audience

Location

Turkey	3
Indonesia	2
Belgium	1
Florida	1
Germany	1
India	1
Malaysia	1
New York	1
Oregon	1
Saudi Arabia	1
United Kingdom	1
Virginia	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Wide Range Achievement Test	2
Comprehensive Tests of Basic…	1
General Aptitude Test Battery	1
Graduate Record Examinations	1
Measures of Academic Progress	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 82 results Save | Export

Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation

Peer reviewed
PDF on ERIC

Download full text

Leonidas Zotos; Hedderik van Rijn; Malvina Nissim – International Educational Data Mining Society, 2025

In an educational setting, an estimate of the difficulty of Multiple-Choice Questions (MCQs), a commonly used strategy to assess learning progress, constitutes very useful information for both teachers and students. Since human assessment is costly from multiple points of view, automatic approaches to MCQ item difficulty estimation are…

Descriptors: Multiple Choice Tests, Test Items, Difficulty Level, Artificial Intelligence

From Item Estimates to Test Operations: The Cascading Effect of Rapid Guessing

Peer reviewed

Direct link

Sarah Alahmadi; Christine E. DeMars – Journal of Educational Measurement, 2025

Inadequate test-taking effort poses a significant challenge, particularly when low-stakes test results inform high-stakes policy and psychometric decisions. We examined how rapid guessing (RG), a common form of low test-taking effort, biases item parameter estimates, particularly the discrimination and difficulty parameters. Previous research…

Descriptors: Guessing (Tests), Computation, Statistical Bias, Test Items

The Accuracy of Estimating Parameters of Multiple-Choice Test Items, Following Item-Response Theory: A Simulation Study

Peer reviewed
PDF on ERIC

Download full text

Aiman Mohammad Freihat; Omar Saleh Bani Yassin – Educational Process: International Journal, 2025

Background/purpose: This study aimed to reveal the accuracy of estimation of multiple-choice test items parameters following the models of the item-response theory in measurement. Materials/methods: The researchers depended on the measurement accuracy indicators, which express the absolute difference between the estimated and actual values of the…

Descriptors: Accuracy, Computation, Multiple Choice Tests, Test Items

Seeking the Real Reliability: Why the Traditional Estimators of Reliability Usually Fail in Achievement Testing and Why the Deflation-Corrected Coefficients Could Be Better Options

Peer reviewed
PDF on ERIC

Download full text

Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2023

Traditional estimators of reliability such as coefficients alpha, theta, omega, and rho (maximal reliability) are prone to give radical underestimates of reliability for the tests common when testing educational achievement. These tests are often structured by widely deviating item difficulties. This is a typical pattern where the traditional…

Descriptors: Test Reliability, Achievement Tests, Computation, Test Items

Validation and Psychometric Properties of the Computational Thinking Multidimensional Test

Peer reviewed
PDF on ERIC

Download full text

Ali Orhan; Inan Tekin; Sedat Sen – International Journal of Assessment Tools in Education, 2025

In this study, it was aimed to translate and adapt the Computational Thinking Multidimensional Test (CTMT) developed by Kang et al. (2023) into Turkish and to investigate its psychometric qualities with Turkish university students. Following the translation procedures of the CTMT with 12 multiple-choice questions developed based on real-life…

Descriptors: Cognitive Tests, Thinking Skills, Computation, Test Validity

How Hard Can This Question Be? An Exploratory Analysis of Features Assessing Question Difficulty Using LLMs

Peer reviewed

Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2024

Assessing the difficulty of reading comprehension questions is crucial to educational methodologies and language understanding technologies. Traditional methods of assessing question difficulty rely frequently on human judgments or shallow metrics, often failing to accurately capture the intricate cognitive demands of answering a question. This…

Descriptors: Difficulty Level, Reading Tests, Test Items, Reading Comprehension

Sample Size and Item Parameter Estimation Precision When Utilizing the Masters' Partial Credit Model

Download full text

Custer, Michael; Kim, Jongpil – Online Submission, 2023

This study utilizes an analysis of diminishing returns to examine the relationship between sample size and item parameter estimation precision when utilizing the Masters' Partial Credit Model for polytomous items. Item data from the standardization of the Batelle Developmental Inventory, 3rd Edition were used. Each item was scored with a…

Descriptors: Sample Size, Item Response Theory, Test Items, Computation

Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items

Peer reviewed

Direct link

Tang, Xiaodan; Karabatsos, George; Chen, Haiqin – Applied Measurement in Education, 2020

In applications of item response theory (IRT) models, it is known that empirical violations of the local independence (LI) assumption can significantly bias parameter estimates. To address this issue, we propose a threshold-autoregressive item response theory (TAR-IRT) model that additionally accounts for order dependence among the item responses…

Descriptors: Item Response Theory, Test Items, Models, Computation

Classical Item Analysis from a Signal Detection Perspective

Peer reviewed

Direct link

DeCarlo, Lawrence T. – Journal of Educational Measurement, 2023

A conceptualization of multiple-choice exams in terms of signal detection theory (SDT) leads to simple measures of item difficulty and item discrimination that are closely related to, but also distinct from, those used in classical item analysis (CIA). The theory defines a "true split," depending on whether or not examinees know an item,…

Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Test Wiseness

Can Auxiliary Information Improve Rasch Estimation at Small Sample Sizes?

Direct link

Derek Sauder – ProQuest LLC, 2020

The Rasch model is commonly used to calibrate multiple choice items. However, the sample sizes needed to estimate the Rasch model can be difficult to attain (e.g., consider a small testing company trying to pretest new items). With small sample sizes, auxiliary information besides the item responses may improve estimation of the item parameters.…

Descriptors: Item Response Theory, Sample Size, Computation, Test Length

Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test

Peer reviewed

Direct link

Lozano, José H.; Revuelta, Javier – Applied Measurement in Education, 2021

The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework…

Descriptors: Bayesian Statistics, Computation, Learning, Testing

Efficient Estimation of Mean Ability Growth Using Vertical Scaling

Peer reviewed

Direct link

Bjermo, Jonas; Miller, Frank – Applied Measurement in Education, 2021

In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability…

Descriptors: Scaling, Ability, Computation, Test Items

Development of Instrument Test Computational Thinking Skills IJHS/JHS Based RME Approach

Peer reviewed
PDF on ERIC

Download full text

Munawarah; Thalhah, Siti Zuhaerah; Angriani, Andi Dian; Nur, Fitriani; Kusumayanti, Andi – Mathematics Teaching Research Journal, 2021

The increase in the need for critical and analytical thinking among students to boost their confidence in dealing with complex and difficult problems has led to the development of computational skills. Therefore, this study aims to develop an instrument test for computational thinking (CT) skills in the mathematics-based RME (Realistic Mathematics…

Descriptors: Test Construction, Mathematics Tests, Computation, Thinking Skills

Optimizing Practice Scheduling Requires Quantitative Tracking of Individual Item Performance

Peer reviewed
PDF on ERIC

Download full text

Direct link

Luke G. Eglington; Philip I. Pavlik – Grantee Submission, 2020

Decades of research has shown that spacing practice trials over time can improve later memory, but there are few concrete recommendations concerning how to optimally space practice. We show that existing recommendations are inherently suboptimal due to their insensitivity to time costs and individual- and item-level differences. We introduce an…

Descriptors: Scheduling, Drills (Practice), Memory, Testing

Optimizing Practice Scheduling Requires Quantitative Tracking of Individual Item Performance

Peer reviewed

Direct link

Luke G. Eglington; Philip I. Pavlik Jr. – npj Science of Learning, 2020

Descriptors: Scheduling, Drills (Practice), Memory, Testing

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Educational and Psychological…	11
Journal of Educational…	8
ProQuest LLC	8
Applied Measurement in…	7
Applied Psychological…	4
ETS Research Report Series	4
Behavioral Research and…	3
Grantee Submission	3
International Journal of…	3
Journal of Educational and…	2
Online Submission	2
Practical Assessment,…	2
Structural Equation Modeling:…	2
Assessment for Effective…	1
Computers & Education	1
EURASIA Journal of…	1
Educational Assessment	1
Educational Process:…	1
Eurasian Journal of…	1
European Journal of Physics…	1
Informatics in Education	1
International Educational…	1
International Journal of…	1
Journal for Research in…	1
Journal of Applied Testing…	1
More ▼

Finch, Holmes	3
Guo, Hongwen	3
He, Wei	3
Ketterlin-Geller, Leanne R.	3
Liu, Kimy	3
Tindal, Gerald	3
Jiao, Hong	2
Luke G. Eglington	2
Matlock, Ki Lynn	2
Michaelides, Michalis P.	2
Nelson, Gena	2
Oh, Hyeonjoo J.	2
Paek, Insu	2
Revuelta, Javier	2
Wang, Shudong	2
Aiman Mohammad Freihat	1
Ali Orhan	1
Ali, Usama S.	1
Anderson, Daniel	1
Andreea Dutulescu	1
Angriani, Andi Dian	1
Arlinwibowo, Janu	1
Bauduin, Charity	1
Belur, Madhu N.	1
Bjermo, Jonas	1
More ▼