ERIC - Search Results

Publication Date

In 2025	1
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	6

Descriptor

Correlation	8
Item Response Theory	8
Sampling	8
Computation	4
Test Items	4
Data Analysis	3
Evaluation Methods	3
Test Length	3
Difficulty Level	2
Educational Assessment	2
Effect Size	2
Item Analysis	2
Measurement	2
Models	2
Research Methodology	2
Sample Size	2
Scaling	2
Simulation	2
Accuracy	1
Achievement Tests	1
Artificial Intelligence	1
Bias	1
Classification	1
College Students	1
Comparative Analysis	1
More ▼

Source

Educational and Psychological…	2
Applied Psychological…	1
British Journal of…	1
ETS Research Report Series	1
Online Submission	1
ProQuest LLC	1
Research Papers in Education	1

Publication Type

Journal Articles	6
Reports - Research	6
Dissertations/Theses -…	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Location

United Kingdom (England)

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 8 results Save | Export

Leveraging LLM Respondents for Item Evaluation: A Psychometric Analysis

Peer reviewed

Direct link

Yunting Liu; Shreya Bhandari; Zachary A. Pardos – British Journal of Educational Technology, 2025

Effective educational measurement relies heavily on the curation of well-designed item pools. However, item calibration is time consuming and costly, requiring a sufficient number of respondents to estimate the psychometric properties of items. In this study, we explore the potential of six different large language models (LLMs; GPT-3.5, GPT-4,…

Descriptors: Artificial Intelligence, Test Items, Psychometrics, Educational Assessment

Minimum Sample Size Requirements for Mokken Scale Analysis

Peer reviewed

Direct link

Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas – Educational and Psychological Measurement, 2014

An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…

Descriptors: Sampling, Test Items, Effect Size, Scaling

An Investigation of Measurement Invariance of the Key Stage 2 National Curriculum Science Sampling Test in England

Peer reviewed

Direct link

He, Qingping; Anwyll, Steve; Glanville, Matthew; Opposs, Dennis – Research Papers in Education, 2014

Since 2010, the whole national cohort Key Stage 2 (KS2) National Curriculum test in science in England has been replaced with a sampling test taken by pupils at the age of 11 from a nationally representative sample of schools annually. The study reported in this paper compares the performance of different subgroups of the samples (classified by…

Descriptors: National Curriculum, Sampling, Foreign Countries, Factor Analysis

Standard Errors and Confidence Intervals from Bootstrapping for Ramsay-Curve Item Response Theory Model Item Parameters

Peer reviewed

Direct link

Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M. – Applied Psychological Measurement, 2011

Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…

Descriptors: Intervals, Item Response Theory, Models, Evaluation Methods

Improving IRT Parameter Estimates with Small Sample Sizes: Evaluating the Efficacy of a New Data Augmentation Technique

Direct link

Foley, Brett Patrick – ProQuest LLC, 2010

The 3PL model is a flexible and widely used tool in assessment. However, it suffers from limitations due to its need for large sample sizes. This study introduces and evaluates the efficacy of a new sample size augmentation technique called Duplicate, Erase, and Replace (DupER) Augmentation through a simulation study. Data are augmented using…

Descriptors: Test Length, Sample Size, Simulation, Item Response Theory

Investigating Effect of Ignoring Hierarchical Data Structures on Accuracy of Vertical Scaling Using Mixed-Effects Rasch Model

Download full text

Wang, Shudong; Jiao, Hong; Jin, Ying; Thum, Yeow Meng – Online Submission, 2010

The vertical scales of large-scale achievement tests created by using item response theory (IRT) models are mostly based on cluster (or correlated) educational data in which students usually are clustered in certain groups or settings (classrooms or schools). While such application directly violated assumption of independent sample of person in…

Descriptors: Scaling, Achievement Tests, Data Analysis, Item Response Theory

A Graphical Approach to Item Analysis. Research Report. ETS RR-04-10

Peer reviewed
PDF on ERIC

Download full text

Livingston, Samuel A.; Dorans, Neil J. – ETS Research Report Series, 2004

This paper describes an approach to item analysis that is based on the estimation of a set of response curves for each item. The response curves show, at a glance, the difficulty and the discriminating power of the item and the popularity of each distractor, at any level of the criterion variable (e.g., total score). The curves are estimated by…

Descriptors: Item Analysis, Computation, Difficulty Level, Test Items

Direct Estimation of Correlation as a Measure of Association Strength Using Multidimensional Item Response Models

Peer reviewed

Direct link

Wang, Wen-Chung – Educational and Psychological Measurement, 2004

The Pearson correlation is used to depict effect sizes in the context of item response theory. Amultidimensional Rasch model is used to directly estimate the correlation between latent traits. Monte Carlo simulations were conducted to investigate whether the population correlation could be accurately estimated and whether the bootstrap method…

Descriptors: Test Length, Sampling, Effect Size, Correlation

Anwyll, Steve	1
Dorans, Neil J.	1
Foley, Brett Patrick	1
Glanville, Matthew	1
Gu, Fei	1
He, Qingping	1
Hoyle, Larry	1
Jiao, Hong	1
Jin, Ying	1
Kingston, Neal M.	1
Livingston, Samuel A.	1
Opposs, Dennis	1
Shreya Bhandari	1
Sijtsma, Klaas	1
Skorupski, William P.	1
Straat, J. Hendrik	1
Thum, Yeow Meng	1
Wang, Shudong	1
Wang, Wen-Chung	1
Yunting Liu	1
Zachary A. Pardos	1
van der Ark, L. Andries	1
More ▼