ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	18
Since 2016 (last 10 years)	33
Since 2006 (last 20 years)	52

Descriptor

Sample Size	64
Test Items	64
Test Length	64
Item Response Theory	46
Simulation	20
Comparative Analysis	16
Monte Carlo Methods	16
Error of Measurement	15
Correlation	14
Models	14
Difficulty Level	13
Test Bias	13
Accuracy	12
Computation	12
Goodness of Fit	12
Statistical Analysis	12
Equated Scores	10
Item Analysis	9
Test Format	7
Nonparametric Statistics	6
Test Construction	6
Ability	5
Error Patterns	5
Guidelines	5
Probability	5
More ▼

Source

Educational and Psychological…	16
ProQuest LLC	8
Applied Measurement in…	5
Applied Psychological…	4
Journal of Educational…	4
ETS Research Report Series	3
International Journal of…	3
International Journal of…	3
Educational Sciences: Theory…	2
Measurement:…	2
Eurasian Journal of…	1
Journal of Educational and…	1
Participatory Educational…	1
Psychometrika	1
Quality Assurance in…	1
More ▼

Publication Type

Reports - Research	48
Journal Articles	47
Dissertations/Theses -…	8
Reports - Evaluative	7
Speeches/Meeting Papers	7
Guides - Non-Classroom	1
Numerical/Quantitative Data	1

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Researchers

Location

Turkey

Laws, Policies, & Programs

Assessments and Surveys

Comprehensive Tests of Basic…	1
Iowa Tests of Basic Skills	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 64 results Save | Export

The NEAT Equating via Chaining Random Forests in the Context of Small Sample Sizes: A Machine-Learning Method

Peer reviewed

Direct link

Jiang, Zhehan; Han, Yuting; Xu, Lingling; Shi, Dexin; Liu, Ren; Ouyang, Jinying; Cai, Fen – Educational and Psychological Measurement, 2023

The part of responses that is absent in the nonequivalent groups with anchor test (NEAT) design can be managed to a planned missing scenario. In the context of small sample sizes, we present a machine learning (ML)-based imputation technique called chaining random forests (CRF) to perform equating tasks within the NEAT design. Specifically, seven…

Descriptors: Test Items, Equated Scores, Sample Size, Artificial Intelligence

What Affects the Quality of Score Transformations? Potential Issues in True-Score Equating Using the Partial Credit Model

Peer reviewed

Direct link

Fellinghauer, Carolina; Debelak, Rudolf; Strobl, Carolin – Educational and Psychological Measurement, 2023

This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation…

Descriptors: True Scores, Equated Scores, Test Items, Sample Size

A Simulation Study on the Performance of Different Reliability Estimation Methods

Peer reviewed

Direct link

Edwards, Ashley A.; Joyner, Keanan J.; Schatschneider, Christopher – Educational and Psychological Measurement, 2021

The accuracy of certain internal consistency estimators have been questioned in recent years. The present study tests the accuracy of six reliability estimators (Cronbach's alpha, omega, omega hierarchical, Revelle's omega, and greatest lower bound) in 140 simulated conditions of unidimensional continuous data with uncorrelated errors with varying…

Descriptors: Reliability, Computation, Accuracy, Sample Size

There Are Many Greater Lower Bounds than Cronbach's [alpha]: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023

A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…

Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation

Type I Error and Power Rates: A Comparative Analysis of Techniques in Differential Item Functioning

Peer reviewed
PDF on ERIC

Download full text

Ayse Bilicioglu Gunes; Bayram Bicak – International Journal of Assessment Tools in Education, 2023

The main purpose of this study is to examine the Type I error and statistical power ratios of Differential Item Functioning (DIF) techniques based on different theories under different conditions. For this purpose, a simulation study was conducted by using Mantel-Haenszel (MH), Logistic Regression (LR), Lord's [chi-squared], and Raju's Areas…

Descriptors: Test Items, Item Response Theory, Error of Measurement, Test Bias

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Investigating Confidence Intervals of Item Parameters When Some Item Parameters Take Priors in the 2PL and 3PL Models

Peer reviewed

Direct link

Paek, Insu; Lin, Zhongtian; Chalmers, Robert Philip – Educational and Psychological Measurement, 2023

To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori…

Descriptors: Models, Item Response Theory, Test Items, Intervals

Evaluation of Factors Affecting the Performance of the "S - X[superscript 2]" Item-Fit Index

Peer reviewed

Direct link

Kim, Hyung Jin; Lee, Won-Chan – Journal of Educational Measurement, 2022

Orlando and Thissen (2000) introduced the "S - X[superscript 2]" item-fit index for testing goodness-of-fit with dichotomous item response theory (IRT) models. This study considers and evaluates an alternative approach for computing "S - X[superscript 2]" values and other factors associated with collapsing tables of observed…

Descriptors: Goodness of Fit, Test Items, Item Response Theory, Computation

A Comparison of the Efficacies of Differential Item Functioning Detection Methods

Peer reviewed
PDF on ERIC

Download full text

Basman, Munevver – International Journal of Assessment Tools in Education, 2023

To ensure the validity of the tests is to check that all items have similar results across different groups of individuals. However, differential item functioning (DIF) occurs when the results of individuals with equal ability levels from different groups differ from each other on the same test item. Based on Item Response Theory and Classic Test…

Descriptors: Test Bias, Test Items, Test Validity, Item Response Theory

IRT Models for Learning with Item-Specific Learning Parameters

Peer reviewed

Direct link

Yu, Albert; Douglas, Jeffrey A. – Journal of Educational and Behavioral Statistics, 2023

We propose a new item response theory growth model with item-specific learning parameters, or ISLP, and two variations of this model. In the ISLP model, either items or blocks of items have their own learning parameters. This model may be used to improve the efficiency of learning in a formative assessment. We show ways that the ISLP model's…

Descriptors: Item Response Theory, Learning, Markov Processes, Monte Carlo Methods

An Evaluation of Fit Indices Used in Model Selection of Dichotomous Mixture IRT Models

Peer reviewed

Direct link

Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024

A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…

Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification

The Recovery of Correlation between Latent Abilities Using Compensatory and Noncompensatory Multidimensional IRT Models

Peer reviewed

Direct link

Fu, Yanyan; Strachan, Tyler; Ip, Edward H.; Willse, John T.; Chen, Shyh-Huei; Ackerman, Terry – International Journal of Testing, 2020

This research examined correlation estimates between latent abilities when using the two-dimensional and three-dimensional compensatory and noncompensatory item response theory models. Simulation study results showed that the recovery of the latent correlation was best when the test contained 100% of simple structure items for all models and…

Descriptors: Item Response Theory, Models, Test Items, Simulation

Performance of the S-X[superscript 2] Statistic for the Multidimensional Graded Response Model

Peer reviewed

Direct link

Su, Shiyang; Wang, Chun; Weiss, David J. – Educational and Psychological Measurement, 2021

S-X[superscript 2] is a popular item fit index that is available in commercial software packages such as "flex"MIRT. However, no research has systematically examined the performance of S-X[superscript 2] for detecting item misfit within the context of the multidimensional graded response model (MGRM). The primary goal of this study was…

Descriptors: Statistics, Goodness of Fit, Test Items, Models

Can Auxiliary Information Improve Rasch Estimation at Small Sample Sizes?

Direct link

Derek Sauder – ProQuest LLC, 2020

The Rasch model is commonly used to calibrate multiple choice items. However, the sample sizes needed to estimate the Rasch model can be difficult to attain (e.g., consider a small testing company trying to pretest new items). With small sample sizes, auxiliary information besides the item responses may improve estimation of the item parameters.…

Descriptors: Item Response Theory, Sample Size, Computation, Test Length

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Lee, Won-Chan	3
Cohen, Allan S.	2
Huggins-Manley, Anne Corinne	2
Kim, Seock-Ho	2
Lee, Yi-Hsuan	2
Wells, Craig S.	2
Zhang, Jinming	2
Ackerman, Terry	1
Allan S. Cohen	1
Ames, Allison J.	1
Andersson, Björn	1
Anil, Duygu	1
Ansley, Timothy N.	1
Ayse Bilicioglu Gunes	1
Basman, Munevver	1
Bayram Bicak	1
Bolt, Daniel M.	1
Bulut, Okan	1
Cai, Fen	1
Cao, Mengyang	1
Chalmers, Robert Philip	1
Chen, Shyh-Huei	1
Chernyshenko, Oleksandr S.	1
Chon, Kyong Hee	1
More ▼