ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	9
Since 2016 (last 10 years)	20
Since 2006 (last 20 years)	31

Descriptor

Error of Measurement	39
Test Items	39
Test Length	39
Item Response Theory	20
Sample Size	15
Simulation	11
Comparative Analysis	10
Computation	9
Adaptive Testing	8
Computer Assisted Testing	8
Item Analysis	8
Scores	8
Test Reliability	8
Models	7
Monte Carlo Methods	7
Statistical Bias	6
Test Bias	6
Difficulty Level	5
Statistical Analysis	5
Ability	4
Accuracy	4
Probability	4
Test Construction	4
Test Format	4
Bayesian Statistics	3
More ▼

Source

Applied Psychological…	5
ETS Research Report Series	5
Applied Measurement in…	4
Educational and Psychological…	4
ProQuest LLC	4
Journal of Educational…	3
International Journal of…	2
International Journal of…	2
Assessment & Evaluation in…	1
Education and Information…	1
Educational Sciences: Theory…	1
Journal of Educational and…	1
Journal of Psychoeducational…	1
Psychological Methods	1
More ▼

Publication Type

Journal Articles	31
Reports - Research	29
Reports - Evaluative	6
Dissertations/Theses -…	4
Speeches/Meeting Papers	2

Education Level

Grade 3	2
Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Primary Education	1

Audience

Location

Iran	1
Taiwan	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
Armed Forces Qualification…	1
Comprehensive Tests of Basic…	1
National Longitudinal Study…	1
Test of English as a Foreign…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 39 results Save | Export

Type I Error and Power Rates: A Comparative Analysis of Techniques in Differential Item Functioning

Peer reviewed
PDF on ERIC

Download full text

Ayse Bilicioglu Gunes; Bayram Bicak – International Journal of Assessment Tools in Education, 2023

The main purpose of this study is to examine the Type I error and statistical power ratios of Differential Item Functioning (DIF) techniques based on different theories under different conditions. For this purpose, a simulation study was conducted by using Mantel-Haenszel (MH), Logistic Regression (LR), Lord's [chi-squared], and Raju's Areas…

Descriptors: Test Items, Item Response Theory, Error of Measurement, Test Bias

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Investigating Confidence Intervals of Item Parameters When Some Item Parameters Take Priors in the 2PL and 3PL Models

Peer reviewed

Direct link

Paek, Insu; Lin, Zhongtian; Chalmers, Robert Philip – Educational and Psychological Measurement, 2023

To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori…

Descriptors: Models, Item Response Theory, Test Items, Intervals

Evaluation of the Goodness-of-Fit Index M[subscript ord] in Polytomous DCMS with Hierarchical Attribute Structures

Direct link

Haimiao Yuan – ProQuest LLC, 2022

The application of diagnostic classification models (DCMs) in the field of educational measurement is getting more attention in recent years. To make a valid inference from the model, it is important to ensure that the model fits the data. The purpose of the present study was to investigate the performance of the limited information…

Descriptors: Goodness of Fit, Educational Assessment, Educational Diagnosis, Models

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

A Regression Discontinuity Design Framework for Controlling Selection Bias in Evaluations of Differential Item Functioning

Peer reviewed

Direct link

Koziol, Natalie A.; Goodrich, J. Marc; Yoon, HyeonJin – Educational and Psychological Measurement, 2022

Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A…

Descriptors: Regression (Statistics), Item Analysis, Validity, Testing Accommodations

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Precision of Single-Skill Math CBM Time-Series Data: The Effect of Probe Stratification and Set Size

Peer reviewed

Direct link

Solomon, Benjamin G.; Payne, Lexy L.; Campana, Kayla V.; Marr, Erin A.; Battista, Carmela; Silva, Alex; Dawes, Jillian M. – Journal of Psychoeducational Assessment, 2020

Comparatively little research exists on single-skill math (SSM) curriculum-based measurements (CBMs) for the purpose of monitoring growth, as may be done in practice or when monitoring intervention effectiveness within group or single-case research. Therefore, we examined a common variant of SSM-CBM: 1 digit × 1 digit multiplication. Reflecting…

Descriptors: Curriculum Based Assessment, Mathematics Tests, Mathematics Skills, Multiplication

Detection and Treatment of Careless Responses to Improve Item Parameter Estimation

Peer reviewed

Direct link

Patton, Jeffrey M.; Cheng, Ying; Hong, Maxwell; Diao, Qi – Journal of Educational and Behavioral Statistics, 2019

In psychological and survey research, the prevalence and serious consequences of careless responses from unmotivated participants are well known. In this study, we propose to iteratively detect careless responders and cleanse the data by removing their responses. The careless responders are detected using person-fit statistics. In two simulation…

Descriptors: Test Items, Response Style (Tests), Identification, Computation

Comparison of Confirmatory Factor Analysis Estimation Methods on Mixed-Format Data

Peer reviewed
PDF on ERIC

Download full text

Kilic, Abdullah Faruk; Dogan, Nuri – International Journal of Assessment Tools in Education, 2021

Weighted least squares (WLS), weighted least squares mean-and-variance-adjusted (WLSMV), unweighted least squares mean-and-variance-adjusted (ULSMV), maximum likelihood (ML), robust maximum likelihood (MLR) and Bayesian estimation methods were compared in mixed item response type data via Monte Carlo simulation. The percentage of polytomous items,…

Descriptors: Factor Analysis, Computation, Least Squares Statistics, Maximum Likelihood Statistics

Different Methods of Adjusting for Form Difficulty under the Rasch Model: Impact on Consistency of Assessment Results. Research Report. ETS RR-19-08

Peer reviewed
PDF on ERIC

Download full text

Manna, Venessa F.; Gu, Lixiong – ETS Research Report Series, 2019

When using the Rasch model, equating with a nonequivalent groups anchor test design is commonly achieved by adjustment of new form item difficulty using an additive equating constant. Using simulated 5-year data, this report compares 4 approaches to calculating the equating constants and the subsequent impact on equating results. The 4 approaches…

Descriptors: Item Response Theory, Test Items, Test Construction, Sample Size

A Modified "a"-Stratified Method for Computerized Adaptive Testing. Research Report. ETS RR-19-10

Peer reviewed
PDF on ERIC

Download full text

Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019

Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items

A Fair Comparison of the Performance of Computerized Adaptive Testing and Multistage Adaptive Testing

Direct link

Wang, Keyin – ProQuest LLC, 2017

The comparison of item-level computerized adaptive testing (CAT) and multistage adaptive testing (MST) has been researched extensively (e.g., Kim & Plake, 1993; Luecht et al., 1996; Patsula, 1999; Jodoin, 2003; Hambleton & Xing, 2006; Keng, 2008; Zheng, 2012). Various CAT and MST designs have been investigated and compared under the same…

Descriptors: Comparative Analysis, Computer Assisted Testing, Adaptive Testing, Test Items

Item Parameter Drift in a Time-Varying Predictor

Peer reviewed

Direct link

Lee, HyeSun – Applied Measurement in Education, 2018

The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…

Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores

Previous Page | Next Page »

Pages: 1 | 2 | 3

Emons, Wilco H. M.	2
Gu, Lixiong	2
Lee, Yi-Hsuan	2
Sijtsma, Klaas	2
Wang, Wen-Chung	2
Zhang, Jinming	2
Andersson, Björn	1
Ayse Bilicioglu Gunes	1
Battista, Carmela	1
Bayram Bicak	1
Bejar, Isaac I.	1
Bergstrom, Betty A.	1
Burton, Richard F.	1
Campana, Kayla V.	1
Cao, Mengyang	1
Chalmers, Robert Philip	1
Chen, Cheng-Te	1
Cheng, Ying	1
Chernyshenko, Oleksandr S.	1
Cui, Zhongmin	1
Dawes, Jillian M.	1
Diao, Qi	1
Dogan, Nuri	1
Dorans, Neil J.	1
More ▼