ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	18

Descriptor

Sample Size	26
Simulation	22
Item Response Theory	13
Test Items	12
Models	9
Error of Measurement	7
Comparative Analysis	6
Correlation	5
Evaluation Methods	5
Item Analysis	5
Statistical Analysis	5
Computation	4
Computer Simulation	4
Goodness of Fit	4
Scoring	4
Test Bias	4
Bayesian Statistics	3
Equated Scores	3
Identification	3
Item Bias	3
Nonparametric Statistics	3
Test Length	3
Ability	2
Adaptive Testing	2
Classification	2
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	26
Reports - Research	17
Reports - Evaluative	9
Speeches/Meeting Papers	3

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items

Peer reviewed

Direct link

Yuan, Lu; Huang, Yingshi; Li, Shuhang; Chen, Ping – Journal of Educational Measurement, 2023

Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more…

Descriptors: Computer Assisted Testing, Adaptive Testing, Computation, Test Items

Standard Errors of IRT Parameter Scale Transformation Coefficients: Comparison of Bootstrap Method, Delta Method, and Multiple Imputation Method

Peer reviewed

Direct link

Zhang, Zhonghua; Zhao, Mingren – Journal of Educational Measurement, 2019

The present study evaluated the multiple imputation method, a procedure that is similar to the one suggested by Li and Lissitz (2004), and compared the performance of this method with that of the bootstrap method and the delta method in obtaining the standard errors for the estimates of the parameter scale transformation coefficients in item…

Descriptors: Item Response Theory, Error Patterns, Item Analysis, Simulation

Detecting Differential Item Discrimination (DID) and the Consequences of Ignoring DID in Multilevel Item Response Models

Peer reviewed

Direct link

Lee, Woo-yeol; Cho, Sun-Joo – Journal of Educational Measurement, 2017

Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…

Descriptors: Test Items, Item Response Theory, Item Analysis, Simulation

Lord's Wald Test for Detecting Dif in Multidimensional Irt Models: A Comparison of Two Estimation Approaches

Peer reviewed

Direct link

Lee, Soo; Suh, Youngsuk – Journal of Educational Measurement, 2018

Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…

Descriptors: Item Response Theory, Sample Size, Models, Error of Measurement

Asymptotic Standard Errors of Observed-Score Equating with Polytomous IRT Models

Peer reviewed

Direct link

Andersson, Björn – Journal of Educational Measurement, 2016

In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…

Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests

Effect Size Measures for Differential Item Functioning in a Multidimensional IRT Model

Peer reviewed

Direct link

Suh, Youngsuk – Journal of Educational Measurement, 2016

This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…

Descriptors: Effect Size, Goodness of Fit, Statistical Analysis, Statistical Significance

Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning

Peer reviewed

Direct link

Li, Zhushan – Journal of Educational Measurement, 2014

Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…

Descriptors: Test Bias, Sample Size, Statistical Analysis, Regression (Statistics)

An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models

Peer reviewed

Direct link

Liang, Tie; Wells, Craig S.; Hambleton, Ronald K. – Journal of Educational Measurement, 2014

As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…

Descriptors: Item Response Theory, Measurement Techniques, Nonparametric Statistics, Models

A Comparison of Different Psychometric Approaches to Modeling Testlet Structures: An Example with C-Tests

Peer reviewed

Direct link

Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan – Journal of Educational Measurement, 2014

C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…

Descriptors: Comparative Analysis, Psychometrics, Cloze Procedure, Language Tests

Evaluating the Wald Test for Item-Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

Peer reviewed

Direct link

de la Torre, Jimmy; Lee, Young-Sun – Journal of Educational Measurement, 2013

This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…

Descriptors: Statistical Analysis, Test Items, Goodness of Fit, Error of Measurement

A Comparison of Item Calibration Procedures in the Presence of Test Speededness

Peer reviewed

Direct link

Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012

In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…

Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory

Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model

Peer reviewed

Direct link

de la Torre, Jimmy; Hong, Yuan; Deng, Weiling – Journal of Educational Measurement, 2010

To better understand the statistical properties of the deterministic inputs, noisy "and" gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the…

Descriptors: Classification, Computation, Models, Simulation

Evaluation of Two New Smoothing Methods in Equating: The Cubic B-Spline Presmoothing Method and the Direct Presmoothing Method

Peer reviewed

Direct link

Cui, Zhongmin; Kolen, Michael J. – Journal of Educational Measurement, 2009

This article considers two new smoothing methods in equipercentile equating, the cubic B-spline presmoothing method and the direct presmoothing method. Using a simulation study, these two methods are compared with established methods, the beta-4 method, the polynomial loglinear method, and the cubic spline postsmoothing method, under three sample…

Descriptors: Equated Scores, Methods, Sample Size, Test Content

A Comparison of Item Fit Statistics for Mixed IRT Models

Peer reviewed

Direct link

Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010

In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…

Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation

Using Log-Linear Smoothing to Improve Small-Sample DIF Estimation

Peer reviewed

Direct link

Puhan, Gautam; Moses, Timothy P.; Yu, Lei; Dorans, Neil J. – Journal of Educational Measurement, 2009

This study examined the extent to which log-linear smoothing could improve the accuracy of differential item functioning (DIF) estimates in small samples of examinees. Examinee responses from a certification test were analyzed using White examinees in the reference group and African American examinees in the focal group. Using a simulation…

Descriptors: Test Items, Reference Groups, Testing Programs, Raw Scores

Previous Page | Next Page »

Pages: 1 | 2

Suh, Youngsuk	3
Cho, Sun-Joo	2
Hambleton, Ronald K.	2
Roussos, Louis A.	2
de la Torre, Jimmy	2
Andersson, Björn	1
Chen, Ping	1
Chen, Shu-Ying	1
Chon, Kyong Hee	1
Clauser, Brian	1
Clauser, Brian E.	1
Cui, Zhongmin	1
Deng, Weiling	1
Dorans, Neil J.	1
Dunbar, Stephen B.	1
Gierl, Mark J.	1
Hong, Yuan	1
Huang, Yingshi	1
Kolen, Michael J.	1
Lee, Soo	1
Lee, Won-Chan	1
Lee, Woo-yeol	1
Lee, Young-Sun	1
Lei, Pui-Wa	1
More ▼