Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 18 |
Descriptor
Sample Size | 26 |
Simulation | 22 |
Item Response Theory | 13 |
Test Items | 12 |
Models | 9 |
Error of Measurement | 7 |
Comparative Analysis | 6 |
Correlation | 5 |
Evaluation Methods | 5 |
Item Analysis | 5 |
Statistical Analysis | 5 |
More ▼ |
Source
Journal of Educational… | 26 |
Author
Suh, Youngsuk | 3 |
Cho, Sun-Joo | 2 |
Hambleton, Ronald K. | 2 |
Roussos, Louis A. | 2 |
de la Torre, Jimmy | 2 |
Andersson, Björn | 1 |
Chen, Ping | 1 |
Chen, Shu-Ying | 1 |
Chon, Kyong Hee | 1 |
Clauser, Brian | 1 |
Clauser, Brian E. | 1 |
More ▼ |
Publication Type
Journal Articles | 26 |
Reports - Research | 17 |
Reports - Evaluative | 9 |
Speeches/Meeting Papers | 3 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Iowa Tests of Basic Skills | 1 |
What Works Clearinghouse Rating
Yuan, Lu; Huang, Yingshi; Li, Shuhang; Chen, Ping – Journal of Educational Measurement, 2023
Online calibration is a key technology for item calibration in computerized adaptive testing (CAT) and has been widely used in various forms of CAT, including unidimensional CAT, multidimensional CAT (MCAT), CAT with polytomously scored items, and cognitive diagnostic CAT. However, as multidimensional and polytomous assessment data become more…
Descriptors: Computer Assisted Testing, Adaptive Testing, Computation, Test Items
Zhang, Zhonghua; Zhao, Mingren – Journal of Educational Measurement, 2019
The present study evaluated the multiple imputation method, a procedure that is similar to the one suggested by Li and Lissitz (2004), and compared the performance of this method with that of the bootstrap method and the delta method in obtaining the standard errors for the estimates of the parameter scale transformation coefficients in item…
Descriptors: Item Response Theory, Error Patterns, Item Analysis, Simulation
Lee, Woo-yeol; Cho, Sun-Joo – Journal of Educational Measurement, 2017
Cross-level invariance in a multilevel item response model can be investigated by testing whether the within-level item discriminations are equal to the between-level item discriminations. Testing the cross-level invariance assumption is important to understand constructs in multilevel data. However, in most multilevel item response model…
Descriptors: Test Items, Item Response Theory, Item Analysis, Simulation
Lee, Soo; Suh, Youngsuk – Journal of Educational Measurement, 2018
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
Descriptors: Item Response Theory, Sample Size, Models, Error of Measurement
Andersson, Björn – Journal of Educational Measurement, 2016
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests
Suh, Youngsuk – Journal of Educational Measurement, 2016
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Descriptors: Effect Size, Goodness of Fit, Statistical Analysis, Statistical Significance
Li, Zhushan – Journal of Educational Measurement, 2014
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Descriptors: Test Bias, Sample Size, Statistical Analysis, Regression (Statistics)
Liang, Tie; Wells, Craig S.; Hambleton, Ronald K. – Journal of Educational Measurement, 2014
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Descriptors: Item Response Theory, Measurement Techniques, Nonparametric Statistics, Models
Schroeders, Ulrich; Robitzsch, Alexander; Schipolowski, Stefan – Journal of Educational Measurement, 2014
C-tests are a specific variant of cloze tests that are considered time-efficient, valid indicators of general language proficiency. They are commonly analyzed with models of item response theory assuming local item independence. In this article we estimated local interdependencies for 12 C-tests and compared the changes in item difficulties,…
Descriptors: Comparative Analysis, Psychometrics, Cloze Procedure, Language Tests
de la Torre, Jimmy; Lee, Young-Sun – Journal of Educational Measurement, 2013
This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…
Descriptors: Statistical Analysis, Test Items, Goodness of Fit, Error of Measurement
Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…
Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory
de la Torre, Jimmy; Hong, Yuan; Deng, Weiling – Journal of Educational Measurement, 2010
To better understand the statistical properties of the deterministic inputs, noisy "and" gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the…
Descriptors: Classification, Computation, Models, Simulation
Cui, Zhongmin; Kolen, Michael J. – Journal of Educational Measurement, 2009
This article considers two new smoothing methods in equipercentile equating, the cubic B-spline presmoothing method and the direct presmoothing method. Using a simulation study, these two methods are compared with established methods, the beta-4 method, the polynomial loglinear method, and the cubic spline postsmoothing method, under three sample…
Descriptors: Equated Scores, Methods, Sample Size, Test Content
Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010
In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…
Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation
Puhan, Gautam; Moses, Timothy P.; Yu, Lei; Dorans, Neil J. – Journal of Educational Measurement, 2009
This study examined the extent to which log-linear smoothing could improve the accuracy of differential item functioning (DIF) estimates in small samples of examinees. Examinee responses from a certification test were analyzed using White examinees in the reference group and African American examinees in the focal group. Using a simulation…
Descriptors: Test Items, Reference Groups, Testing Programs, Raw Scores
Previous Page | Next Page »
Pages: 1 | 2