Limited-Information Goodness-of-Fit Testing of Diagnostic Classification Item Response Models.

Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen

Notes FAQ Contact Us

Back to results

Peer reviewed
PDF on ERIC

Download full text

ERIC Number: ED600826

Record Type: Non-Journal

Publication Date: 2016-May-10

Pages: 47

Abstractor: As Provided

ISBN: N/A

ISSN: EISSN-

EISSN: N/A

Available Date: N/A

Limited-Information Goodness-of-Fit Testing of Diagnostic Classification Item Response Models

Hansen, Mark; Cai, Li; Monroe, Scott; Li, Zhen

Grantee Submission

Despite the growing popularity of diagnostic classification models (e.g., Rupp, Templin, & Henson, 2010) in educational and psychological measurement, methods for testing their absolute goodness-of-fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full-information test statistics such as Pearson's X[superscript 2] and the likelihood ratio statistic G[superscript 2] suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited-information fit statistics such as Maydeu-Olivares and Joe's (2006) M[subscript 2] have been found to be quite useful in testing the overall goodness-of-fit fit of item response theory (IRT) models. In this study, we applied Maydeu-Olivares and Joe's (2006) M[subscript 2] statistic to diagnostic classification models. Through a series of simulation studies, we found that M[subscript 2] is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q-matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodeled testlet effects. On the other hand, M [subscript 2] was largely insensitive to misspecifications in the distribution of higher-order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness-of-fit using M[subscript 2], we investigated the utility of the Chen and Thissen (1997) local dependence statistic X[superscript 2 over subscript LD] for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favor of overall statements. The X [superscript 2 over subscript LD] statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M[subscript 2] and X[superscript 2 over subscript LD] statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study (TIMSS), drawing upon analyses previously conducted by Lee, Park, and Taylan (2011). [This paper was published in "British Journal of Mathematical and Statistical Psychology" v69 p225-252 2016.]

Descriptors: Goodness of Fit, Item Response Theory, Classification, Maximum Likelihood Statistics, Models, International Programs, Testing Programs, Mathematics Tests, Achievement Tests, International Assessment, Elementary Secondary Education, Foreign Countries, Mathematics Achievement, Science Achievement, Science Tests

Publication Type: Reports - Research

Education Level: Elementary Secondary Education

Audience: N/A

Language: English

Sponsor: National Center for Education Research (ED); National Science Foundation (NSF)

Authoring Institution: N/A

Identifiers - Assessments and Surveys: Trends in International Mathematics and Science Study

IES Funded: Yes

Grant or Contract Numbers: R305D140046; SES1260746

Author Affiliations: N/A