NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 12 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Matlock, Ki Lynn; Turner, Ronna – Educational and Psychological Measurement, 2016
When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…
Descriptors: Item Response Theory, Computation, Test Items, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Dimitrov, Dimiter M. – Educational and Psychological Measurement, 2016
This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…
Descriptors: Scoring, Equated Scores, Test Items, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Choi, In-Hee; Wilson, Mark – Educational and Psychological Measurement, 2015
An essential feature of the linear logistic test model (LLTM) is that item difficulties are explained using item design properties. By taking advantage of this explanatory aspect of the LLTM, in a mixture extension of the LLTM, the meaning of latent classes is specified by how item properties affect item difficulties within each class. To improve…
Descriptors: Classification, Test Items, Difficulty Level, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Frick, Hannah; Strobl, Carolin; Zeileis, Achim – Educational and Psychological Measurement, 2015
Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch…
Descriptors: Item Response Theory, Test Bias, Comparative Analysis, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Paek, Insu; Cai, Li – Educational and Psychological Measurement, 2014
The present study was motivated by the recognition that standard errors (SEs) of item response theory (IRT) model parameters are often of immediate interest to practitioners and that there is currently a lack of comparative research on different SE (or error variance-covariance matrix) estimation procedures. The present study investigated item…
Descriptors: Item Response Theory, Comparative Analysis, Error of Measurement, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Finch, Holmes; Edwards, Julianne M. – Educational and Psychological Measurement, 2016
Standard approaches for estimating item response theory (IRT) model parameters generally work under the assumption that the latent trait being measured by a set of items follows the normal distribution. Estimation of IRT parameters in the presence of nonnormal latent traits has been shown to generate biased person and item parameter estimates. A…
Descriptors: Item Response Theory, Computation, Nonparametric Statistics, Bayesian Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Ye, Meng; Xin, Tao – Educational and Psychological Measurement, 2014
The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…
Descriptors: Scaling, Test Items, Equated Scores, Achievement Gains
Peer reviewed Peer reviewed
Direct linkDirect link
He, Wei; Reckase, Mark D. – Educational and Psychological Measurement, 2014
For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…
Descriptors: Item Banks, Test Length, Computer Assisted Testing, Adaptive Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Seo, Dong Gi; Weiss, David J. – Educational and Psychological Measurement, 2013
The usefulness of the l[subscript z] person-fit index was investigated with achievement test data from 20 exams given to more than 3,200 college students. Results for three methods of estimating ? showed that the distributions of l[subscript z] were not consistent with its theoretical distribution, resulting in general overfit to the item response…
Descriptors: Achievement Tests, College Students, Goodness of Fit, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Hartig, Johannes; Frey, Andreas; Nold, Gunter; Klieme, Eckhard – Educational and Psychological Measurement, 2012
The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM+e). The methods are applied to empirical data from a…
Descriptors: Item Response Theory, Models, Methods, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Miller, G. Edward; Fitzpatrick, Steven J. – Educational and Psychological Measurement, 2009
Incorrect handling of item parameter drift during the equating process can result in equating error. If the item parameter drift is due to construct-irrelevant factors, then inclusion of these items in the estimation of the equating constants can be expected to result in equating error. On the other hand, if the item parameter drift is related to…
Descriptors: Equated Scores, Computation, Item Response Theory, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Chafouleas, Sandra M.; Christ, Theodore J.; Riley-Tillman, T. Chris – Educational and Psychological Measurement, 2009
Generalizability theory is used to examine the impact of scaling gradients on a single-item Direct Behavior Rating (DBR). A DBR refers to a type of rating scale used to efficiently record target behavior(s) following an observation occasion. Variance components associated with scale gradients are estimated using a random effects design for persons…
Descriptors: Generalizability Theory, Undergraduate Students, Scaling, Rating Scales