ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	12

Descriptor

Computation	12
Difficulty Level	12
Test Items	11
Item Response Theory	10
Comparative Analysis	5
Bayesian Statistics	4
Equated Scores	3
Maximum Likelihood Statistics	3
Monte Carlo Methods	3
Scaling	3
Test Bias	3
Error of Measurement	2
Foreign Countries	2
Models	2
Scores	2
Statistical Analysis	2
Statistical Bias	2
Statistical Distributions	2
Ability	1
Accuracy	1
Achievement Gains	1
Achievement Tests	1
Adaptive Testing	1
Aggression	1
Aptitude Tests	1
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	12
Reports - Research	11
Reports - Evaluative	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Germany	1
Saudi Arabia	1

Laws, Policies, & Programs

Assessments and Surveys

General Aptitude Test Battery

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Unidimensional IRT Item Parameter Estimates across Equivalent Test Forms with Confounding Specifications within Dimensions

Peer reviewed

Direct link

Matlock, Ki Lynn; Turner, Ronna – Educational and Psychological Measurement, 2016

When constructing multiple test forms, the number of items and the total test difficulty are often equivalent. Not all test developers match the number of items and/or average item difficulty within subcontent areas. In this simulation study, six test forms were constructed having an equal number of items and average item difficulty overall.…

Descriptors: Item Response Theory, Computation, Test Items, Difficulty Level

An Approach to Scoring and Equating Tests with Binary Items: Piloting With Large-Scale Assessments

Peer reviewed

Direct link

Dimitrov, Dimiter M. – Educational and Psychological Measurement, 2016

This article describes an approach to test scoring, referred to as "delta scoring" (D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the…

Descriptors: Scoring, Equated Scores, Test Items, Measurement

Multidimensional Classification of Examinees Using the Mixture Random Weights Linear Logistic Test Model

Peer reviewed

Direct link

Choi, In-Hee; Wilson, Mark – Educational and Psychological Measurement, 2015

An essential feature of the linear logistic test model (LLTM) is that item difficulties are explained using item design properties. By taking advantage of this explanatory aspect of the LLTM, in a mixture extension of the LLTM, the meaning of latent classes is specified by how item properties affect item difficulties within each class. To improve…

Descriptors: Classification, Test Items, Difficulty Level, Statistical Analysis

Rasch Mixture Models for DIF Detection: A Comparison of Old and New Score Specifications

Peer reviewed

Direct link

Frick, Hannah; Strobl, Carolin; Zeileis, Achim – Educational and Psychological Measurement, 2015

Rasch mixture models can be a useful tool when checking the assumption of measurement invariance for a single Rasch model. They provide advantages compared to manifest differential item functioning (DIF) tests when the DIF groups are only weakly correlated with the manifest covariates available. Unlike in single Rasch models, estimation of Rasch…

Descriptors: Item Response Theory, Test Bias, Comparative Analysis, Scores

A Comparison of Item Parameter Standard Error Estimation Procedures for Unidimensional and Multidimensional Item Response Theory Modeling

Peer reviewed

Direct link

Paek, Insu; Cai, Li – Educational and Psychological Measurement, 2014

The present study was motivated by the recognition that standard errors (SEs) of item response theory (IRT) model parameters are often of immediate interest to practitioners and that there is currently a lack of comparative research on different SE (or error variance-covariance matrix) estimation procedures. The present study investigated item…

Descriptors: Item Response Theory, Comparative Analysis, Error of Measurement, Computation

Rasch Model Parameter Estimation in the Presence of a Nonnormal Latent Trait Using a Nonparametric Bayesian Approach

Peer reviewed

Direct link

Finch, Holmes; Edwards, Julianne M. – Educational and Psychological Measurement, 2016

Standard approaches for estimating item response theory (IRT) model parameters generally work under the assumption that the latent trait being measured by a set of items follows the normal distribution. Estimation of IRT parameters in the presence of nonnormal latent traits has been shown to generate biased person and item parameter estimates. A…

Descriptors: Item Response Theory, Computation, Nonparametric Statistics, Bayesian Statistics

Effects of Item Parameter Drift on Vertical Scaling with the Nonequivalent Groups with Anchor Test (NEAT) Design

Peer reviewed

Direct link

Ye, Meng; Xin, Tao – Educational and Psychological Measurement, 2014

The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…

Descriptors: Scaling, Test Items, Equated Scores, Achievement Gains

Item Pool Design for an Operational Variable-Length Computerized Adaptive Test

Peer reviewed

Direct link

He, Wei; Reckase, Mark D. – Educational and Psychological Measurement, 2014

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution…

Descriptors: Item Banks, Test Length, Computer Assisted Testing, Adaptive Testing

l[subscript z] Person-Fit Index to Identify Misfit Students with Achievement Test Data

Peer reviewed

Direct link

Seo, Dong Gi; Weiss, David J. – Educational and Psychological Measurement, 2013

The usefulness of the l[subscript z] person-fit index was investigated with achievement test data from 20 exams given to more than 3,200 college students. Results for three methods of estimating ? showed that the distributions of l[subscript z] were not consistent with its theoretical distribution, resulting in general overfit to the item response…

Descriptors: Achievement Tests, College Students, Goodness of Fit, Item Response Theory

An Application of Explanatory Item Response Modeling for Model-Based Proficiency Scaling

Peer reviewed

Direct link

Hartig, Johannes; Frey, Andreas; Nold, Gunter; Klieme, Eckhard – Educational and Psychological Measurement, 2012

The article compares three different methods to estimate effects of task characteristics and to use these estimates for model-based proficiency scaling: prediction of item difficulties from the Rasch model, the linear logistic test model (LLTM), and an LLTM including random item effects (LLTM+e). The methods are applied to empirical data from a…

Descriptors: Item Response Theory, Models, Methods, Computation

Expected Equating Error Resulting from Incorrect Handling of Item Parameter Drift among the Common Items

Peer reviewed

Direct link

Miller, G. Edward; Fitzpatrick, Steven J. – Educational and Psychological Measurement, 2009

Incorrect handling of item parameter drift during the equating process can result in equating error. If the item parameter drift is due to construct-irrelevant factors, then inclusion of these items in the estimation of the equating constants can be expected to result in equating error. On the other hand, if the item parameter drift is related to…

Descriptors: Equated Scores, Computation, Item Response Theory, Test Items

Generalizability of Scaling Gradients on Direct Behavior Ratings

Peer reviewed

Direct link

Chafouleas, Sandra M.; Christ, Theodore J.; Riley-Tillman, T. Chris – Educational and Psychological Measurement, 2009

Generalizability theory is used to examine the impact of scaling gradients on a single-item Direct Behavior Rating (DBR). A DBR refers to a type of rating scale used to efficiently record target behavior(s) following an observation occasion. Variance components associated with scale gradients are estimated using a random effects design for persons…

Descriptors: Generalizability Theory, Undergraduate Students, Scaling, Rating Scales

Cai, Li	1
Chafouleas, Sandra M.	1
Choi, In-Hee	1
Christ, Theodore J.	1
Dimitrov, Dimiter M.	1
Edwards, Julianne M.	1
Finch, Holmes	1
Fitzpatrick, Steven J.	1
Frey, Andreas	1
Frick, Hannah	1
Hartig, Johannes	1
He, Wei	1
Klieme, Eckhard	1
Matlock, Ki Lynn	1
Miller, G. Edward	1
Nold, Gunter	1
Paek, Insu	1
Reckase, Mark D.	1
Riley-Tillman, T. Chris	1
Seo, Dong Gi	1
Strobl, Carolin	1
Turner, Ronna	1
Weiss, David J.	1
Wilson, Mark	1
Xin, Tao	1
More ▼