NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 32 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Harold Doran; Testsuhiro Yamada; Ted Diaz; Emre Gonulates; Vanessa Culver – Journal of Educational Measurement, 2025
Computer adaptive testing (CAT) is an increasingly common mode of test administration offering improved test security, better measurement precision, and the potential for shorter testing experiences. This article presents a new item selection algorithm based on a generalized objective function to support multiple types of testing conditions and…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Algorithms
Peer reviewed Peer reviewed
Direct linkDirect link
Strachan, Tyler; Cho, Uk Hyun; Kim, Kyung Yong; Willse, John T.; Chen, Shyh-Huei; Ip, Edward H.; Ackerman, Terry A.; Weeks, Jonathan P. – Journal of Educational Measurement, 2021
In vertical scaling, results of tests from several different grade levels are placed on a common scale. Most vertical scaling methodologies rely heavily on the assumption that the construct being measured is unidimensional. In many testing situations, however, such an assumption could be problematic. For instance, the construct measured at one…
Descriptors: Item Response Theory, Scaling, Tests, Construct Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Bolt, Daniel M.; Deng, Sien; Lee, Sora – Journal of Educational Measurement, 2014
Functional form misfit is frequently a concern in item response theory (IRT), although the practical implications of misfit are often difficult to evaluate. In this article, we illustrate how seemingly negligible amounts of functional form misfit, when systematic, can be associated with significant distortions of the score metric in vertical…
Descriptors: Item Response Theory, Scaling, Goodness of Fit, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Shu, Lianghua; Schwarz, Richard D. – Journal of Educational Measurement, 2014
As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's a, Feldt-Raju, stratified a, and marginal reliability). Models with different underlying assumptions concerning test-part similarity are discussed. A detailed computational example is presented for the targeted…
Descriptors: Item Response Theory, Reliability, Models, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Keller, Lisa A.; Hambleton, Ronald K. – Journal of Educational Measurement, 2013
Due to recent research in equating methodologies indicating that some methods may be more susceptible to the accumulation of equating error over multiple administrations, the sustainability of several item response theory methods of equating over time was investigated. In particular, the paper is focused on two equating methodologies: fixed common…
Descriptors: Item Response Theory, Scaling, Test Format, Equated Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Briggs, Derek C. – Journal of Educational Measurement, 2013
A vertical score scale is needed to measure growth across multiple tests in terms of absolute changes in magnitude. Since the warrant for subsequent growth interpretations depends upon the assumption that the scale has interval properties, the validation of a vertical scale would seem to require methods for distinguishing interval scales from…
Descriptors: Measurement, Scaling, Validity, Test Interpretation
Peer reviewed Peer reviewed
Klockars, Alan J.; Yamagishi, Midori – Journal of Educational Measurement, 1988
The influence of the verbal label and its scalar position in defining the meaning of the labeled position on a rating scale was studied in three forms of the scale with the labels FAIR and GOOD systematically moved. When labels and position differed in meaning, college students rated the labeled position as a compromise between the two. (SLD)
Descriptors: College Students, Rating Scales, Scaling
Peer reviewed Peer reviewed
Burket, George R.; Yen, Wendy M. – Journal of Educational Measurement, 1997
Using simulated data modeled after real tests, a Thurstone method (L. Thurstone, 1925 and later) and three-parameter item response theory were compared for vertical scaling. Neither procedure produced artificial scale shrinkage, and both produced modest scale expansion for one simulated condition. (SLD)
Descriptors: Comparative Analysis, Item Response Theory, Scaling, Simulation
Peer reviewed Peer reviewed
Quereshi, M. Y. – Journal of Educational Measurement, 1971
The study investigated the degree to which errors of scaling and selection depress the linear relationship and whether the reduction in the magnitude of r differs with the type of error. Results indicated that various scaling errors caused considerable discrepancy in the measurement of underlying relations, but the effect of non-normality was…
Descriptors: Correlation, Error Patterns, Factor Analysis, Scaling
Peer reviewed Peer reviewed
Spray, Judith; Huang, Chi-Yu – Journal of Educational Measurement, 2000
Presents a method for combining multiple scale responses from job or task surveys based on a hierarchical rating scheme. Provides the rationale for placing the resulting ordinal information on an interval scale of measurement using the Rasch model. Also suggests a method for linking two or more surveys using the Rasch model and the BIGSTEPS…
Descriptors: Item Response Theory, Job Analysis, Responses, Scaling
Peer reviewed Peer reviewed
Shaw, Dale G; And Others – Journal of Educational Measurement, 1987
Information loss occurs when continuous data are grouped in discrete intervals. After calculating the squared correlation coefficients between continuous data and corresponding grouped data for four population distributions, the effects of population distribution, number of intervals, and interval width on information loss and recovery were…
Descriptors: Intervals, Rating Scales, Sampling, Scaling
Peer reviewed Peer reviewed
Kolen, Michael J. – Journal of Educational Measurement, 1988
Linear and nonlinear methods for incorporating score precision information when the score scale is established for educational tests are compared. Examples illustrate the methods, which discourage overinterpretation of small score differences and enhance score interpretability by equalizing error variance along the score scale. Measurement error…
Descriptors: Error of Measurement, Measures (Individuals), Scaling, Scoring
Peer reviewed Peer reviewed
Camilli, Gregory – Journal of Educational Measurement, 1999
Yen and Burket suggested that shrinkage in vertical equating cannot be understood apart from multidimensionality. Reviews research on reliability, multidimensionality, and scale shrinkage, and explores issues of practical importance to educators. (SLD)
Descriptors: Equated Scores, Error of Measurement, Item Response Theory, Reliability
Peer reviewed Peer reviewed
Burket, George R. – Journal of Educational Measurement, 1987
This response to the Baglin paper (1986) points out the fallacy in inferring that inappropriate scaling procedures cause apparent discrepancies between medians and means and between means calculated using different units. (LMO)
Descriptors: Norm Referenced Tests, Scaling, Scoring, Statistical Distributions
Peer reviewed Peer reviewed
Schulz, E. Matthew; Nicewander, W. Alan – Journal of Educational Measurement, 1997
The arbitrary nature of growth trends in cognitive variables is illustrated. Two metrics, grade equivalent and item-response theory representations, both of which preserve the order of performance levels in test data, produced different pictures of cognitive growth, and differences were seen to arise from differences in the scaling models. (SLD)
Descriptors: Cognitive Development, Comparative Analysis, Grade Equivalent Scores, Item Response Theory
Previous Page | Next Page ยป
Pages: 1  |  2  |  3