NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)2
Since 2006 (last 20 years)7
Audience
Location
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 1 to 15 of 19 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2020
Smoothing is designed to yield smoother equating results that can reduce random equating error without introducing very much systematic error. The main objective of this study is to propose a new statistic and to compare its performance to the performance of the Akaike information criterion and likelihood ratio chi-square difference statistics in…
Descriptors: Equated Scores, Statistical Analysis, Error of Measurement, Criteria
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2018
Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed-format pseudo tests under the…
Descriptors: Comparative Analysis, Accuracy, Models, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan – International Journal of Testing, 2012
Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…
Descriptors: Mathematics Tests, Achievement Tests, College Entrance Examinations, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Kolen, Michael J.; Lee, Won-Chan – Educational Measurement: Issues and Practice, 2011
This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…
Descriptors: Test Use, Test Format, Error of Measurement, Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Tong, Ye; Kolen, Michael J. – Educational Measurement: Issues and Practice, 2010
"Scaling" is the process of constructing a score scale that associates numbers or other ordered indicators with the performance of examinees. Scaling typically is conducted to aid users in interpreting test results. This module describes different types of raw scores and scale scores, illustrates how to incorporate various sources of…
Descriptors: Test Results, Scaling, Measures (Individuals), Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008
This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
Descriptors: Test Length, Test Content, Simulation, Computation
Peer reviewed Peer reviewed
Kolen, Michael J. – Journal of Educational Measurement, 1988
Linear and nonlinear methods for incorporating score precision information when the score scale is established for educational tests are compared. Examples illustrate the methods, which discourage overinterpretation of small score differences and enhance score interpretability by equalizing error variance along the score scale. Measurement error…
Descriptors: Error of Measurement, Measures (Individuals), Scaling, Scoring
Peer reviewed Peer reviewed
Harris, Deborah J.; Kolen, Michael J. – Educational and Psychological Measurement, 1988
Three methods of estimating point-biserial correlation coefficient standard errors were compared: (1) assuming normality; (2) not assuming normality; and (3) bootstrapping. Although errors estimated assuming normality were biased, such estimates were less variable and easier to compute, suggesting that this might be the method of choice in some…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Analysis, Statistical Analysis
Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – 2002
This paper reviews various procedures for constructing an interval for an individual's true score given the assumption that errors of measurement are distributed as binomial. This paper also presents two general interval estimation procedures (i.e., normal approximation and endpoints conversion methods) for an individual's true scale score;…
Descriptors: Bayesian Statistics, Error of Measurement, Estimation (Mathematics), Scaling
Peer reviewed Peer reviewed
Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Journal of Educational Measurement, 2000
Describes four procedures previously developed for estimating conditional standard errors of measurement for scale scores and compares them in a simulation study. All four procedures appear viable. Recommends that test users select a procedure based on various factors such as the type of scale score of concern, test characteristics, assumptions…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Scaling
Peer reviewed Peer reviewed
Kolen, Michael J.; Zeng, Lingjia; Hanson, Bradley A. – Journal of Educational Measurement, 1996
Presents an Item Response Theory (IRT) method for estimating standard errors of measurement of scale scores for the situation in which scale scores are nonlinear transformations of number-correct scores. Also describes procedures for estimating the average conditional standard error of measurement for scale scores and the reliability of scale…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Reliability
Peer reviewed Peer reviewed
Tsai, Tsung-Hsun; Hanson, Bradley A.; Kolen, Michael J.; Forsyth, Robert A. – Applied Measurement in Education, 2001
Compared bootstrap standard errors of five item response theory (IRT) equating methods for the common-item nonequivalent groups design using test results for 1,493 and 1,793 examinees taking a professional certification test. Results suggest that standard errors of equating less than 0.1 standard deviation units could be obtained with any of the…
Descriptors: Equated Scores, Error of Measurement, Item Response Theory, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Journal of Educational and Behavioral Statistics, 2006
Assuming errors of measurement are distributed binomially, this article reviews various procedures for constructing an interval for an individual's true number-correct score; presents two general interval estimation procedures for an individual's true scale score (i.e., normal approximation and endpoints conversion methods); compares various…
Descriptors: Probability, Intervals, Guidelines, Computer Simulation
Colton, Dean A.; Gao, Xiaohong; Harris, Deborah J.; Kolen, Michael J.; Martinovich-Barhite, Dara; Wang, Tianyou; Welch, Catherine J. – 1997
This collection consists of six papers, each dealing with some aspects of reliability and performance testing. Each paper has an abstract, and each contains its own references. Papers include: (1) "Using Reliabilities To Make Decisions" (Deborah J. Harris); (2) "Conditional Standard Errors, Reliability, and Decision Consistency…
Descriptors: Decision Making, Error of Measurement, Item Response Theory, Performance Based Assessment
Kolen, Michael J. – 1984
Large sample standard errors for the Tucker method of linear equating under the common item nonrandom groups design are derived under normality assumptions as well as under less restrictive assumptions. Standard errors of Tucker equating are estimated using the bootstrap method described by Efron. The results from different methods are compared…
Descriptors: Certification, Comparative Analysis, Equated Scores, Error of Measurement
Previous Page | Next Page ยป
Pages: 1  |  2