Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 7 |
Descriptor
Error of Measurement | 12 |
Scaling | 12 |
Simulation | 12 |
Test Items | 8 |
Item Response Theory | 7 |
Scores | 4 |
Computer Assisted Testing | 3 |
Estimation (Mathematics) | 3 |
Statistical Bias | 3 |
Adaptive Testing | 2 |
Bayesian Statistics | 2 |
More ▼ |
Source
Journal of Educational… | 2 |
Applied Measurement in… | 1 |
Applied Psychological… | 1 |
EURASIA Journal of… | 1 |
Measurement and Evaluation in… | 1 |
Multivariate Behavioral… | 1 |
ProQuest LLC | 1 |
Sociological Methods &… | 1 |
Author
Lee, Won-Chan | 3 |
Ban, Jae-Chun | 2 |
Brennan, Robert L. | 2 |
Hanson, Bradley A. | 2 |
Harris, Deborah J. | 2 |
Kolen, Michael J. | 2 |
Yi, Qing | 2 |
Bentler, Peter M. | 1 |
Benítez, Isabel | 1 |
Daud, Muslem | 1 |
Griffeth, Rodger W. | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Research | 7 |
Reports - Descriptive | 2 |
Reports - Evaluative | 2 |
Speeches/Meeting Papers | 2 |
Dissertations/Theses -… | 1 |
Education Level
Grade 9 | 1 |
High Schools | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Location
Indonesia | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023
This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…
Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation
Willse, John T. – Measurement and Evaluation in Counseling and Development, 2017
This article provides a brief introduction to the Rasch model. Motivation for using Rasch analyses is provided. Important Rasch model concepts and key aspects of result interpretation are introduced, with major points reinforced using a simulation demonstration. Concrete guidelines are provided regarding sample size and the evaluation of items.
Descriptors: Item Response Theory, Test Results, Test Interpretation, Simulation
Hidalgo, Ma Dolores; Benítez, Isabel; Padilla, Jose-Luis; Gómez-Benito, Juana – Sociological Methods & Research, 2017
The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A…
Descriptors: Test Items, Test Bias, Item Response Theory, Surveys
Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W. – Applied Psychological Measurement, 2012
Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…
Descriptors: Item Response Theory, Multiple Regression Analysis, Error of Measurement, Models
Lin, Johnny; Bentler, Peter M. – Multivariate Behavioral Research, 2012
Goodness-of-fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square, but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's (1984) asymptotically distribution-free method and Satorra Bentler's…
Descriptors: Factor Analysis, Statistical Analysis, Scaling, Sample Size
Topczewski, Anna Marie – ProQuest LLC, 2013
Developmental score scales represent the performance of students along a continuum, where as students learn more they move higher along that continuum. Unidimensional item response theory (UIRT) vertical scaling has become a commonly used method to create developmental score scales. Research has shown that UIRT vertical scaling methods can be…
Descriptors: Item Response Theory, Scaling, Scores, Student Development
Kuo, Bor-Chen; Daud, Muslem; Yang, Chih-Wei – EURASIA Journal of Mathematics, Science & Technology Education, 2015
This paper describes a curriculum-based multidimensional computerized adaptive test that was developed for Indonesia junior high school Biology. In adherence to the Indonesian curriculum of different Biology dimensions, 300 items was constructed, and then tested to 2238 students. A multidimensional random coefficients multinomial logit model was…
Descriptors: Secondary School Science, Science Education, Science Tests, Computer Assisted Testing
Interval Estimation for True Scores under Various Scale Transformations. ACT Research Report Series.
Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – 2002
This paper reviews various procedures for constructing an interval for an individual's true score given the assumption that errors of measurement are distributed as binomial. This paper also presents two general interval estimation procedures (i.e., normal approximation and endpoints conversion methods) for an individual's true scale score;…
Descriptors: Bayesian Statistics, Error of Measurement, Estimation (Mathematics), Scaling

Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Journal of Educational Measurement, 2000
Describes four procedures previously developed for estimating conditional standard errors of measurement for scale scores and compares them in a simulation study. All four procedures appear viable. Recommends that test users select a procedure based on various factors such as the type of scale score of concern, test characteristics, assumptions…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Scaling
Zwick, Rebecca; Thayer, Dorothy T. – 1994
Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in test items that are scored on an ordinal scale. Mantel's extension of the Mantel-Haenszel test is a possible hypothesis-testing method for this purpose. The development of descriptive statistics for…
Descriptors: Error of Measurement, Evaluation Methods, Hypothesis Testing, Item Bias

Ban, Jae-Chun; Hanson, Bradley A.; Yi, Qing; Harris, Deborah J. – Journal of Educational Measurement, 2002
Compared three online pretest calibration scaling methods through simulation: (1) marginal maximum likelihood with one expectation maximization (EM) cycle (OEM) method; (2) marginal maximum likelihood with multiple EM cycles (MEM); and (3) M. Stocking's method B. MEM produced the smallest average total error in parameter estimation; OEM yielded…
Descriptors: Computer Assisted Testing, Error of Measurement, Maximum Likelihood Statistics, Online Systems
Ban, Jae-Chun; Hanson, Bradley A.; Yi, Qing; Harris, Deborah J. – 2002
The purpose of this study was to compare and evaluate three online pretest item calibration/scaling methods in terms of item parameter recovery when the item responses to the pretest items in the pool would be sparse. The three methods considered were the marginal maximum likelihood estimate with one EM cycle (OEM) method, the marginal maximum…
Descriptors: Adaptive Testing, Computer Assisted Testing, Data Analysis, Error of Measurement