ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	12

Descriptor

Error of Measurement	15
Scores	10
Item Response Theory	7
Comparative Analysis	5
Simulation	5
Accuracy	4
Scaling	4
Estimation (Mathematics)	3
Models	3
Raw Scores	3
Reliability	3
Sample Size	3
Scoring	3
Test Items	3
Bayesian Statistics	2
Classification	2
Goodness of Fit	2
Guidelines	2
Intervals	2
Item Analysis	2
Probability	2
Psychometrics	2
Statistical Bias	2
Test Format	2
Test Length	2
More ▼

Source

Journal of Educational…	7
Applied Measurement in…	2
Applied Psychological…	1
Educational Measurement:…	1
Educational and Psychological…	1
International Journal of…	1
Journal of Educational and…	1

Author

Lee, Won-Chan	15
Brennan, Robert L.	5
Kolen, Michael J.	5
Kim, Stella Y.	2
Choi, Jiwon	1
Chon, Kyong Hee	1
Dunbar, Stephen B.	1
Huang, Feifei	1
Kang, Yujin	1
Kim, Hyung Jin	1
Kim, Kyung Yong	1
Kim, Stella Yun	1
Li, Yixing	1
Li, Zonglong	1
Song, Yoon Ah	1
Wang, Shaojie	1
Wang, Tianyou	1
Yu, Sufang	1
Zhang, Minqiang	1
More ▼

Publication Type

Journal Articles	14
Reports - Research	9
Reports - Descriptive	3
Reports - Evaluative	3
Speeches/Meeting Papers	2

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills	2
ACT Assessment	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

A New Statistic to Assess Fitness of Cubic-Spline Postsmoothing

Peer reviewed

Direct link

Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan – Journal of Educational Measurement, 2020

In equating, smoothing techniques are frequently used to diminish sampling error. There are typically two types of smoothing: presmoothing and postsmoothing. For polynomial log-linear presmoothing, an optimum smoothing degree can be determined statistically based on the Akaike information criterion or Chi-square difference criterion. For…

Descriptors: Equated Scores, Sampling, Error of Measurement, Statistical Analysis

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

Classification Consistency and Accuracy with Atypical Score Distributions

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020

The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…

Descriptors: Classification, Accuracy, Scores, Cutting Scores

IRT Approaches to Modeling Scores on Mixed-Format Tests

Peer reviewed

Direct link

Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…

Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests

Confidence Intervals for Weighted Composite Scores under the Compound Binomial Error Model

Peer reviewed

Direct link

Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2018

Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly…

Descriptors: Weighted Scores, Error of Measurement, Test Use, Decision Making

Conditional Standard Errors of Measurement for Composite Scores Using IRT

Peer reviewed

Direct link

Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan – International Journal of Testing, 2012

Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…

Descriptors: Mathematics Tests, Achievement Tests, College Entrance Examinations, Error of Measurement

Psychometric Properties of Raw and Scale Scores on Mixed-Format Tests

Peer reviewed

Direct link

Kolen, Michael J.; Lee, Won-Chan – Educational Measurement: Issues and Practice, 2011

This paper illustrates that the psychometric properties of scores and scales that are used with mixed-format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is…

Descriptors: Test Use, Test Format, Error of Measurement, Raw Scores

A Comparison of Item Fit Statistics for Mixed IRT Models

Peer reviewed

Direct link

Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010

In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…

Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation

Multinomial and Compound Multinomial Error Models for Tests with Complex Item Scoring

Peer reviewed

Direct link

Lee, Won-Chan – Applied Psychological Measurement, 2007

This article introduces a multinomial error model, which models an examinee's test scores obtained over repeated measurements of an assessment that consists of polytomously scored items. A compound multinomial error model is also introduced for situations in which items are stratified according to content categories and/or prespecified numbers of…

Descriptors: Simulation, Error of Measurement, Scoring, Test Items

Conditional Scale-Score Standard Errors of Measurement under Binomial and Compound Binomial Assumptions.

Peer reviewed

Brennan, Robert L.; Lee, Won-Chan – Educational and Psychological Measurement, 1999

Develops two procedures for estimating individual-level conditional standard errors of measurement for scale scores, assuming tests of dichotomously scored items. Compares the two procedures to a polynomial procedure and a procedure developed by L. Feldt and A. Qualls (1998) using data from the Iowa Tests of Basic Skills. Contains 22 references.…

Descriptors: Error of Measurement, Estimation (Mathematics), Scaling, Scores

Interval Estimation for True Raw and Scale Scores under the Binomial Error Model

Peer reviewed

Direct link

Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Journal of Educational and Behavioral Statistics, 2006

Assuming errors of measurement are distributed binomially, this article reviews various procedures for constructing an interval for an individual's true number-correct score; presents two general interval estimation procedures for an individual's true scale score (i.e., normal approximation and endpoints conversion methods); compares various…

Descriptors: Probability, Intervals, Guidelines, Computer Simulation

Interval Estimation for True Scores under Various Scale Transformations. ACT Research Report Series.

Download full text

Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – 2002

This paper reviews various procedures for constructing an interval for an individual's true score given the assumption that errors of measurement are distributed as binomial. This paper also presents two general interval estimation procedures (i.e., normal approximation and endpoints conversion methods) for an individual's true scale score;…

Descriptors: Bayesian Statistics, Error of Measurement, Estimation (Mathematics), Scaling

Estimators of Conditional Scale-Score Standard Errors of Measurement: A Simulation Study.

Peer reviewed

Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Journal of Educational Measurement, 2000

Describes four procedures previously developed for estimating conditional standard errors of measurement for scale scores and compares them in a simulation study. All four procedures appear viable. Recommends that test users select a procedure based on various factors such as the type of scale score of concern, test characteristics, assumptions…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Scaling