ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	7

Descriptor

Error of Measurement	12
Scaling	12
Simulation	12
Test Items	8
Item Response Theory	7
Scores	4
Computer Assisted Testing	3
Estimation (Mathematics)	3
Statistical Bias	3
Adaptive Testing	2
Bayesian Statistics	2
Effect Size	2
Goodness of Fit	2
Maximum Likelihood Statistics	2
Models	2
Online Systems	2
Sample Size	2
Scoring	2
Statistical Analysis	2
Test Bias	2
Test Interpretation	2
Test Reliability	2
Biology	1
Comparative Analysis	1
Correlation	1
More ▼

Source

Journal of Educational…	2
Applied Measurement in…	1
Applied Psychological…	1
EURASIA Journal of…	1
Measurement and Evaluation in…	1
Multivariate Behavioral…	1
ProQuest LLC	1
Sociological Methods &…	1

Publication Type

Journal Articles	8
Reports - Research	7
Reports - Descriptive	2
Reports - Evaluative	2
Speeches/Meeting Papers	2
Dissertations/Theses -…	1

Education Level

Grade 9	1
High Schools	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Indonesia

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Polytomous Rasch Models in Counseling Assessment

Peer reviewed

Direct link

Willse, John T. – Measurement and Evaluation in Counseling and Development, 2017

This article provides a brief introduction to the Rasch model. Motivation for using Rasch analyses is provided. Important Rasch model concepts and key aspects of result interpretation are introduced, with major points reinforced using a simulation demonstration. Concrete guidelines are provided regarding sample size and the evaluation of items.

Descriptors: Item Response Theory, Test Results, Test Interpretation, Simulation

How Does Polytomous Item Bias Affect Total-Group Survey Score Comparisons?

Peer reviewed

Direct link

Hidalgo, Ma Dolores; Benítez, Isabel; Padilla, Jose-Luis; Gómez-Benito, Juana – Sociological Methods & Research, 2017

The growing use of scales in survey questionnaires warrants the need to address how does polytomous differential item functioning (DIF) affect observed scale score comparisons. The aim of this study is to investigate the impact of DIF on the type I error and effect size of the independent samples t-test on the observed total scale scores. A…

Descriptors: Test Items, Test Bias, Item Response Theory, Surveys

Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression

Peer reviewed

Direct link

Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W. – Applied Psychological Measurement, 2012

Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…

Descriptors: Item Response Theory, Multiple Regression Analysis, Error of Measurement, Models

A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

Peer reviewed

Direct link

Lin, Johnny; Bentler, Peter M. – Multivariate Behavioral Research, 2012

Goodness-of-fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square, but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's (1984) asymptotically distribution-free method and Satorra Bentler's…

Descriptors: Factor Analysis, Statistical Analysis, Scaling, Sample Size

Effect of Violating Unidimensional Item Response Theory Vertical Scaling Assumptions on Developmental Score Scales

Direct link

Topczewski, Anna Marie – ProQuest LLC, 2013

Developmental score scales represent the performance of students along a continuum, where as students learn more they move higher along that continuum. Unidimensional item response theory (UIRT) vertical scaling has become a commonly used method to create developmental score scales. Research has shown that UIRT vertical scaling methods can be…

Descriptors: Item Response Theory, Scaling, Scores, Student Development

Multidimensional Computerized Adaptive Testing for Indonesia Junior High School Biology

Peer reviewed

Direct link

Kuo, Bor-Chen; Daud, Muslem; Yang, Chih-Wei – EURASIA Journal of Mathematics, Science & Technology Education, 2015

This paper describes a curriculum-based multidimensional computerized adaptive test that was developed for Indonesia junior high school Biology. In adherence to the Indonesian curriculum of different Biology dimensions, 300 items was constructed, and then tested to 2238 students. A multidimensional random coefficients multinomial logit model was…

Descriptors: Secondary School Science, Science Education, Science Tests, Computer Assisted Testing

Interval Estimation for True Scores under Various Scale Transformations. ACT Research Report Series.

Download full text

Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – 2002

This paper reviews various procedures for constructing an interval for an individual's true score given the assumption that errors of measurement are distributed as binomial. This paper also presents two general interval estimation procedures (i.e., normal approximation and endpoints conversion methods) for an individual's true scale score;…

Descriptors: Bayesian Statistics, Error of Measurement, Estimation (Mathematics), Scaling

Estimators of Conditional Scale-Score Standard Errors of Measurement: A Simulation Study.

Peer reviewed

Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Journal of Educational Measurement, 2000

Describes four procedures previously developed for estimating conditional standard errors of measurement for scale scores and compares them in a simulation study. All four procedures appear viable. Recommends that test users select a procedure based on various factors such as the type of scale score of concern, test characteristics, assumptions…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Response Theory, Scaling

Evaluation of the Magnitude of Differential Item Functioning in Polytomous Items. Program Statistics Research Technical Report No. 94-2.

Download full text

Zwick, Rebecca; Thayer, Dorothy T. – 1994

Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in test items that are scored on an ordinal scale. Mantel's extension of the Mantel-Haenszel test is a possible hypothesis-testing method for this purpose. The development of descriptive statistics for…

Descriptors: Error of Measurement, Evaluation Methods, Hypothesis Testing, Item Bias

Data Sparseness and On-Line Pretest Item Calibration-Scaling Methods in CAT.

Peer reviewed

Ban, Jae-Chun; Hanson, Bradley A.; Yi, Qing; Harris, Deborah J. – Journal of Educational Measurement, 2002

Compared three online pretest calibration scaling methods through simulation: (1) marginal maximum likelihood with one expectation maximization (EM) cycle (OEM) method; (2) marginal maximum likelihood with multiple EM cycles (MEM); and (3) M. Stocking's method B. MEM produced the smallest average total error in parameter estimation; OEM yielded…

Descriptors: Computer Assisted Testing, Error of Measurement, Maximum Likelihood Statistics, Online Systems

Data Sparseness and Online Pretest Item Calibration/Scaling Methods in CAT. ACT Research Report Series.

Download full text

Ban, Jae-Chun; Hanson, Bradley A.; Yi, Qing; Harris, Deborah J. – 2002

The purpose of this study was to compare and evaluate three online pretest item calibration/scaling methods in terms of item parameter recovery when the item responses to the pretest items in the pool would be sparse. The three methods considered were the marginal maximum likelihood estimate with one EM cycle (OEM) method, the marginal maximum…

Descriptors: Adaptive Testing, Computer Assisted Testing, Data Analysis, Error of Measurement

Lee, Won-Chan	3
Ban, Jae-Chun	2
Brennan, Robert L.	2
Hanson, Bradley A.	2
Harris, Deborah J.	2
Kolen, Michael J.	2
Yi, Qing	2
Bentler, Peter M.	1
Benítez, Isabel	1
Daud, Muslem	1
Griffeth, Rodger W.	1
Gómez-Benito, Juana	1
Hidalgo, Ma Dolores	1
Johanson, George A.	1
Kim, Stella Yun	1
Kuo, Bor-Chen	1
Lin, Johnny	1
Morse, Brendan J.	1
Padilla, Jose-Luis	1
Thayer, Dorothy T.	1
Topczewski, Anna Marie	1
Willse, John T.	1
Yang, Chih-Wei	1
Zwick, Rebecca	1
More ▼