ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	7

Descriptor

Error of Measurement	10
Scoring	10
Simulation	10
Item Response Theory	5
Test Items	4
Accuracy	3
Comparative Analysis	3
Evaluation Methods	3
Models	3
Probability	3
Scores	3
Statistical Bias	3
Adaptive Testing	2
Scaling	2
Statistical Analysis	2
Test Reliability	2
Bayesian Statistics	1
Computer Assisted Testing	1
Estimation (Mathematics)	1
Evaluation Criteria	1
Factor Analysis	1
Guessing (Tests)	1
Hypothesis Testing	1
Inferences	1
Item Analysis	1
More ▼

Source

ProQuest LLC	2
Applied Measurement in…	1
Applied Psychological…	1
ETS Research Report Series	1
Grantee Submission	1
Journal of Educational and…	1

Author

Lee, Won-Chan	2
Cai, Li	1
Falk, Carl F.	1
Greifer, Noah	1
Kim, Sooyeon	1
Kim, Stella Yun	1
Livingston, Samuel A.	1
Longford, Nicholas T.	1
Monroe, Scott	1
Thayer, Dorothy T.	1
Wainer, Howard	1
Wang, Keyin	1
Wright, Benjamin D.	1
Zwick, Rebecca	1
More ▼

Publication Type

Reports - Research	5
Journal Articles	4
Dissertations/Theses -…	2
Reports - Evaluative	2
Speeches/Meeting Papers	2
Reports - Descriptive	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Improving Methods for Propensity Score Analysis with Mismeasured Variables by Incorporating Background Variables with Moderated Nonlinear Factor Analysis

Direct link

Greifer, Noah – ProQuest LLC, 2018

There has been some research in the use of propensity scores in the context of measurement error in the confounding variables; one recommended method is to generate estimates of the mis-measured covariate using a latent variable model, and to use those estimates (i.e., factor scores) in place of the covariate. I describe a simulation study…

Descriptors: Evaluation Methods, Probability, Scores, Statistical Analysis

Estimation of Expected Fisher Information for IRT Models

Peer reviewed

Direct link

Monroe, Scott – Journal of Educational and Behavioral Statistics, 2019

In item response theory (IRT) modeling, the Fisher information matrix is used for numerous inferential procedures such as estimating parameter standard errors, constructing test statistics, and facilitating test scoring. In principal, these procedures may be carried out using either the expected information or the observed information. However, in…

Descriptors: Item Response Theory, Error of Measurement, Scoring, Inferences

Accuracy of a Classical Test Theory-Based Procedure for Estimating the Reliability of a Multistage Test. Research Report. ETS RR-17-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2017

The purpose of this simulation study was to assess the accuracy of a classical test theory (CTT)-based procedure for estimating the alternate-forms reliability of scores on a multistage test (MST) having 3 stages. We generated item difficulty and discrimination parameters for 10 parallel, nonoverlapping forms of the complete 3-stage test and…

Descriptors: Accuracy, Test Theory, Test Reliability, Adaptive Testing

A Fair Comparison of the Performance of Computerized Adaptive Testing and Multistage Adaptive Testing

Direct link

Wang, Keyin – ProQuest LLC, 2017

The comparison of item-level computerized adaptive testing (CAT) and multistage adaptive testing (MST) has been researched extensively (e.g., Kim & Plake, 1993; Luecht et al., 1996; Patsula, 1999; Jodoin, 2003; Hambleton & Xing, 2006; Keng, 2008; Zheng, 2012). Various CAT and MST designs have been investigated and compared under the same…

Descriptors: Comparative Analysis, Computer Assisted Testing, Adaptive Testing, Test Items

A Flexible Full-Information Approach to the Modeling of Response Styles

Peer reviewed
PDF on ERIC

Download full text

Falk, Carl F.; Cai, Li – Grantee Submission, 2015

In this paper, we present a flexible full-information approach to modeling multiple userdefined response styles across multiple constructs of interest. The model is based on a novel parameterization of the multidimensional nominal response model that separates estimation of overall item slopes from the scoring functions (indicating the order of…

Descriptors: Response Style (Tests), Item Response Theory, Outcome Measures, Models

Multinomial and Compound Multinomial Error Models for Tests with Complex Item Scoring

Peer reviewed

Direct link

Lee, Won-Chan – Applied Psychological Measurement, 2007

This article introduces a multinomial error model, which models an examinee's test scores obtained over repeated measurements of an assessment that consists of polytomously scored items. A compound multinomial error model is also introduced for situations in which items are stratified according to content categories and/or prespecified numbers of…

Descriptors: Simulation, Error of Measurement, Scoring, Test Items

Comparison of Efficiency of Jackknife and Variance Component Estimators of Standard Errors. Program Statistics Research. Technical Report.

Download full text

Longford, Nicholas T. – 1992

Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A…

Descriptors: Error of Measurement, Estimation (Mathematics), Prediction, Research Design

Robust Estimation of Ability in the Rasch Model.

Wainer, Howard; Wright, Benjamin D. – 1980

The pure Rasch model was compared with four modifications of the model in a number of different simulations in order to ascertain the comparative efficiencies of the parameter estimations of these modifications. Because there is always noise in test score data, some individuals may have response patterns that do not fit the model and their…

Descriptors: Error of Measurement, Guessing (Tests), Item Analysis, Latent Trait Theory

Evaluation of the Magnitude of Differential Item Functioning in Polytomous Items. Program Statistics Research Technical Report No. 94-2.

Download full text

Zwick, Rebecca; Thayer, Dorothy T. – 1994

Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in test items that are scored on an ordinal scale. Mantel's extension of the Mantel-Haenszel test is a possible hypothesis-testing method for this purpose. The development of descriptive statistics for…

Descriptors: Error of Measurement, Evaluation Methods, Hypothesis Testing, Item Bias