Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 29 |
Descriptor
Error of Measurement | 121 |
True Scores | 121 |
Test Reliability | 37 |
Statistical Analysis | 36 |
Reliability | 30 |
Mathematical Models | 27 |
Correlation | 25 |
Scores | 20 |
Item Response Theory | 18 |
Test Interpretation | 18 |
Comparative Analysis | 17 |
More ▼ |
Source
Author
Publication Type
Education Level
Early Childhood Education | 1 |
Elementary Education | 1 |
Grade 2 | 1 |
Higher Education | 1 |
Junior High Schools | 1 |
Preschool Education | 1 |
Audience
Researchers | 5 |
Practitioners | 2 |
Administrators | 1 |
Teachers | 1 |
Location
Australia | 1 |
Canada | 1 |
Oregon | 1 |
Taiwan | 1 |
United Kingdom (England) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Zhang, Zhonghua – Journal of Experimental Education, 2022
Reporting standard errors of equating has been advocated as a standard practice when conducting test equating. The two most widely applied procedures for standard errors of equating including the bootstrap method and the delta method are either computationally intensive or confined to the derivations of complicated formulas. In the current study,…
Descriptors: Error of Measurement, Item Response Theory, True Scores, Equated Scores
Zhang, Zhonghua – Applied Measurement in Education, 2020
The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…
Descriptors: Error of Measurement, Computation, Equated Scores, True Scores
Dimitrov, Dimiter M. – Educational and Psychological Measurement, 2020
This study presents new models for item response functions (IRFs) in the framework of the D-scoring method (DSM) that is gaining attention in the field of educational and psychological measurement and largescale assessments. In a previous work on DSM, the IRFs of binary items were estimated using a logistic regression model (LRM). However, the LRM…
Descriptors: Item Response Theory, Scoring, True Scores, Scaling
Phillips, Gary W.; Jiang, Tao – Practical Assessment, Research & Evaluation, 2016
Power analysis is a fundamental prerequisite for conducting scientific research. Without power analysis the researcher has no way of knowing whether the sample size is large enough to detect the effect he or she is looking for. This paper demonstrates how psychometric factors such as measurement error and equating error affect the power of…
Descriptors: Error of Measurement, Statistical Analysis, Equated Scores, Sample Size
Tao, Wei; Cao, Yi – Applied Measurement in Education, 2016
Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…
Descriptors: Item Response Theory, Equated Scores, Test Format, Models
Cher Wong, Cheow – Journal of Educational Measurement, 2015
Building on previous works by Lord and Ogasawara for dichotomous items, this article proposes an approach to derive the asymptotic standard errors of item response theory true score equating involving polytomous items, for equivalent and nonequivalent groups of examinees. This analytical approach could be used in place of empirical methods like…
Descriptors: Item Response Theory, Error of Measurement, True Scores, Equated Scores
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores
Raykov, Tenko; Marcoulides, George A. – Educational and Psychological Measurement, 2016
The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…
Descriptors: Test Theory, Item Response Theory, Models, Correlation
Moses, Tim – Educational Measurement: Issues and Practice, 2014
This module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and…
Descriptors: Scaling, Regression (Statistics), Equated Scores, Comparative Analysis
Moses, Tim – Journal of Educational Measurement, 2012
The focus of this paper is assessing the impact of measurement errors on the prediction error of an observed-score regression. Measures are presented and described for decomposing the linear regression's prediction error variance into parts attributable to the true score variance and the error variances of the dependent variable and the predictor…
Descriptors: Error of Measurement, Prediction, Regression (Statistics), True Scores
Andrews, Benjamin James – ProQuest LLC, 2011
The equity properties can be used to assess the quality of an equating. The degree to which expected scores conditional on ability are similar between test forms is referred to as first-order equity. Second-order equity is the degree to which conditional standard errors of measurement are similar between test forms after equating. The purpose of…
Descriptors: Test Format, Advanced Placement, Simulation, True Scores
Leue, Anja; Lange, Sebastian – Assessment, 2011
The assessment of positive affect (PA) and negative affect (NA) by means of the Positive Affect and Negative Affect Schedule has received a remarkable popularity in the social sciences. Using a meta-analytic tool--namely, reliability generalization (RG)--population reliability scores of both scales have been investigated on the basis of a random…
Descriptors: Social Sciences, True Scores, Generalization, Affective Behavior
Stoolmiller, Michael; Biancarosa, Gina; Fien, Hank – Assessment for Effective Intervention, 2013
Lack of psychometric equivalence of oral reading fluency (ORF) passages used within a grade for screening and progress monitoring has recently become an issue with calls for the use of equating methods to ensure equivalence. To investigate the nature of the nonequivalence and to guide the choice of equating method to correct for nonequivalence,…
Descriptors: School Personnel, Reading Fluency, Emergent Literacy, Psychometrics
Erdodi, Laszlo A.; Richard, David C. S.; Hopwood, Christopher – Journal of Psychoeducational Assessment, 2009
Classical test theory assumes that ability level has no effect on measurement error. Newer test theories, however, argue that the precision of a measurement instrument changes as a function of the examinee's true score. Research has shown that administration errors are common in the Wechsler scales and that subtests requiring subjective scoring…
Descriptors: Scoring, Error of Measurement, True Scores, Intelligence Tests
Laenen, Annouschka; Alonso, Ariel; Molenberghs, Geert; Vangeneugden, Tony; Mallinckrodt, Craig H. – Applied Psychological Measurement, 2010
Longitudinal studies are permeating clinical trials in psychiatry. Therefore, it is of utmost importance to study the psychometric properties of rating scales, frequently used in these trials, within a longitudinal framework. However, intrasubject serial correlation and memory effects are problematic issues often encountered in longitudinal data.…
Descriptors: Psychiatry, Rating Scales, Memory, Psychometrics