NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Haberman, Shelby J.; Wainer, Howard – Educational and Psychological Measurement, 2011
There are several techniques that increase the precision of subscores by borrowing information from other parts of the test. These techniques have been criticized on validity grounds in several of the recent publications. In this note, the authors question the argument used in these publications and suggest both inherent limits to the validity…
Descriptors: Scores, Methods, Validity, Reliability
Haberman, Shelby J.; Dorans, Neil J. – Educational Testing Service, 2011
For testing programs that administer multiple forms within a year and across years, score equating is used to ensure that scores can be used interchangeably. In an ideal world, samples sizes are large and representative of populations that hardly change over time, and very reliable alternate test forms are built with nearly identical psychometric…
Descriptors: Scores, Reliability, Equated Scores, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J. – Multivariate Behavioral Research, 2010
Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…
Descriptors: Educational Testing, Scores, Reports, Psychometrics
Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011
Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…
Descriptors: True Scores, Test Theory, Prediction, Group Membership
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2008
The reliability of a scaled score can be computed by use of item response theory. Estimated reliability can be obtained even if the item response model selected is not valid.
Descriptors: Reliability, Scores, Item Response Theory, Computation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2008
Continuous exponential families are applied to linking forms via a single-group design. In this application, a distribution from the continuous bivariate exponential family is used that has selected moments that match those of the bivariate distribution of scores on the forms to be linked. The selected continuous bivariate distribution then yields…
Descriptors: Equated Scores, Probability, Statistical Distributions, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Haberman, Shelby J. – Measurement: Interdisciplinary Research and Perspectives, 2009
In this commentary, the authors discuss some of the issues regarding the use of diagnostic classification models that practitioners should keep in mind. In the authors experience, these issues are not as well known as they should be. The authors then provide recommendations on diagnostic scoring.
Descriptors: Scoring, Reliability, Validity, Classification
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2008
In educational testing, subscores may be provided based on a portion of the items from a larger test. One consideration in evaluation of such subscores is their ability to predict a criterion score. Two limitations on prediction exist. The first, which is well known, is that the coefficient of determination for linear prediction of the criterion…
Descriptors: Scores, Validity, Educational Testing, Correlation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2005
Some probabilistic illustrations of the reliability coefficient are provided to assist in interpretation of this measure. All explanations are derived under the assumption that the joint distribution of examinee scores from two parallel tests is well approximated by a bivariate normal distribution.
Descriptors: Probability, Reliability, Intervals, Computation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2007
In item-response theory, if a latent-structure model has an ability variable, then elementary information theory may be employed to provide a criterion for evaluation of the information the test provides concerning ability. This criterion may be considered even in cases in which the latent-structure model is not valid, although interpretation of…
Descriptors: Item Response Theory, Ability, Information Theory, Computation
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2006
Multinomial-response models are available that correspond implicitly to tests in which a total score is computed as the sum of polytomous item scores. For these models, joint and conditional estimation may be considered in much the same way as for the Rasch model for right-scored tests. As in the Rasch model, joint estimation is only attractive if…
Descriptors: Computation, Models, Test Items, Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J.; Sinharay, Sadip; Puhan, Gautam – ETS Research Report Series, 2006
Recently, there has been an increasing level of interest in reporting subscores. This paper examines the issue of reporting subscores at an aggregate level, especially at the level of institutions that the examinees belong to. A series of statistical analyses is suggested to determine when subscores at the institutional level have any added value…
Descriptors: Scores, Statistical Analysis, Error of Measurement, Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2004
The usefulness of joint and conditional maximum-likelihood is considered for the Rasch model under realistic testing conditions in which the number of examinees is very large and the number is items is relatively large. Conditions for consistency and asymptotic normality are explored, effects of model error are investigated, measures of prediction…
Descriptors: Maximum Likelihood Statistics, Computation, Item Response Theory, Testing
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J. – ETS Research Report Series, 2005
In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean-squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…
Descriptors: Scores, Test Items, Error of Measurement, Computation