ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	11

Descriptor

Reliability	14
Computation	8
Scores	8
Item Response Theory	5
Models	5
Validity	5
Educational Testing	4
Error of Measurement	4
Prediction	4
Test Theory	4
Correlation	3
Probability	3
Test Items	3
Criterion Referenced Tests	2
Equated Scores	2
Licensing Examinations…	2
Maximum Likelihood Statistics	2
Measurement	2
Multivariate Analysis	2
Psychometrics	2
Regression (Statistics)	2
Scoring	2
Statistical Analysis	2
True Scores	2
Ability	1
More ▼

Source

ETS Research Report Series	9
Educational Testing Service	2
Educational and Psychological…	1
Measurement:…	1
Multivariate Behavioral…	1

Author

Haberman, Shelby J.	14
Sinharay, Sandip	4
Puhan, Gautam	2
Dorans, Neil J.	1
Sinharay, Sadip	1
Wainer, Howard	1

Publication Type

Journal Articles	12
Reports - Research	9
Reports - Descriptive	2
Reports - Evaluative	2
Opinion Papers	1

Education Level

Higher Education	2
Postsecondary Education	2
High Schools	1
Secondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Praxis Series	2
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Do Adjusted Subscores Lack Validity? Don't Blame the Messenger

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J.; Wainer, Howard – Educational and Psychological Measurement, 2011

There are several techniques that increase the precision of subscores by borrowing information from other parts of the test. These techniques have been criticized on validity grounds in several of the recent publications. In this note, the authors question the argument used in these publications and suggest both inherent limits to the validity…

Descriptors: Scores, Methods, Validity, Reliability

Sources of Score Scale Inconsistency. Research Report. ETS RR-11-10

Download full text

Haberman, Shelby J.; Dorans, Neil J. – Educational Testing Service, 2011

For testing programs that administer multiple forms within a year and across years, score equating is used to ensure that scores can be used interchangeably. In an ideal world, samples sizes are large and representative of populations that hardly change over time, and very reliable alternate test forms are built with nearly identical psychometric…

Descriptors: Scores, Reliability, Equated Scores, Test Construction

Reporting Diagnostic Scores in Educational Testing: Temptations, Pitfalls, and Some Solutions

Peer reviewed

Direct link

Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J. – Multivariate Behavioral Research, 2010

Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…

Descriptors: Educational Testing, Scores, Reports, Psychometrics

How Does the Knowledge of Subgroup Membership of Examinees Affect the Prediction of True Subscores? Research Report. ETS RR-11-43

Download full text

Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011

Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…

Descriptors: True Scores, Test Theory, Prediction, Group Membership

Reliability of Scaled Scores. Research Report. ETS RR-08-70

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2008

The reliability of a scaled score can be computed by use of item response theory. Estimated reliability can be obtained even if the item response model selected is not valid.

Descriptors: Reliability, Scores, Item Response Theory, Computation

Linking with Continuous Exponential Families: Single-Group Designs. Research Report. ETS RR-08-61

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2008

Continuous exponential families are applied to linking forms via a single-group design. In this application, a distribution from the continuous bivariate exponential family is used that has selected moments that match those of the bivariate distribution of scores on the forms to be linked. The selected continuous bivariate distribution then yields…

Descriptors: Equated Scores, Probability, Statistical Distributions, Models

How Much Can We Reliably Know about What Examinees Know?

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J. – Measurement: Interdisciplinary Research and Perspectives, 2009

In this commentary, the authors discuss some of the issues regarding the use of diagnostic classification models that practitioners should keep in mind. In the authors experience, these issues are not as well known as they should be. The authors then provide recommendations on diagnostic scoring.

Descriptors: Scoring, Reliability, Validity, Classification

Subscores and Validity. Research Report. ETS RR-08-64

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2008

In educational testing, subscores may be provided based on a portion of the items from a larger test. One consideration in evaluation of such subscores is their ability to predict a criterion score. Two limitations on prediction exist. The first, which is well known, is that the coefficient of determination for linear prediction of the criterion…

Descriptors: Scores, Validity, Educational Testing, Correlation

Interpretations of Reliability. Research Report. ETS RR-05-29

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2005

Some probabilistic illustrations of the reliability coefficient are provided to assist in interpretation of this measure. All explanations are derived under the assumption that the joint distribution of examinee scores from two parallel tests is well approximated by a bivariate normal distribution.

Descriptors: Probability, Reliability, Intervals, Computation

The Information a Test Provides on an Ability Parameter. Research Report. ETS RR-07-18

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2007

In item-response theory, if a latent-structure model has an ability variable, then elementary information theory may be employed to provide a criterion for evaluation of the information the test provides concerning ability. This criterion may be considered even in cases in which the latent-structure model is not valid, although interpretation of…

Descriptors: Item Response Theory, Ability, Information Theory, Computation

Joint and Conditional Estimation for Implicit Models for Tests with Polytomous Item Scores. Research Report. ETS RR-06-03

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2006

Multinomial-response models are available that correspond implicitly to tests in which a total score is computed as the sum of polytomous item scores. For these models, joint and conditional estimation may be considered in much the same way as for the Rasch model for right-scored tests. As in the Rasch model, joint estimation is only attractive if…

Descriptors: Computation, Models, Test Items, Scores

Subscores for Institutions. Research Report. ETS RR-06-13

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Sinharay, Sadip; Puhan, Gautam – ETS Research Report Series, 2006

Recently, there has been an increasing level of interest in reporting subscores. This paper examines the issue of reporting subscores at an aggregate level, especially at the level of institutions that the examinees belong to. A series of statistical analyses is suggested to determine when subscores at the institutional level have any added value…

Descriptors: Scores, Statistical Analysis, Error of Measurement, Reliability

Joint and Conditional Maximum Likelihood Estimation for the Rasch Model for Binary Responses. Research Report. RR-04-20

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2004

The usefulness of joint and conditional maximum-likelihood is considered for the Rasch model under realistic testing conditions in which the number of examinees is very large and the number is items is relatively large. Conditions for consistency and asymptotic normality are explored, effects of model error are investigated, measures of prediction…

Descriptors: Maximum Likelihood Statistics, Computation, Item Response Theory, Testing

When Can Subscores Have Value? Research Report. ETS RR-05-08

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2005

In educational tests, subscores are often generated from a portion of the items in a larger test. Guidelines based on mean-squared error are proposed to indicate whether subscores are worth reporting. Alternatives considered are direct reports of subscores, estimates of subscores based on total score, combined estimates based on subscores and…

Descriptors: Scores, Test Items, Error of Measurement, Computation