ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	24

Source

ETS Research Report Series	11
Educational Testing Service	3
Journal of Educational…	3
International Journal of…	2
Journal of Educational and…	2
Educational Measurement:…	1
Educational and Psychological…	1
Measurement:…	1
Multivariate Behavioral…	1
Psychometrika	1

Author

Haberman, Shelby J.	26
Sinharay, Sandip	10
Lee, Yi-Hsuan	6
Dorans, Neil J.	3
Puhan, Gautam	3
Guo, Hongwen	1
Liu, Jinghua	1
Liu, Yang	1
Sinharay, Sadip	1
Wainer, Howard	1
Zwick, Rebecca	1
More ▼

Publication Type

Journal Articles	23
Reports - Research	17
Reports - Evaluative	5
Reports - Descriptive	3
Opinion Papers	1

Education Level

Higher Education	5
Postsecondary Education	5
High Schools	2
Secondary Education	2

Audience

Location

China	1
France	1
Germany	1
South Korea	1

Laws, Policies, & Programs

Assessments and Surveys

Praxis Series	3
SAT (College Admission Test)	3
Test of English as a Foreign…	2
Graduate Management Admission…	1
Graduate Record Examinations	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Studying Score Stability with a Harmonic Regression Family: A Comparison of Three Approaches to Adjustment of Examinee-Specific Demographic Data

Peer reviewed

Direct link

Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021

For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…

Descriptors: Scores, Regression (Statistics), Demography, Data

Use of Adjustment by Minimum Discriminant Information in Linking Constructed-Response Test Scores in the Absence of Common Items

Peer reviewed

Direct link

Lee, Yi-Hsuan; Haberman, Shelby J.; Dorans, Neil J. – Journal of Educational Measurement, 2019

In many educational tests, both multiple-choice (MC) and constructed-response (CR) sections are used to measure different constructs. In many common cases, security concerns lead to the use of form-specific CR items that cannot be used for equating test scores, along with MC sections that can be linked to previous test forms via common items. In…

Descriptors: Scores, Multiple Choice Tests, Test Items, Responses

Application of Best Linear Prediction and Penalized Best Linear Prediction to ETS Tests. Research Report. ETS RR-20-08

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J. – ETS Research Report Series, 2020

Best linear prediction (BLP) and penalized best linear prediction (PBLP) are techniques for combining sources of information to produce task scores, section scores, and composite test scores. The report examines issues to consider in operational implementation of BLP and PBLP in testing programs administered by ETS [Educational Testing Service].

Descriptors: Prediction, Scores, Tests, Testing Programs

Distractor Analysis for Multiple-Choice Tests: An Empirical Study with International Language Assessment Data. Research Report. ETS RR-19-39

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019

Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…

Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests

Investigating Test-Taking Behaviors Using Timing and Process Data

Peer reviewed

Direct link

Lee, Yi-Hsuan; Haberman, Shelby J. – International Journal of Testing, 2016

The use of computer-based assessments makes the collection of detailed data that capture examinees' progress in the tests and time spent on individual actions possible. This article presents a study using process and timing data to aid understanding of an international language assessment and the examinees. Issues regarding test-taking strategies,…

Descriptors: Computer Assisted Testing, Test Wiseness, Language Tests, International Assessment

Sources of Score Scale Inconsistency. Research Report. ETS RR-11-10

Download full text

Haberman, Shelby J.; Dorans, Neil J. – Educational Testing Service, 2011

For testing programs that administer multiple forms within a year and across years, score equating is used to ensure that scores can be used interchangeably. In an ideal world, samples sizes are large and representative of populations that hardly change over time, and very reliable alternate test forms are built with nearly identical psychometric…

Descriptors: Scores, Reliability, Equated Scores, Test Construction

Do Adjusted Subscores Lack Validity? Don't Blame the Messenger

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J.; Wainer, Howard – Educational and Psychological Measurement, 2011

There are several techniques that increase the precision of subscores by borrowing information from other parts of the test. These techniques have been criticized on validity grounds in several of the recent publications. In this note, the authors question the argument used in these publications and suggest both inherent limits to the validity…

Descriptors: Scores, Methods, Validity, Reliability

An Empirical Investigation of Population Invariance in the Value of Subscores

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014

Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…

Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)

When Does Scale Anchoring Work? A Case Study

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J.; Lee, Yi-Hsuan – Journal of Educational Measurement, 2011

Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement. Scale anchoring, a technique which describes what students at different points on a score scale know and can do, is a tool to provide such information.…

Descriptors: Scores, Test Items, Statistical Analysis, Licensing Examinations (Professions)

An NCME Instructional Module on Subscores

Peer reviewed

Direct link

Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J. – Educational Measurement: Issues and Practice, 2011

The purpose of this ITEMS module is to provide an introduction to subscores. First, examples of subscores from an operational test are provided. Then, a review of methods that can be used to examine if subscores have adequate psychometric quality is provided. It is demonstrated, using results from operational and simulated data, that subscores…

Descriptors: Scores, Psychometrics, Tests, Data

How Can Multivariate Item Response Theory Be Used in Reporting of Susbcores? Research Report. ETS RR-10-09

Download full text

Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2010

Recently, there has been increasing interest in reporting diagnostic scores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models. An MIRT model is fitted using a stabilized Newton-Raphson algorithm (Haberman, 1974, 1988) with adaptive Gauss-Hermite quadrature (Haberman, von Davier, & Lee, 2008).…

Descriptors: Item Response Theory, Scores, Multivariate Analysis

Statistical Procedures to Evaluate Quality of Scale Anchoring. Research Report. ETS RR-11-02

Download full text

Haberman, Shelby J.; Sinharay, Sandip; Lee, Yi-Hsuan – Educational Testing Service, 2011

Providing information to test takers and test score users about the abilities of test takers at different score levels has been a persistent problem in educational and psychological measurement (Carroll, 1993). Scale anchoring (Beaton & Allen, 1992), a technique that describes what students at different points on a score scale know and can do,…

Descriptors: Statistical Analysis, Scores, Regression (Statistics), Item Response Theory

Reporting Diagnostic Scores in Educational Testing: Temptations, Pitfalls, and Some Solutions

Peer reviewed

Direct link

Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J. – Multivariate Behavioral Research, 2010

Diagnostic scores are of increasing interest in educational testing due to their potential remedial and instructional benefit. Naturally, the number of educational tests that report diagnostic scores is on the rise, as are the number of research publications on such scores. This article provides a critical evaluation of diagnostic score reporting…

Descriptors: Educational Testing, Scores, Reports, Psychometrics

The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

Peer reviewed

Direct link

Haberman, Shelby J.; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2010

Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

Descriptors: Scoring, Regression (Statistics), Essays, Computer Software

Issues with Self-Monitoring Assessments: Comments on Koretz and Beguin (2010)

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J.; Zwick, Rebecca – Measurement: Interdisciplinary Research and Perspectives, 2010

Several researchers (e.g., Klein, Hamilton, McCaffrey, & Stecher, 2000; Koretz & Barron, 1998; Linn, 2000) have asserted that test-based accountability, a crucial component of U.S. education policy, has resulted in score inflation. This inference has relied on comparisons with performance on other tests such as the National Assessment of…

Descriptors: Audits (Verification), Test Items, Scores, Measurement

Previous Page | Next Page »

Pages: 1 | 2

Scores	26
Test Items	9
Reliability	8
Item Response Theory	7
Regression (Statistics)	7
Statistical Analysis	7
Error of Measurement	6
Models	6
Computation	5
Correlation	5
Educational Testing	5
Language Tests	5
Tests	5
College Entrance Examinations	4
Comparative Analysis	4
Licensing Examinations…	4
Prediction	4
Test Theory	4
English (Second Language)	3
Equated Scores	3
Goodness of Fit	3
International Assessment	3
Measurement	3
Psychometrics	3
Scoring	3
More ▼