ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	13

Descriptor

Equated Scores	25
Item Response Theory	25
Sampling	25
Statistical Analysis	9
College Entrance Examinations	8
Evaluation Methods	8
Simulation	6
Comparative Analysis	5
Sample Size	5
Test Items	5
True Scores	5
Achievement Tests	4
Difficulty Level	4
Standardized Tests	4
Test Format	4
Accuracy	3
Error of Measurement	3
Gender Differences	3
Mathematics Tests	3
Measurement	3
Research Design	3
Science Achievement	3
Testing Programs	3
Computation	2
Data Analysis	2
More ▼

Source

Applied Measurement in…	10
Applied Psychological…	4
ACT, Inc.	1
College Board	1
ETS Research Report Series	1
International Journal of…	1
Journal of Educational…	1
Ministerial Council on…	1
ProQuest LLC	1

Publication Type

Journal Articles	17
Reports - Evaluative	12
Reports - Research	10
Numerical/Quantitative Data	3
Information Analyses	2
Speeches/Meeting Papers	2
Dissertations/Theses -…	1

Education Level

Higher Education	4
Postsecondary Education	2
Secondary Education	2
Elementary Education	1
Elementary Secondary Education	1
Grade 6	1
Grade 8	1
High Schools	1
Junior High Schools	1
Middle Schools	1

Audience

Location

Australia

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	3
College Board Achievement…	1
National Merit Scholarship…	1
Preliminary Scholastic…	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Investigating Repeater Effects on Small Sample Equating: Include or Exclude?

Peer reviewed

Direct link

Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020

Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…

Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems

Evaluating Equity at the Local Level Using Bootstrap Tests. Research Report 2016-4

Download full text

Kim, YoungKoung; DeCarlo, Lawrence T. – College Board, 2016

Because of concerns about test security, different test forms are typically used across different testing occasions. As a result, equating is necessary in order to get scores from the different test forms that can be used interchangeably. In order to assure the quality of equating, multiple equating methods are often examined. Various equity…

Descriptors: Equated Scores, Evaluation Methods, Sampling, Statistical Inference

The Accuracy and Consistency of a Series of IRT True Score Equatings

Peer reviewed

Direct link

Li, Deping; Jiang, Yanlin; von Davier, Alina A. – Journal of Educational Measurement, 2012

This study investigates a sequence of item response theory (IRT) true score equatings based on various scale transformation approaches and evaluates equating accuracy and consistency over time. The results show that the biases and sample variances for the IRT true score equating (both direct and indirect) are quite small (except for the mean/sigma…

Descriptors: True Scores, Equated Scores, Item Response Theory, Accuracy

Selection of Common Items as an Unrecognized Source of Variability in Test Equating: A Bootstrap Approximation Assuming Random Sampling of Common Items

Peer reviewed

Direct link

Michaelides, Michalis P.; Haertel, Edward H. – Applied Measurement in Education, 2014

The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…

Descriptors: Equated Scores, Test Items, Sampling, Statistical Inference

A Comparison of Four Linear Equating Methods for the Common-Item Nonequivalent Groups Design Using Simulation Methods. ACT Research Report Series, 2013 (2)

Download full text

Topczewski, Anna; Cui, Zhongmin; Woodruff, David; Chen, Hanwei; Fang, Yu – ACT, Inc., 2013

This paper investigates four methods of linear equating under the common item nonequivalent groups design. Three of the methods are well known: Tucker, Angoff-Levine, and Congeneric-Levine. A fourth method is presented as a variant of the Congeneric-Levine method. Using simulation data generated from the three-parameter logistic IRT model we…

Descriptors: Comparative Analysis, Equated Scores, Methods, Simulation

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Observed-Score Equating with a Heterogeneous Target Population

Peer reviewed

Direct link

Duong, Minh Q.; von Davier, Alina A. – International Journal of Testing, 2012

Test equating is a statistical procedure for adjusting for test form differences in difficulty in a standardized assessment. Equating results are supposed to hold for a specified target population (Kolen & Brennan, 2004; von Davier, Holland, & Thayer, 2004) and to be (relatively) independent of the subpopulations from the target population (see…

Descriptors: Ability Grouping, Difficulty Level, Psychometrics, Statistical Analysis

Conditions Affecting the Accuracy of Classical Equating Methods for Small Samples under the NEAT Design: A Simulation Study

Direct link

Sunnassee, Devdass – ProQuest LLC, 2011

Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…

Descriptors: Test Length, Test Format, Sample Size, Simulation

Jackknifing Techniques for Evaluation of Equating Accuracy. Research Report. ETS RR-09-39

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Lee, Yi-Hsuan; Qian, Jiahe – ETS Research Report Series, 2009

Grouped jackknifing may be used to evaluate the stability of equating procedures with respect to sampling error and with respect to changes in anchor selection. Properties of grouped jackknifing are reviewed for simple-random and stratified sampling, and its use is described for comparisons of anchor sets. Application is made to examples of item…

Descriptors: Equated Scores, Accuracy, Sampling, Statistical Analysis

Anchor Test Type and Population Invariance: An Exploration across Subpopulations and Test Administrations

Peer reviewed

Direct link

Dorans, Neil J.; Liu, Jinghua; Hammond, Shelby – Applied Psychological Measurement, 2008

This exploratory study was built on research spanning three decades. Petersen, Marco, and Stewart (1982) conducted a major empirical investigation of the efficacy of different equating methods. The studies reported in Dorans (1990) examined how different equating methods performed across samples selected in different ways. Recent population…

Descriptors: Test Format, Equated Scores, Sampling, Evaluation Methods

A Discussion of Population Invariance

Peer reviewed

Direct link

Brennan, Robert L. – Applied Psychological Measurement, 2008

The discussion here covers five articles that are linked in the sense that they all treat population invariance. This discussion of population invariance is a somewhat broader treatment of the subject than simply a discussion of these five articles. In particular, occasional reference is made to publications other than those in this issue. The…

Descriptors: Advanced Placement, Law Schools, Science Achievement, Achievement Tests

Effect on Equating Results of Matching Samples on an Anchor Test.

Peer reviewed

Lawrence, Ida M.; Dorans, Neil J. – Applied Measurement in Education, 1990

The sample invariant properties of five anchor test equating methods are addressed. Equating results across two sampling conditions--representative sampling and new-form matched sampling--are compared for Tucker and Levine equally reliable linear equating, item response theory true-score equating, and two equipercentile methods. (SLD)

Descriptors: Equated Scores, Item Response Theory, Sampling, Statistical Analysis

Does Matching in Equating Work? A Discussion.

Peer reviewed

Kolen, Michael J. – Applied Measurement in Education, 1990

Articles on equating test forms in this issue are reviewed and discussed. The results of these papers collectively indicate that matching on the anchor test does not result in more accurate equating. Implications for research are discussed. (SLD)

Descriptors: Equated Scores, Item Response Theory, Research Design, Sampling

To Match or Not to Match Samples on Ability for Equating: A Discussion of Five Articles.

Peer reviewed

Skaggs, Gary – Applied Measurement in Education, 1990

The articles in this issue that address the effect of matching samples on ability are reviewed. In spite of these examinations of equating methods and sampling plans, it is still hard to determine a definitive answer to the question of to match or not to match. Implications are discussed. (SLD)

Descriptors: Equated Scores, Item Response Theory, Research Methodology, Sampling

A Discussion of Population Invariance of Equating

Peer reviewed

Direct link

Petersen, Nancy S. – Applied Psychological Measurement, 2008

This article discusses the five studies included in this issue. Each article addressed the same topic, population invariance of equating. They all used data from major standardized testing programs, and they all used essentially the same statistics to evaluate their results, namely, the root mean square difference and root expected mean square…

Descriptors: Testing Programs, Standardized Tests, Equated Scores, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2

Dorans, Neil J.	4
Eignor, Daniel R.	2
Lawrence, Ida M.	2
von Davier, Alina A.	2
Baker, Frank B.	1
Blais, Jean-Guy	1
Brennan, Robert L.	1
Chen, Hanwei	1
Cui, Zhongmin	1
DeCarlo, Lawrence T.	1
Diao, Hongyu	1
Donovan, Jenny	1
Duong, Minh Q.	1
Fang, Yu	1
Haberman, Shelby J.	1
Haertel, Edward H.	1
Hammond, Shelby	1
Hutton, Penny	1
Jiang, Yanlin	1
Keller, Lisa	1
Kim, YoungKoung	1
Kolen, Michael J.	1
Kramer, Gene A.	1
Lee, Yi-Hsuan	1
More ▼