ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	25

Descriptor

Equated Scores	69
Item Response Theory	32
Evaluation Methods	16
Test Items	16
Comparative Analysis	13
Statistical Analysis	11
Test Format	11
Error of Measurement	10
Scaling	10
Estimation (Mathematics)	9
Simulation	9
Achievement Tests	8
Mathematical Models	8
Sampling	8
Equations (Mathematics)	7
Multiple Choice Tests	7
College Entrance Examinations	6
Computer Simulation	6
Latent Trait Theory	6
Standardized Tests	6
True Scores	6
Models	5
Scoring	5
Statistical Distributions	5
Test Theory	5
More ▼

Source

Applied Psychological…

Publication Type

Journal Articles	68
Reports - Evaluative	39
Reports - Research	19
Information Analyses	4
Reports - Descriptive	4
Collected Works - Serials	2
Book/Product Reviews	1
Collected Works - General	1

Education Level

Higher Education	5
High Schools	3

Audience

Location

Australia	1
Israel	1

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Educational…	2
SAT (College Admission Test)	2
ACT Assessment	1
Armed Services Vocational…	1
California Learning…	1
Iowa Tests of Basic Skills	1
Law School Admission Test	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 69 results Save | Export

Equated Pooled Booklet Method in DIF Testing

Peer reviewed

Direct link

Cheng, Ying; Chen, Peihua; Qian, Jiahe; Chang, Hua-Hua – Applied Psychological Measurement, 2013

Differential item functioning (DIF) analysis is an important step in the data analysis of large-scale testing programs. Nowadays, many such programs endorse matrix sampling designs to reduce the load on examinees, such as the balanced incomplete block (BIB) design. These designs pose challenges to the traditional DIF analysis methods. For example,…

Descriptors: Test Bias, Equated Scores, Test Items, Effect Size

Software Note: Using BILOG for Fixed-Anchor Item Calibration

Peer reviewed

Direct link

DeMars, Christine E.; Jurich, Daniel P. – Applied Psychological Measurement, 2012

The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…

Descriptors: Computer Software, Item Response Theory, Scaling, Equated Scores

Observed Score and True Score Equating Procedures for Multidimensional Item Response Theory

Peer reviewed

Direct link

Brossman, Bradley G.; Lee, Won-Chan – Applied Psychological Measurement, 2013

The purpose of this research was to develop observed score and true score equating procedures to be used in conjunction with the multidimensional item response theory (MIRT) framework. Three equating procedures--two observed score procedures and one true score procedure--were created and described in detail. One observed score procedure was…

Descriptors: Equated Scores, True Scores, Item Response Theory, Mathematics Tests

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Peer reviewed

Direct link

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores

Local Observed-Score Equating with Anchor-Test Designs

Peer reviewed

Direct link

van der Linden, Wim J.; Wiberg, Marie – Applied Psychological Measurement, 2010

For traditional methods of observed-score equating with anchor-test designs, such as chain and poststratification equating, it is difficult to satisfy the criteria of equity and population invariance. Their equatings are therefore likely to be biased. The bias in these methods was evaluated against a simple local equating method in which the…

Descriptors: Methods, Equated Scores, Test Items, Bias

Investigating the Impact of Compromised Anchor Items on IRT Equating under the Nonequivalent Anchor Test Design

Peer reviewed

Direct link

Jurich, Daniel P.; DeMars, Christine E.; Goodman, Joshua T. – Applied Psychological Measurement, 2012

The prevalence of high-stakes test scores as a basis for significant decisions necessitates the dissemination of accurate and fair scores. However, the magnitude of these decisions has created an environment in which examinees may be prone to resort to cheating. To reduce the risk of cheating, multiple test forms are commonly administered. When…

Descriptors: High Stakes Tests, Scores, Prevention, Cheating

A Graphical Approach to Evaluating Equating Using Test Characteristic Curves

Peer reviewed

Direct link

Wyse, Adam E.; Reckase, Mark D. – Applied Psychological Measurement, 2011

An essential concern in the application of any equating procedure is determining whether tests can be considered equated after the tests have been placed onto a common scale. This article clarifies one equating criterion, the first-order equity property of equating, and develops a new method for evaluating equating that is linked to this…

Descriptors: Lawyers, Licensing Examinations (Professions), Testing Programs, Graphs

Two Approaches for Using Multiple Anchors in NEAT Equating: A Description and Demonstration

Peer reviewed

Direct link

Moses, Tim; Deng, Weiling; Zhang, Yu-Li – Applied Psychological Measurement, 2011

Nonequivalent groups with anchor test (NEAT) equating functions that use a single anchor can have accuracy problems when the groups are extremely different and/or when the anchor weakly correlates with the tests being equated. Proposals have been made to address these issues by incorporating more than one anchor into NEAT equating functions. These…

Descriptors: Equated Scores, Tests, Comparative Analysis, Correlation

A Comparison of Statistical Significance Tests for Selecting Equating Functions

Peer reviewed

Direct link

Moses, Tim – Applied Psychological Measurement, 2009

This study compared the accuracies of nine previously proposed statistical significance tests for selecting identity, linear, and equipercentile equating functions in an equivalent groups equating design. The strategies included likelihood ratio tests for the loglinear models of tests' frequency distributions, regression tests, Kolmogorov-Smirnov…

Descriptors: Statistical Significance, Equated Scores, Comparative Analysis, Tests

A Comparison of Anchor-Item Designs for the Concurrent Calibration of Large Banks of Likert-Type Items

Peer reviewed

Direct link

Garcia-Perez, Miguel A.; Alcala-Quintana, Rocio; Garcia-Cueto, Eduardo – Applied Psychological Measurement, 2010

Current interest in measuring quality of life is generating interest in the construction of computerized adaptive tests (CATs) with Likert-type items. Calibration of an item bank for use in CAT requires collecting responses to a large number of candidate items. However, the number is usually too large to administer to each subject in the…

Descriptors: Comparative Analysis, Test Items, Equated Scores, Item Banks

Asymptotic and Sampling-Based Standard Errors for Two Population Invariance Measures in the Linear Equating Case

Peer reviewed

Direct link

Rijmen, Frank; Manalo, Jonathan R.; von Davier, Alina A. – Applied Psychological Measurement, 2009

This article describes two methods for obtaining the standard errors of two commonly used population invariance measures of equating functions: the root mean square difference of the subpopulation equating functions from the overall equating function and the root expected mean square difference. The delta method relies on an analytical…

Descriptors: Error of Measurement, Sampling, Equated Scores, Statistical Analysis

IRTEQ: Windows Application that Implements Item Response Theory Scaling and Equating

Peer reviewed

Direct link

Han, Kyung T. – Applied Psychological Measurement, 2009

This article provides a brief description of a Windows application called IRTEQ. IRTEQ employs an intuitive, user-friendly graphic user interface that can rescale one test form to another by using various item response theory (IRT) scaling methods. It supports various IRT models for test forms. It can also equate test scores on the scale of one…

Descriptors: Item Response Theory, Scaling, True Scores, Equated Scores

The Continuized Log-Linear Method: An Alternative to the Kernel Method of Continuization in Test Equating

Peer reviewed

Direct link

Wang, Tianyou – Applied Psychological Measurement, 2008

Von Davier, Holland, and Thayer (2004) laid out a five-step framework of test equating that can be applied to various data collection designs and equating methods. In the continuization step, they presented an adjusted Gaussian kernel method that preserves the first two moments. This article proposes an alternative continuization method that…

Descriptors: Equated Scores, Models, Data Collection, Computation

A Comparison of the Frequency Estimation and Chained Equipercentile Methods under the Common-Item Nonequivalent Groups Design

Peer reviewed

Direct link

Wang, Tianyou; Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Applied Psychological Measurement, 2008

This article uses simulation to compare two test equating methods under the common-item nonequivalent groups design: the frequency estimation method and the chained equipercentile method. An item response theory model is used to define the true equating criterion, simulate group differences, and generate response data. Three linear equating…

Descriptors: Equated Scores, Item Response Theory, Simulation, Comparative Analysis

Anchor Test Type and Population Invariance: An Exploration across Subpopulations and Test Administrations

Peer reviewed

Direct link

Dorans, Neil J.; Liu, Jinghua; Hammond, Shelby – Applied Psychological Measurement, 2008

This exploratory study was built on research spanning three decades. Petersen, Marco, and Stewart (1982) conducted a major empirical investigation of the efficacy of different equating methods. The studies reported in Dorans (1990) examined how different equating methods performed across samples selected in different ways. Recent population…

Descriptors: Test Format, Equated Scores, Sampling, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Hanson, Bradley A.	6
Kolen, Michael J.	6
Baker, Frank B.	5
Brennan, Robert L.	4
Harris, Deborah J.	4
Wang, Tianyou	4
Zeng, Lingjia	4
van der Linden, Wim J.	4
Cohen, Allan S.	2
DeMars, Christine E.	2
Jurich, Daniel P.	2
Kim, Seock-Ho	2
Lee, Won-Chan	2
Liou, Michelle	2
Lissitz, Robert W.	2
Moses, Tim	2
von Davier, Alina A.	2
Alcala-Quintana, Rocio	1
Ankenmann, Robert D.	1
Beguin, Anton A.	1
Brossman, Bradley G.	1
Camilli, Gregory	1
Chang, Hua-Hua	1
Chen, Hanwei	1
More ▼