Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 23 |
Descriptor
Equated Scores | 35 |
Simulation | 35 |
Test Items | 35 |
Item Response Theory | 20 |
Comparative Analysis | 14 |
Difficulty Level | 12 |
Error of Measurement | 12 |
Statistical Analysis | 8 |
Sample Size | 7 |
Test Construction | 6 |
Test Format | 6 |
More ▼ |
Source
Author
Antal, Judit | 2 |
Holland, Paul | 2 |
Sinharay, Sandip | 2 |
Andersson, Björn | 1 |
Ban, Jae-Chun | 1 |
Beretvas, S. Natasha | 1 |
Beyerlein, Michael M. | 1 |
Bogan, Evelyn Doody | 1 |
Boldt, R. F. | 1 |
Bolt, Daniel M. | 1 |
Boughton, Keith | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Elementary Education | 1 |
Grade 4 | 1 |
Intermediate Grades | 1 |
Audience
Location
Florida | 1 |
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 2 |
Advanced Placement… | 1 |
Florida Comprehensive… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Bramley, Tom – Research Matters, 2020
The aim of this study was to compare, by simulation, the accuracy of mapping a cut-score from one test to another by expert judgement (using the Angoff method) versus the accuracy with a small-sample equating method (chained linear equating). As expected, the standard-setting method resulted in more accurate equating when we assumed a higher level…
Descriptors: Cutting Scores, Standard Setting (Scoring), Equated Scores, Accuracy
Fitzpatrick, Joseph; Skorupski, William P. – Journal of Educational Measurement, 2016
The equating performance of two internal anchor test structures--miditests and minitests--is studied for four IRT equating methods using simulated data. Originally proposed by Sinharay and Holland, miditests are anchors that have the same mean difficulty as the overall test but less variance in item difficulties. Four popular IRT equating methods…
Descriptors: Difficulty Level, Test Items, Comparative Analysis, Test Construction
Guo, Rui; Zheng, Yi; Chang, Hua-Hua – Journal of Educational Measurement, 2015
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the…
Descriptors: Item Response Theory, Test Items, Evaluation Methods, Equated Scores
Andersson, Björn – Journal of Educational Measurement, 2016
In observed-score equipercentile equating, the goal is to make scores on two scales or tests measuring the same construct comparable by matching the percentiles of the respective score distributions. If the tests consist of different items with multiple categories for each item, a suitable model for the responses is a polytomous item response…
Descriptors: Equated Scores, Item Response Theory, Error of Measurement, Tests
Kopf, Julia; Zeileis, Achim; Strobl, Carolin – Educational and Psychological Measurement, 2015
Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model,…
Descriptors: Test Items, Equated Scores, Test Bias, Item Response Theory
Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2017
This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…
Descriptors: Psychometrics, Test Items, Item Response Theory, Hypothesis Testing
Cao, Yi; Lu, Ru; Tao, Wei – ETS Research Report Series, 2014
The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2-parameter logistic [2PL] model), (b) combine the interdependent items to form a…
Descriptors: Item Response Theory, Equated Scores, Test Items, Simulation
Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014
In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory
He, Yong – ProQuest LLC, 2013
Common test items play an important role in equating multiple test forms under the common-item nonequivalent groups design. Inconsistent item parameter estimates among common items can lead to large bias in equated scores for IRT true score equating. Current methods extensively focus on detection and elimination of outlying common items, which…
Descriptors: Test Items, Regression (Statistics), Simulation, Comparative Analysis
Li, Yanmei – ETS Research Report Series, 2012
In a common-item (anchor) equating design, the common items should be evaluated for item parameter drift. Drifted items are often removed. For a test that contains mostly dichotomous items and only a small number of polytomous items, removing some drifted polytomous anchor items may result in anchor sets that no longer resemble mini-versions of…
Descriptors: Scores, Item Response Theory, Equated Scores, Simulation
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
Wang, Wei – ProQuest LLC, 2013
Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…
Descriptors: Equated Scores, Test Format, Test Items, Test Length
Carvajal-Espinoza, Jorge E. – ProQuest LLC, 2011
The Non-Equivalent groups with Anchor Test equating (NEAT) design is a widely used equating design in large scale testing that involves two groups that do not have to be of equal ability. One group P gets form X and a group of items A and the other group Q gets form Y and the same group of items A. One of the most commonly used equating methods in…
Descriptors: Sample Size, Equated Scores, Psychometrics, Measurement
Sunnassee, Devdass – ProQuest LLC, 2011
Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…
Descriptors: Test Length, Test Format, Sample Size, Simulation