Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 17 |
Descriptor
Source
Author
Angoff, William H. | 4 |
Andrulis, Richard S. | 2 |
Baird, Jo-Anne | 2 |
Cohen, Allan S. | 2 |
Cook, Linda L. | 2 |
Gilmer, Jerry S. | 2 |
Hoover, H. D. | 2 |
Jaeger, Richard M. | 2 |
Kim, Seock-Ho | 2 |
Kolen, Michael J. | 2 |
Lissitz, Robert W. | 2 |
More ▼ |
Publication Type
Education Level
Elementary Secondary Education | 6 |
Secondary Education | 2 |
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Researchers | 10 |
Location
United Kingdom (England) | 3 |
United States | 3 |
Australia | 2 |
Netherlands | 2 |
New York | 2 |
United Kingdom | 2 |
United Kingdom (Wales) | 2 |
Alaska | 1 |
California | 1 |
Hawaii | 1 |
Idaho | 1 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 4 |
Assessments and Surveys
What Works Clearinghouse Rating
Chang, Kuo-Feng – ProQuest LLC, 2022
This dissertation was designed to foster a deeper understanding of population invariance in the context of composite-score equating and provide practitioners with guidelines for addressing score equity concerns at the composite score level. The purpose of this dissertation was threefold. The first was to compare different composite equating…
Descriptors: Test Items, Equated Scores, Methods, Design
Kim, Sooyeon; Walker, Michael E. – Educational Measurement: Issues and Practice, 2022
Test equating requires collecting data to link the scores from different forms of a test. Problems arise when equating samples are not equivalent and the test forms to be linked share no common items by which to measure or adjust for the group nonequivalence. Using data from five operational test forms, we created five pairs of research forms for…
Descriptors: Ability, Tests, Equated Scores, Testing Problems
Benton, Tom; Williamson, Joanna – Research Matters, 2022
Equating methods are designed to adjust between alternate versions of assessments targeting the same content at the same level, with the aim that scores from the different versions can be used interchangeably. The statistical processes used in equating have, however, been extended to statistically "link" assessments that differ, such as…
Descriptors: Statistical Analysis, Equated Scores, Definitions, Alternative Assessment
Uysal, Ibrahim; Sahin-Kürsad, Merve; Kiliç, Abdullah Faruk – Participatory Educational Research, 2022
The aim of the study was to examine the common items in the mixed format (e.g., multiple-choices and essay items) contain parameter drifts in the test equating processes performed with the common item nonequivalent groups design. In this study, which was carried out using Monte Carlo simulation with a fully crossed design, the factors of test…
Descriptors: Test Items, Test Format, Item Response Theory, Equated Scores
McGill, Ryan J.; Ward, Thomas J.; Canivez, Gary L. – School Psychology International, 2020
The Wechsler Intelligence Scale for Children (WISC) is the most widely used intelligence test in the world. Now in its fifth edition, the WISC-V has been translated and adapted for use in nearly a dozen countries. Despite its popularity, numerous concerns have been raised about some of the procedures used to develop and validate translated and…
Descriptors: Children, Intelligence Tests, Translation, Test Validity
van der Linden, Wim J. – Journal of Educational and Behavioral Statistics, 2019
Lord's (1980) equity theorem claims observed-score equating to be possible only when two test forms are perfectly reliable or strictly parallel. An analysis of its proof reveals use of an incorrect statistical assumption. The assumption does not invalidate the theorem itself though, which can be shown to follow directly from the discrete nature of…
Descriptors: Equated Scores, Testing Problems, Item Response Theory, Evaluation Methods
Kettler, Ryan J. – School Psychology International, 2020
This article is a commentary on McGill et al.'s (2020) article "Use of Translated and Adapted Versions of the WISC-V: Caveat Emptor." McGill et al. use caveat emptor in their title to indicate that the buyer of an assessment must be careful about the product being purchased, presumably because the seller of the assessment is not being…
Descriptors: Children, Intelligence Tests, Translation, Test Reliability
Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020
Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…
Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems
El Masri, Yasmine H.; Baird, Jo-Anne; Graesser, Art – Assessment in Education: Principles, Policy & Practice, 2016
We investigate the extent to which language versions (English, French and Arabic) of the same science test are comparable in terms of item difficulty and demands. We argue that language is an inextricable part of the scientific literacy construct, be it intended or not by the examiner. This argument has considerable implications on methodologies…
Descriptors: International Assessment, Difficulty Level, Test Items, Language Variation
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
Haberman, Shelby J. – Educational Testing Service, 2010
Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…
Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy
van Rijn, P. W.; Beguin, A. A.; Verstralen, H. H. F. M. – Assessment in Education: Principles, Policy & Practice, 2012
While measurement precision is relatively easy to establish for single tests and assessments, it is much more difficult to determine for decision making with multiple tests on different subjects. This latter is the situation in the system of final examinations for secondary education in the Netherlands and is used as an example in this paper. This…
Descriptors: Secondary Education, Tests, Foreign Countries, Decision Making
Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010
Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010
This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010
"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…
Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques