Publication Date
| In 2026 | 0 |
| Since 2025 | 7 |
| Since 2022 (last 5 years) | 42 |
| Since 2017 (last 10 years) | 126 |
| Since 2007 (last 20 years) | 479 |
Descriptor
Source
Author
| Bianchini, John C. | 35 |
| von Davier, Alina A. | 34 |
| Dorans, Neil J. | 33 |
| Kolen, Michael J. | 31 |
| Loret, Peter G. | 31 |
| Kim, Sooyeon | 26 |
| Moses, Tim | 24 |
| Livingston, Samuel A. | 22 |
| Holland, Paul W. | 20 |
| Puhan, Gautam | 20 |
| Liu, Jinghua | 19 |
| More ▼ | |
Publication Type
Education Level
Location
| Canada | 9 |
| Australia | 8 |
| Florida | 8 |
| United Kingdom (England) | 8 |
| Netherlands | 7 |
| New York | 7 |
| United States | 7 |
| Israel | 6 |
| Turkey | 6 |
| United Kingdom | 6 |
| California | 5 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 12 |
| No Child Left Behind Act 2001 | 5 |
| Education Consolidation… | 3 |
| Hawkins Stafford Act 1988 | 1 |
| Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
Moses, Tim; Holland, Paul – ETS Research Report Series, 2008
This study addressed 2 issues of using loglinear models for smoothing univariate test score distributions and for enhancing the stability of equipercentile equating functions. One issue was a comparative assessment of several statistical strategies that have been proposed for selecting 1 from several competing model parameterizations. Another…
Descriptors: Equated Scores, Selection, Models, Statistical Analysis
Moses, Tim – ETS Research Report Series, 2008
Nine statistical strategies for selecting equating functions in an equivalent groups design were evaluated. The strategies of interest were likelihood ratio chi-square tests, regression tests, Kolmogorov-Smirnov tests, and significance tests for equated score differences. The most accurate strategies in the study were the likelihood ratio tests…
Descriptors: Equated Scores, Statistical Analysis, Statistical Significance, Regression (Statistics)
Eignor, Daniel R. – Educational Measurement: Issues and Practice, 2008
This article discusses a particular type of concordance table and the potential for test score misuse that may result from employing such a table. The concordance that is discussed is typically created between scores on different, nonequatable versions of a test that share the same or close to the same test title. These concordance tables often…
Descriptors: Scores, Tables (Data), Comparative Analysis, Equated Scores
Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010
Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…
Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques
Effect of Repeaters on Score Equating in a Large-Scale Licensure Test. Research Report. ETS RR-09-27
Kim, Sooyeon; Walker, Michael E. – ETS Research Report Series, 2009
This study investigated the subgroup invariance of equating functions for a licensure test in the context of a nonequivalent groups with anchor test (NEAT) design. Examinees who had taken a new, to-be-equated form of the test were divided into three subgroups according to their previous testing experience: (a) repeaters who previously took the…
Descriptors: Equated Scores, Licensing Examinations (Professions), Test Construction, Repetition
Goldman, Robert N.; McKenzie, John D. Jr. – Teaching Statistics: An International Journal for Teachers, 2009
We explain how to simulate both univariate and bivariate raw data sets having specified values for common summary statistics. The first example illustrates how to "construct" a data set having prescribed values for the mean and the standard deviation--for a one-sample t test with a specified outcome. The second shows how to create a bivariate data…
Descriptors: Correlation, Equated Scores, Statistical Analysis, Weighted Scores
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – ETS Research Report Series, 2008
This study examined variations of the nonequivalent-groups equating design for mixed-format tests--tests containing both multiple-choice (MC) and constructed-response (CR) items--to determine which design was most effective in producing equivalent scores across the two tests to be equated. Four linking designs were examined: (a) an anchor with…
Descriptors: Equated Scores, Test Format, Multiple Choice Tests, Responses
von Davier, Alina A.; Manalo, Jonathan R.; Rijmen, Frank – ETS Research Report Series, 2008
The standard errors of the 2 most widely used population-invariance measures of equating functions, root mean square difference (RMSD) and root expected mean square difference (REMSD), are not derived for common equating methods such as linear equating. Consequently, it is unknown how much noise is contained in these estimates. This paper…
Descriptors: Equated Scores, Error of Measurement, Statistical Analysis, Sampling
Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet – Pearson, 2012
Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…
Descriptors: Equated Scores, Test Items, Test Format, Item Response Theory
van Rijn, P. W.; Beguin, A. A.; Verstralen, H. H. F. M. – Assessment in Education: Principles, Policy & Practice, 2012
While measurement precision is relatively easy to establish for single tests and assessments, it is much more difficult to determine for decision making with multiple tests on different subjects. This latter is the situation in the system of final examinations for secondary education in the Netherlands and is used as an example in this paper. This…
Descriptors: Secondary Education, Tests, Foreign Countries, Decision Making
Suh, Youngsuk; Mroch, Andrew A.; Kane, Michael T.; Ripkey, Douglas R. – Measurement: Interdisciplinary Research and Perspectives, 2009
In this study, a data base containing the responses of 40,000 candidates to 90 multiple-choice questions was used to mimic data sets for 50-item tests under the "nonequivalent groups with anchor test" (NEAT) design. Using these smaller data sets, we evaluated the performance of five linear equating methods for the NEAT design with five levels of…
Descriptors: Test Items, Equated Scores, Methods, Differences
Wells, Craig S.; Baldwin, Su; Hambleton, Ronald K.; Sireci, Stephen G.; Karatonis, Ana; Jirka, Stephen – Applied Measurement in Education, 2009
Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study…
Descriptors: National Competency Tests, Test Bias, Equated Scores, Grade 8
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010
Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…
Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis
Wang, Tianyou; Lee, Won-Chan; Brennan, Robert L.; Kolen, Michael J. – Applied Psychological Measurement, 2008
This article uses simulation to compare two test equating methods under the common-item nonequivalent groups design: the frequency estimation method and the chained equipercentile method. An item response theory model is used to define the true equating criterion, simulate group differences, and generate response data. Three linear equating…
Descriptors: Equated Scores, Item Response Theory, Simulation, Comparative Analysis

Peer reviewed
Direct link
