Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 30 |
Descriptor
Equated Scores | 30 |
Testing Programs | 30 |
Item Response Theory | 14 |
Error of Measurement | 10 |
Evaluation Methods | 8 |
Mathematics Tests | 8 |
Test Items | 8 |
Scaling | 7 |
Statistical Analysis | 7 |
Test Reliability | 7 |
Academic Achievement | 6 |
More ▼ |
Source
Author
von Davier, Alina A. | 3 |
Guo, Hongwen | 2 |
Haberman, Shelby | 2 |
Keller, Lisa A. | 2 |
Kim, Sooyeon | 2 |
Brennan, Robert L. | 1 |
Chen, Hanwei | 1 |
Colvin, Kimberly F. | 1 |
Cook, Robert J. | 1 |
Cui, Zhongmin | 1 |
Dorans, Neil | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 8 |
Secondary Education | 5 |
Elementary Education | 4 |
Adult Education | 3 |
Early Childhood Education | 3 |
Grade 3 | 3 |
Grade 4 | 3 |
Grade 5 | 3 |
Grade 6 | 3 |
Grade 7 | 3 |
Grade 8 | 3 |
More ▼ |
Audience
Location
New York | 3 |
Canada | 2 |
Minnesota | 1 |
North Carolina | 1 |
United States | 1 |
Laws, Policies, & Programs
Assessments and Surveys
General Educational… | 2 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F. – Applied Measurement in Education, 2016
The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…
Descriptors: Item Response Theory, Equated Scores, Test Format, Testing Programs
Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2015
An equating procedure for a testing program with evolving distribution of examinee profiles is developed. No anchor is available because the original scoring scheme was based on expert judgment of the item difficulties. Pairs of examinees from two administrations are formed by matching on coarsened propensity scores derived from a set of…
Descriptors: Equated Scores, Testing Programs, College Entrance Examinations, Scoring
LaFlair, Geoffrey T.; Isbell, Daniel; May, L. D. Nicolas; Gutierrez Arvizu, Maria Nelly; Jamieson, Joan – Language Testing, 2017
Language programs need multiple test forms for secure administrations and effective placement decisions, but can they have confidence that scores on alternate test forms have the same meaning? In large-scale testing programs, various equating methods are available to ensure the comparability of forms. The choice of equating method is informed by…
Descriptors: Language Tests, Equated Scores, Testing Programs, Comparative Analysis
Livingston, Samuel A. – ETS Research Report Series, 2014
In this study, I investigated 2 procedures intended to create test-taker groups of equal ability by poststratifying on a composite variable created from demographic information. In one procedure, the stratifying variable was the composite variable that best predicted the test score. In the other procedure, the stratifying variable was the…
Descriptors: Demography, Equated Scores, Cluster Grouping, Ability Grouping
GED Testing Service, 2014
This manual was written to provide technical information regarding the General Educational Development (GED®) test as evidence that the GED® test is technically sound. Throughout this manual, documentation is provided regarding the development of the GED® test and data collection activities, as well as evidence of reliability and validity. This…
Descriptors: High School Equivalency Programs, Equivalency Tests, Testing Programs, Test Validity
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
Guo, Hongwen; Liu, Jinghua; Dorans, Neil; Feigenbaum, Miriam – ETS Research Report Series, 2011
Maintaining score stability is crucial for an ongoing testing program that administers several tests per year over many years. One way to stall the drift of the score scale is to use an equating design with multiple links. In this study, we use the operational and experimental SAT® data collected from 44 administrations to investigate the effect…
Descriptors: Equated Scores, College Entrance Examinations, Reliability, Testing Programs
Wyse, Adam E.; Reckase, Mark D. – Applied Psychological Measurement, 2011
An essential concern in the application of any equating procedure is determining whether tests can be considered equated after the tests have been placed onto a common scale. This article clarifies one equating criterion, the first-order equity property of equating, and develops a new method for evaluating equating that is linked to this…
Descriptors: Lawyers, Licensing Examinations (Professions), Testing Programs, Graphs
Keller, Lisa A.; Keller, Robert R. – Educational and Psychological Measurement, 2011
This article investigates the accuracy of examinee classification into performance categories and the estimation of the theta parameter for several item response theory (IRT) scaling techniques when applied to six administrations of a test. Previous research has investigated only two administrations; however, many testing programs equate tests…
Descriptors: Item Response Theory, Scaling, Sustainability, Classification
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – Applied Measurement in Education, 2011
The synthetic function is a weighted average of the identity (the linking function for forms that are known to be completely parallel) and a traditional equating method. The purpose of the present study was to investigate the benefits of the synthetic function on small-sample equating using various real data sets gathered from different…
Descriptors: Testing Programs, Equated Scores, Investigations, Data Analysis
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
New York State Education Department, 2016
This technical report provides detailed information regarding the technical, statistical, and measurement attributes of the New York State Testing Program (NYSTP) for the Grades 3-8 Common Core English Language Arts (ELA) and Mathematics 2016 Operational Tests. This report includes information about test content and test development, item (i.e.,…
Descriptors: Testing Programs, English, Language Arts, Mathematics Tests
Northwest Evaluation Association, 2014
Recently, Northwest Evaluation Association (NWEA) completed a study to connect the scale of the Minnesota Comprehensive Assessments (MCA) Testing Program used for Minnesota's mathematics and reading assessments with NWEA's RIT (Rasch Unit) scale. Information from the state assessments was used in a study to establish performance-level scores on…
Descriptors: Alignment (Education), Testing Programs, State Programs, Mathematics Tests
Guo, Hongwen – Psychometrika, 2010
After many equatings have been conducted in a testing program, equating errors can accumulate to a degree that is not negligible compared to the standard error of measurement. In this paper, the author investigates the asymptotic accumulative standard error of equating (ASEE) for linear equating methods, including chained linear, Tucker, and…
Descriptors: Testing Programs, Testing, Error of Measurement, Equated Scores
Northwest Evaluation Association, 2014
Recently, the Northwest Evaluation Association (NWEA) completed a study to connect the scale of the North Carolina State End of Grade (EOG) Testing Program used for North Carolina's mathematics and reading assessments with NWEA's Rausch Interval Unit (RIT) scale. Information from the state assessments was used in a study to establish…
Descriptors: Alignment (Education), Testing Programs, Equated Scores, Standard Setting
Previous Page | Next Page »
Pages: 1 | 2