Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 4 |
Descriptor
Comparative Analysis | 4 |
Testing Programs | 4 |
Equated Scores | 3 |
Sample Size | 3 |
Test Items | 3 |
Correlation | 2 |
Item Response Theory | 2 |
Regression (Statistics) | 2 |
Scores | 2 |
Simulation | 2 |
Statistical Analysis | 2 |
More ▼ |
Source
ETS Research Report Series | 4 |
Author
von Davier, Alina A. | 2 |
Breyer, F. Jay | 1 |
Haberman, Shelby | 1 |
Kim, Sooyeon | 1 |
Lee, Yi-Hsuan | 1 |
Lorenz, Florian | 1 |
Qian, Jiahe | 1 |
Wang, Lin | 1 |
Zhang, Mo | 1 |
Publication Type
Journal Articles | 4 |
Reports - Research | 4 |
Speeches/Meeting Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
von Davier, Alina A. – ETS Research Report Series, 2012
Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…
Descriptors: Quality Control, Data Analysis, Trend Analysis, Scaling
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – ETS Research Report Series, 2006
This study addresses the sample error and linking bias that occur with small and unrepresentative samples in a non-equivalent groups anchor test (NEAT) design. We propose a linking method called the "synthetic function," which is a weighted average of the identity function (the trivial equating function for forms that are known to be…
Descriptors: Equated Scores, Sample Size, Test Items, Statistical Bias