ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	4

Descriptor

Comparative Analysis	4
Testing Programs	4
Equated Scores	3
Sample Size	3
Test Items	3
Correlation	2
Item Response Theory	2
Regression (Statistics)	2
Scores	2
Simulation	2
Statistical Analysis	2
Charts	1
Computer Assisted Testing	1
Computer Software	1
Data Analysis	1
Educational Assessment	1
English (Second Language)	1
Error of Measurement	1
Essays	1
Evaluation Research	1
Evaluators	1
Foreign Countries	1
Generalization	1
Interrater Reliability	1
Item Analysis	1
More ▼

Source

ETS Research Report Series

Author

von Davier, Alina A.	2
Breyer, F. Jay	1
Haberman, Shelby	1
Kim, Sooyeon	1
Lee, Yi-Hsuan	1
Lorenz, Florian	1
Qian, Jiahe	1
Wang, Lin	1
Zhang, Mo	1

Publication Type

Journal Articles	4
Reports - Research	4
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

Exploring Alternative Test Form Linking Designs with Modified Equating Sample Size and Anchor Test Length. Research Report. ETS RR-13-02

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013

The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…

Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation

The Use of Quality Control and Data Mining Techniques for Monitoring Scaled Scores: An Overview. Research Report. ETS RR-12-20

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A. – ETS Research Report Series, 2012

Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…

Descriptors: Quality Control, Data Analysis, Trend Analysis, Scaling

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

An Alternative to Equating with Small Samples in the Non-Equivalent Groups Anchor Test Design. Research Report. ETS RR-06-27

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – ETS Research Report Series, 2006

This study addresses the sample error and linking bias that occur with small and unrepresentative samples in a non-equivalent groups anchor test (NEAT) design. We propose a linking method called the "synthetic function," which is a weighted average of the identity function (the trivial equating function for forms that are known to be…

Descriptors: Equated Scores, Sample Size, Test Items, Statistical Bias