Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 13 |
Descriptor
Testing Programs | 13 |
Equated Scores | 5 |
Scores | 5 |
Test Items | 5 |
College Entrance Examinations | 4 |
Comparative Analysis | 4 |
Scoring | 4 |
Statistical Analysis | 4 |
Test Construction | 4 |
Correlation | 3 |
Educational Assessment | 3 |
More ▼ |
Source
ETS Research Report Series | 13 |
Author
Dorans, Neil J. | 2 |
Liu, Jinghua | 2 |
von Davier, Alina A. | 2 |
Breyer, F. Jay | 1 |
Deane, Paul | 1 |
Deng, Weiling | 1 |
Dorans, Neil | 1 |
Feigenbaum, Miriam | 1 |
Guo, Hongwen | 1 |
Haberman, Shelby | 1 |
Haberman, Shelby J. | 1 |
More ▼ |
Publication Type
Journal Articles | 13 |
Reports - Research | 11 |
Reports - Evaluative | 2 |
Tests/Questionnaires | 2 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 4 |
Postsecondary Education | 4 |
Elementary Secondary Education | 2 |
Early Childhood Education | 1 |
Preschool Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
SAT (College Admission Test) | 3 |
Praxis Series | 2 |
ACT Assessment | 1 |
Early Childhood Longitudinal… | 1 |
Graduate Record Examinations | 1 |
National Assessment of… | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Dorans, Neil J. – ETS Research Report Series, 2020
This report, which is based on an invited presentation given at the 2015 meeting of the Association of Test Publishers, is a response to the continuing proliferation of scale linking studies that have occurred since the publication of "Uncommon Measures" in 1999.The report has four parts. First, I restate the conclusions made in…
Descriptors: State Programs, Testing Programs, National Competency Tests, College Entrance Examinations
Tannenbaum, Richard J.; Kane, Michael T. – ETS Research Report Series, 2019
Testing programs are often classified as high or low stakes to indicate how stringently they need to be evaluated. However, in practice, this classification falls short. A high-stakes label is taken to imply that all indicators of measurement quality must meet high standards; whereas a low-stakes label is taken to imply the opposite. This approach…
Descriptors: High Stakes Tests, Testing Programs, Measurement, Evaluation Criteria
Haberman, Shelby J. – ETS Research Report Series, 2020
Best linear prediction (BLP) and penalized best linear prediction (PBLP) are techniques for combining sources of information to produce task scores, section scores, and composite test scores. The report examines issues to consider in operational implementation of BLP and PBLP in testing programs administered by ETS [Educational Testing Service].
Descriptors: Prediction, Scores, Tests, Testing Programs
Livingston, Samuel A. – ETS Research Report Series, 2014
In this study, I investigated 2 procedures intended to create test-taker groups of equal ability by poststratifying on a composite variable created from demographic information. In one procedure, the stratifying variable was the composite variable that best predicted the test score. In the other procedure, the stratifying variable was the…
Descriptors: Demography, Equated Scores, Cluster Grouping, Ability Grouping
Sabatini, John; O'Reilly, Tenaha; Deane, Paul – ETS Research Report Series, 2013
This report describes the foundation and rationale for a framework designed to measure reading literacy. The aim of the effort is to build an assessment system that reflects current theoretical conceptions of reading and is developmentally sensitive across a prekindergarten to 12th grade student range. The assessment framework is intended to…
Descriptors: Reading Tests, Literacy, Models, Testing Programs
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
Guo, Hongwen; Liu, Jinghua; Dorans, Neil; Feigenbaum, Miriam – ETS Research Report Series, 2011
Maintaining score stability is crucial for an ongoing testing program that administers several tests per year over many years. One way to stall the drift of the score scale is to use an equating design with multiple links. In this study, we use the operational and experimental SAT® data collected from 44 administrations to investigate the effect…
Descriptors: Equated Scores, College Entrance Examinations, Reliability, Testing Programs
von Davier, Alina A. – ETS Research Report Series, 2012
Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…
Descriptors: Quality Control, Data Analysis, Trend Analysis, Scaling
Moses, Tim; Liu, Jinghua; Tan, Adele; Deng, Weiling; Dorans, Neil J. – ETS Research Report Series, 2013
In this study, differential item functioning (DIF) methods utilizing 14 different matching variables were applied to assess DIF in the constructed-response (CR) items from 6 forms of 3 mixed-format tests. Results suggested that the methods might produce distinct patterns of DIF results for different tests and testing programs, in that the DIF…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Item Analysis
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Rock, Donald A. – ETS Research Report Series, 2012
This paper provides a history of ETS's role in developing assessment instruments and psychometric procedures for measuring change in large-scale national assessments funded by the Longitudinal Studies branch of the National Center for Education Statistics. It documents the innovations developed during more than 30 years of working with…
Descriptors: Models, Educational Change, Longitudinal Studies, Educational Development
Xi, Xiaoming; Higgins, Derrick; Zechner, Klaus; Williamson, David M. – ETS Research Report Series, 2008
This report presents the results of a research and development effort for SpeechRater? Version 1.0 (v1.0), an automated scoring system for the spontaneous speech of English language learners used operationally in the Test of English as a Foreign Language™ (TOEFL®) Practice Online assessment (TPO). The report includes a summary of the validity…
Descriptors: Speech, Scoring, Scoring Rubrics, Scoring Formulas
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – ETS Research Report Series, 2006
This study addresses the sample error and linking bias that occur with small and unrepresentative samples in a non-equivalent groups anchor test (NEAT) design. We propose a linking method called the "synthetic function," which is a weighted average of the identity function (the trivial equating function for forms that are known to be…
Descriptors: Equated Scores, Sample Size, Test Items, Statistical Bias