Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 3 |
Since 2006 (last 20 years) | 24 |
Descriptor
Scoring | 21 |
Language Tests | 8 |
Test Construction | 7 |
Educational Assessment | 6 |
English (Second Language) | 6 |
Writing Tests | 6 |
Educational Testing | 5 |
Evaluation Research | 5 |
Factor Analysis | 5 |
Test Validity | 5 |
Automation | 4 |
More ▼ |
Source
Educational Testing Service | 24 |
Author
Livingston, Samuel A. | 3 |
Attali, Yigal | 2 |
Deane, Paul | 2 |
Dorans, Neil J. | 2 |
Puhan, Gautam | 2 |
Quinlan, Thomas | 2 |
Bennett, Randy Elliot | 1 |
Davis, Larry | 1 |
DeCarlo, Lawrence T. | 1 |
Flotts, Paulina | 1 |
Garcia Gomez, Pablo | 1 |
More ▼ |
Publication Type
Reports - Research | 11 |
Reports - Descriptive | 8 |
Reports - Evaluative | 4 |
Numerical/Quantitative Data | 2 |
Tests/Questionnaires | 2 |
Guides - Classroom - Learner | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Elementary Secondary Education | 10 |
Higher Education | 9 |
Postsecondary Education | 5 |
Grade 8 | 2 |
High Schools | 2 |
Elementary Education | 1 |
Grade 7 | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Location
Chile | 1 |
China | 1 |
Europe | 1 |
India | 1 |
North America | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 6 |
Graduate Record Examinations | 3 |
Marlowe Crowne Social… | 1 |
SAT (College Admission Test) | 1 |
Test of English for… | 1 |
What Works Clearinghouse Rating
Livingston, Samuel A. – Educational Testing Service, 2020
This booklet is a conceptual introduction to item response theory (IRT), which many large-scale testing programs use for constructing and scoring their tests. Although IRT is essentially mathematical, the approach here is nonmathematical, in order to serve as an introduction on the topic for people who want to understand why IRT is used and what…
Descriptors: Item Response Theory, Scoring, Test Items, Scaling
Schmidgall, Jonathan – Educational Testing Service, 2021
The redesigned "TOEIC Bridge"® tests are designed to measure the reading, listening, speaking, and writing proficiency of beginning to low-intermediate English learners in the context of everyday adult life. This report describes the comprehensive and multifaceted process used to enhance the meaningfulness of TOEIC Bridge test score…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Language Proficiency
Papageorgiou, Spiros; Davis, Larry; Norris, John M.; Garcia Gomez, Pablo; Manna, Venessa F.; Monfils, Lora – Educational Testing Service, 2021
The "TOEFL® Essentials"™ test is a new English language proficiency test in the "TOEFL"® family of assessments. It measures foundational language skills and communication abilities in academic and general (daily life) contexts. The test covers the four language skills of reading, listening, writing, and speaking and is intended…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Language Proficiency
Deane, Paul; Quinlan, Thomas; Kostin, Irene – Educational Testing Service, 2011
ETS has recently instituted the Cognitively Based Assessments of, for, and as Learning (CBAL) research initiative to create a new generation of assessment designed from the ground up to enhance learning. It is intended as a general approach, covering multiple subject areas including reading, writing, and math. This paper is concerned with the…
Descriptors: Automation, Scoring, Educational Assessment, Writing Tests
Attali, Yigal – Educational Testing Service, 2011
The e-rater[R] automated essay scoring system is used operationally in the scoring of TOEFL iBT[R] independent essays. Previous research has found support for a 3-factor structure of the e-rater features. This 3-factor structure has an attractive hierarchical linguistic interpretation with a word choice factor, a grammatical convention within a…
Descriptors: Essay Tests, Language Tests, Test Scoring Machines, Automation
Livingston, Samuel A. – Educational Testing Service, 2014
This booklet grew out of a half-day class on equating that author Samuel Livingston teaches for new statistical staff at Educational Testing Service (ETS). The class is a nonmathematical introduction to the topic, emphasizing conceptual understanding and practical applications. The class consists of illustrated lectures, interspersed with…
Descriptors: Equated Scores, Scoring, Self Evaluation (Individuals), Scores
Attali, Yigal – Educational Testing Service, 2011
This paper proposes an alternative content measure for essay scoring, based on the "difference" in the relative frequency of a word in high-scored versus low-scored essays. The "differential word use" (DWU) measure is the average of these differences across all words in the essay. A positive value indicates the essay is using…
Descriptors: Scoring, Essay Tests, Word Frequency, Content Analysis
Dorans, Neil J.; Liang, Longjuan; Puhan, Gautam – Educational Testing Service, 2010
Scores are the most visible and widely used products of a testing program. The choice of score scale has implications for test specifications, equating, and test reliability and validity, as well as for test interpretation. At the same time, the score scale should be viewed as infrastructure likely to require repair at some point. In this report…
Descriptors: Testing Programs, Standard Setting (Scoring), Test Interpretation, Certification
Ricker-Pedley, Kathryn L. – Educational Testing Service, 2011
A pseudo-experimental study was conducted to examine the link between rater accuracy calibration performances and subsequent accuracy during operational scoring. The study asked 45 raters to score a 75-response calibration set and then a 100-response (operational) set of responses from a retired Graduate Record Examinations[R] (GRE[R]) writing…
Descriptors: Scoring, Accuracy, College Entrance Examinations, Writing Tests
Haertel, Edward H. – Educational Testing Service, 2013
Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…
Descriptors: Reliability, Validity, Inferences, Teacher Effectiveness
Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam – Educational Testing Service, 2010
This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…
Descriptors: Equated Scores, Scoring, Responses, Test Items
DeCarlo, Lawrence T. – Educational Testing Service, 2010
A basic consideration in large-scale assessments that use constructed response (CR) items, such as essays, is how to allocate the essays to the raters that score them. Designs that are used in practice are incomplete, in that each essay is scored by only a subset of the raters, and also unbalanced, in that the number of essays scored by each rater…
Descriptors: Test Items, Responses, Essay Tests, Scoring
Bennett, Randy Elliot – Educational Testing Service, 2011
CBAL, an acronym for Cognitively Based Assessment of, for, and as Learning, is a research initiative intended to create a model for an innovative K-12 assessment system that provides summative information for policy makers, as well as formative information for classroom instructional purposes. This paper summarizes empirical results from 16 CBAL…
Descriptors: Educational Assessment, Elementary Secondary Education, Summative Evaluation, Formative Evaluation
Haberman, Shelby J. – Educational Testing Service, 2011
Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…
Descriptors: Writing Tests, Scoring, Essays, Language Tests
Santelices, Maria Veronica; Ugarte, Juan Jose; Flotts, Paulina; Radovic, Darinka; Kyllonen, Patrick – Educational Testing Service, 2011
This paper presents the development and initial validation of new measures of critical thinking and noncognitive attributes that were designed to supplement existing standardized tests used in the admissions system for higher education in Chile. The importance of various facets of this process, including the establishment of technical rigor and…
Descriptors: Foreign Countries, College Entrance Examinations, Test Construction, Test Validity
Previous Page | Next Page »
Pages: 1 | 2