Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 11 |
Descriptor
Source
Educational Testing Service | 14 |
Author
Sinharay, Sandip | 2 |
Baron, Patricia | 1 |
Breland, Hunter | 1 |
Bridgeman, Brent | 1 |
Burstein, Jill | 1 |
Chodorow, Martin | 1 |
Cline, Frederick | 1 |
Coley, Richard J. | 1 |
Deane, Paul | 1 |
Deng, Weiling | 1 |
Dorans, Neil J. | 1 |
More ▼ |
Publication Type
Numerical/Quantitative Data | 14 |
Reports - Research | 9 |
Reports - Evaluative | 4 |
Reports - Descriptive | 1 |
Tests/Questionnaires | 1 |
Education Level
Elementary Secondary Education | 4 |
Secondary Education | 3 |
Elementary Education | 2 |
High Schools | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Grade 5 | 1 |
Grade 7 | 1 |
Grade 8 | 1 |
Higher Education | 1 |
Postsecondary Education | 1 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 2 |
California Achievement Tests | 1 |
Graduate Record Examinations | 1 |
Marlowe Crowne Social… | 1 |
National Assessment of… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Weeks, Jonathan; Baron, Patricia – Educational Testing Service, 2021
The current project, Exploring Math Education Relations by Analyzing Large Data Sets (EMERALDS) II, is an attempt to identify specific Common Core State Standards procedural, conceptual, and problem-solving competencies in earlier grades that best predict success in algebraic areas in later grades. The data for this study include two cohorts of…
Descriptors: Mathematics Education, Common Core State Standards, Problem Solving, Mathematics Tests
Sinharay, Sandip; Haberman, Shelby J.; Jia, Helena – Educational Testing Service, 2011
Standard 3.9 of the "Standards for Educational and Psychological Testing" (American Educational Research Association, American Psychological Association, & National Council for Measurement in Education, 1999) demands evidence of model fit when an item response theory (IRT) model is used to make inferences from a data set. We applied two recently…
Descriptors: Item Response Theory, Goodness of Fit, Statistical Analysis, Language Tests
Fife, James H.; Graf, Edith Aurora; Ohls, Sarah – Educational Testing Service, 2011
Six tasks, selected from assessments administered in 2007 as part of the Cognitively-Based Assessments of, for, and as Learning (CBAL) project, were revised in an effort to remove difficulties with the tasks that were unrelated to the construct being assessed. Because the revised tasks were piloted on a different population from the original…
Descriptors: Mathematics Tests, Responses, Test Construction, Construct Validity
Santelices, Maria Veronica; Ugarte, Juan Jose; Flotts, Paulina; Radovic, Darinka; Kyllonen, Patrick – Educational Testing Service, 2011
This paper presents the development and initial validation of new measures of critical thinking and noncognitive attributes that were designed to supplement existing standardized tests used in the admissions system for higher education in Chile. The importance of various facets of this process, including the establishment of technical rigor and…
Descriptors: Foreign Countries, College Entrance Examinations, Test Construction, Test Validity
Tan, Xuan; Xiang, Bihua; Dorans, Neil J.; Qu, Yanxuan – Educational Testing Service, 2010
The nature of the matching criterion (usually the total score) in the study of differential item functioning (DIF) has been shown to impact the accuracy of different DIF detection procedures. One of the topics related to the nature of the matching criterion is whether the studied item should be included. Although many studies exist that suggest…
Descriptors: Test Bias, Test Items, Item Response Theory
Sinharay, Sandip; Haberman, Shelby – Educational Testing Service, 2011
Recently, the literature has seen increasing interest in subscores for their potential diagnostic values; for example, one study suggested the report of weighted averages of a subscore and the total score, whereas others showed, for various operational and simulated data sets, that weighted averages, as compared to subscores, lead to more accurate…
Descriptors: Equated Scores, Weighted Scores, Tests, Statistical Analysis
Deane, Paul; Quinlan, Thomas; Kostin, Irene – Educational Testing Service, 2011
ETS has recently instituted the Cognitively Based Assessments of, for, and as Learning (CBAL) research initiative to create a new generation of assessment designed from the ground up to enhance learning. It is intended as a general approach, covering multiple subject areas including reading, writing, and math. This paper is concerned with the…
Descriptors: Automation, Scoring, Educational Assessment, Writing Tests
Steinberg, Jonathan; Cline, Frederick; Sawaki, Yasuyo – Educational Testing Service, 2011
This study examined the scores on a state standards-based Grade 5 Science assessment obtained by a group of students without learning disabilities who took the standard form of the test and by three groups of students with learning disabilities: one taking the standard form of the test without accommodations or modifications, a second taking the…
Descriptors: Learning Disabilities, State Standards, Educational Improvement, Science Tests
Moses, Tim; Deng, Weiling; Zhang, Yu-Li – Educational Testing Service, 2010
In the equating literature, a recurring concern is that equating functions that utilize a single anchor to account for examinee groups' nonequivalence are biased when the groups are extremely different and/or when the anchor only weakly measures what the tests measure. Several proposals have been made to address this equating bias by incorporating…
Descriptors: Equated Scores, Data Collection, Statistical Analysis, Differences
Coley, Richard J.; Sum, Andrew – Educational Testing Service, 2012
As the 21st century unfolds, the United States faces historic challenges, including a struggling economy, an aging infrastructure and global terrorism. Solutions will have to come from educated, skilled citizens who understand and believe in our democratic system and are civically engaged. This incisive new report examines these fault lines and…
Descriptors: Citizen Participation, Democracy, Citizenship Education, Civics
Bridgeman, Brent; McBride, Amanda; Monaghan, William – Educational Testing Service, 2004
Imposing time limits on tests can serve a range of important functions. Time limits are essential, for example, if speed of performance is an integral component of what is being measured, as would be the case when testing such skills as how quickly someone can type. Limiting testing time also helps contain expenses associated with test…
Descriptors: Computer Assisted Testing, Timed Tests, Test Results, Aptitude Tests
Educational Testing Service, 2008
The Test of English as a Foreign Language[TM], better known as TOEFL[R], is designed to measure the English-language proficiency of people whose native language is not English. TOEFL scores are accepted by more than 6,000 colleges, universities, and licensing agencies in 130 countries. The test is also used by governments, and scholarship and…
Descriptors: English (Second Language), Language Proficiency, Language Tests, Computer Assisted Testing
Chodorow, Martin; Burstein, Jill – Educational Testing Service, 2004
This study examines the relation between essay length and holistic scores assigned to Test of English as a Foreign Language[TM] (TOEFL[R]) essays by e-rater[R], the automated essay scoring system developed by ETS. Results show that an early version of the system, e-rater99, accounted for little variance in human reader scores beyond that which…
Descriptors: Essays, Test Scoring Machines, English (Second Language), Student Evaluation
Breland, Hunter; Lee, Yong-Won; Najarian, Michelle; Muraki, Eiji – Educational Testing Service, 2004
This investigation of the comparability of writing assessment prompts was conducted in two phases. In an exploratory Phase I, 47 writing prompts administered in the computer-based Test of English as a Foreign Language[TM] (TOEFL[R] CBT) from July through December 1998 were examined. Logistic regression procedures were used to estimate prompt…
Descriptors: Writing Evaluation, Quality Control, Gender Differences, Writing Tests