Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 16 |
Descriptor
Reliability | 9 |
Test Reliability | 6 |
Scores | 5 |
Scoring | 5 |
Test Validity | 5 |
Correlation | 4 |
Language Tests | 4 |
Test Theory | 4 |
Accuracy | 3 |
Educational Assessment | 3 |
English (Second Language) | 3 |
More ▼ |
Source
Educational Testing Service | 16 |
Author
Haberman, Shelby J. | 3 |
Dorans, Neil J. | 2 |
Sinharay, Sandip | 2 |
Ackerman, Debra J. | 1 |
Alexiou, Jon J. | 1 |
Dwyer, Carol A. | 1 |
Flotts, Paulina | 1 |
Haertel, Edward H. | 1 |
Holtzman, Steven | 1 |
Kane, Michael | 1 |
Kyllonen, Patrick | 1 |
More ▼ |
Publication Type
Reports - Research | 7 |
Reports - Evaluative | 5 |
Numerical/Quantitative Data | 2 |
Reports - Descriptive | 2 |
Speeches/Meeting Papers | 2 |
Information Analyses | 1 |
Opinion Papers | 1 |
Education Level
Elementary Secondary Education | 4 |
Grade 8 | 3 |
Higher Education | 3 |
Elementary Education | 2 |
Grade 4 | 2 |
Junior High Schools | 2 |
Middle Schools | 2 |
Postsecondary Education | 2 |
Early Childhood Education | 1 |
Grade 7 | 1 |
High Schools | 1 |
More ▼ |
Audience
Practitioners | 2 |
Administrators | 1 |
Policymakers | 1 |
Location
Chile | 1 |
India | 1 |
North America | 1 |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 2 |
Assessments and Surveys
Gates MacGinitie Reading Tests | 2 |
Test of English as a Foreign… | 2 |
Graduate Record Examinations | 1 |
Marlowe Crowne Social… | 1 |
Program for International… | 1 |
What Works Clearinghouse Rating
Ackerman, Debra J. – Educational Testing Service, 2014
Early education programs are increasingly being promoted by states and the federal government as an integral part of their efforts to ensure that all children enter school ready to learn. As these programs and their enrollments have grown in recent years, so too have efforts to monitor their quality and performance. A common focus is on…
Descriptors: Preschool Education, State Policy, Observation, Validity
Ricker-Pedley, Kathryn L. – Educational Testing Service, 2011
A pseudo-experimental study was conducted to examine the link between rater accuracy calibration performances and subsequent accuracy during operational scoring. The study asked 45 raters to score a 75-response calibration set and then a 100-response (operational) set of responses from a retired Graduate Record Examinations[R] (GRE[R]) writing…
Descriptors: Scoring, Accuracy, College Entrance Examinations, Writing Tests
Haertel, Edward H. – Educational Testing Service, 2013
Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…
Descriptors: Reliability, Validity, Inferences, Teacher Effectiveness
Haberman, Shelby J. – Educational Testing Service, 2011
Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…
Descriptors: Writing Tests, Scoring, Essays, Language Tests
Haberman, Shelby J.; Dorans, Neil J. – Educational Testing Service, 2011
For testing programs that administer multiple forms within a year and across years, score equating is used to ensure that scores can be used interchangeably. In an ideal world, samples sizes are large and representative of populations that hardly change over time, and very reliable alternate test forms are built with nearly identical psychometric…
Descriptors: Scores, Reliability, Equated Scores, Test Construction
Sinharay, Sandip – Educational Testing Service, 2010
Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman (2008) suggested a method based on classical test theory to determine whether subscores have added value over total scores. This paper provides a literature review and reports when subscores were found to have added value for…
Descriptors: Scores, Correlation, Reliability, Item Response Theory
Sheehan, Kathleen M.; O'Reilly, Tenaha – Educational Testing Service, 2011
"No Child Left Behind" has highlighted the need for new types of assessments that not only provide high-quality evidence about what students know and can do, but also help to move learning forward. This paper describes a linked set of formative and summative reading assessments designed to address the tradeoffs inherent in these two…
Descriptors: Educational Assessment, Reading Tests, Formative Evaluation, Summative Evaluation
Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011
Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…
Descriptors: True Scores, Test Theory, Prediction, Group Membership
Middleton, Kyndra; Dorans, Neil J. – Educational Testing Service, 2011
Extreme linkings are performed in settings in which neither equivalent groups nor anchor material is available to link scores on two assessments. Examples of extreme linkages include links between scores on tests administered in different languages or between scores on tests administered across disability groups. The strength of interpretation…
Descriptors: Equated Scores, Testing, Difficulty Level, Test Reliability
Rose, Norman; von Davier, Matthias; Xu, Xueli – Educational Testing Service, 2010
Large-scale educational surveys are low-stakes assessments of educational outcomes conducted using nationally representative samples. In these surveys, students do not receive individual scores, and the outcome of the assessment is inconsequential for respondents. The low-stakes nature of these surveys, as well as variations in average performance…
Descriptors: Item Response Theory, Educational Assessment, Data Analysis, Case Studies
Li, Yanmei; Li, Shuhong; Wang, Lin – Educational Testing Service, 2010
Many standardized educational tests include groups of items based on a common stimulus, known as "testlets". Standard unidimensional item response theory (IRT) models are commonly used to model examinees' responses to testlet items. However, it is known that local dependence among testlet items can lead to biased item parameter estimates…
Descriptors: English, Language Tests, Reading Tests, Item Response Theory
Santelices, Maria Veronica; Ugarte, Juan Jose; Flotts, Paulina; Radovic, Darinka; Kyllonen, Patrick – Educational Testing Service, 2011
This paper presents the development and initial validation of new measures of critical thinking and noncognitive attributes that were designed to supplement existing standardized tests used in the admissions system for higher education in Chile. The importance of various facets of this process, including the establishment of technical rigor and…
Descriptors: Foreign Countries, College Entrance Examinations, Test Construction, Test Validity
Kane, Michael – Educational Testing Service, 2010
The 12th annual William H. Angoff Memorial Lecture was presented by Dr. Michael T. Kane, ETS's (Educational Testing Service) Samuel J. Messick Chair in Test Validity and the former Director of Research at the National Conference of Bar Examiners. Dr. Kane argues that it is important for policymakers to recognize the impact of errors of measurement…
Descriptors: Error of Measurement, Scores, Public Policy, Test Theory
Young, John W.; Holtzman, Steven; Steinberg, Jonathan – Educational Testing Service, 2011
In this research investigation of score comparability for language minority students (English language learners [ELLs] and former English language learners), we examined 3 indicators of score comparability (reliability, internal test structure, and differential item functioning) for 4th and 8th grade students who took the NCLB-mandated content…
Descriptors: Language Minorities, Second Language Learning, Grade 8, Minority Group Students
Xi, Xiaoming; Mollaun, Pam – Educational Testing Service, 2009
This study investigated the scoring of the Test of English as a Foreign Language[TM] Internet-based Test (TOEFL iBT[TM]) Speaking section by bilingual or multilingual speakers of English and 1 or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the Speaking section for…
Descriptors: Foreign Countries, English (Second Language), Internet, Language Tests
Previous Page | Next Page ยป
Pages: 1 | 2