ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	16

Source

Educational Testing Service

Publication Type

Reports - Research	7
Reports - Evaluative	5
Numerical/Quantitative Data	2
Reports - Descriptive	2
Speeches/Meeting Papers	2
Information Analyses	1
Opinion Papers	1

Education Level

Elementary Secondary Education	4
Grade 8	3
Higher Education	3
Elementary Education	2
Grade 4	2
Junior High Schools	2
Middle Schools	2
Postsecondary Education	2
Early Childhood Education	1
Grade 7	1
High Schools	1
Intermediate Grades	1
Preschool Education	1
Secondary Education	1
More ▼

Audience

Practitioners	2
Administrators	1
Policymakers	1

Location

Chile	1
India	1
North America	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Gates MacGinitie Reading Tests	2
Test of English as a Foreign…	2
Graduate Record Examinations	1
Marlowe Crowne Social…	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

State-Funded PreK Policies on External Classroom Observations: Issues and Status. Policy Information Report

Download full text

Ackerman, Debra J. – Educational Testing Service, 2014

Early education programs are increasingly being promoted by states and the federal government as an integral part of their efforts to ensure that all children enter school ready to learn. As these programs and their enrollments have grown in recent years, so too have efforts to monitor their quality and performance. A common focus is on…

Descriptors: Preschool Education, State Policy, Observation, Validity

An Examination of the Link between Rater Calibration Performance and Subsequent Scoring Accuracy in Graduate Record Examinations[R] (GRE[R]) Writing. Research Report. ETS RR-11-03

Download full text

Ricker-Pedley, Kathryn L. – Educational Testing Service, 2011

A pseudo-experimental study was conducted to examine the link between rater accuracy calibration performances and subsequent accuracy during operational scoring. The study asked 45 raters to score a 75-response calibration set and then a 100-response (operational) set of responses from a retired Graduate Record Examinations[R] (GRE[R]) writing…

Descriptors: Scoring, Accuracy, College Entrance Examinations, Writing Tests

Reliability and Validity of Inferences about Teachers Based on Student Scores. William H. Angoff Memorial Lecture Series

Download full text

Haertel, Edward H. – Educational Testing Service, 2013

Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…

Descriptors: Reliability, Validity, Inferences, Teacher Effectiveness

Use of e-rater[R] in Scoring of the TOEFL iBT[R] Writing Test. Research Report. ETS RR-11-25

Download full text

Haberman, Shelby J. – Educational Testing Service, 2011

Alternative approaches are discussed for use of e-rater[R] to score the TOEFL iBT[R] Writing test. These approaches involve alternate criteria. In the 1st approach, the predicted variable is the expected rater score of the examinee's 2 essays. In the 2nd approach, the predicted variable is the expected rater score of 2 essay responses by the…

Descriptors: Writing Tests, Scoring, Essays, Language Tests

Sources of Score Scale Inconsistency. Research Report. ETS RR-11-10

Download full text

Haberman, Shelby J.; Dorans, Neil J. – Educational Testing Service, 2011

For testing programs that administer multiple forms within a year and across years, score equating is used to ensure that scores can be used interchangeably. In an ideal world, samples sizes are large and representative of populations that hardly change over time, and very reliable alternate test forms are built with nearly identical psychometric…

Descriptors: Scores, Reliability, Equated Scores, Test Construction

When Can Subscores Be Expected to Have Added Value? Results from Operational and Simulated Data. Research Report. ETS RR-10-16

Download full text

Sinharay, Sandip – Educational Testing Service, 2010

Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman (2008) suggested a method based on classical test theory to determine whether subscores have added value over total scores. This paper provides a literature review and reports when subscores were found to have added value for…

Descriptors: Scores, Correlation, Reliability, Item Response Theory

The CBAL Reading Assessment: An Approach for Balancing Measurement and Learning Goals. Research Report. ETS RR-11-21

Download full text

Sheehan, Kathleen M.; O'Reilly, Tenaha – Educational Testing Service, 2011

"No Child Left Behind" has highlighted the need for new types of assessments that not only provide high-quality evidence about what students know and can do, but also help to move learning forward. This paper describes a linked set of formative and summative reading assessments designed to address the tradeoffs inherent in these two…

Descriptors: Educational Assessment, Reading Tests, Formative Evaluation, Summative Evaluation

How Does the Knowledge of Subgroup Membership of Examinees Affect the Prediction of True Subscores? Research Report. ETS RR-11-43

Download full text

Haberman, Shelby J.; Sinharay, Sandip – Educational Testing Service, 2011

Subscores are reported for several operational assessments. Haberman (2008) suggested a method based on classical test theory to determine if the true subscore is predicted better by the corresponding subscore or the total score. Researchers are often interested in learning how different subgroups perform on subtests. Stricker (1993) and…

Descriptors: True Scores, Test Theory, Prediction, Group Membership

Assessing the Falsifiability of Extreme Linking. Research Report. ETS RR-11-04

Download full text

Middleton, Kyndra; Dorans, Neil J. – Educational Testing Service, 2011

Extreme linkings are performed in settings in which neither equivalent groups nor anchor material is available to link scores on two assessments. Examples of extreme linkages include links between scores on tests administered in different languages or between scores on tests administered across disability groups. The strength of interpretation…

Descriptors: Equated Scores, Testing, Difficulty Level, Test Reliability

Modeling Nonignorable Missing Data with Item Response Theory (IRT). Research Report. ETS RR-10-11

Download full text

Rose, Norman; von Davier, Matthias; Xu, Xueli – Educational Testing Service, 2010

Large-scale educational surveys are low-stakes assessments of educational outcomes conducted using nationally representative samples. In these surveys, students do not receive individual scores, and the outcome of the assessment is inconsequential for respondents. The low-stakes nature of these surveys, as well as variations in average performance…

Descriptors: Item Response Theory, Educational Assessment, Data Analysis, Case Studies

Application of a General Polytomous Testlet Model to the Reading Section of a Large-Scale English Language Assessment. Research Report. ETS RR-10-21

Download full text

Li, Yanmei; Li, Shuhong; Wang, Lin – Educational Testing Service, 2010

Many standardized educational tests include groups of items based on a common stimulus, known as "testlets". Standard unidimensional item response theory (IRT) models are commonly used to model examinees' responses to testlet items. However, it is known that local dependence among testlet items can lead to biased item parameter estimates…

Descriptors: English, Language Tests, Reading Tests, Item Response Theory

Measurement of New Attributes for Chile's Admissions System to Higher Education. Research Report. ETS RR-11-18

Download full text

Santelices, Maria Veronica; Ugarte, Juan Jose; Flotts, Paulina; Radovic, Darinka; Kyllonen, Patrick – Educational Testing Service, 2011

This paper presents the development and initial validation of new measures of critical thinking and noncognitive attributes that were designed to supplement existing standardized tests used in the admissions system for higher education in Chile. The importance of various facets of this process, including the establishment of technical rigor and…

Descriptors: Foreign Countries, College Entrance Examinations, Test Construction, Test Validity

Errors of Measurement, Theory, and Public Policy. William H. Angoff Memorial Lecture Series

Download full text

Kane, Michael – Educational Testing Service, 2010

The 12th annual William H. Angoff Memorial Lecture was presented by Dr. Michael T. Kane, ETS's (Educational Testing Service) Samuel J. Messick Chair in Test Validity and the former Director of Research at the National Conference of Bar Examiners. Dr. Kane argues that it is important for policymakers to recognize the impact of errors of measurement…

Descriptors: Error of Measurement, Scores, Public Policy, Test Theory

Score Comparability for Language Minority Students on the Content Assessments Used by Two States. Research Report. ETS RR-11-27

Download full text

Young, John W.; Holtzman, Steven; Steinberg, Jonathan – Educational Testing Service, 2011

In this research investigation of score comparability for language minority students (English language learners [ELLs] and former English language learners), we examined 3 indicators of score comparability (reliability, internal test structure, and differential item functioning) for 4th and 8th grade students who took the NCLB-mandated content…

Descriptors: Language Minorities, Second Language Learning, Grade 8, Minority Group Students

How Do Raters from India Perform in Scoring the TOEFL iBT[TM] Speaking Section and What Kind of Training Helps? TOEFL iBT[TM] Research Report. RR-09-31

Download full text

Xi, Xiaoming; Mollaun, Pam – Educational Testing Service, 2009

This study investigated the scoring of the Test of English as a Foreign Language[TM] Internet-based Test (TOEFL iBT[TM]) Speaking section by bilingual or multilingual speakers of English and 1 or more Indian languages. We explored the extent to which raters from India, after being trained and certified, were able to score the Speaking section for…

Descriptors: Foreign Countries, English (Second Language), Internet, Language Tests

Previous Page | Next Page »

Pages: 1 | 2

Reliability	9
Test Reliability	6
Scores	5
Scoring	5
Test Validity	5
Correlation	4
Language Tests	4
Test Theory	4
Accuracy	3
Educational Assessment	3
English (Second Language)	3
Error of Measurement	3
Item Response Theory	3
Reading Tests	3
College Entrance Examinations	2
Data Analysis	2
Educational Testing	2
Equated Scores	2
Factor Analysis	2
Foreign Countries	2
Goodness of Fit	2
Grade 4	2
Grade 8	2
High Stakes Tests	2
Inferences	2
More ▼

Haberman, Shelby J.	3
Dorans, Neil J.	2
Sinharay, Sandip	2
Ackerman, Debra J.	1
Alexiou, Jon J.	1
Dwyer, Carol A.	1
Flotts, Paulina	1
Haertel, Edward H.	1
Holtzman, Steven	1
Kane, Michael	1
Kyllonen, Patrick	1
Li, Shuhong	1
Li, Yanmei	1
Middleton, Kyndra	1
Millett, Catherine M.	1
Mollaun, Pam	1
O'Reilly, Tenaha	1
Payne, David G.	1
Radovic, Darinka	1
Ricker-Pedley, Kathryn L.	1
Rose, Norman	1
Santelices, Maria Veronica	1
Sheehan, Kathleen M.	1
Steinberg, Jonathan	1
Stickler, Leslie M.	1
More ▼