ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	11

Descriptor

Error of Measurement	12
Equated Scores	7
Scores	5
Test Items	4
Comparative Analysis	3
Computation	3
Reliability	3
Sampling	3
Scoring	3
Statistical Bias	3
Accuracy	2
Data Analysis	2
High Stakes Tests	2
Item Response Theory	2
Sample Size	2
Statistical Analysis	2
Adaptive Testing	1
Bias	1
Case Studies	1
Certification	1
College Entrance Examinations	1
Computer Assisted Testing	1
Context Effect	1
Differences	1
Difficulty Level	1
More ▼

Source

Educational Testing Service

Publication Type

Reports - Research	8
Speeches/Meeting Papers	3
Reports - Evaluative	2
Opinion Papers	1
Reports - Descriptive	1

Education Level

Elementary Education	1
Elementary Secondary Education	1
Grade 4	1

Audience

Location

United States

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Reliability and Validity of Inferences about Teachers Based on Student Scores. William H. Angoff Memorial Lecture Series

Download full text

Haertel, Edward H. – Educational Testing Service, 2013

Policymakers and school administrators have embraced value-added models of teacher effectiveness as tools for educational improvement. Teacher value-added estimates may be viewed as complicated scores of a certain kind. This suggests using a test validation model to examine their reliability and validity. Validation begins with an interpretive…

Descriptors: Reliability, Validity, Inferences, Teacher Effectiveness

Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated with Respect to Multiple Equating Criteria. Research Report. ETS RR-11-20

Download full text

Moses, Tim; Liu, Jinghua – Educational Testing Service, 2011

In equating research and practice, equating functions that are smooth are typically assumed to be more accurate than equating functions with irregularities. This assumption presumes that population test score distributions are relatively smooth. In this study, two examples were used to reconsider common beliefs about smoothing and equating. The…

Descriptors: Equated Scores, Data Analysis, Scores, Methods

Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28

Download full text

Guo, Hongwen; Sinharay, Sandip – Educational Testing Service, 2011

Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…

Descriptors: Error of Measurement, Nonparametric Statistics, Item Response Theory, Computation

Sources of Score Scale Inconsistency. Research Report. ETS RR-11-10

Download full text

Haberman, Shelby J.; Dorans, Neil J. – Educational Testing Service, 2011

For testing programs that administer multiple forms within a year and across years, score equating is used to ensure that scores can be used interchangeably. In an ideal world, samples sizes are large and representative of populations that hardly change over time, and very reliable alternate test forms are built with nearly identical psychometric…

Descriptors: Scores, Reliability, Equated Scores, Test Construction

Can Smoothing Help When Equating with Unrepresentative Small Samples? Research Report. ETS RR-11-09

Download full text

Puhan, Gautam – Educational Testing Service, 2011

The study evaluated the effectiveness of log-linear presmoothing (Holland & Thayer, 1987) on the accuracy of small sample chained equipercentile equatings under two conditions (i.e., using small samples that differed randomly in ability from the target population "versus" using small samples that were distinctly different from the…

Descriptors: Equated Scores, Data Analysis, Accuracy, Sample Size

Chained versus Post-Stratification Equating in a Linear Context: An Evaluation Using Empirical Data. Research Report. ETS RR-10-06

Download full text

Puhan, Gautam – Educational Testing Service, 2010

This study used real data to construct testing conditions for comparing results of chained linear, Tucker, and Levine-observed score equatings. The comparisons were made under conditions where the new- and old-form samples were similar in ability and when they differed in ability. The length of the anchor test was also varied to enable examination…

Descriptors: Equated Scores, Comparative Analysis, Statistical Analysis, Statistical Bias

Single- versus Double-Scoring of Trend Responses in Trend Score Equating with Constructed-Response Tests. Research Report. ETS RR-10-12

Download full text

Tan, Xuan; Ricker, Kathryn L.; Puhan, Gautam – Educational Testing Service, 2010

This study examines the differences in equating outcomes between two trend score equating designs resulting from two different scoring strategies for trend scoring when operational constructed-response (CR) items are double-scored--the single group (SG) design, where each trend CR item is double-scored, and the nonequivalent groups with anchor…

Descriptors: Equated Scores, Scoring, Responses, Test Items

Limits on the Accuracy of Linking. Research Report. ETS RR-10-22

Download full text

Haberman, Shelby J. – Educational Testing Service, 2010

Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are especially important in testing programs in which a very large number of forms are employed. Standard inequalities in mathematical statistics may be used to establish lower bounds on the achievable inking accuracy. To illustrate results, a variety of…

Descriptors: Testing Programs, Equated Scores, Sampling, Accuracy

Errors of Measurement, Theory, and Public Policy. William H. Angoff Memorial Lecture Series

Download full text

Kane, Michael – Educational Testing Service, 2010

The 12th annual William H. Angoff Memorial Lecture was presented by Dr. Michael T. Kane, ETS's (Educational Testing Service) Samuel J. Messick Chair in Test Validity and the former Director of Research at the National Conference of Bar Examiners. Dr. Kane argues that it is important for policymakers to recognize the impact of errors of measurement…

Descriptors: Error of Measurement, Scores, Public Policy, Test Theory

Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study. Research Report. ETS RR-10-10

Download full text

Xu, Xueli; von Davier, Matthias – Educational Testing Service, 2010

One of the major objectives of large-scale educational surveys is reporting trends in academic achievement. For this purpose, a substantial number of items are carried from one assessment cycle to the next. The linking process that places academic abilities measured in different assessments on a common scale is usually based on a concurrent…

Descriptors: Case Studies, Trend Analysis, Computation, Educational Assessment

The Effects of Different Types of Anchor Tests on Observed Score Equating. Research Report. ETS RR-09-41

Download full text

Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Feigenbaum, Miriam; Curley, Edward – Educational Testing Service, 2009

This study explores the use of a different type of anchor, a "midi anchor", that has a smaller spread of item difficulties than the tests to be equated, and then contrasts its use with the use of a "mini anchor". The impact of different anchors on observed score equating were evaluated and compared with respect to systematic…

Descriptors: Equated Scores, Test Items, Difficulty Level, Error of Measurement

Tolerable Variation in Item Parameter Estimates for Linear and Adaptive Computer-Based Testing. Research Report No. 04-28

Download full text

Rizavi, Saba; Way, Walter D.; Davey, Tim; Herbert, Erin – Educational Testing Service, 2004

Item parameter estimates vary for a variety of reasons, including estimation error, characteristics of the examinee samples, and context effects (e.g., item location effects, section location effects, etc.). Although we expect variation based on theory, there is reason to believe that observed variation in item parameter estimates exceeds what…

Descriptors: Adaptive Testing, Test Items, Computation, Context Effect

Puhan, Gautam	3
Haberman, Shelby J.	2
Liu, Jinghua	2
Sinharay, Sandip	2
Curley, Edward	1
Davey, Tim	1
Dorans, Neil J.	1
Feigenbaum, Miriam	1
Guo, Hongwen	1
Haertel, Edward H.	1
Herbert, Erin	1
Holland, Paul W.	1
Kane, Michael	1
Moses, Tim	1
Ricker, Kathryn L.	1
Rizavi, Saba	1
Tan, Xuan	1
Way, Walter D.	1
Xu, Xueli	1
von Davier, Matthias	1
More ▼