ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	24

Source

ETS Research Report Series

Publication Type

Journal Articles	24
Reports - Research	23
Collected Works - General	1
Information Analyses	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Higher Education	7
Postsecondary Education	7
Secondary Education	4
Elementary Secondary Education	3
Elementary Education	2
Grade 8	2
High Schools	2
Intermediate Grades	2
Junior High Schools	2
Middle Schools	2
Grade 4	1
Grade 6	1
More ▼

Audience

Location

Maryland	1
New Jersey	1
Ohio (Cleveland)	1
Pennsylvania (Philadelphia)	1
Qatar	1
Vermont	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

SAT (College Admission Test)	6
ACT Assessment	2
Graduate Record Examinations	2
College Level Examination…	1
National Assessment of…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 24 results Save | Export

Practical Considerations in Item Calibration with Small Samples under Multistage Test Design: A Case Study. Research Report. ETS RR-24-03

Peer reviewed
PDF on ERIC

Download full text

Hongwen Guo; Matthew S. Johnson; Daniel F. McCaffrey; Lixong Gu – ETS Research Report Series, 2024

The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies…

Descriptors: Test Items, Test Construction, Sample Size, Scaling

Impact of Categorization and Scaling on Classification Agreement and Prediction Accuracy Statistics. Research Report. ETS RR-21-26

Peer reviewed
PDF on ERIC

Download full text

Wang, Wei; Dorans, Neil J. – ETS Research Report Series, 2021

Agreement statistics and measures of prediction accuracy are often used to assess the quality of two measures of a construct. Agreement statistics are appropriate for measures that are supposed to be interchangeable, whereas prediction accuracy statistics are appropriate for situations where one variable is the target and the other variables are…

Descriptors: Classification, Scaling, Prediction, Accuracy

Providing a Context for Interpreting Predictions of Job Performance. Research Report. ETS RR-18-38

Peer reviewed
PDF on ERIC

Download full text

Dorans, Neil J. – ETS Research Report Series, 2018

A distinction is made between scores as measures of a construct and predictions of a criterion or outcome variable. The interpretation attached to predictions of criteria, such as job performance or college grade point average (GPA), differs from that attached to scores that are measures of a construct, such as reading proficiency or knowledge…

Descriptors: Job Performance, Scores, Data Interpretation, Statistical Distributions

Unidimensional Vertical Scaling in Multidimensional Space. Research Report. ETS RR-17-29

Peer reviewed
PDF on ERIC

Download full text

Carlson, James E. – ETS Research Report Series, 2017

In this paper, I consider a set of test items that are located in a multidimensional space, S[subscript M], but are located along a curved line in S[subscript M] and can be scaled unidimensionally. Furthermore, I am demonstrating a case in which the test items are administered across 6 levels, such as occurs in K-12 assessment across 6 grade…

Descriptors: Test Items, Item Response Theory, Difficulty Level, Scoring

Grouping Effects on Jackknifed Variance Estimation for Item Response Theory Scaling and Equating with Cluster-Based Assessment Data. Research Report. ETS RR-18-16

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2018

Educational assessment data are often collected from a set of test centers across various geographic regions, and therefore the data samples contain clusters. Such cluster-based data may result in clustering effects in variance estimation. However, in many grouped jackknife variance estimation applications, jackknife groups are often formed by a…

Descriptors: Item Response Theory, Scaling, Equated Scores, Cluster Grouping

An Evaluation of the Single-Group Growth Model as an Alternative to Common-Item Equating. Research Report. ETS RR-16-01

Peer reviewed
PDF on ERIC

Download full text

Wei, Youhua; Morgan, Rick – ETS Research Report Series, 2016

As an alternative to common-item equating when common items do not function as expected, the single-group growth model (SGGM) scaling uses common examinees or repeaters to link test scores on different forms. The SGGM scaling assumes that, for repeaters taking adjacent administrations, the conditional distribution of scale scores in later…

Descriptors: Equated Scores, Growth Models, Scaling, Computation

Long-Term Impact of Valid Case Criterion on Capturing Population-Level Growth under Item Response Theory Equating. Research Report. ETS RR-17-17

Peer reviewed
PDF on ERIC

Download full text

Deng, Weiling; Monfils, Lora – ETS Research Report Series, 2017

Using simulated data, this study examined the impact of different levels of stringency of the valid case inclusion criterion on item response theory (IRT)-based true score equating over 5 years in the context of K-12 assessment when growth in student achievement is expected. Findings indicate that the use of the most stringent inclusion criterion…

Descriptors: Item Response Theory, Equated Scores, True Scores, Educational Assessment

Estimating Item Difficulty with Comparative Judgments. Research Report. ETS RR-14-39

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014

Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…

Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations

A Criterion to Evaluate the Individual Raw-to-Scale Equating Conversions. Research Report. ETS RR-13-05

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Puhan, Gautam; Walker, Michael – ETS Research Report Series, 2013

In this study we investigated when an equating conversion line is problematic in terms of gaps and clumps. We suggest using the conditional standard error of measurement (CSEM) to measure the scale scores that are inappropriate in the overall raw-to-scale transformation.

Descriptors: Equated Scores, Test Items, Evaluation Criteria, Error of Measurement

Statistical Methods for Assessments in Simulations and Serious Games. Research Report. ETS RR-14-12

Peer reviewed
PDF on ERIC

Download full text

Fu, Jianbin; Zapata, Diego; Mavronikolas, Elia – ETS Research Report Series, 2014

Simulation or game-based assessments produce outcome data and process data. In this article, some statistical models that can potentially be used to analyze data from simulation or game-based assessments are introduced. Specifically, cognitive diagnostic models that can be used to estimate latent skills from outcome data so as to scale these…

Descriptors: Simulation, Evaluation Methods, Games, Data Collection

What Can Repeated Cross-Sectional Studies Tell Us about Student Growth? Research Report. ETS RR-12-17

Peer reviewed
PDF on ERIC

Download full text

Almond, Russell G.; Sinharay, Sandip – ETS Research Report Series, 2012

To answer questions about how students' proficiencies are changing over time, educational researchers are looking for data sources that span many years. Clearly, for answering questions about student growth, a longitudinal study--in which a single sample is followed over many years--is preferable to repeated cross-sectional samples--in which a…

Descriptors: Educational Research, Case Studies, Research Methodology, Literature Reviews

The Use of Quality Control and Data Mining Techniques for Monitoring Scaled Scores: An Overview. Research Report. ETS RR-12-20

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A. – ETS Research Report Series, 2012

Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…

Descriptors: Quality Control, Data Analysis, Trend Analysis, Scaling

The Stability of the Score Scales for the "SAT Reasoning Test"™ from 2005 to 2010. Research Report. ETS RR-12-15

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Liu, Jinghua; Curley, Edward; Dorans, Neil – ETS Research Report Series, 2012

This study examines the stability of the "SAT Reasoning Test"™ score scales from 2005 to 2010. A 2005 old form (OF) was administered along with a 2010 new form (NF). A new conversion for OF was derived through direct equipercentile equating. A comparison of the newly derived and the original OF conversions showed that Critical Reading…

Descriptors: Aptitude Tests, Cognitive Tests, Thinking Skills, Equated Scores

Does Preequating Work? An Investigation into a Preequated Testlet-Based College Placement Exam Using Postadministration Data. Research Report. ETS RR-12-12

Peer reviewed
PDF on ERIC

Download full text

Gao, Rui; He, Wei; Ruan, Chunyi – ETS Research Report Series, 2012

In this study, we investigated whether preequating results agree with equating results that are based on observed operational data (postequating) for a college placement program. Specifically, we examined the degree to which item response theory (IRT) true score preequating results agreed with those from IRT true score postequating and from…

Descriptors: College Entrance Examinations, Student Placement, Item Response Theory, True Scores

A Scale Drift Study. Research Report. ETS RR-09-43

Peer reviewed
PDF on ERIC

Download full text

Liu, Jinghua; Curley, Edward; Low, Albert – ETS Research Report Series, 2009

This study examines the stability of the SAT® scale from 1994 to 2001. A 1994 form and a 2001 form were readministered in a 2005 SAT administration, and the 1994 form was equated to the 2001 form. The new conversion was compared to the old conversion. Both the verbal and math sections exhibit a similar degree of scale drift, but in opposite…

Descriptors: College Entrance Examinations, Scaling, Verbal Tests, Mathematics Tests

Previous Page | Next Page »

Pages: 1 | 2

Scaling	24
Equated Scores	12
Item Response Theory	9
Scores	9
Comparative Analysis	8
College Entrance Examinations	5
Correlation	5
Error of Measurement	5
Raw Scores	5
Test Items	5
Accuracy	4
Computer Assisted Testing	4
Educational Assessment	4
Statistical Analysis	4
Classification	3
Computation	3
Data Collection	3
Elementary Secondary Education	3
Evaluation Criteria	3
Growth Models	3
Language Tests	3
Longitudinal Studies	3
Scoring	3
True Scores	3
Academic Achievement	2
More ▼

Dorans, Neil J.	3
Guo, Hongwen	3
Liu, Jinghua	3
Attali, Yigal	2
Curley, Edward	2
Monfils, Lora	2
Puhan, Gautam	2
von Davier, Alina A.	2
Almond, Russell G.	1
Carlson, James E.	1
Carstensen, Claus H.	1
Daniel F. McCaffrey	1
Deng, Weiling	1
Dorans, Neil	1
Fu, Jianbin	1
Gao, Rui	1
Haberman, Shelby J.	1
He, Wei	1
Hongwen Guo	1
Jackson, Carol	1
Jamieson, Joan	1
Kim, Sooyeon	1
Lall, Venessa F.	1
Lee, Yi-Hsuan	1
Lixong Gu	1
More ▼