ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	14

Descriptor

Comparative Analysis	26
Equated Scores	26
Scaling	26
Item Response Theory	9
College Entrance Examinations	8
Measurement Techniques	7
Educational Assessment	6
Statistical Analysis	6
Test Items	6
Evaluation Methods	5
Psychometrics	5
Test Construction	5
Test Interpretation	5
Achievement Tests	4
Classification	4
Educational Testing	4
Error of Measurement	4
Estimation (Mathematics)	4
Foreign Countries	4
Item Analysis	4
Latent Trait Theory	4
Mathematics Tests	4
Raw Scores	4
Regression (Statistics)	4
Scores	4
More ▼

Source

ETS Research Report Series	6
Measurement:…	3
Applied Psychological…	1
College Entrance Examination…	1
Educational Measurement:…	1
Educational Testing Service	1
Journal of Educational…	1
Ministerial Council on…	1
Pearson	1

Publication Type

Reports - Research	17
Journal Articles	12
Reports - Evaluative	6
Speeches/Meeting Papers	6
Opinion Papers	3
Books	1
Numerical/Quantitative Data	1

Education Level

Elementary Secondary Education	5
Higher Education	4
Postsecondary Education	4
High Schools	2
Secondary Education	2
Elementary Education	1
Grade 6	1

Audience

Researchers

Location

Australia	2
United Kingdom (England)	2
United Kingdom (Wales)	2
United States	2
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	9
Advanced Placement…	2
Test of English as a Foreign…	2
ACT Assessment	1
National Assessment of…	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Quantifying Error and Uncertainty Reductions in Scaling Functions: An ITEMS Module

Peer reviewed

Direct link

Moses, Tim – Educational Measurement: Issues and Practice, 2014

This module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and…

Descriptors: Scaling, Regression (Statistics), Equated Scores, Comparative Analysis

Equating Subscores Using Total Scaled Scores as an Anchor. Research Report. ETS RR-11-07

Download full text

Puhan, Gautam; Liang, Longjuan – Educational Testing Service, 2011

Because the demand for subscores is ever increasing, this study examined two different approaches for equating subscores: (a) equating a subscore on the new form to the same subscore in the old form using internal common items as the anchor to conduct the equating, and (b) equating a subscore on the new form to the same subscore in the old form…

Descriptors: Equated Scores, Scaling, Raw Scores, Methods

Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

Peer reviewed

Direct link

He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013

Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores

The Use of Quality Control and Data Mining Techniques for Monitoring Scaled Scores: An Overview. Research Report. ETS RR-12-20

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A. – ETS Research Report Series, 2012

Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…

Descriptors: Quality Control, Data Analysis, Trend Analysis, Scaling

The Stability of the Score Scales for the "SAT Reasoning Test"™ from 2005 to 2010. Research Report. ETS RR-12-15

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Liu, Jinghua; Curley, Edward; Dorans, Neil – ETS Research Report Series, 2012

This study examines the stability of the "SAT Reasoning Test"™ score scales from 2005 to 2010. A 2005 old form (OF) was administered along with a 2010 new form (NF). A new conversion for OF was derived through direct equipercentile equating. A comparison of the newly derived and the original OF conversions showed that Critical Reading…

Descriptors: Aptitude Tests, Cognitive Tests, Thinking Skills, Equated Scores

A Scale Drift Study. Research Report. ETS RR-09-43

Peer reviewed
PDF on ERIC

Download full text

Liu, Jinghua; Curley, Edward; Low, Albert – ETS Research Report Series, 2009

This study examines the stability of the SAT® scale from 1994 to 2001. A 1994 form and a 2001 form were readministered in a 2005 SAT administration, and the 1994 form was equated to the 2001 form. The new conversion was compared to the old conversion. Both the verbal and math sections exhibit a similar degree of scale drift, but in opposite…

Descriptors: College Entrance Examinations, Scaling, Verbal Tests, Mathematics Tests

Population Invariance of Vertical Scaling Results

Direct link

Powers, Sonya; Turhan, Ahmet; Binici, Salih – Pearson, 2012

The population sensitivity of vertical scaling results was evaluated for a state reading assessment spanning grades 3-10 and a state mathematics test spanning grades 3-8. Subpopulations considered included males and females. The 3-parameter logistic model was used to calibrate math and reading items and a common item design was used to construct…

Descriptors: Scaling, Equated Scores, Standardized Tests, Reading Tests

Defending the Quality of Links between Scores from Different Tests and Exams

Peer reviewed

Direct link

Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010

Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…

Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis

Conceptualizing Comparability

Peer reviewed

Direct link

Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010

This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…

Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis

Linking through Improved Design, Not Redefinition: Commentary on Newton

Peer reviewed

Direct link

Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010

"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…

Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques

Consistency of SAT® I: Reasoning Test Score Conversions. Research Report. ETS RR-08-67

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Guo, Hongwen; Liu, Jinghua; Dorans, Neil J. – ETS Research Report Series, 2008

This study uses historical data to explore the consistency of SAT® I: Reasoning Test score conversions and to examine trends in scaled score means. During the period from April 1995 to December 2003, both Verbal (V) and Math (M) means display substantial seasonality, and a slight increasing trend for both is observed. SAT Math means increase more…

Descriptors: College Entrance Examinations, Thinking Skills, Logical Thinking, Scaling

Reliability and the Nonequivalent Groups with Anchor Test Design. Research Report. ETS RR-07-16

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…

Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

Uncommon Measures: Equivalence and Linkage among Educational Tests.

Download full text

Feuer, Michael J., Ed.; Holland, Paul W., Ed.; Green, Bert F., Ed.; Bertenthal, Meryl W., Ed.; Hemphill, F. Cadelle, Ed. – 1999

A study was conducted of the feasibility of establishing an equivalency scale that would enable commercial state tests to be linked to one another and to the National Assessment of Educational Progress (NAEP). In evaluating the feasibility of linkages, the study committee focused on the linkage of various fourth-grade reading tests and the linkage…

Descriptors: Achievement Tests, Comparative Analysis, Elementary Secondary Education, Equated Scores

IRT versus Conventional Equating Methods: A Comparative Study of Scale Stability.

Peer reviewed

Petersen, Nancy S.; And Others – Journal of Educational Statistics, 1983

Three methods of test equating (linear, equipercentile, and item response theory) were investigated with respect to the issue of scale drift. Results indicate that all three models work well in limited settings but that the item response theory approach provided the most stable results overall. (JKS)

Descriptors: College Entrance Examinations, Comparative Analysis, Equated Scores, Item Analysis

Direct and Indirect Equating: A Comparison of Four Methods Using the Rasch Model.

Download full text

Morrison, Carol A.; Fitzpatrick, Steven J. – 1992

An attempt was made to determine which item response theory (IRT) equating method results in the least amount of equating error or "scale drift" when equating scores across one or more test forms. An internal anchor test design was employed with five different test forms, each consisting of 30 items, 10 in common with the base test and 5…

Descriptors: Comparative Analysis, Computer Simulation, Equated Scores, Error of Measurement

Previous Page | Next Page »

Pages: 1 | 2

Liu, Jinghua	3
Curley, Edward	2
Dorans, Neil J.	2
Guo, Hongwen	2
Moses, Tim	2
von Davier, Alina A.	2
Abdel-fattah, Abdel-fattah A.	1
Bertenthal, Meryl W., Ed.	1
Binici, Salih	1
Carstensen, Claus H.	1
Chen, Hanwei	1
Cresswell, Mike	1
Cui, Zhongmin	1
Donovan, Jenny	1
Dorans, Neil	1
Doron, Rina	1
Eignor, Daniel R.	1
Fang, Yu	1
Feuer, Michael J., Ed.	1
Fitzpatrick, Steven J.	1
Green, Bert F., Ed.	1
Haberman, Shelby J.	1
He, Yong	1
Hemphill, F. Cadelle, Ed.	1
More ▼