NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)0
Since 2006 (last 20 years)14
Audience
Researchers2
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 26 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Moses, Tim – Educational Measurement: Issues and Practice, 2014
This module describes and extends X-to-Y regression measures that have been proposed for use in the assessment of X-to-Y scaling and equating results. Measures are developed that are similar to those based on prediction error in regression analyses but that are directly suited to interests in scaling and equating evaluations. The regression and…
Descriptors: Scaling, Regression (Statistics), Equated Scores, Comparative Analysis
Puhan, Gautam; Liang, Longjuan – Educational Testing Service, 2011
Because the demand for subscores is ever increasing, this study examined two different approaches for equating subscores: (a) equating a subscore on the new form to the same subscore in the old form using internal common items as the anchor to conduct the equating, and (b) equating a subscore on the new form to the same subscore in the old form…
Descriptors: Equated Scores, Scaling, Raw Scores, Methods
Peer reviewed Peer reviewed
Direct linkDirect link
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei – Applied Psychological Measurement, 2013
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Descriptors: Regression (Statistics), Item Response Theory, Test Items, Equated Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
von Davier, Alina A. – ETS Research Report Series, 2012
Maintaining comparability of test scores is a major challenge faced by testing programs that have almost continuous administrations. Among the potential problems are scale drift and rapid accumulation of errors. Many standard quality control techniques for testing programs, which can effectively detect and address scale drift for small numbers of…
Descriptors: Quality Control, Data Analysis, Trend Analysis, Scaling
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Guo, Hongwen; Liu, Jinghua; Curley, Edward; Dorans, Neil – ETS Research Report Series, 2012
This study examines the stability of the "SAT Reasoning Test"™ score scales from 2005 to 2010. A 2005 old form (OF) was administered along with a 2010 new form (NF). A new conversion for OF was derived through direct equipercentile equating. A comparison of the newly derived and the original OF conversions showed that Critical Reading…
Descriptors: Aptitude Tests, Cognitive Tests, Thinking Skills, Equated Scores
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Liu, Jinghua; Curley, Edward; Low, Albert – ETS Research Report Series, 2009
This study examines the stability of the SAT® scale from 1994 to 2001. A 1994 form and a 2001 form were readministered in a 2005 SAT administration, and the 1994 form was equated to the 2001 form. The new conversion was compared to the old conversion. Both the verbal and math sections exhibit a similar degree of scale drift, but in opposite…
Descriptors: College Entrance Examinations, Scaling, Verbal Tests, Mathematics Tests
Powers, Sonya; Turhan, Ahmet; Binici, Salih – Pearson, 2012
The population sensitivity of vertical scaling results was evaluated for a state reading assessment spanning grades 3-10 and a state mathematics test spanning grades 3-8. Subpopulations considered included males and females. The 3-parameter logistic model was used to calibrate math and reading items and a common item design was used to construct…
Descriptors: Scaling, Equated Scores, Standardized Tests, Reading Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Cresswell, Mike – Measurement: Interdisciplinary Research and Perspectives, 2010
Paul Newton (2010), with his characteristic concern about theory, has set out two different ways of thinking about the basis upon which equivalences of one sort or another are established between test score scales. His reason for doing this is a desire to establish "the defensibility of linkages lower on the continuum than concordance."…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Newton, Paul E. – Measurement: Interdisciplinary Research and Perspectives, 2010
This article presents the author's rejoinder to thinking about linking from issue 8(1). Particularly within the more embracing linking frameworks, e.g., Holland & Dorans (2006) and Holland (2007), there appears to be a major disjunction between (1) classification discourse: the supposed basis for classification, that is, the underlying theory…
Descriptors: Foreign Countries, Measurement Techniques, Psychometrics, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Walker, Michael E. – Measurement: Interdisciplinary Research and Perspectives, 2010
"Linking" is a term given to a general class of procedures by which one represents scores X on one test or measure in terms of scores Y on another test or measure. A recent taxonomy by Holland and Dorans (2006; Holland, 2007) organizes the various types of links into three broad categories: prediction, scale aligning, and equating. In…
Descriptors: Foreign Countries, Test Construction, Test Validity, Measurement Techniques
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J.; Guo, Hongwen; Liu, Jinghua; Dorans, Neil J. – ETS Research Report Series, 2008
This study uses historical data to explore the consistency of SAT® I: Reasoning Test score conversions and to examine trends in scaled score means. During the period from April 1995 to December 2003, both Verbal (V) and Math (M) means display substantial seasonality, and a slight increasing trend for both is observed. SAT Math means increase more…
Descriptors: College Entrance Examinations, Thinking Skills, Logical Thinking, Scaling
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007
This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…
Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis
Feuer, Michael J., Ed.; Holland, Paul W., Ed.; Green, Bert F., Ed.; Bertenthal, Meryl W., Ed.; Hemphill, F. Cadelle, Ed. – 1999
A study was conducted of the feasibility of establishing an equivalency scale that would enable commercial state tests to be linked to one another and to the National Assessment of Educational Progress (NAEP). In evaluating the feasibility of linkages, the study committee focused on the linkage of various fourth-grade reading tests and the linkage…
Descriptors: Achievement Tests, Comparative Analysis, Elementary Secondary Education, Equated Scores
Peer reviewed Peer reviewed
Petersen, Nancy S.; And Others – Journal of Educational Statistics, 1983
Three methods of test equating (linear, equipercentile, and item response theory) were investigated with respect to the issue of scale drift. Results indicate that all three models work well in limited settings but that the item response theory approach provided the most stable results overall. (JKS)
Descriptors: College Entrance Examinations, Comparative Analysis, Equated Scores, Item Analysis
Morrison, Carol A.; Fitzpatrick, Steven J. – 1992
An attempt was made to determine which item response theory (IRT) equating method results in the least amount of equating error or "scale drift" when equating scores across one or more test forms. An internal anchor test design was employed with five different test forms, each consisting of 30 items, 10 in common with the base test and 5…
Descriptors: Comparative Analysis, Computer Simulation, Equated Scores, Error of Measurement
Previous Page | Next Page »
Pages: 1  |  2