Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 16 |
Descriptor
Comparative Analysis | 16 |
Error of Measurement | 16 |
Item Response Theory | 8 |
Equated Scores | 7 |
Computation | 6 |
Simulation | 6 |
Test Items | 6 |
Statistical Analysis | 5 |
Ability | 4 |
Regression (Statistics) | 4 |
Statistical Bias | 4 |
More ▼ |
Source
ETS Research Report Series | 16 |
Author
Kim, Sooyeon | 3 |
Liu, Jinghua | 3 |
Oranje, Andreas | 3 |
Lee, Yi-Hsuan | 2 |
Li, Deping | 2 |
Livingston, Samuel A. | 2 |
Moses, Tim | 2 |
Qian, Jiahe | 2 |
Zhang, Jinming | 2 |
Braun, Henry | 1 |
Casabianca, Jodi | 1 |
More ▼ |
Publication Type
Journal Articles | 16 |
Reports - Research | 16 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Secondary Education | 2 |
Elementary Education | 1 |
Grade 4 | 1 |
Grade 8 | 1 |
High Schools | 1 |
Intermediate Grades | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Audience
Location
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
National Assessment of… | 3 |
SAT (College Admission Test) | 2 |
Praxis Series | 1 |
What Works Clearinghouse Rating
Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry – ETS Research Report Series, 2015
The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…
Descriptors: Item Response Theory, Computation, Statistical Bias, Error of Measurement
Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013
The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…
Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation
Guo, Hongwen; Liu, Jinghua; Curley, Edward; Dorans, Neil – ETS Research Report Series, 2012
This study examines the stability of the "SAT Reasoning Test"™ score scales from 2005 to 2010. A 2005 old form (OF) was administered along with a 2010 new form (NF). A new conversion for OF was derived through direct equipercentile equating. A comparison of the newly derived and the original OF conversions showed that Critical Reading…
Descriptors: Aptitude Tests, Cognitive Tests, Thinking Skills, Equated Scores
Zu, Jiyun; Liu, Jinghua – ETS Research Report Series, 2009
Equating of tests composed of both discrete and passage-based items using the nonequivalent groups with anchor test (NEAT) design is popular in practice. This study investigated the impact of discrete anchor items and passage-based anchor items on observed score equating via simulation. Results suggested that an anchor with a larger proportion of…
Descriptors: Comparative Analysis, Equated Scores, Test Items, Simulation
Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009
A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…
Descriptors: Sampling, Sample Size, Accuracy, Test Items
Li, Deping; Oranje, Andreas; Jiang, Yanlin – ETS Research Report Series, 2007
The hierarchical latent regression model (HLRM) is a flexible framework for estimating group-level proficiency while taking into account the complex sample designs often found in large-scale educational surveys. A complex assessment design in which information is collected at different levels (such as student, school, and district), the model also…
Descriptors: Hierarchical Linear Modeling, Regression (Statistics), Computation, Comparative Analysis
Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2008
The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…
Descriptors: Item Response Theory, Comparative Analysis, Computation, Ability
Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – ETS Research Report Series, 2008
Will reporting subscores provide any additional information than the total score? Is there a method that can be used to provide more trustworthy subscores than observed subscores? These 2 questions are addressed in this study. To answer the 2nd question, 2 subscore estimation methods (i.e., subscore estimated from the observed total score or…
Descriptors: Comparative Analysis, Scores, Tests, Certification
Zhang, Jinming; Lu, Ting – ETS Research Report Series, 2007
In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter…
Descriptors: Item Response Theory, Ability, Error of Measurement, Maximum Likelihood Statistics
Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007
This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…
Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis
von Davier, Alina A.; Holland, Paul W.; Livingston, Samuel A.; Casabianca, Jodi; Grant, Mary C.; Martin, Kathleen – ETS Research Report Series, 2006
This study examines how closely the kernel equating (KE) method (von Davier, Holland, & Thayer, 2004a) approximates the results of other observed-score equating methods--equipercentile and linear equatings. The study used pseudotests constructed of item responses from a real test to simulate three equating designs: an equivalent groups (EG)…
Descriptors: Equated Scores, Statistical Analysis, Simulation, Tests
Braun, Henry; Qian, Jiahe – ETS Research Report Series, 2008
This report describes the derivation and evaluation of a method for comparing the performance standards for public school students set by different states. It is based on an approach proposed by McLaughlin and associates, which constituted an innovative attempt to resolve the confusion and concern that occurs when very different proportions of…
Descriptors: State Standards, Comparative Analysis, Public Schools, National Competency Tests
Liu, Jinghua; Low, Albert C. – ETS Research Report Series, 2007
This study applied kernel equating (KE) in two scenarios: equating to a very similar population and equating to a very different population, referred to as a distant population, using SAT® data. The KE results were compared to the results obtained from analogous classical equating methods in both scenarios. The results indicate that KE results are…
Descriptors: College Entrance Examinations, Equated Scores, Comparative Analysis, Evaluation Methods
Li, Deping; Oranje, Andreas – ETS Research Report Series, 2007
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Descriptors: Error of Measurement, Regression (Statistics), Trend Analysis, National Competency Tests
Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007
Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…
Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models
Previous Page | Next Page »
Pages: 1 | 2