ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	16

Descriptor

Comparative Analysis	16
Error of Measurement	16
Item Response Theory	8
Equated Scores	7
Computation	6
Simulation	6
Test Items	6
Statistical Analysis	5
Ability	4
Regression (Statistics)	4
Statistical Bias	4
Maximum Likelihood Statistics	3
National Competency Tests	3
Scores	3
Test Format	3
Accuracy	2
College Entrance Examinations	2
Hierarchical Linear Modeling	2
Mathematics	2
Models	2
Multiple Choice Tests	2
Raw Scores	2
Sample Size	2
Sampling	2
Scaling	2
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	16
Reports - Research	16
Speeches/Meeting Papers	1

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	2
Elementary Education	1
Grade 4	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1

Audience

Location

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	3
SAT (College Admission Test)	2
Praxis Series	1

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Effectiveness of Item Response Theory (IRT) Proficiency Estimation Methods under Adaptive Multistage Testing. Research Report. ETS RR-15-11

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim; Yoo, Hanwook Henry – ETS Research Report Series, 2015

The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths…

Descriptors: Item Response Theory, Computation, Statistical Bias, Error of Measurement

Exploring Alternative Test Form Linking Designs with Modified Equating Sample Size and Anchor Test Length. Research Report. ETS RR-13-02

Peer reviewed
PDF on ERIC

Download full text

Wang, Lin; Qian, Jiahe; Lee, Yi-Hsuan – ETS Research Report Series, 2013

The purpose of this study was to evaluate the combined effects of reduced equating sample size and shortened anchor test length on item response theory (IRT)-based linking and equating results. Data from two independent operational forms of a large-scale testing program were used to establish the baseline results for evaluating the results from…

Descriptors: Test Construction, Item Response Theory, Testing Programs, Simulation

The Stability of the Score Scales for the "SAT Reasoning Test"™ from 2005 to 2010. Research Report. ETS RR-12-15

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Liu, Jinghua; Curley, Edward; Dorans, Neil – ETS Research Report Series, 2012

This study examines the stability of the "SAT Reasoning Test"™ score scales from 2005 to 2010. A 2005 old form (OF) was administered along with a 2010 new form (NF). A new conversion for OF was derived through direct equipercentile equating. A comparison of the newly derived and the original OF conversions showed that Critical Reading…

Descriptors: Aptitude Tests, Cognitive Tests, Thinking Skills, Equated Scores

Comparison of the Effects of Discrete Anchor Items and Assage-Based Anchor Items on Observed-Score Equating Results. Research Report. ETS RR-09-44

Peer reviewed
PDF on ERIC

Download full text

Zu, Jiyun; Liu, Jinghua – ETS Research Report Series, 2009

Equating of tests composed of both discrete and passage-based items using the nonequivalent groups with anchor test (NEAT) design is popular in practice. This study investigated the impact of discrete anchor items and passage-based anchor items on observed score equating via simulation. Results suggested that an anchor with a larger proportion of…

Descriptors: Comparative Analysis, Equated Scores, Test Items, Simulation

Methods of Linking with Small Samples in a Common-Item Design: An Empirical Comparison. Research Report. ETS RR-09-38

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Livingston, Samuel A. – ETS Research Report Series, 2009

A series of resampling studies was conducted to compare the accuracy of equating in a common item design using four different methods: chained equipercentile equating of smoothed distributions, chained linear equating, chained mean equating, and the circle-arc method. Four operational test forms, each containing more than 100 items, were used for…

Descriptors: Sampling, Sample Size, Accuracy, Test Items

Parameter Recovery and Subpopulation Proficiency Estimation in Hierarchical Latent Regression Models. Research Report. ETS RR-07-27

Peer reviewed
PDF on ERIC

Download full text

Li, Deping; Oranje, Andreas; Jiang, Yanlin – ETS Research Report Series, 2007

The hierarchical latent regression model (HLRM) is a flexible framework for estimating group-level proficiency while taking into account the complex sample designs often found in large-scale educational surveys. A complex assessment design in which information is collected at different levels (such as student, school, and district), the model also…

Descriptors: Hierarchical Linear Modeling, Regression (Statistics), Computation, Comparative Analysis

Comparing Different Approaches of Bias Correction for Ability Estimation in IRT Models. Research Report. ETS RR-08-13

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2008

The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…

Descriptors: Item Response Theory, Comparative Analysis, Computation, Ability

Comparison of Subscores Based on Classical Test Theory Methods. Research Report. ETS RR-08-54

Peer reviewed
PDF on ERIC

Download full text

Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – ETS Research Report Series, 2008

Will reporting subscores provide any additional information than the total score? Is there a method that can be used to provide more trustworthy subscores than observed subscores? These 2 questions are addressed in this study. To answer the 2nd question, 2 subscore estimation methods (i.e., subscore estimated from the observed total score or…

Descriptors: Comparative Analysis, Scores, Tests, Certification

Refinement of a Bias-Correction Procedure for the Weighted Likelihood Estimator of Ability. Research Report. ETS RR-07-23

Peer reviewed
PDF on ERIC

Download full text

Zhang, Jinming; Lu, Ting – ETS Research Report Series, 2007

In practical applications of item response theory (IRT), item parameters are usually estimated first from a calibration sample. After treating these estimates as fixed and known, ability parameters are then estimated. However, the statistical inferences based on the estimated abilities can be misleading if the uncertainty of the item parameter…

Descriptors: Item Response Theory, Ability, Error of Measurement, Maximum Likelihood Statistics

Reliability and the Nonequivalent Groups with Anchor Test Design. Research Report. ETS RR-07-16

Peer reviewed
PDF on ERIC

Download full text

Moses, Tim; Kim, Sooyeon – ETS Research Report Series, 2007

This study evaluated the impact of unequal reliability on test equating methods in the nonequivalent groups with anchor test (NEAT) design. Classical true score-based models were compared in terms of their assumptions about how reliability impacts test scores. These models were related to treatment of population ability differences by different…

Descriptors: Reliability, Equated Scores, Test Items, Statistical Analysis

An Evaluation of the Kernel Equating Method: A Special Study with Pseudotests Constructed from Real Test Data. Research Report. ETS RR-06-02

Peer reviewed
PDF on ERIC

Download full text

von Davier, Alina A.; Holland, Paul W.; Livingston, Samuel A.; Casabianca, Jodi; Grant, Mary C.; Martin, Kathleen – ETS Research Report Series, 2006

This study examines how closely the kernel equating (KE) method (von Davier, Holland, & Thayer, 2004a) approximates the results of other observed-score equating methods--equipercentile and linear equatings. The study used pseudotests constructed of item responses from a real test to simulate three equating designs: an equivalent groups (EG)…

Descriptors: Equated Scores, Statistical Analysis, Simulation, Tests

Mapping State Standards to the NAEP Scale. Research Report. ETS RR-08-57

Peer reviewed
PDF on ERIC

Download full text

Braun, Henry; Qian, Jiahe – ETS Research Report Series, 2008

This report describes the derivation and evaluation of a method for comparing the performance standards for public school students set by different states. It is based on an approach proposed by McLaughlin and associates, which constituted an innovative attempt to resolve the confusion and concern that occurs when very different proportions of…

Descriptors: State Standards, Comparative Analysis, Public Schools, National Competency Tests

An Exploration of Kernel Equating Using SAT® Data: Equating to a Similar Population and to a Distant Population. Research Report. ETS RR-07-17

Peer reviewed
PDF on ERIC

Download full text

Liu, Jinghua; Low, Albert C. – ETS Research Report Series, 2007

This study applied kernel equating (KE) in two scenarios: equating to a very similar population and equating to a very different population, referred to as a distant population, using SAT® data. The KE results were compared to the results obtained from analogous classical equating methods in both scenarios. The results indicate that KE results are…

Descriptors: College Entrance Examinations, Equated Scores, Comparative Analysis, Evaluation Methods

Estimation of Standard Error of Regression Effects in Latent Regression Models Using Binder's Linearization. Research Report. ETS RR-07-09

Peer reviewed
PDF on ERIC

Download full text

Li, Deping; Oranje, Andreas – ETS Research Report Series, 2007

Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…

Descriptors: Error of Measurement, Regression (Statistics), Trend Analysis, National Competency Tests

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Previous Page | Next Page »

Pages: 1 | 2

Kim, Sooyeon	3
Liu, Jinghua	3
Oranje, Andreas	3
Lee, Yi-Hsuan	2
Li, Deping	2
Livingston, Samuel A.	2
Moses, Tim	2
Qian, Jiahe	2
Zhang, Jinming	2
Braun, Henry	1
Casabianca, Jodi	1
Curley, Edward	1
Deping, Li	1
Dorans, Neil	1
Grant, Mary C.	1
Guo, Hongwen	1
Haberman, Shelby	1
Holland, Paul W.	1
Jiang, Yanlin	1
Larkin, Kevin	1
Low, Albert C.	1
Lu, Ting	1
Martin, Kathleen	1
Patsula, Liane	1
Puhan, Gautam	1
More ▼