ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	9

Descriptor

Test Items	9
Test Length	9
Item Response Theory	7
Error of Measurement	5
Comparative Analysis	4
Sample Size	3
Simulation	3
Statistical Analysis	3
Correlation	2
Equated Scores	2
Evaluation Criteria	2
Evaluation Methods	2
Factor Analysis	2
Item Analysis	2
Methods	2
Scores	2
Statistical Bias	2
Test Construction	2
Ability	1
Adaptive Testing	1
Computation	1
Computer Assisted Testing	1
Cutting Scores	1
Difficulty Level	1
Elementary Secondary Education	1
More ▼

Source

ETS Research Report Series

Author

Gu, Lixiong	2
Lee, Yi-Hsuan	2
Zhang, Jinming	2
Dorans, Neil J.	1
Feng, Yuling	1
Fu, Jianbin	1
Guo, Hongwen	1
Li, Feifei	1
Ling, Guangming	1
Lu, Ru	1
Lu, Ying	1
Manna, Venessa F.	1
Qu, Yanxuan	1
Ricker, Kathryn L.	1
von Davier, Alina A.	1
More ▼

Publication Type

Journal Articles	9
Reports - Research	9
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	1
Grade 3	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

Different Methods of Adjusting for Form Difficulty under the Rasch Model: Impact on Consistency of Assessment Results. Research Report. ETS RR-19-08

Peer reviewed
PDF on ERIC

Download full text

Manna, Venessa F.; Gu, Lixiong – ETS Research Report Series, 2019

When using the Rasch model, equating with a nonequivalent groups anchor test design is commonly achieved by adjustment of new form item difficulty using an additive equating constant. Using simulated 5-year data, this report compares 4 approaches to calculating the equating constants and the subsequent impact on equating results. The 4 approaches…

Descriptors: Item Response Theory, Test Items, Test Construction, Sample Size

A Modified "a"-Stratified Method for Computerized Adaptive Testing. Research Report. ETS RR-19-10

Peer reviewed
PDF on ERIC

Download full text

Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019

Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items

A Comparison of Score Aggregation Methods for Unidimensional Tests on Different Dimensions. Research Report. ETS RR-18-01

Peer reviewed
PDF on ERIC

Download full text

Fu, Jianbin; Feng, Yuling – ETS Research Report Series, 2018

In this study, we propose aggregating test scores with unidimensional within-test structure and multidimensional across-test structure based on a 2-level, 1-factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1-factor model based on the…

Descriptors: Comparative Analysis, Scores, Correlation, Standardized Tests

An Information-Correction Method for Testlet-Based Test Analysis: From the Perspectives of Item Response Theory and Generalizability Theory. Research Report. ETS RR-17-27

Peer reviewed
PDF on ERIC

Download full text

Li, Feifei – ETS Research Report Series, 2017

An information-correction method for testlet-based tests is introduced. This method takes advantage of both generalizability theory (GT) and item response theory (IRT). The measurement error for the examinee proficiency parameter is often underestimated when a unidimensional conditional-independence IRT model is specified for a testlet dataset. By…

Descriptors: Item Response Theory, Generalizability Theory, Tests, Error of Measurement

Variability in Percentage above Cut Scores Due to Discreteness in Score Scale. Research Report. ETS RR-17-32

Peer reviewed
PDF on ERIC

Download full text

Lu, Ying – ETS Research Report Series, 2017

For standard- or criterion-based assessments, the use of cut scores to indicate mastery, nonmastery, or different levels of skill mastery is very common. As part of performance summary, it is of interest to examine the percentage of examinees at or above the cut scores (PAC) and how PAC evolves across administrations. This paper shows that…

Descriptors: Cutting Scores, Evaluation Methods, Mastery Learning, Performance Based Assessment

Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

Descriptors: Test Bias, Item Response Theory, Test Items, Scores

Comparing Different Approaches of Bias Correction for Ability Estimation in IRT Models. Research Report. ETS RR-08-13

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2008

The method of maximum-likelihood is typically applied to item response theory (IRT) models when the ability parameter is estimated while conditioning on the true item parameters. In practice, the item parameters are unknown and need to be estimated first from a calibration sample. Lewis (1985) and Zhang and Lu (2007) proposed the expected response…

Descriptors: Item Response Theory, Comparative Analysis, Computation, Ability

The Impact of Anchor Test Length on Equating Results in a Nonequivalent Groups Design. Research Report. ETS RR-07-44

Peer reviewed
PDF on ERIC

Download full text

Ricker, Kathryn L.; von Davier, Alina A. – ETS Research Report Series, 2007

This study explored the effects of external anchor test length on final equating results of several equating methods, including equipercentile (frequency estimation), chained equipercentile, kernel equating (KE) poststratification PSE with optimal bandwidths, and KE PSE linear (large bandwidths) when using the nonequivalent groups anchor test…

Descriptors: Equated Scores, Test Items, Statistical Analysis, Test Length