ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	8

Descriptor

Error of Measurement	10
Item Analysis	10
Test Length	10
Test Items	8
Item Response Theory	6
Comparative Analysis	5
Adaptive Testing	4
Computer Assisted Testing	4
Sample Size	4
Monte Carlo Methods	3
Simulation	3
Accuracy	2
Evaluation Problems	2
Guidelines	2
Item Banks	2
Academic Achievement	1
Achievement Tests	1
Biology	1
Classification	1
College Students	1
Comparative Testing	1
Decision Making	1
Difficulty Level	1
Effect Size	1
Evaluation Criteria	1
More ▼

Source

ETS Research Report Series	2
Applied Measurement in…	1
Applied Psychological…	1
Educational and Psychological…	1
Journal of Educational…	1
ProQuest LLC	1
Psychological Methods	1
Psychometrika	1

Author

Bejar, Isaac I.	1
Dorans, Neil J.	1
Emons, Wilco H. M.	1
Finch, Holmes	1
Goodrich, J. Marc	1
Gu, Lixiong	1
Guo, Hongwen	1
Huang, Feifei	1
Huo, Yan	1
Koziol, Natalie A.	1
Lee, Won-Chan	1
Li, Yixing	1
Li, Zonglong	1
Ling, Guangming	1
Lixin Yuan	1
Lu, Ru	1
Meijer, Rob R.	1
Minqiang Zhang	1
Qu, Yanxuan	1
Shaojie Wang	1
Sijtsma, Klaas	1
Wang, Shaojie	1
Won-Chan Lee	1
Yao, Lihua	1
More ▼

Publication Type

Journal Articles	8
Reports - Research	7
Reports - Evaluative	2
Dissertations/Theses -…	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 10 results Save | Export

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

A Regression Discontinuity Design Framework for Controlling Selection Bias in Evaluations of Differential Item Functioning

Peer reviewed

Direct link

Koziol, Natalie A.; Goodrich, J. Marc; Yoon, HyeonJin – Educational and Psychological Measurement, 2022

Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A…

Descriptors: Regression (Statistics), Item Analysis, Validity, Testing Accommodations

Robustness of Weighted Differential Item Functioning (DIF) Analysis: The Case of Mantel-Haenszel DIF Statistics. Research Report. ETS RR-21-12

Peer reviewed
PDF on ERIC

Download full text

Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021

Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…

Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis

A Modified "a"-Stratified Method for Computerized Adaptive Testing. Research Report. ETS RR-19-10

Peer reviewed
PDF on ERIC

Download full text

Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019

Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

Peer reviewed

Direct link

Yao, Lihua – Psychometrika, 2012

Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…

Descriptors: Item Banks, Test Length, Simulation, Adaptive Testing

Variable-Length Computerized Adaptive Testing: Adaptation of the A-Stratified Strategy in Item Selection with Content Balancing

Direct link

Huo, Yan – ProQuest LLC, 2009

Variable-length computerized adaptive testing (CAT) can provide examinees with tailored test lengths. With the fixed standard error of measurement ("SEM") termination rule, variable-length CAT can achieve predetermined measurement precision by using relatively shorter tests compared to fixed-length CAT. To explore the application of…

Descriptors: Test Length, Test Items, Adaptive Testing, Item Analysis

On the Consistency of Individual Classification Using Short Scales

Peer reviewed

Direct link

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…

Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio

Peer reviewed

Direct link

Finch, Holmes – Applied Psychological Measurement, 2005

This study compares the ability of the multiple indicators, multiple causes (MIMIC) confirmatory factor analysis model to correctly identify cases of differential item functioning (DIF) with more established methods. Although the MIMIC model might have application in identifying DIF for multiple grouping variables, there has been little…

Descriptors: Identification, Factor Analysis, Test Bias, Models

An Information Comparison of Conventional and Adaptive Tests in the Measurement of Classroom Achievement. Research Report 77-7.

Download full text

Bejar, Isaac I.; And Others – 1977

Information provided by typical and improved conventional classroom achievement tests was compared with information provided by an adaptive test covering the same subject matter. Both tests were administered to over 700 college students in a general biology course. Using the same scoring method, adaptive testing was found to yield substantially…

Descriptors: Academic Achievement, Achievement Tests, Adaptive Testing, Biology