ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	4
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	13

Descriptor

Evaluation Methods	15
Simulation	15
Test Length	15
Item Response Theory	12
Test Items	8
Sample Size	5
Computation	4
Correlation	4
Goodness of Fit	4
Item Analysis	4
Monte Carlo Methods	4
Bias	3
Cutting Scores	3
Error Patterns	3
Evaluation Research	3
Psychometrics	3
Accuracy	2
Classification	2
Computer Assisted Testing	2
Error of Measurement	2
Nonparametric Statistics	2
Psychological Studies	2
Research Methodology	2
Scoring	2
Test Bias	2
More ▼

Source

Educational and Psychological…	3
Applied Psychological…	2
Grantee Submission	2
Journal of Educational…	2
Applied Measurement in…	1
ETS Research Report Series	1
International Journal of…	1
Measurement:…	1
ProQuest LLC	1

Publication Type

Journal Articles	12
Reports - Research	10
Reports - Evaluative	3
Dissertations/Theses -…	1
Numerical/Quantitative Data	1
Reports - Descriptive	1

Education Level

Elementary Secondary Education

Audience

Location

Taiwan

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

Assessing Dimensionality of IRT Models Using Traditional and Revised Parallel Analyses

Peer reviewed

Direct link

Guo, Wenjing; Choi, Youn-Jeng – Educational and Psychological Measurement, 2023

Determining the number of dimensions is extremely important in applying item response theory (IRT) models to data. Traditional and revised parallel analyses have been proposed within the factor analysis framework, and both have shown some promise in assessing dimensionality. However, their performance in the IRT framework has not been…

Descriptors: Item Response Theory, Evaluation Methods, Factor Analysis, Guidelines

There Are Many Greater Lower Bounds than Cronbach's [alpha]: A Monte Carlo Simulation Study

Peer reviewed

Direct link

Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023

A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…

Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation

A Note on Improving Variational Estimation for Multidimensional Item Response Theory

Peer reviewed

Direct link

Chenchen Ma; Jing Ouyang; Chun Wang; Gongjun Xu – Grantee Submission, 2024

Survey instruments and assessments are frequently used in many domains of social science. When the constructs that these assessments try to measure become multifaceted, multidimensional item response theory (MIRT) provides a unified framework and convenient statistical tool for item analysis, calibration, and scoring. However, the computational…

Descriptors: Algorithms, Item Response Theory, Scoring, Accuracy

Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data

Peer reviewed
PDF on ERIC

Download full text

Direct link

Xue Zhang; Chun Wang – Grantee Submission, 2022

Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit…

Descriptors: Goodness of Fit, Item Response Theory, Scores, Test Length

Variability in Percentage above Cut Scores Due to Discreteness in Score Scale. Research Report. ETS RR-17-32

Peer reviewed
PDF on ERIC

Download full text

Lu, Ying – ETS Research Report Series, 2017

For standard- or criterion-based assessments, the use of cut scores to indicate mastery, nonmastery, or different levels of skill mastery is very common. As part of performance summary, it is of interest to examine the percentage of examinees at or above the cut scores (PAC) and how PAC evolves across administrations. This paper shows that…

Descriptors: Cutting Scores, Evaluation Methods, Mastery Learning, Performance Based Assessment

A Nonparametric Approach to Estimate Classification Accuracy and Consistency

Peer reviewed

Direct link

Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014

When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…

Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics

Adjusting the Adjusted X[superscript 2]/df Ratio Statistic for Dichotomous Item Response Theory Analyses: Does the Model Fit?

Peer reviewed

Direct link

Tay, Louis; Drasgow, Fritz – Educational and Psychological Measurement, 2012

Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…

Descriptors: Test Length, Monte Carlo Methods, Goodness of Fit, Item Response Theory

Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores

Peer reviewed

Direct link

Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…

Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items

Computerized Adaptive Testing with the Zinnes and Griggs Pairwise Preference Ideal Point Model

Peer reviewed

Direct link

Stark, Stephen; Chernyshenko, Oleksandr S. – International Journal of Testing, 2011

This article delves into a relatively unexplored area of measurement by focusing on adaptive testing with unidimensional pairwise preference items. The use of such tests is becoming more common in applied non-cognitive assessment because research suggests that this format may help to reduce certain types of rater error and response sets commonly…

Descriptors: Test Length, Simulation, Adaptive Testing, Item Analysis

Improving IRT Parameter Estimates with Small Sample Sizes: Evaluating the Efficacy of a New Data Augmentation Technique

Direct link

Foley, Brett Patrick – ProQuest LLC, 2010

The 3PL model is a flexible and widely used tool in assessment. However, it suffers from limitations due to its need for large sample sizes. This study introduces and evaluates the efficacy of a new sample size augmentation technique called Duplicate, Erase, and Replace (DupER) Augmentation through a simulation study. Data are augmented using…

Descriptors: Test Length, Sample Size, Simulation, Item Response Theory

Ramsay-Curve Item Response Theory for the Three-Parameter Logistic Item Response Model

Peer reviewed

Direct link

Woods, Carol M. – Applied Psychological Measurement, 2008

In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…

Descriptors: Test Length, Computation, Item Response Theory, Maximum Likelihood Statistics

Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

Peer reviewed

Direct link

Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008

Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

Factors Influencing the Mantel and Generalized Mantel-Haenszel Methods for the Assessment of Differential Item Functioning in Polytomous Items

Peer reviewed

Direct link

Wang, Wen-Chung; Su, Ya-Hui – Applied Psychological Measurement, 2004

Eight independent variables (differential item functioning [DIF] detection method, purification procedure, item response model, mean latent trait difference between groups, test length, DIF pattern, magnitude of DIF, and percentage of DIF items) were manipulated, and two dependent variables (Type I error and power) were assessed through…

Descriptors: Test Length, Test Bias, Simulation, Item Response Theory

Test Speededness under Number-Right Scoring: An Analysis of the Test of English as a Foreign Language.

Download full text

Bejar, Isaac I. – 1985

The Test of English as a Foreign Language (TOEFL) was used in this study, which attempted to develop a new methodology for assessing the speededness of right-scored tests. Traditional procedures of assessing speededness have assumed that the test is scored under formula-scoring instructions; this approach is not always appropriate. In this study,…

Descriptors: College Entrance Examinations, English (Second Language), Estimation (Mathematics), Evaluation Methods

Chun Wang	2
Batinic, Bernad	1
Bejar, Isaac I.	1
Bolt, Daniel M.	1
Chenchen Ma	1
Cheng, Ying	1
Chernyshenko, Oleksandr S.	1
Choi, Youn-Jeng	1
Cui, Ying	1
Drasgow, Fritz	1
Foley, Brett Patrick	1
Gnambs, Timo	1
Gongjun Xu	1
Guo, Wenjing	1
Jing Ouyang	1
Lathrop, Quinn N.	1
Leighton, Jacqueline P.	1
Lu, Ying	1
Novak, Josip	1
Rebernjak, Blaž	1
Stark, Stephen	1
Su, Ya-Hui	1
Tay, Louis	1
Wang, Wen-Chung	1
Wells, Craig S.	1
More ▼