ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	10

Descriptor

Scores	13
Simulation	13
Test Length	13
Test Items	9
Item Response Theory	8
Error of Measurement	7
Comparative Analysis	6
Computer Assisted Testing	4
Test Reliability	4
Goodness of Fit	3
Models	3
Test Format	3
Adaptive Testing	2
Bayesian Statistics	2
Difficulty Level	2
Language Tests	2
Nonparametric Statistics	2
Sample Size	2
Test Construction	2
Test Content	2
Ability Identification	1
Achievement Tests	1
Cheating	1
Computation	1
Correlation	1
More ▼

Source

Applied Psychological…	3
ETS Research Report Series	2
ProQuest LLC	2
Education and Information…	1
Grantee Submission	1
International Journal of…	1
Journal of Educational…	1

Publication Type

Journal Articles	9
Reports - Research	7
Reports - Evaluative	3
Dissertations/Theses -…	2
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Armed Forces Qualification…

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Modified Item-Fit Indices for Dichotomous IRT Models with Missing Data

Peer reviewed
PDF on ERIC

Download full text

Direct link

Xue Zhang; Chun Wang – Grantee Submission, 2022

Item-level fit analysis not only serves as a complementary check to global fit analysis, it is also essential in scale development because the fit results will guide item revision and/or deletion (Liu & Maydeu-Olivares, 2014). During data collection, missing response data may likely happen due to various reasons. Chi-square-based item fit…

Descriptors: Goodness of Fit, Item Response Theory, Scores, Test Length

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

A Comparison of Score Aggregation Methods for Unidimensional Tests on Different Dimensions. Research Report. ETS RR-18-01

Peer reviewed
PDF on ERIC

Download full text

Fu, Jianbin; Feng, Yuling – ETS Research Report Series, 2018

In this study, we propose aggregating test scores with unidimensional within-test structure and multidimensional across-test structure based on a 2-level, 1-factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1-factor model based on the…

Descriptors: Comparative Analysis, Scores, Correlation, Standardized Tests

Identifying Aberrant Responding: Use of Multiple Measures

Direct link

Steinkamp, Susan Christa – ProQuest LLC, 2017

For test scores that rely on the accurate estimation of ability via an IRT model, their use and interpretation is dependent upon the assumption that the IRT model fits the data. Examinees who do not put forth full effort in answering test questions, have prior knowledge of test content, or do not approach a test with the intent of answering…

Descriptors: Test Items, Item Response Theory, Scores, Test Wiseness

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Conditions Affecting the Accuracy of Classical Equating Methods for Small Samples under the NEAT Design: A Simulation Study

Direct link

Sunnassee, Devdass – ProQuest LLC, 2011

Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under various test characteristic and sampling conditions. The equating methods considered are typically…

Descriptors: Test Length, Test Format, Sample Size, Simulation

A Comparison of Item Fit Statistics for Mixed IRT Models

Peer reviewed

Direct link

Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010

In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…

Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation

Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

Peer reviewed

Direct link

Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008

This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…

Descriptors: Test Length, Test Content, Simulation, Computation

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Influence of Test and Person Characteristics on Nonparametric Appropriateness Measurement.

Peer reviewed

Meijer, Rob R.; And Others – Applied Psychological Measurement, 1994

The power of the nonparametric person-fit statistic, U3, is investigated through simulations as a function of item characteristics, test characteristics, person characteristics, and the group to which examinees belong. Results suggest conditions under which relatively short tests can be used for person-fit analysis. (SLD)

Descriptors: Difficulty Level, Group Membership, Item Response Theory, Nonparametric Statistics

Bias and Information of Bayesian Adaptive Testing. Research Report 83-2.

Download full text

Weiss, David J.; McBride, James R. – 1983

Monte Carlo simulation was used to investigate score bias and information characteristics of Owen's Bayesian adaptive testing strategy, and to examine possible causes of score bias. Factors investigated in three related studies included effects of item discrimination, effects of fixed vs. variable test length, and effects of an accurate prior…

Descriptors: Ability Identification, Adaptive Testing, Bayesian Statistics, Computer Assisted Testing

Setting Standards on Performance Assessments: Promising New Methods and Technical Issues.

Download full text

Hambleton, Ronald K. – 1995

Performance assessments in education and credentialing are becoming popular. At the same time, there do not exist any well established and validated methods for setting standards on performance assessments. This paper describes several of the new standard-setting methods that are emerging for use with performance assessments and considers their…

Descriptors: Achievement Tests, Cutting Scores, Holistic Evaluation, Licensing Examinations (Professions)

Chon, Kyong Hee	1
Chun Wang	1
Cui, Zhongmin	1
Dunbar, Stephen B.	1
Emons, Wilco H. M.	1
Feng, Yuling	1
Fu, Jianbin	1
Gelbal, Selahattin	1
Hambleton, Ronald K.	1
Kolen, Michael J.	1
Kruyen, Peter M.	1
Lee, Won-Chan	1
McBride, James R.	1
Meijer, Rob R.	1
Ozdemir, Burhanettin	1
Patsula, Liane	1
Rizavi, Saba	1
Rotou, Ourania	1
Sijtsma, Klaas	1
Steffen, Manfred	1
Steinkamp, Susan Christa	1
Sunnassee, Devdass	1
Weiss, David J.	1
Xue Zhang	1
More ▼