NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)4
Since 2006 (last 20 years)12
Audience
Location
Germany2
Canada1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 22 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bashkov, Bozhidar M.; Clauser, Jerome C. – Practical Assessment, Research & Evaluation, 2019
Successful testing programs rely on high-quality test items to produce reliable scores and defensible exams. However, determining what statistical screening criteria are most appropriate to support these goals can be daunting. This study describes and demonstrates cost-benefit analysis as an empirical approach to determining appropriate screening…
Descriptors: Test Items, Test Reliability, Evaluation Criteria, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Maeda, Hotaka; Zhang, Bo – International Journal of Testing, 2017
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
Descriptors: Cheating, Test Items, Mathematics, Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Chalmers, R. Philip; Counsell, Alyssa; Flora, David B. – Educational and Psychological Measurement, 2016
Differential test functioning, or DTF, occurs when one or more items in a test demonstrate differential item functioning (DIF) and the aggregate of these effects are witnessed at the test level. In many applications, DTF can be more important than DIF when the overall effects of DIF at the test level can be quantified. However, optimal statistical…
Descriptors: Test Bias, Sampling, Test Items, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Hecht, Martin; Weirich, Sebastian; Siegle, Thilo; Frey, Andreas – Educational and Psychological Measurement, 2015
Multiple matrix designs are commonly used in large-scale assessments to distribute test items to students. These designs comprise several booklets, each containing a subset of the complete item pool. Besides reducing the test burden of individual students, using various booklets allows aligning the difficulty of the presented items to the assumed…
Descriptors: Measurement, Item Sampling, Statistical Analysis, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Hopfenbeck, Therese N.; Lenkeit, Jenny; El Masri, Yasmine; Cantrell, Kate; Ryan, Jeanne; Baird, Jo-Anne – Scandinavian Journal of Educational Research, 2018
International large-scale assessments are on the rise, with the Programme for International Student Assessment (PISA) seen by many as having strategic prominence in education policy debates. The present article reviews PISA-related English-language peer-reviewed articles from the programme's first cycle in 2000 to its most current in 2015. Five…
Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Qian, Jiahe; Jiang, Yanming; von Davier, Alina A. – ETS Research Report Series, 2013
Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal…
Descriptors: Item Response Theory, Test Items, Sampling, True Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Babcock, Ben – Applied Psychological Measurement, 2011
Relatively little research has been conducted with the noncompensatory class of multidimensional item response theory (MIRT) models. A Monte Carlo simulation study was conducted exploring the estimation of a two-parameter noncompensatory item response theory (IRT) model. The estimation method used was a Metropolis-Hastings within Gibbs algorithm…
Descriptors: Item Response Theory, Sampling, Computation, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Duong, Minh Q.; von Davier, Alina A. – International Journal of Testing, 2012
Test equating is a statistical procedure for adjusting for test form differences in difficulty in a standardized assessment. Equating results are supposed to hold for a specified target population (Kolen & Brennan, 2004; von Davier, Holland, & Thayer, 2004) and to be (relatively) independent of the subpopulations from the target population (see…
Descriptors: Ability Grouping, Difficulty Level, Psychometrics, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Sueiro, Manuel J.; Abad, Francisco J. – Educational and Psychological Measurement, 2011
The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…
Descriptors: Goodness of Fit, Item Response Theory, Nonparametric Statistics, Probability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Haberman, Shelby J.; Lee, Yi-Hsuan; Qian, Jiahe – ETS Research Report Series, 2009
Grouped jackknifing may be used to evaluate the stability of equating procedures with respect to sampling error and with respect to changes in anchor selection. Properties of grouped jackknifing are reviewed for simple-random and stratified sampling, and its use is described for comparisons of anchor sets. Application is made to examples of item…
Descriptors: Equated Scores, Accuracy, Sampling, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Zhang, Bo; Stone, Clement A. – Educational and Psychological Measurement, 2008
This research examines the utility of the s-x[superscript 2] statistic proposed by Orlando and Thissen (2000) in evaluating item fit for multidimensional item response models. Monte Carlo simulation was conducted to investigate both the Type I error and statistical power of this fit statistic in analyzing two kinds of multidimensional test…
Descriptors: Monte Carlo Methods, Sampling, Goodness of Fit, Evaluation Methods
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Yu, Lei; Moses, Tim; Puhan, Gautam; Dorans, Neil – ETS Research Report Series, 2008
All differential item functioning (DIF) methods require at least a moderate sample size for effective DIF detection. Samples that are less than 200 pose a challenge for DIF analysis. Smoothing can improve upon the estimation of the population distribution by preserving major features of an observed frequency distribution while eliminating the…
Descriptors: Test Bias, Item Response Theory, Sample Size, Evaluation Criteria
Peer reviewed Peer reviewed
Lawrence, Ida M.; Dorans, Neil J. – Applied Measurement in Education, 1990
The sample invariant properties of five anchor test equating methods are addressed. Equating results across two sampling conditions--representative sampling and new-form matched sampling--are compared for Tucker and Levine equally reliable linear equating, item response theory true-score equating, and two equipercentile methods. (SLD)
Descriptors: Equated Scores, Item Response Theory, Sampling, Statistical Analysis
Peer reviewed Peer reviewed
Kolen, Michael J. – Applied Measurement in Education, 1990
Articles on equating test forms in this issue are reviewed and discussed. The results of these papers collectively indicate that matching on the anchor test does not result in more accurate equating. Implications for research are discussed. (SLD)
Descriptors: Equated Scores, Item Response Theory, Research Design, Sampling
Peer reviewed Peer reviewed
Skaggs, Gary – Applied Measurement in Education, 1990
The articles in this issue that address the effect of matching samples on ability are reviewed. In spite of these examinations of equating methods and sampling plans, it is still hard to determine a definitive answer to the question of to match or not to match. Implications are discussed. (SLD)
Descriptors: Equated Scores, Item Response Theory, Research Methodology, Sampling
Previous Page | Next Page ยป
Pages: 1  |  2