NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 12 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Applied Measurement in Education, 2017
Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…
Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Cui, Ying; Gierl, Mark; Guo, Qi – Educational Psychology, 2016
The purpose of the current investigation was to describe how the artificial neural networks (ANNs) can be used to interpret student performance on cognitive diagnostic assessments (CDAs) and evaluate the performances of ANNs using simulation results. CDAs are designed to measure student performance on problem-solving tasks and provide useful…
Descriptors: Cognitive Tests, Diagnostic Tests, Classification, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014
In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…
Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Hooper, Jay; Cowell, Ryan – Educational Assessment, 2014
There has been much research and discussion on the principles of standards-based grading, and there is a growing consensus of best practice. Even so, the actual process of implementing standards-based grading at a school or district level can be a significant challenge. There are very practical questions that remain unclear, such as how the grades…
Descriptors: True Scores, Grading, Academic Standards, Computation
Hill, Jennifer Lynn; Su, Yu-Sung – Grantee Submission, 2013
Causal inference in observational studies typically requires making comparisons between groups that are dissimilar. For instance, researchers investigating the role of a prolonged duration of breastfeeding on child outcomes may be forced to make comparisons between women with substantially different characteristics on average. In the extreme there…
Descriptors: Nutrition, Comparative Analysis, Child Development, Cognitive Ability
Peer reviewed Peer reviewed
Direct linkDirect link
Penfield, Randall D. – Applied Psychological Measurement, 2008
The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…
Descriptors: Simulation, Nonparametric Statistics, Item Response Theory, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008
This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
Descriptors: Test Length, Test Content, Simulation, Computation
Beasley, T. Mark – 1996
Robustness and power of parametric, semi-parametric, and nonparametric tests of between-group discordance were compared in this simulation study. The empirical Type I error rates and power of nine tests were compared. When data were sampled from the any differences especially favor young women in single-sex Catholic secondary schools, and whether…
Descriptors: Comparative Analysis, Group Membership, Nonparametric Statistics, Robustness (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Mroch, Andrew A.; Bolt, Daniel M. – Applied Measurement in Education, 2006
Recently, nonparametric methods have been proposed that provide a dimensionally based description of test structure for tests with dichotomous items. Because such methods are based on different notions of dimensionality than are assumed when using a psychometric model, it remains unclear whether these procedures might lead to a different…
Descriptors: Simulation, Comparative Analysis, Psychometrics, Methods Research
Peer reviewed Peer reviewed
Direct linkDirect link
van Abswoude, Alexandra A. H.; van der Ark, L. Andries; Sijtsma, Klaas – Applied Psychological Measurement, 2004
In this article, an overview of nonparametric item response theory methods for determining the dimensionality of item response data is provided. Four methods were considered: MSP, DETECT, HCA/CCPROX, and DIMTEST. First, the methods were compared theoretically. Second, a simulation study was done to compare the effectiveness of MSP, DETECT, and…
Descriptors: Comparative Analysis, Computer Software, Simulation, Nonparametric Statistics
Mandeville, Garrett K.; And Others – 1975
A strategy for comparing two sets of results (one based upon early childhood recollections (ECR) and another upon video taped (VT) group behavior) from the Perceptual Characteristics Rating Scale was developed. The null distribution of the mean deviation was estimated by randomly matching an ECR response vector with a VT response vector. To…
Descriptors: Comparative Analysis, Correlation, Data Analysis, Goodness of Fit