NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A. – Educational and Psychological Measurement, 2017
Molenaar extended Mokken's original probabilistic-nonparametric scaling models for use with polytomous data. These polytomous extensions of Mokken's original scaling procedure have facilitated the use of Mokken scale analysis as an approach to exploring fundamental measurement properties across a variety of domains in which polytomous ratings are…
Descriptors: Nonparametric Statistics, Scaling, Models, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Zeng, Ji; Yin, Ping; Shedden, Kerby A. – Educational and Psychological Measurement, 2015
This article provides a brief overview and comparison of three matching approaches in forming comparable groups for a study comparing test administration modes (i.e., computer-based tests [CBT] and paper-and-pencil tests [PPT]): (a) a propensity score matching approach proposed in this article, (b) the propensity score matching approach used by…
Descriptors: Comparative Analysis, Computer Assisted Testing, Probability, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Luo, Yong; Jiao, Hong – Educational and Psychological Measurement, 2018
Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the…
Descriptors: Bayesian Statistics, Item Response Theory, Probability, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Jamil, Tahira; Marsman, Maarten; Ly, Alexander; Morey, Richard D.; Wagenmakers, Eric-Jan – Educational and Psychological Measurement, 2017
In 1881, Donald MacAlister posed a problem in the "Educational Times" that remains relevant today. The problem centers on the statistical evidence for the effectiveness of a treatment based on a comparison between two proportions. A brief historical sketch is followed by a discussion of two default Bayesian solutions, one based on a…
Descriptors: Bayesian Statistics, Evidence, Comparative Analysis, Problem Solving
Peer reviewed Peer reviewed
Direct linkDirect link
Long, Michael A.; Berry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 2009
An exact permutation test is provided for the tetrachoric correlation coefficient. Comparisons with the conventional test employing Student's t distribution demonstrate the necessity of using the permutation approach for small sample sizes and/or disproportionate marginal frequency totals. (Contains 4 tables.)
Descriptors: Statistical Analysis, Correlation, Sample Size, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Paek, Insu; Wilson, Mark – Educational and Psychological Measurement, 2011
This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…
Descriptors: Test Bias, Test Length, Statistical Inference, Geometric Concepts
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wen-Chung; Huang, Sheng-Yun – Educational and Psychological Measurement, 2011
The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…
Descriptors: Computer Assisted Testing, Classification, Item Analysis, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wen-Chung; Liu, Chih-Yu – Educational and Psychological Measurement, 2007
In this study, the authors develop a generalized multilevel facets model, which is not only a multilevel and two-parameter generalization of the facets model, but also a multilevel and facet generalization of the generalized partial credit model. Because the new model is formulated within a framework of nonlinear mixed models, no efforts are…
Descriptors: Generalization, Item Response Theory, Models, Equipment
Peer reviewed Peer reviewed
Keselman, H. J.; And Others – Educational and Psychological Measurement, 1976
Compares the harmonic mean and Kramer unequal group forms of the Tukey test for various: (a) degrees of disparate group sizes, (b) numbers of groups, and (c) nominal significant levels. (RC)
Descriptors: Comparative Analysis, Probability, Sampling, Statistical Significance
Peer reviewed Peer reviewed
Braver, Sanford L. – Educational and Psychological Measurement, 1975
The controversy regarding the admissibility of one-tailed tests of hypotheses was examined. Rather than taking a stand with regard to whether the one-or the two-tailed test is the most seriously flawed, a procedure is developed which can capitalize on the advantages of each. (RC)
Descriptors: Comparative Analysis, Hypothesis Testing, Prediction, Probability
Peer reviewed Peer reviewed
Hsu, Louis M. – Educational and Psychological Measurement, 1979
Though the Paired-Item-Score (Eakin and Long) (EJ 174 780) method of scoring true-false tests has certain advantages over the traditional scoring methods (percentage right and right minus wrong), these advantages are attained at the cost of a larger risk of misranking the examinees. (Author/BW)
Descriptors: Comparative Analysis, Guessing (Tests), Objective Tests, Probability
Peer reviewed Peer reviewed
Brandenburg, Dale C.; Forsyth, Robert A. – Educational and Psychological Measurement, 1974
Descriptors: Comparative Analysis, Goodness of Fit, Item Sampling, Models
Peer reviewed Peer reviewed
Berry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997
Describes a FORTRAN software program that calculates the probability of an observed difference between agreement measures obtained from two independent sets of raters. An example illustrates the use of the DIFFER program in evaluating undergraduate essays. (Author/SLD)
Descriptors: Comparative Analysis, Computer Software, Evaluation Methods, Higher Education