ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	8

Descriptor

Comparative Analysis	13
Probability	13
Classification	4
Item Response Theory	4
Models	4
Equations (Mathematics)	3
Statistical Analysis	3
Bayesian Statistics	2
Computer Assisted Testing	2
Computer Software	2
Guessing (Tests)	2
Hypothesis Testing	2
Sample Size	2
Scores	2
Simulation	2
Statistical Significance	2
Test Items	2
Ability	1
Adults	1
Computation	1
Correlation	1
Criteria	1
Cutting Scores	1
Educational Assessment	1
Educational History	1
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	10
Reports - Research	7
Book/Product Reviews	1
Reports - Descriptive	1
Reports - Evaluative	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 13 results Save | Export

Adjacent-Categories Mokken Models for Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Educational and Psychological Measurement, 2017

Molenaar extended Mokken's original probabilistic-nonparametric scaling models for use with polytomous data. These polytomous extensions of Mokken's original scaling procedure have facilitated the use of Mokken scale analysis as an approach to exploring fundamental measurement properties across a variety of domains in which polytomous ratings are…

Descriptors: Nonparametric Statistics, Scaling, Models, Item Response Theory

Does Matching Quality Matter in Mode Comparison Studies?

Peer reviewed

Direct link

Zeng, Ji; Yin, Ping; Shedden, Kerby A. – Educational and Psychological Measurement, 2015

This article provides a brief overview and comparison of three matching approaches in forming comparable groups for a study comparing test administration modes (i.e., computer-based tests [CBT] and paper-and-pencil tests [PPT]): (a) a propensity score matching approach proposed in this article, (b) the propensity score matching approach used by…

Descriptors: Comparative Analysis, Computer Assisted Testing, Probability, Classification

Using the Stan Program for Bayesian Item Response Theory

Peer reviewed

Direct link

Luo, Yong; Jiao, Hong – Educational and Psychological Measurement, 2018

Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the…

Descriptors: Bayesian Statistics, Item Response Theory, Probability, Computer Software

What Are the Odds? Modern Relevance and Bayes Factor Solutions for MacAlister's Problem from the 1881 "Educational Times"

Peer reviewed

Direct link

Jamil, Tahira; Marsman, Maarten; Ly, Alexander; Morey, Richard D.; Wagenmakers, Eric-Jan – Educational and Psychological Measurement, 2017

In 1881, Donald MacAlister posed a problem in the "Educational Times" that remains relevant today. The problem centers on the statistical evidence for the effectiveness of a treatment based on a comparison between two proportions. A brief historical sketch is followed by a discussion of two default Bayesian solutions, one based on a…

Descriptors: Bayesian Statistics, Evidence, Comparative Analysis, Problem Solving

Tetrachoric Correlation: A Permutation Alternative

Peer reviewed

Direct link

Long, Michael A.; Berry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 2009

An exact permutation test is provided for the tetrachoric correlation coefficient. Comparisons with the conventional test employing Student's t distribution demonstrate the necessity of using the permutation approach for small sample sizes and/or disproportionate marginal frequency totals. (Contains 4 tables.)

Descriptors: Statistical Analysis, Correlation, Sample Size, Comparative Analysis

Formulating the Rasch Differential Item Functioning Model under the Marginal Maximum Likelihood Estimation Context and Its Comparison with Mantel-Haenszel Procedure in Short Test and Small Sample Conditions

Peer reviewed

Direct link

Paek, Insu; Wilson, Mark – Educational and Psychological Measurement, 2011

This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel-Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known…

Descriptors: Test Bias, Test Length, Statistical Inference, Geometric Concepts

Computerized Classification Testing under the One-Parameter Logistic Response Model with Ability-Based Guessing

Peer reviewed

Direct link

Wang, Wen-Chung; Huang, Sheng-Yun – Educational and Psychological Measurement, 2011

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their…

Descriptors: Computer Assisted Testing, Classification, Item Analysis, Probability

Formulation and Application of the Generalized Multilevel Facets Model

Peer reviewed

Direct link

Wang, Wen-Chung; Liu, Chih-Yu – Educational and Psychological Measurement, 2007

In this study, the authors develop a generalized multilevel facets model, which is not only a multilevel and two-parameter generalization of the facets model, but also a multilevel and facet generalization of the generalized partial credit model. Because the new model is formulated within a framework of nonlinear mixed models, no efforts are…

Descriptors: Generalization, Item Response Theory, Models, Equipment

Effect of Very Unequal Group Sizes on Tukey's Multiple Comparison Test

Peer reviewed

Keselman, H. J.; And Others – Educational and Psychological Measurement, 1976

Compares the harmonic mean and Kramer unequal group forms of the Tukey test for various: (a) degrees of disparate group sizes, (b) numbers of groups, and (c) nominal significant levels. (RC)

Descriptors: Comparative Analysis, Probability, Sampling, Statistical Significance

On Splitting the Tails Unequally: A New Perspective on One-versus Two-Tailed Tests

Peer reviewed

Braver, Sanford L. – Educational and Psychological Measurement, 1975

The controversy regarding the admissibility of one-tailed tests of hypotheses was examined. Rather than taking a stand with regard to whether the one-or the two-tailed test is the most seriously flawed, a procedure is developed which can capitalize on the advantages of each. (RC)

Descriptors: Comparative Analysis, Hypothesis Testing, Prediction, Probability

A Comparison of Three Methods of Scoring True-False Tests.

Peer reviewed

Hsu, Louis M. – Educational and Psychological Measurement, 1979

Though the Paired-Item-Score (Eakin and Long) (EJ 174 780) method of scoring true-false tests has certain advantages over the traditional scoring methods (percentage right and right minus wrong), these advantages are attained at the cost of a larger risk of misranking the examinees. (Author/BW)

Descriptors: Comparative Analysis, Guessing (Tests), Objective Tests, Probability

The Use of Multiple Matrix Sampling to Approximate Norms Distributions: An Empirical Comparison of Two Models

Peer reviewed

Brandenburg, Dale C.; Forsyth, Robert A. – Educational and Psychological Measurement, 1974

Descriptors: Comparative Analysis, Goodness of Fit, Item Sampling, Models

Agreement Measure Comparisons between Two Independent Sets of Raters.

Peer reviewed

Berry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997

Describes a FORTRAN software program that calculates the probability of an observed difference between agreement measures obtained from two independent sets of raters. An example illustrates the use of the DIFFER program in evaluating undergraduate essays. (Author/SLD)

Descriptors: Comparative Analysis, Computer Software, Evaluation Methods, Higher Education

Berry, Kenneth J.	2
Mielke, Paul W., Jr.	2
Wang, Wen-Chung	2
Brandenburg, Dale C.	1
Braver, Sanford L.	1
Forsyth, Robert A.	1
Hsu, Louis M.	1
Huang, Sheng-Yun	1
Jamil, Tahira	1
Jiao, Hong	1
Keselman, H. J.	1
Liu, Chih-Yu	1
Long, Michael A.	1
Luo, Yong	1
Ly, Alexander	1
Marsman, Maarten	1
Morey, Richard D.	1
Paek, Insu	1
Shedden, Kerby A.	1
Wagenmakers, Eric-Jan	1
Wilson, Mark	1
Wind, Stefanie A.	1
Yin, Ping	1
Zeng, Ji	1
More ▼