Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 32 |
Descriptor
Classification | 33 |
Item Response Theory | 33 |
Statistical Analysis | 33 |
Models | 15 |
Simulation | 10 |
Test Items | 10 |
Computation | 9 |
Sample Size | 8 |
Accuracy | 7 |
Comparative Analysis | 6 |
Evaluation Methods | 6 |
More ▼ |
Source
Author
Rupp, Andre A. | 3 |
von Davier, Matthias | 2 |
Alahmadi, Sarah | 1 |
Albano, Anthony D. | 1 |
Ames, Allison | 1 |
Andrich, David | 1 |
Babcock, Ben | 1 |
Barnes, Tiffany, Ed. | 1 |
Barry, Carol L. | 1 |
Bashkov, Bozhidar M. | 1 |
Cai, Yuyang | 1 |
More ▼ |
Publication Type
Journal Articles | 27 |
Reports - Research | 22 |
Reports - Evaluative | 6 |
Collected Works - Proceedings | 2 |
Dissertations/Theses -… | 2 |
Non-Print Media | 1 |
Reference Materials - General | 1 |
Speeches/Meeting Papers | 1 |
Education Level
Higher Education | 8 |
Postsecondary Education | 7 |
Secondary Education | 3 |
Junior High Schools | 2 |
Middle Schools | 2 |
Elementary Education | 1 |
Grade 6 | 1 |
High Schools | 1 |
Intermediate Grades | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 2 |
Program for International… | 1 |
SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Weese, James D.; Turner, Ronna C.; Liang, Xinya; Ames, Allison; Crawford, Brandon – Educational and Psychological Measurement, 2023
A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and…
Descriptors: Effect Size, Classification, Guidelines, Statistical Analysis
Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023
Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…
Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items
Bashkov, Bozhidar M.; Clauser, Jerome C. – Practical Assessment, Research & Evaluation, 2019
Successful testing programs rely on high-quality test items to produce reliable scores and defensible exams. However, determining what statistical screening criteria are most appropriate to support these goals can be daunting. This study describes and demonstrates cost-benefit analysis as an empirical approach to determining appropriate screening…
Descriptors: Test Items, Test Reliability, Evaluation Criteria, Accuracy
von Davier, Matthias – ETS Research Report Series, 2016
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
Descriptors: Psychometrics, Mathematics, Models, Statistical Analysis
Sari, Halil Ibrahim; Huggins, Anne Corinne – Educational and Psychological Measurement, 2015
This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF…
Descriptors: Test Bias, Comparative Analysis, Statistical Analysis, College Entrance Examinations
Suh, Youngsuk – Journal of Educational Measurement, 2016
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Descriptors: Effect Size, Goodness of Fit, Statistical Analysis, Statistical Significance
Eckes, Thomas – Language Testing, 2017
This paper presents an approach to standard setting that combines the prototype group method (PGM; Eckes, 2012) with a receiver operating characteristic (ROC) analysis. The combined PGM-ROC approach is applied to setting cut scores on a placement test of English as a foreign language (EFL). To implement the PGM, experts first named learners whom…
Descriptors: English (Second Language), Language Tests, Cutting Scores, Standard Setting (Scoring)
Jones, W. Paul – Educational and Psychological Measurement, 2014
A study in a university clinic/laboratory investigated adaptive Bayesian scaling as a supplement to interpretation of scores on the Mini-IPIP. A "probability of belonging" in categories of low, medium, or high on each of the Big Five traits was calculated after each item response and continued until all items had been used or until a…
Descriptors: Personality Traits, Personality Measures, Bayesian Statistics, Clinics
Andrich, David – Educational and Psychological Measurement, 2013
Assessments in response formats with ordered categories are ubiquitous in the social and health sciences. Although the assumption that the ordering of the categories is working as intended is central to any interpretation that arises from such assessments, testing that this assumption is valid is not standard in psychometrics. This is surprising…
Descriptors: Item Response Theory, Classification, Statistical Analysis, Models
Chiu, Chia-Yi – Applied Psychological Measurement, 2013
Most methods for fitting cognitive diagnosis models to educational test data and assigning examinees to proficiency classes require the Q-matrix that associates each item in a test with the cognitive skills (attributes) needed to answer it correctly. In most cases, the Q-matrix is not known but is constructed from the (fallible) judgments of…
Descriptors: Cognitive Tests, Diagnostic Tests, Models, Statistical Analysis
Svetina, Dubravka – Educational and Psychological Measurement, 2013
The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…
Descriptors: Item Response Theory, Statistical Analysis, Computation, Test Length
Wang, Zijian Gerald – ProQuest LLC, 2012
A latent class signal detection (SDT) model was recently introduced as an alternative to traditional item response theory (IRT) methods in the analysis of constructed response data. This class of models can be represented as restricted latent class models and differ from the IRT approach in the way the latent construct is conceptualized. One…
Descriptors: Item Response Theory, Statistical Analysis, Models, Test Items
Babcock, Ben; Albano, Anthony D. – Applied Psychological Measurement, 2012
Testing programs often rely on common-item equating to maintain a single measurement scale across multiple test administrations and multiple years. Changes over time, in the item parameters and the latent trait underlying the scale, can lead to inaccurate score comparisons and misclassifications of examinees. This study examined how instability in…
Descriptors: Test Items, Measurement, Item Response Theory, Predictor Variables
Kaplan, David; Depaoli, Sarah – Structural Equation Modeling: A Multidisciplinary Journal, 2011
This article examines the problem of specification error in 2 models for categorical latent variables; the latent class model and the latent Markov model. Specification error in the latent class model focuses on the impact of incorrectly specifying the number of latent classes of the categorical latent variable on measures of model adequacy as…
Descriptors: Markov Processes, Longitudinal Studies, Probability, Item Response Theory
Nix, John-Michael L.; Tseng, Wen-Ta – International Journal of Listening, 2014
The present research aims to identify the underlying English listening belief structure of English-as-a-foreign-language (EFL) learners, thereby informing methodologies for subsequent analysis of beliefs with respect to listening achievement. Development of a measurement model of English listening learning beliefs entailed the creation of an…
Descriptors: Item Response Theory, English (Second Language), Second Language Learning, Listening Skills