Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 5 |
Since 2006 (last 20 years) | 6 |
Descriptor
Classification | 8 |
Evaluation Methods | 8 |
Accuracy | 3 |
Computation | 3 |
Cutting Scores | 3 |
Decision Making | 2 |
Item Response Theory | 2 |
Models | 2 |
Scores | 2 |
Simulation | 2 |
Student Evaluation | 2 |
More ▼ |
Source
Journal of Educational… | 8 |
Author
Lee, Won-Chan | 2 |
Berk, Ronald A. | 1 |
Binici, Salih | 1 |
Cheng, Ying | 1 |
Cuhadar, Ismail | 1 |
Jihong Zhang | 1 |
Jonathan Templin | 1 |
Kim, Kyung Yong | 1 |
Kim, Stella Y. | 1 |
Lathrop, Quinn N. | 1 |
Marco, Gary L. | 1 |
More ▼ |
Publication Type
Journal Articles | 7 |
Reports - Research | 4 |
Guides - Non-Classroom | 1 |
Information Analyses | 1 |
Reports - Descriptive | 1 |
Reports - Evaluative | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Jihong Zhang; Jonathan Templin; Xinya Liang – Journal of Educational Measurement, 2024
Recently, Bayesian diagnostic classification modeling has been becoming popular in health psychology, education, and sociology. Typically information criteria are used for model selection when researchers want to choose the best model among alternative models. In Bayesian estimation, posterior predictive checking is a flexible Bayesian model…
Descriptors: Bayesian Statistics, Cognitive Measurement, Models, Classification
Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023
Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…
Descriptors: Testing, Computation, Classification, Accuracy
Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022
Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…
Descriptors: Validity, Performance, Standards, Multivariate Analysis
Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021
Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…
Descriptors: Decision Making, Reliability, Classification, Scores
Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020
The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…
Descriptors: Classification, Accuracy, Scores, Cutting Scores
Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics

Marco, Gary L.; And Others – Journal of Educational Measurement, 1976
Special emphasis is given to the kinds of control that can be exercised over initial status, including the use of proxy input data. A rationale for the classification scheme is developed, based on (1) three one-shot, one cross-sectional, and two longitudinal data types and (2) two types of referencing: criterion referencing and norm referencing.…
Descriptors: Classification, Data Collection, Evaluation Methods, Methods

Berk, Ronald A. – Journal of Educational Measurement, 1980
A dozen different approaches that yield 13 reliability indices for criterion-referenced tests were identified and grouped into three categories: threshold loss function, squared-error loss function, and domain score estimation. Indices were evaluated within each category. (Author/RL)
Descriptors: Classification, Criterion Referenced Tests, Cutting Scores, Evaluation Methods