Publication Date
In 2025 | 1 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 10 |
Since 2016 (last 10 years) | 21 |
Since 2006 (last 20 years) | 31 |
Descriptor
Classification | 31 |
Accuracy | 16 |
Item Response Theory | 10 |
Simulation | 10 |
Models | 9 |
Computation | 8 |
Diagnostic Tests | 6 |
Evaluation Methods | 6 |
Psychometrics | 6 |
Reliability | 6 |
Scores | 6 |
More ▼ |
Source
Journal of Educational… | 31 |
Author
Lee, Won-Chan | 4 |
Chen, Ping | 2 |
Cheng, Ying | 2 |
Ding, Shuliang | 2 |
Jonathan Templin | 2 |
Kim, Stella Y. | 2 |
Sinharay, Sandip | 2 |
Song, Lihong | 2 |
Wang, Wenyi | 2 |
A. Corinne Huggins-Manley | 1 |
Alex J. Mechaber | 1 |
More ▼ |
Publication Type
Journal Articles | 31 |
Reports - Research | 25 |
Reports - Evaluative | 4 |
Reports - Descriptive | 2 |
Education Level
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Grade 4 | 1 |
High Schools | 1 |
Higher Education | 1 |
Intermediate Grades | 1 |
Postsecondary Education | 1 |
Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
Progress in International… | 1 |
What Works Clearinghouse Rating
Tae Yeon Kwon; A. Corinne Huggins-Manley; Jonathan Templin; Mingying Zheng – Journal of Educational Measurement, 2024
In classroom assessments, examinees can often answer test items multiple times, resulting in sequential multiple-attempt data. Sequential diagnostic classification models (DCMs) have been developed for such data. As student learning processes may be aligned with a hierarchy of measured traits, this study aimed to develop a sequential hierarchical…
Descriptors: Classification, Accuracy, Student Evaluation, Sequential Approach
Jihong Zhang; Jonathan Templin; Xinya Liang – Journal of Educational Measurement, 2024
Recently, Bayesian diagnostic classification modeling has been becoming popular in health psychology, education, and sociology. Typically information criteria are used for model selection when researchers want to choose the best model among alternative models. In Bayesian estimation, posterior predictive checking is a flexible Bayesian model…
Descriptors: Bayesian Statistics, Cognitive Measurement, Models, Classification
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Sijia Huang; Seungwon Chung; Carl F. Falk – Journal of Educational Measurement, 2024
In this study, we introduced a cross-classified multidimensional nominal response model (CC-MNRM) to account for various response styles (RS) in the presence of cross-classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of…
Descriptors: Response Style (Tests), Classification, Data, Models
Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023
Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…
Descriptors: Testing, Computation, Classification, Accuracy
Binici, Salih; Cuhadar, Ismail – Journal of Educational Measurement, 2022
Validity of performance standards is a key element for the defensibility of standard setting results, and validating performance standards requires collecting multiple pieces of evidence at every step during the standard setting process. This study employs a statistical procedure, latent class analysis, to set performance standards and compares…
Descriptors: Validity, Performance, Standards, Multivariate Analysis
Setzer, J. Carl; Cheng, Ying; Liu, Cheng – Journal of Educational Measurement, 2023
Test scores are often used to make decisions about examinees, such as in licensure and certification testing, as well as in many educational contexts. In some cases, these decisions are based upon compensatory scores, such as those from multiple sections or components of an exam. Classification accuracy and classification consistency are two…
Descriptors: Classification, Accuracy, Psychometrics, Scores
Lim, Hwanggyu; Davey, Tim; Wells, Craig S. – Journal of Educational Measurement, 2021
This study proposed a recursion-based analytical approach to assess measurement precision of ability estimation and classification accuracy in multistage adaptive tests (MSTs). A simulation study was conducted to compare the proposed recursion-based analytical method with an analytical method proposed by Park, Kim, Chung, and Dodd and with the…
Descriptors: Adaptive Testing, Measurement, Accuracy, Classification
Matthew J. Madison; Stefanie A. Wind; Lientje Maas; Kazuhiro Yamaguchi; Sergio Haab – Journal of Educational Measurement, 2024
Diagnostic classification models (DCMs) are psychometric models designed to classify examinees according to their proficiency or nonproficiency of specified latent characteristics. These models are well suited for providing diagnostic and actionable feedback to support intermediate and formative assessment efforts. Several DCMs have been developed…
Descriptors: Diagnostic Tests, Classification, Models, Psychometrics
Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021
Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…
Descriptors: Decision Making, Reliability, Classification, Scores
Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020
The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…
Descriptors: Classification, Accuracy, Scores, Cutting Scores
Johnson, Matthew S.; Sinharay, Sandip – Journal of Educational Measurement, 2018
One of the proposed uses of cognitive diagnostic assessments is to classify the examinees as either masters or nonmasters on each of a number of skills being assessed. As with any test, it is important to report the quality of these binary classifications with measures of their reliability. Cui et al. and Wang et al. have suggested reliability…
Descriptors: Classification, Accuracy, Test Reliability, Diagnostic Tests
Chen, Yi-Hsin; Senk, Sharon L.; Thompson, Denisse R.; Voogt, Kevin – Journal of Educational Measurement, 2019
The van Hiele theory and van Hiele Geometry Test have been extensively used in mathematics assessments across countries. The purpose of this study is to use classical test theory (CTT) and cognitive diagnostic modeling (CDM) frameworks to examine psychometric properties of the van Hiele Geometry Test and to compare how various classification…
Descriptors: Geometry, Mathematics Tests, Test Theory, Psychometrics
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Wang, Wenyi; Song, Lihong; Chen, Ping; Ding, Shuliang – Journal of Educational Measurement, 2019
Most of the existing classification accuracy indices of attribute patterns lose effectiveness when the response data is absent in diagnostic testing. To handle this issue, this article proposes new indices to predict the correct classification rate of a diagnostic test before administering the test under the deterministic noise input…
Descriptors: Cognitive Tests, Classification, Accuracy, Diagnostic Tests