Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 25 |
Descriptor
Classification | 31 |
Comparative Analysis | 31 |
Statistical Analysis | 11 |
Test Items | 11 |
Item Response Theory | 9 |
Accuracy | 8 |
Models | 8 |
Simulation | 7 |
Scoring | 5 |
Bayesian Statistics | 4 |
Computer Assisted Testing | 4 |
More ▼ |
Source
Educational and Psychological… | 31 |
Author
Chung, Hyewon | 2 |
Dodd, Barbara G. | 2 |
Finch, W. Holmes | 2 |
Hong, Sehee | 2 |
Kim, Jiseon | 2 |
Park, Ryoungsun | 2 |
Wilson, Mark | 2 |
Agresti, Alan | 1 |
Allan S. Cohen | 1 |
Baldwin, Peter | 1 |
Bradshaw, Laine | 1 |
More ▼ |
Publication Type
Journal Articles | 29 |
Reports - Research | 25 |
Reports - Evaluative | 3 |
Reports - Descriptive | 1 |
Education Level
Higher Education | 2 |
Postsecondary Education | 2 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Audience
Location
Switzerland (Geneva) | 1 |
Laws, Policies, & Programs
Assessments and Surveys
ACT Assessment | 1 |
Trends in International… | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
Jang, Yoona; Hong, Sehee – Educational and Psychological Measurement, 2023
The purpose of this study was to evaluate the degree of classification quality in the basic latent class model when covariates are either included or are not included in the model. To accomplish this task, Monte Carlo simulations were conducted in which the results of models with and without a covariate were compared. Based on these simulations,…
Descriptors: Classification, Models, Prediction, Sample Size
Sedat Sen; Allan S. Cohen – Educational and Psychological Measurement, 2024
A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's…
Descriptors: Goodness of Fit, Item Response Theory, Sample Size, Classification
Huang, Hung-Yu – Educational and Psychological Measurement, 2023
The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs)…
Descriptors: Test Items, Classification, Bayesian Statistics, Decision Making
No, Unkyung; Hong, Sehee – Educational and Psychological Measurement, 2018
The purpose of the present study is to compare performances of mixture modeling approaches (i.e., one-step approach, three-step maximum-likelihood approach, three-step BCH approach, and LTB approach) based on diverse sample size conditions. To carry out this research, two simulation studies were conducted with two different models, a latent class…
Descriptors: Sample Size, Classification, Comparative Analysis, Statistical Analysis
von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023
Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…
Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education
Wind, Stefanie A. – Educational and Psychological Measurement, 2017
Molenaar extended Mokken's original probabilistic-nonparametric scaling models for use with polytomous data. These polytomous extensions of Mokken's original scaling procedure have facilitated the use of Mokken scale analysis as an approach to exploring fundamental measurement properties across a variety of domains in which polytomous ratings are…
Descriptors: Nonparametric Statistics, Scaling, Models, Item Response Theory
Liu, Ren; Huggins-Manley, Anne Corinne; Bradshaw, Laine – Educational and Psychological Measurement, 2017
There is an increasing demand for assessments that can provide more fine-grained information about examinees. In response to the demand, diagnostic measurement provides students with feedback on their strengths and weaknesses on specific skills by classifying them into mastery or nonmastery attribute categories. These attributes often form a…
Descriptors: Matrices, Classification, Accuracy, Diagnostic Tests
Park, Ryoungsun; Kim, Jiseon; Chung, Hyewon; Dodd, Barbara G. – Educational and Psychological Measurement, 2017
The current study proposes novel methods to predict multistage testing (MST) performance without conducting simulations. This method, called MST test information, is based on analytic derivation of standard errors of ability estimates across theta levels. We compared standard errors derived analytically to the simulation results to demonstrate the…
Descriptors: Testing, Performance, Prediction, Error of Measurement
Zeng, Ji; Yin, Ping; Shedden, Kerby A. – Educational and Psychological Measurement, 2015
This article provides a brief overview and comparison of three matching approaches in forming comparable groups for a study comparing test administration modes (i.e., computer-based tests [CBT] and paper-and-pencil tests [PPT]): (a) a propensity score matching approach proposed in this article, (b) the propensity score matching approach used by…
Descriptors: Comparative Analysis, Computer Assisted Testing, Probability, Classification
Lamprianou, Iasonas – Educational and Psychological Measurement, 2018
It is common practice for assessment programs to organize qualifying sessions during which the raters (often known as "markers" or "judges") demonstrate their consistency before operational rating commences. Because of the high-stakes nature of many rating activities, the research community tends to continuously explore new…
Descriptors: Social Networks, Network Analysis, Comparative Analysis, Innovation
Clauser, Jerome C.; Hambleton, Ronald K.; Baldwin, Peter – Educational and Psychological Measurement, 2017
The Angoff standard setting method relies on content experts to review exam items and make judgments about the performance of the minimally proficient examinee. Unfortunately, at times content experts may have gaps in their understanding of specific exam content. These gaps are particularly likely to occur when the content domain is broad and/or…
Descriptors: Scores, Item Analysis, Classification, Decision Making
Sari, Halil Ibrahim; Huggins, Anne Corinne – Educational and Psychological Measurement, 2015
This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF…
Descriptors: Test Bias, Comparative Analysis, Statistical Analysis, College Entrance Examinations
Choi, In-Hee; Wilson, Mark – Educational and Psychological Measurement, 2015
An essential feature of the linear logistic test model (LLTM) is that item difficulties are explained using item design properties. By taking advantage of this explanatory aspect of the LLTM, in a mixture extension of the LLTM, the meaning of latent classes is specified by how item properties affect item difficulties within each class. To improve…
Descriptors: Classification, Test Items, Difficulty Level, Statistical Analysis
Liu, Min; Hancock, Gregory R. – Educational and Psychological Measurement, 2014
Growth mixture modeling has gained much attention in applied and methodological social science research recently, but the selection of the number of latent classes for such models remains a challenging issue, especially when the assumption of proper model specification is violated. The current simulation study compared the performance of a linear…
Descriptors: Models, Classification, Simulation, Comparative Analysis
Nicole B. Kersting; Bruce L. Sherin; James W. Stigler – Educational and Psychological Measurement, 2014
In this study, we explored the potential for machine scoring of short written responses to the Classroom-Video-Analysis (CVA) assessment, which is designed to measure teachers' usable mathematics teaching knowledge. We created naïve Bayes classifiers for CVA scales assessing three different topic areas and compared computer-generated scores to…
Descriptors: Scoring, Automation, Video Technology, Teacher Evaluation