NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED658631
Record Type: Non-Journal
Publication Date: 2024
Pages: 446
Abstractor: As Provided
ISBN: 979-8-3832-0591-4
ISSN: N/A
EISSN: N/A
Available Date: N/A
An Experimental Study of Supervised Machine Learning Techniques for Minor Class Prediction Utilizing Kernel Density Estimation: Factors Impacting Model Performance
Abdullah Mana Alfarwan
ProQuest LLC, Ph.D. Dissertation, Western Michigan University
This dissertation examined classification outcome differences among four popular individual supervised machine learning (ISML) models (logistic regression, decision tree, support vector machine, and multilayer perceptron) when predicting minor class membership within imbalanced datasets. The study context and the theoretical population sampled focus on one aspect of the larger problem of student retention and dropout prediction in higher education (HE): identification. This study differs from current literature by implementing an experimental design approach with simulated student data that closely mirrors HE situational and student data. Specifically, this study tested the predictive ability of the four ISML classification models (CLS) under experimentally manipulated conditions. These included total sample size (TS), minor class proportion (MCP), training-to-testing sample size ratios (TTSS), and the application of bagging techniques during model training (BAG). Using this 4-between, 1-within mixed design, five different outcome measures (precision, recall/sensitivity, specificity, F1-score and AUC) were examined and analyzed individually. For each outcome measure, findings revealed multiple statistically significant interactions among classifier models and design variables. Simple effect analyses of these interactions highlighted how TS, MCP, TTSS, and BAG differentially affect different measures of classification performance such as precision, recall/sensitivity, specificity, F1-score, and AUC. For instance, the presence of interactions involving MCP underscores the importance of informed modeling of class distribution for enhancing overall model predictive capability and performance. Such insights regarding how the experimental variables can critically affect different measures of classification success advances our understanding of how these four ISML models might be optimized for the prediction of student-at-risk status within imbalanced datasets. This dissertation provides a framework for using these or similar ISML models more effectively in HE. It points toward the development of predictive modeling methods that are more useful and perhaps equitable by demonstrating empirically the impact of one of the most challenging aspects of implementing machine learning in HE: maximizing the accurate identification of the minority class. This work contributes to the use of machine learning in HE and will help inform its use in smaller and larger educational research communities by providing strategies for improving the prediction of student dropout. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A