NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED677753
Record Type: Non-Journal
Publication Date: 2025-Oct-9
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Navigating Complexity and Interpretability: Relaxed Lasso Regression for Predicting Transfer Student Success with High-Dimensional Data
David Wutchiett; Alexandra W. Logue; Martin Kurzweil; Colin Chellman
Society for Research on Educational Effectiveness
Background/Context: The application of statistical modeling techniques to high-dimensional educational data presents methodological challenges, particularly the study of student transfer between higher education institutions. Over 13% of new college students each year are transfer students (National Student Clearinghouse Research Center, 2025) and the success rates of transfer students are low, impeding higher education equity (Gentsch et al., 2024). Particular majors and institutions, including in their combination, are important in assessing these success rates because institutions vary enormously in their graduation rates, and rules concerning the transfer of credits are generally specific to particular combinations of institutions and majors (Elliott & Lakin, 2020; Schudde & Jabbar, 2024). Yet, when particular majors and institutions in combination are taken into consideration within statistical models, hundreds if not thousands of additional predictor variables can become included. Traditional regression approaches often struggle with multicollinearity and model overspecification when evaluating complex institutional and major combinations in students' varied and frequently multi-stage transfer pathways (Hawkins, 2004; Kunina-Habenicht et al., 2012). Regularized regression techniques, such as lasso and relaxed lasso regression, provide a means to improve predictive accuracy, important to model generalizability, and model parsimony through variable selection (Hastie et al., 2009; 2017; 2020). Objective: This study investigates the application of lasso and relaxed lasso regression within a high-dimensional higher education transfer environment. The study assesses these methodologies' effectiveness in optimizing prediction, while maintaining model interpretability and capacity to test for statistically significant relationships between predictors and outcomes through similar or equivalent generalized linear models. The primary objective is to evaluate the methodological advantages of these approaches in constructing parsimonious models while preserving predictive performance and generalizability. By comparing lasso to ridge regression, the study explores the extent to which variable selection techniques enhance model efficiency and reduce overfitting in education research. Similarity between regularized and unregularized relaxed fit models are further compared to assess whether predictively optimal parsimonious versions of lasso regression models approximate unregularized models capable of assessing statistical relationships between predictors and outcomes. Setting: This study utilizes administraOve data from The City University of New York (CUNY), a large public university system of 20 undergraduate colleges with approximately 20,000 new, diverse, transfer students per year. Participants: The dataset consists of 28,199 students who transferred from one of CUNY's seven community colleges to one of its 12 bachelor's-degree-granting institutions between 2012 and 2016, with predictor variables including demographic information, academic performance metrics, institutional affiliations, and financial aid status. Research Design: This study employs a quantitative research design focused on predictive modeling. Lasso and ridge regression, as well as their relaxed variant methodologies, are used to assess regularization's impact on predicOon accuracy and the methods' effectiveness in selecting the most relevant predictors for student success while mitigating issues of multicollinearity and model complexity. Lasso regression is compared with ridge regression in terms of model fit, variable selection, and predictive accuracy. Relaxed lasso and ridge regression are incorporated to evaluate unregularized versions of parsimonious regularized models to assess the methods' ability to refine variable selection and improve model parsimony and prediction, in comparison to fully regularized, nonrelaxed approaches. Ten-fold cross- validation is used to assess out-of-sample prediction performance and to validate models. Data Collection and Analysis: Data were extracted from CUNY institutional databases containing student demographic and academic characteristics spanning initial enrollment through graduation. Two dichotomous outcomes were examined: graduation with a bachelor's degree within six years of beginning an associate's program, and posttransfer GPA improvement. Key predictor variables included pretransfer GPA, cumulative credits earned, institutional affiliations, and major categories. Combinations of majors and institutions before and after transfer, producing over six hundred predictor variables, were included in models applying regularization and variable selection. Lasso and ridge regression models were trained and tested using cross-validation, with relaxed lasso and ridge regression applied to further identify more parsimonious and predictive models. Binomial deviance was used as the primary metric for assessing out-of-sample prediction accuracy. Findings/Results: Cross-validation results confirmed that regularization techniques enhance generalizability in high-dimensional education research contexts. Lasso regression consistently outperformed ridge regression in minimizing binomial deviance, demonstrating that, within the higher education transfer context, variable selecOon through the lasso method improved prediction. Additional key findings include: (1) Relaxed lasso produced more parsimonious models than standard lasso while maintaining predictive performance; (2) Parsimonious predictive models selected by relaxed lasso methods were found to equate to or more closely approximate unregularized modeling approaches, enabling assessment of the statistical significance of relationships between predictors and outcomes; and (3) Lasso-based variable selection improved model interpretability by eliminating extraneous predictors without sacrificing accuracy. Table 1 presents results in detail comparing models identified to minimize out-of-sample binomial deviance and parsimonious near optimal models selected by relaxed lasso and ridge methods for each outcome variable. Tuning parameter values, model fit, and retained predictor counts are described. Conclusions: The study highlights the methodological advantages of lasso and relaxed lasso regression in high-dimensional educational research such as that concerning higher education student transfer. By demonstrating the effectiveness of these approaches in predictive modeling, the findings support their application in studies where model generalizability, parsimony, and interpretability are crucial. This research contributes to the growing body of literature on regularized regression techniques and underscores their value, and particularly the relaxed lasso's value, in analyzing and predicting outcomes within highly complex education environments, including those involving combinations of numerous institutions and majors. Such results can be used to further higher education equity by helping faculty and administrators identify transfer paths that are more or less successful and do or do not need modification, and by helping students choose more successful transfer paths. Future research should further explore hybrid modeling approaches that improve prediction accuracy through variable selection, while also comparing results to models fit with traditional methods. This would help further identify methodologies that ensure selected models not only optimize prediction accuracy but also facilitate and allow for clear interpretations of the relationships between key higher education predictors and student outcomes.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Identifiers - Location: New York (New York)
Grant or Contract Numbers: N/A
Author Affiliations: N/A