Classification models using Spark ML.
Financial institutions face the dual challenge of minimizing the risk of loan defaults while efficiently processing legitimate applications. The goal was to build a predictive model to automate the approval process with high accuracy.
Historical loan application data including applicant demographics, credit history, income levels, loan amount, and repayment status.
Handled missing values, encoded categorical variables (One-Hot Encoding), and scaled numerical features using VectorAssembler.
Trained Logistic Regression, Random Forest, and Gradient Boosted Tree models to compare performance.
Assessed models using AUC-ROC, Accuracy, and Confusion Matrices to minimize false positives and false negatives.