All Projects → jgcorliss → lending-club

jgcorliss / lending-club

Licence: other
Applying machine learning to predict loan charge-offs on LendingClub.com

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to lending-club

Hummingbird
Hummingbird compiles trained ML models into tensor computation for faster inference.
Stars: ✭ 2,704 (+6833.33%)
Mutual labels:  scikit-learn
Kagglestruggle
Kaggle Struggle
Stars: ✭ 228 (+484.62%)
Mutual labels:  scikit-learn
Porndetector
Porn images detector with python, tensorflow, scikit-learn and opencv.
Stars: ✭ 248 (+535.9%)
Mutual labels:  scikit-learn
Stocksensation
基于情感字典和机器学习的股市舆情情感分类可视化Web
Stars: ✭ 215 (+451.28%)
Mutual labels:  scikit-learn
Deeplearning cv notes
📓 deepleaning and cv notes.
Stars: ✭ 223 (+471.79%)
Mutual labels:  scikit-learn
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+502.56%)
Mutual labels:  scikit-learn
Eli5
A library for debugging/inspecting machine learning classifiers and explaining their predictions
Stars: ✭ 2,477 (+6251.28%)
Mutual labels:  scikit-learn
Artificial Intelligence Deep Learning Machine Learning Tutorials
A comprehensive list of Deep Learning / Artificial Intelligence and Machine Learning tutorials - rapidly expanding into areas of AI/Deep Learning / Machine Vision / NLP and industry specific areas such as Climate / Energy, Automotives, Retail, Pharma, Medicine, Healthcare, Policy, Ethics and more.
Stars: ✭ 2,966 (+7505.13%)
Mutual labels:  scikit-learn
Svm mnist digit classification
MNIST digit classification with scikit-learn and Support Vector Machine (SVM) algorithm.
Stars: ✭ 226 (+479.49%)
Mutual labels:  scikit-learn
Tune Sklearn
A drop-in replacement for Scikit-Learn’s GridSearchCV / RandomizedSearchCV -- but with cutting edge hyperparameter tuning techniques.
Stars: ✭ 241 (+517.95%)
Mutual labels:  scikit-learn
Auto viml
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Stars: ✭ 216 (+453.85%)
Mutual labels:  scikit-learn
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+458.97%)
Mutual labels:  scikit-learn
Text Classification
Machine Learning and NLP: Text Classification using python, scikit-learn and NLTK
Stars: ✭ 239 (+512.82%)
Mutual labels:  scikit-learn
Hydro Serving
MLOps Platform
Stars: ✭ 213 (+446.15%)
Mutual labels:  scikit-learn
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+7982.05%)
Mutual labels:  scikit-learn
Sklearn Onnx
Convert scikit-learn models and pipelines to ONNX
Stars: ✭ 206 (+428.21%)
Mutual labels:  scikit-learn
Jetson Containers
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Stars: ✭ 223 (+471.79%)
Mutual labels:  scikit-learn
kenchi
A scikit-learn compatible library for anomaly detection
Stars: ✭ 36 (-7.69%)
Mutual labels:  scikit-learn
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (+541.03%)
Mutual labels:  scikit-learn
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+7479.49%)
Mutual labels:  scikit-learn

Predicting Loan Defaults on LendingClub.com

LendingClub is a US peer-to-peer lending company and the world's largest peer-to-peer lending platform. In this project, I build machine learning models to predict the probability that a loan on LendingClub will charge off (default). These models could help LendingClub investors make better-informed investment decisions. I use a 1.8 GB LendingClub dataset with 1,646,801 loans and 150 variables for each loan.

In training the models, I only use features that are known to investors before they choose to invest in the loan. These features include, among others, the borrower's income, FICO score, and debt-to-income ratio, and the loan amount, purpose, grade, and interest rate.

The modeling process takes several steps, including: removing loan features with significant missing data, or that aren't known to investors; exploring, transforming, and visualizing the data; creating dummy variables for categorical features; and fitting three models: logistic regression, random forest, and k-nearest neighbors. I use machine learning pipelines to combine imputation, standardization, dimension reduction, and model fitting into one pipeline object. I optimize hyperparameters through a cross-validated grid search.

I found that the three models performed similarly well according to cross-validated AUROC scores on the training data. I chose logistic regression as the final model, which obtained an AUROC score of 0.689 on a test set consisting of the most recent 10% of the loans.

I also found that, according to Pearson correlations, the most useful variables for predicting charge-off are the loan interest rate, the loan term, the borrower's FICO score, and the borrower's debt-to-income ratio.

All the analysis is done in a Python Jupyter Notebook, utilizing the packages numpy, pandas, matplotlib, seaborn, and scikit-learn.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].