All Projects → WLOGSolutions → telco-customer-churn-in-r-and-h2o

WLOGSolutions / telco-customer-churn-in-r-and-h2o

Licence: Apache-2.0 license
Showcase for using H2O and R for churn prediction (inspired by ZhouFang928 examples)

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to telco-customer-churn-in-r-and-h2o

Benchm Ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+3010.17%)
Mutual labels:  h2o, gradient-boosting-machine
skutil
NOTE: skutil is now deprecated. See its sister project: https://github.com/tgsmith61591/skoot. Original description: A set of scikit-learn and h2o extension classes (as well as caret classes for python). See more here: https://tgsmith61591.github.io/skutil
Stars: ✭ 29 (-50.85%)
Mutual labels:  h2o
openui5-tour
OpenUI5 Tour enables an user-friendly way to showcase products and features in your website.
Stars: ✭ 21 (-64.41%)
Mutual labels:  showcase
forecastVeg
A Machine Learning Approach to Forecasting Remotely Sensed Vegetation Health in Python
Stars: ✭ 44 (-25.42%)
Mutual labels:  h2o
MultiPy
MultiPy lets you conveniently keep track of your python scripts for personal use or showcase by loading and grouping them into categories. It allows you to either run each script individually or together with just one click.
Stars: ✭ 56 (-5.08%)
Mutual labels:  showcase
showcase-template
A React Native template that helps developers to showcase their amazing libraries examples.
Stars: ✭ 50 (-15.25%)
Mutual labels:  showcase
decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (+3.39%)
Mutual labels:  gradient-boosting-machine
reskit
A library for creating and curating reproducible pipelines for scientific and industrial machine learning
Stars: ✭ 27 (-54.24%)
Mutual labels:  grid-search
exemplary-ml-pipeline
Exemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-61.02%)
Mutual labels:  h2o
isopaint
Isometric Painting Tool on HTML5 canvas
Stars: ✭ 58 (-1.69%)
Mutual labels:  showcase
haxeflixel.com
haxeflixel.com docpad source
Stars: ✭ 57 (-3.39%)
Mutual labels:  showcase
subplayer
A music player frontend compatible with Subsonic backends
Stars: ✭ 66 (+11.86%)
Mutual labels:  rsuite
showcase-hugo-theme
Showcase is a minimal, single page theme for Hugo
Stars: ✭ 54 (-8.47%)
Mutual labels:  showcase
VinylShop
https://dribbble.com/shots/4996346-Vinyl-Shop-mobile-app
Stars: ✭ 30 (-49.15%)
Mutual labels:  showcase
interpretable-ml
Techniques & resources for training interpretable ML models, explaining ML models, and debugging ML models.
Stars: ✭ 17 (-71.19%)
Mutual labels:  gradient-boosting-machine
Android-Showcase
📱 Android showcase app
Stars: ✭ 19 (-67.8%)
Mutual labels:  showcase
stremr
Streamlined Estimation for Static, Dynamic and Stochastic Treatment Regimes in Longitudinal Data
Stars: ✭ 33 (-44.07%)
Mutual labels:  grid-search
customer churn prediction
零售电商客户流失模型,基于tensorflow,xgboost4j-spark,spark-ml实现LR,FM,GBDT,RF,进行模型效果对比,离线/在线部署方式总结
Stars: ✭ 58 (-1.69%)
Mutual labels:  churn-prediction
mercury-ml
Mercury-ML is an open source Machine Learning workflow management library. Its core contributors are employees of Alexander Thamm GmbH
Stars: ✭ 37 (-37.29%)
Mutual labels:  h2o
rsuite.github.io
React Suite documentation site. The library will stop updating. For documentation related issues, please visit https://github.com/rsuite/rsuite/tree/master/docs.
Stars: ✭ 41 (-30.51%)
Mutual labels:  rsuite

Showcase: telco customer churn prediction with GNU R and H2O

Showcase for using H2O and R for churn prediction (inspired by ZhouFang928 examples).

ZhouFang928 in a blog post Telco Customer Churn with R in SQL Server 2016 presented a great analysis of telco customer churn prediction. I found it missed one of my favorite machine-learning library H2O in the comparison. This showcase presents how easy it is to use H2O library to build very good quality predictive models.

Prerequisities

I have used:

Remark for Windows users

Instalation of the packages requires Rtools compatible with your R version.

Usage instruction

Prepare project

Install dependencies for the project

rsuite proj depsinst

It will result in the following output

2017-09-23 20:39:18 INFO:rsuite:Detecting repositories (for R 3.3)...
2017-09-23 20:39:20 WARNING:rsuite:Project is configured to use non reliable repositories: S3. You should use only reliable repositories to be sure of project consistency over time.
2017-09-23 20:39:20 INFO:rsuite:Will look for dependencies in ...
2017-09-23 20:39:20 INFO:rsuite:.          MRAN#1 = http://mran.microsoft.com/snapshot/2017-09-23 (win.binary, source)
2017-09-23 20:39:20 INFO:rsuite:.            S3#2 = http://h2o-release.s3.amazonaws.com/h2o/master/4034/R (source)
2017-09-23 20:39:20 INFO:rsuite:Collecting project dependencies (for R 3.3)...
2017-09-23 20:39:20 INFO:rsuite:Resolving dependencies (for R 3.3)...
2017-09-23 20:39:44 INFO:rsuite:Detected 29 dependencies to install. Installing...
2017-09-23 20:43:47 INFO:rsuite:All dependencies successfully installed.

Build custom packages

rsuite proj build

You should get the following output

2017-09-23 20:48:46 INFO:rsuite:Installing externalpackages (for R 3.3) ...
2017-09-23 20:48:51 INFO:rsuite:Installing modelbuilder (for R 3.3) ...
2017-09-23 20:48:57 INFO:rsuite:Successfuly build 2 packages

Train and evaluate models

Run model training and evaluation

Rscript.exe R\build_telco_churn_model.R --nthreads=4 --max-mem="4g"

Please note that script has two parameters:

  • nthreads - number of threads to be used with -1 (all) as default
  • max_mem - maximum memory size for H2O with 4g as default

Check results

After succesful model building you can find it (in H2O format) in folder export. It can be loaded in H2O Flow for further inspection.

Approach

I decided to go with Gradient Boosting Models. To select best model I used grid search for such parameters:

  • number of trees: 50, 100, 500
  • max tree depth: 4, 8, 16, 32

Best model was selected using AUC metric -- resulting in 100 trees with max depth equals 16. After model building I optimized threshold to maximize minimum per class accuracy.

Obtained results

Best model (with threshold selected to maximize min per class classification error) gave following results on test dataset:

  • AUC = 0.949
  • Accuracy = 0.879
  • Precision = 0.420
  • Recall = 0.848

Performance issues

Computation involved validating (using 5-fold cross validation) 12 GBM models with different parameters. On my laptop (Intel i7, 8GB RAM, Windows 10) it took around 25 minutes. Using Amazon's EC2 c4.4xlarge instance the time droped to around 14-15 minutes.

Good practices

  1. Always install packages for each project separately. R Suite solution makes it for you.
  2. Select best model with any parametr tunning procedure.
  3. Do not forget to optimize threshold.
  4. Use logging instead of print function.

Project structure description

Project structure

Folders:

  • data - this folder contains CSV file with customers' info. It is a copy of data from ZhouFang928's example.
  • export - this folder is for saving computing results (currently final model is stored there)
  • R - master scripts
  • packages
    • externalpackages - dummy package to maintain 3rd party packages dependencies
    • modelbuilder - package that delivers funciton that builds GBM models
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].