All Projects → worldbank → Ml Classification Algorithms Poverty

worldbank / Ml Classification Algorithms Poverty

A comparative assessment of machine learning classification algorithms applied to poverty prediction

Projects that are alternatives of or similar to Ml Classification Algorithms Poverty

Ner Bert
BERT-NER (nert-bert) with google bert https://github.com/google-research.
Stars: ✭ 339 (+841.67%)
Mutual labels:  jupyter-notebook, classification
Breast cancer classifier
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
Stars: ✭ 614 (+1605.56%)
Mutual labels:  jupyter-notebook, classification
Transformers Tutorials
Github repo with tutorials to fine tune transformers for diff NLP tasks
Stars: ✭ 384 (+966.67%)
Mutual labels:  jupyter-notebook, classification
Demo Chinese Text Binary Classification With Bert
Stars: ✭ 276 (+666.67%)
Mutual labels:  jupyter-notebook, classification
Servenet
Service Classification based on Service Description
Stars: ✭ 21 (-41.67%)
Mutual labels:  jupyter-notebook, classification
Pycaret
An open-source, low-code machine learning library in Python
Stars: ✭ 4,594 (+12661.11%)
Mutual labels:  jupyter-notebook, classification
Tensorflow Book
Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.
Stars: ✭ 4,448 (+12255.56%)
Mutual labels:  jupyter-notebook, classification
Shufflenet V2 Tensorflow
A lightweight convolutional neural network
Stars: ✭ 145 (+302.78%)
Mutual labels:  jupyter-notebook, classification
Bayesian Neural Networks
Pytorch implementations of Bayes By Backprop, MC Dropout, SGLD, the Local Reparametrization Trick, KF-Laplace, SG-HMC and more
Stars: ✭ 900 (+2400%)
Mutual labels:  jupyter-notebook, classification
Skin Cancer Image Classification
Skin cancer classification using Inceptionv3
Stars: ✭ 16 (-55.56%)
Mutual labels:  jupyter-notebook, classification
Timeseries fastai
fastai V2 implementation of Timeseries classification papers.
Stars: ✭ 221 (+513.89%)
Mutual labels:  jupyter-notebook, classification
The Deep Learning With Keras Workshop
An Interactive Approach to Understanding Deep Learning with Keras
Stars: ✭ 34 (-5.56%)
Mutual labels:  jupyter-notebook, classification
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+6002.78%)
Mutual labels:  jupyter-notebook, classification
Tianchi Medical Lungtumordetect
天池医疗AI大赛[第一季]:肺部结节智能诊断 UNet/VGG/Inception/ResNet/DenseNet
Stars: ✭ 314 (+772.22%)
Mutual labels:  jupyter-notebook, classification
100daysofmlcode
My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (+305.56%)
Mutual labels:  jupyter-notebook, classification
Food Recipe Cnn
food image to recipe with deep convolutional neural networks.
Stars: ✭ 448 (+1144.44%)
Mutual labels:  jupyter-notebook, classification
Benchmarks
Comparison tools
Stars: ✭ 139 (+286.11%)
Mutual labels:  jupyter-notebook, classification
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+5088.89%)
Mutual labels:  jupyter-notebook, classification
Tensorflow cookbook
Code for Tensorflow Machine Learning Cookbook
Stars: ✭ 5,984 (+16522.22%)
Mutual labels:  jupyter-notebook, classification
Prediciting Binary Options
Predicting forex binary options using time series data and machine learning
Stars: ✭ 33 (-8.33%)
Mutual labels:  jupyter-notebook, classification
world bank logo

A comparative assessment of machine learning classification algorithms applied to poverty prediction

A project of the World Bank Knowledge for Change (KCP) Program

We provide here a series of notebooks developed as an empirical comparative assessment of machine learning classification algorithms applied to poverty prediction. The objectives of this project are to explore how well machine learning algorithms perform when given the task to identify the poor in a given population, and to provide a resource of machine learning techniques for researchers, data scientists, and statisticians around the world.

We use a selection of categorical variables from household survey data from Indonesia and Malawi to predict the poverty status of households – a binary class with labels “Poor” and “Non-poor”. Various “out-of-the-box” classification algorithms (no regression algorithms) are used: logistic regression, linear discriminant analysis, k-nearest neighbors, decision trees, random forests, naïve Bayes, support vector machine, extreme gradient boosting, multilayer perceptron, and deep learning. More complex solutions including ensembling, deep factorization machines, and automated machine learning are also implemented. Models are compared across six metrics (accuracy, recall, precision, f1, cross entropy, ROC AUC, and Cohen-Kappa). An analysis of misclassified observations is conducted.

The project report is provided in the report folder (ML_Classification_Poverty_Comparative_Assessment_v01.pdf).

As part of the project, a Data Science competition was also organized (on DrivenData platform), challenging data scientists to build poverty prediction models for three countries. Participants in the competition were not informed of the origin of the three (obfuscated) survey datasets used for the competition. One of the three datasets was from the Malawi Integrated Household Survey 2010. We provide in this repo an adapted version of the scripts produced by the 4 winners of the competition (the adaptation was made to make the scripts run on a de-obfuscated version of the dataset).

This project was funded by Grant TF 0A4534 of the World Bank Knowledge for Change Program.

Prerequisites:

The prerequisites for this project are:

  • Python 3.6
  • pip>=9.0.1 (to check your version, run pip --version; to upgrade run pip install --upgrade pip)

Recommended software:

Although it is not required, we recommend using Anaconda to manage your Python environment for this project. Other configurations, e.g., using virtualenv are not tested. Anaconda is free, open-source software distributed by Continuum Analytics. Download Anaconda for your operating system here. Instructions for environment setup in this README are given for Anaconda.

Setup:

  1. Create a worldbank-poverty environment. To do this, after installing Anaconda, run the command:
conda create --name worldbank-poverty python=3.6

After answering yes (y) to the prompt asking you would like to proceed, your environment will be created. To activate the environment, run the following command.

source activate worldbank-poverty

(On Windows, you can just run activate worldbank-poverty instead).

  1. Install requirements. First, activate the worldbank-poverty environment and navigate to the project root. If you are using Linux or MacOS, run the command:
pip install -r requirements.txt

If you are using Windows, run the command:

pip install -r requirements-windows.txt
  1. Download the data. The data required for these notebooks must be downloaded separately. Currently the Malawi dataset is publicly available through the World Bank Data Catalog. Add raw data to data/raw directory. Extract the contents of all zipped files and leave the extracted directories in the data/raw directory. If you have all the data, then the final data folder will look like:
data/raw
├── IDN_2011
│   ├── IDN2011_Dictionary.xlsx
│   ├── IDN2011_expenditure.dta
│   ├── IDN2011_household.dta
│   └── IDN2011_individual.dta
├── IDN_2012
│   ├── IDN2012_Dictionary.xlsx
│   ├── IDN2012_expenditure.dta
│   ├── IDN2012_household.dta
│   └── IDN2012_individual.dta
├── IDN_2013
│   ├── IDN2013_Dictionary.xlsx
│   ├── IDN2013_expenditure.dta
│   ├── IDN2013_household.dta
│   └── IDN2013_individual.dta
├── IDN_2014
│   ├── IDN2014_Dictionary.xlsx
│   ├── IDN2014_expenditure.dta
│   ├── IDN2014_household.dta
│   └── IDN2014_individual.dta
├── KCP2017_MP
│   ├── KCP_ML_IDN
│   │   ├── IDN_2012_household.dta
│   │   ├── IDN_2012_individual.dta
│   │   ├── IDN_household.txt
│   │   └── IDN_individual.txt
│   └── KCP_ML_MWI
│       ├── MWI_2012_household.dta
│       ├── MWI_2012_individual.dta
│       ├── MWI_household.txt
│       └── MWI_individual.txt
└── competition-winners
    ├── 1st-rgama-ag100.csv
    ├── 2nd-sagol.csv
    ├── 3rd-lastrocky.csv
    ├── bonus-avsolatorio.csv
    ├── just_labels.csv
    └── just_pubidx.csv

The minimum that you need for most notebooks are the 2012 surveys that are contained in data/raw/KCP2017_MP.

  1. Start Jupyter. The Jupyter Notebooks are contained in the notebooks directory. Run jupyter notebook notebooks to access and run the algorithm notebooks that are in the notebooks folder. Jupyter should open a browser window with the notebooks listed. To interact with a notebooks, select and launch (double-click) it from the browser window.

Video Thumbnail

  1. Run the data preparation notebooks first. The first notebooks that should be run are 00.0-data-preparation.ipynb and 00.1-new-idn-data-preparation.ipynb. This will read the raw data and output the necessary training and test sets to be used in all of the subsequent notebooks.

  2. Run the rest of the notebooks for a first time. After running the data preparation notebook, run the Logistic Regression notebooks first. The notebooks for all other algorithms up to notebooks 12+ compare results to the logistic regression baseline model, so these models must be generated before running other algorithm notebooks. Notebooks 12+ should be run in relative order as well.

  3. Explore. After all of the notebooks have bee run once in the proper order, all necessary models and files will have been created and saved, so notebooks can be run in any order. Model files will exist under the models/ directory, and processed data will exist under the data/processed/ directory.

Notes:

  • There will be some differences between these notebooks and the published results. Many notebooks have a long runtime when working with the full dataset. The versions that are released include parameters that sample the data or reduce the exploration space. To get equivalent results to the published paper, these notebooks must be executed against the full dataset. Each notebook notes where these parameters can be adjusted to replicate the paper.

These materials have been produced by a team at DrivenData.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].