Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → worldbank → Ml Classification Algorithms Poverty

worldbank / Ml Classification Algorithms Poverty

A comparative assessment of machine learning classification algorithms applied to poverty prediction

Labels

jupyter-notebook machine-learning classification

Projects that are alternatives of or similar to Ml Classification Algorithms Poverty

BERT-NER (nert-bert) with google bert https://github.com/google-research.

Stars: ✭ 339 (+841.67%)

Mutual labels: jupyter-notebook, classification

Breast cancer classifier

Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

Stars: ✭ 614 (+1605.56%)

Mutual labels: jupyter-notebook, classification

Transformers Tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Stars: ✭ 384 (+966.67%)

Mutual labels: jupyter-notebook, classification

Demo Chinese Text Binary Classification With Bert

Stars: ✭ 276 (+666.67%)

Mutual labels: jupyter-notebook, classification

Service Classification based on Service Description

Stars: ✭ 21 (-41.67%)

Mutual labels: jupyter-notebook, classification

An open-source, low-code machine learning library in Python

Stars: ✭ 4,594 (+12661.11%)

Mutual labels: jupyter-notebook, classification

Tensorflow Book

Accompanying source code for Machine Learning with TensorFlow. Refer to the book for step-by-step explanations.

Stars: ✭ 4,448 (+12255.56%)

Mutual labels: jupyter-notebook, classification

Shufflenet V2 Tensorflow

A lightweight convolutional neural network

Stars: ✭ 145 (+302.78%)

Mutual labels: jupyter-notebook, classification

Bayesian Neural Networks

Pytorch implementations of Bayes By Backprop, MC Dropout, SGLD, the Local Reparametrization Trick, KF-Laplace, SG-HMC and more

Stars: ✭ 900 (+2400%)

Mutual labels: jupyter-notebook, classification

Skin Cancer Image Classification

Skin cancer classification using Inceptionv3

Stars: ✭ 16 (-55.56%)

Mutual labels: jupyter-notebook, classification

Timeseries fastai

fastai V2 implementation of Timeseries classification papers.

Stars: ✭ 221 (+513.89%)

Mutual labels: jupyter-notebook, classification

The Deep Learning With Keras Workshop

An Interactive Approach to Understanding Deep Learning with Keras

Stars: ✭ 34 (-5.56%)

Mutual labels: jupyter-notebook, classification

Machine Learning With Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

Stars: ✭ 2,197 (+6002.78%)

Mutual labels: jupyter-notebook, classification

Tianchi Medical Lungtumordetect

天池医疗AI大赛[第一季]：肺部结节智能诊断 UNet/VGG/Inception/ResNet/DenseNet

Stars: ✭ 314 (+772.22%)

Mutual labels: jupyter-notebook, classification

100daysofmlcode

My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.

Stars: ✭ 146 (+305.56%)

Mutual labels: jupyter-notebook, classification

Food Recipe Cnn

food image to recipe with deep convolutional neural networks.

Stars: ✭ 448 (+1144.44%)

Mutual labels: jupyter-notebook, classification

Comparison tools

Stars: ✭ 139 (+286.11%)

Mutual labels: jupyter-notebook, classification

Practical Machine Learning With Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Stars: ✭ 1,868 (+5088.89%)

Mutual labels: jupyter-notebook, classification

Tensorflow cookbook

Code for Tensorflow Machine Learning Cookbook

Stars: ✭ 5,984 (+16522.22%)

Mutual labels: jupyter-notebook, classification

Prediciting Binary Options

Predicting forex binary options using time series data and machine learning

Stars: ✭ 33 (-8.33%)

Mutual labels: jupyter-notebook, classification

View All Similar Projects ➔

A comparative assessment of machine learning classification algorithms applied to poverty prediction

A project of the World Bank Knowledge for Change (KCP) Program

We provide here a series of notebooks developed as an empirical comparative assessment of machine learning classification algorithms applied to poverty prediction. The objectives of this project are to explore how well machine learning algorithms perform when given the task to identify the poor in a given population, and to provide a resource of machine learning techniques for researchers, data scientists, and statisticians around the world.

We use a selection of categorical variables from household survey data from Indonesia and Malawi to predict the poverty status of households – a binary class with labels “Poor” and “Non-poor”. Various “out-of-the-box” classification algorithms (no regression algorithms) are used: logistic regression, linear discriminant analysis, k-nearest neighbors, decision trees, random forests, naïve Bayes, support vector machine, extreme gradient boosting, multilayer perceptron, and deep learning. More complex solutions including ensembling, deep factorization machines, and automated machine learning are also implemented. Models are compared across six metrics (accuracy, recall, precision, f1, cross entropy, ROC AUC, and Cohen-Kappa). An analysis of misclassified observations is conducted.

The project report is provided in the report folder (ML_Classification_Poverty_Comparative_Assessment_v01.pdf).

As part of the project, a Data Science competition was also organized (on DrivenData platform), challenging data scientists to build poverty prediction models for three countries. Participants in the competition were not informed of the origin of the three (obfuscated) survey datasets used for the competition. One of the three datasets was from the Malawi Integrated Household Survey 2010. We provide in this repo an adapted version of the scripts produced by the 4 winners of the competition (the adaptation was made to make the scripts run on a de-obfuscated version of the dataset).

This project was funded by Grant TF 0A4534 of the World Bank Knowledge for Change Program.

Prerequisites:

The prerequisites for this project are:

Python 3.6
pip>=9.0.1 (to check your version, run pip --version; to upgrade run pip install --upgrade pip)

Recommended software:

Although it is not required, we recommend using Anaconda to manage your Python environment for this project. Other configurations, e.g., using virtualenv are not tested. Anaconda is free, open-source software distributed by Continuum Analytics. Download Anaconda for your operating system here. Instructions for environment setup in this README are given for Anaconda.

Setup:

Create a worldbank-poverty environment. To do this, after installing Anaconda, run the command:

conda create --name worldbank-poverty python=3.6

After answering yes (y) to the prompt asking you would like to proceed, your environment will be created. To activate the environment, run the following command.

source activate worldbank-poverty

(On Windows, you can just run activate worldbank-poverty instead).

Install requirements. First, activate the worldbank-poverty environment and navigate to the project root. If you are using Linux or MacOS, run the command:

pip install -r requirements.txt

If you are using Windows, run the command:

pip install -r requirements-windows.txt

Download the data. The data required for these notebooks must be downloaded separately. Currently the Malawi dataset is publicly available through the World Bank Data Catalog. Add raw data to data/raw directory. Extract the contents of all zipped files and leave the extracted directories in the data/raw directory. If you have all the data, then the final data folder will look like:

data/raw
├── IDN_2011
│   ├── IDN2011_Dictionary.xlsx
│   ├── IDN2011_expenditure.dta
│   ├── IDN2011_household.dta
│   └── IDN2011_individual.dta
├── IDN_2012
│   ├── IDN2012_Dictionary.xlsx
│   ├── IDN2012_expenditure.dta
│   ├── IDN2012_household.dta
│   └── IDN2012_individual.dta
├── IDN_2013
│   ├── IDN2013_Dictionary.xlsx
│   ├── IDN2013_expenditure.dta
│   ├── IDN2013_household.dta
│   └── IDN2013_individual.dta
├── IDN_2014
│   ├── IDN2014_Dictionary.xlsx
│   ├── IDN2014_expenditure.dta
│   ├── IDN2014_household.dta
│   └── IDN2014_individual.dta
├── KCP2017_MP
│   ├── KCP_ML_IDN
│   │   ├── IDN_2012_household.dta
│   │   ├── IDN_2012_individual.dta
│   │   ├── IDN_household.txt
│   │   └── IDN_individual.txt
│   └── KCP_ML_MWI
│       ├── MWI_2012_household.dta
│       ├── MWI_2012_individual.dta
│       ├── MWI_household.txt
│       └── MWI_individual.txt
└── competition-winners
    ├── 1st-rgama-ag100.csv
    ├── 2nd-sagol.csv
    ├── 3rd-lastrocky.csv
    ├── bonus-avsolatorio.csv
    ├── just_labels.csv
    └── just_pubidx.csv

The minimum that you need for most notebooks are the 2012 surveys that are contained in data/raw/KCP2017_MP.

Start Jupyter. The Jupyter Notebooks are contained in the notebooks directory. Run jupyter notebook notebooks to access and run the algorithm notebooks that are in the notebooks folder. Jupyter should open a browser window with the notebooks listed. To interact with a notebooks, select and launch (double-click) it from the browser window.

Run the data preparation notebooks first. The first notebooks that should be run are 00.0-data-preparation.ipynb and 00.1-new-idn-data-preparation.ipynb. This will read the raw data and output the necessary training and test sets to be used in all of the subsequent notebooks.
Run the rest of the notebooks for a first time. After running the data preparation notebook, run the Logistic Regression notebooks first. The notebooks for all other algorithms up to notebooks 12+ compare results to the logistic regression baseline model, so these models must be generated before running other algorithm notebooks. Notebooks 12+ should be run in relative order as well.
Explore. After all of the notebooks have bee run once in the proper order, all necessary models and files will have been created and saved, so notebooks can be run in any order. Model files will exist under the models/ directory, and processed data will exist under the data/processed/ directory.

Notes:

There will be some differences between these notebooks and the published results. Many notebooks have a long runtime when working with the full dataset. The versions that are released include parameters that sample the data or reduce the exploration space. To get equivalent results to the published paper, these notebooks must be executed against the full dataset. Each notebook notes where these parameters can be adjusted to replicate the paper.

These materials have been produced by a team at DrivenData.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 36

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗