All Projects β†’ mjbahmani β†’ Machine Learning Workflow With Python

mjbahmani / Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Machine Learning Workflow With Python

My Journey In The Data Science World
πŸ“’ Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+648.41%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook, data-cleaning, feature-extraction, data-visualization
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-45.22%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook, feature-extraction, feature-engineering
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-77.71%)
Mutual labels:  jupyter-notebook, data-cleaning, feature-engineering, data-visualization
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+52.87%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering, data-visualization
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+38.85%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering, data-visualization
Bike-Sharing-Demand-Kaggle
Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand
Stars: ✭ 33 (-78.98%)
Mutual labels:  kaggle, feature-extraction, kaggle-competition, feature-engineering
Amazon Forest Computer Vision
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Stars: ✭ 346 (+120.38%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Articles
A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci
Stars: ✭ 350 (+122.93%)
Mutual labels:  jupyter-notebook, machine-learning-algorithms, data-visualization
Kaggle Web Traffic Time Series Forecasting
Solution to Kaggle - Web Traffic Time Series Forecasting
Stars: ✭ 29 (-81.53%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (+68.79%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Ds and ml projects
Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-64.33%)
Mutual labels:  jupyter-notebook, machine-learning-algorithms, data-visualization
Kaggle Notebooks
Sample notebooks for Kaggle competitions
Stars: ✭ 77 (-50.96%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (+119.11%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Machine learning basics
Plain python implementations of basic machine learning algorithms
Stars: ✭ 3,557 (+2165.61%)
Mutual labels:  jupyter-notebook, machine-learning-algorithms, kmeans
Feature Engineering And Feature Selection
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Stars: ✭ 526 (+235.03%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Pytorch Kaggle Starter
Pytorch starter kit for Kaggle competitions
Stars: ✭ 268 (+70.7%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+6714.01%)
Mutual labels:  machine-learning-algorithms, feature-extraction, feature-engineering
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+708.92%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+865.61%)
Mutual labels:  jupyter-notebook, data-cleaning, data-visualization
Apartment-Interest-Prediction
Predict people interest in renting specific NYC apartments. The challenge combines structured data, geolocalization, time data, free text and images.
Stars: ✭ 17 (-89.17%)
Mutual labels:  kaggle, kaggle-competition, gradient-boosting

πŸ“’ Machine Learning Workflow with Python

πŸ’»πŸ’ΎπŸ““βœ’πŸ“Š

1- Introduction

This is a comprehensive ML techniques with python , that I have spent for more than 6 months to complete it.

I think it is a great opportunity for who want to learn machine learning workflow with python completely. I have covered most of the methods that are implemented for iris until 2018, you can start to learn and review your knowledge about ML with a simple dataset and try to learn and memorize the workflow for your journey in Data science world.

I am open to getting your feedback for improving this

2- Machine Learning Workflow

Field of study that gives computers the ability to learn without being explicitly programmed.

Arthur Samuel, 1959

If you have already read some machine learning books. You have noticed that there are different ways to stream data into machine learning.

most of these books share the following steps (checklist):

  • Define the Problem(Look at the big picture)
  • Specify Inputs & Outputs
  • Data Collection
  • Exploratory data analysis
  • Data Preprocessing
  • Model Design, Training, and Offline Evaluation
  • Model Deployment, Online Evaluation, and Monitoring
  • Model Maintenance, Diagnosis, and Retraining

You can see my workflow in the below image :

you should feel free to adapt this checklist to your needs

2-1 Real world Application Vs Competitions


## 3- Problem Definition I think one of the important things when you start a new machine learning project is Defining your problem. that means you should understand business problem.( **Problem Formalization**)

Problem Definition has four steps that have illustrated in the picture below:

3-1 Problem Feature

The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. That's why the name DieTanic. This is a very unforgetable disaster that no one in the world can forget.

It took about $7.5 million to build the Titanic and it sunk under the ocean due to collision. The Titanic Dataset is a very good dataset for begineers to start a journey in data science and participate in competitions in Kaggle.

ٌWe will use the classic titanic data set. This dataset contains information about 11 different variables:

  1. Survival
  2. Pclass
  3. Name
  4. Sex
  5. Age
  6. SibSp
  7. Parch
  8. Ticket
  9. Fare
  10. Cabin
  11. Embarked

Note : You must answer the following question: How does your company expact to use and benfit from your model.

3-2 Aim

It is your job to predict if a passenger survived the sinking of the Titanic or not. For each PassengerId in the test set, you must predict a 0 or 1 value for the Survived variable.

3-3 Variables

  1. Age :

    1. Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
  2. Sibsp :

    1. The dataset defines family relations in this way...

      a. Sibling = brother, sister, stepbrother, stepsister

      b. Spouse = husband, wife (mistresses and fiancΓ©s were ignored)

  3. Parch:

    1. The dataset defines family relations in this way...

      a. Parent = mother, father

      b. Child = daughter, son, stepdaughter, stepson

      c. Some children travelled only with a nanny, therefore parch=0 for them.

  4. Pclass :

    • A proxy for socio-economic status (SES)
      • 1st = Upper
      • 2nd = Middle
      • 3rd = Lower
  5. Embarked :

    • nominal datatype
  6. Name:

    • nominal datatype . It could be used in feature engineering to derive the gender from title
  7. Sex:

    • nominal datatype
  8. Ticket:

    • that have no impact on the outcome variable. Thus, they will be excluded from analysis
  9. Cabin:

    • is a nominal datatype that can be used in feature engineering
  10. Fare:

    • Indicating the fare
  11. PassengerID:

    • have no impact on the outcome variable. Thus, it will be excluded from analysis
  12. Survival:

4- Inputs & Outputs


4-1 Inputs

What's our input for this problem: 1. train.csv 1. test.csv

4-2 Outputs

  1. Your score is the percentage of passengers you correctly predict. This is known simply as "accuracy”.

The Outputs should have exactly 2 columns:

1. PassengerId (sorted in any order)
1. Survived (contains your binary predictions: 1 for survived, 0 for deceased)

5- Loading Packages

In this kernel we are using the following packages:

6- Exploratory Data Analysis(EDA)

In this section, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data.

  • Which variables suggest interesting relationships?
  • Which observations are unusual?
  • Analysis of the features!

By the end of the section, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful. then We will review analytical and statistical operations:

  • 5-1 Data Collection
  • 5-2 Visualization
  • 5-3 Data Preprocessing
  • 5-4 Data Cleaning

Note: You can change the order of the above steps.

6-1 Data Collection

Data collection is the process of gathering and measuring data, information or any variables of interest in a standardized and established manner that enables the collector to answer or test hypothesis and evaluate outcomes of the particular collection.[techopedia]
I start Collection Data by the training and testing datasets into Pandas DataFrames

6-2 Visualization

Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns.

With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed.[SAS]

In this section I show you 11 plots with matplotlib and seaborn that is listed in the blew picture:

Help

I hope you have enjoyed reading my python notebook.

If you have any problem to run notebook please open an issue here in github.

for most of the my notebook you need dataset as input.

To use the correct data, please download the dat set from the Kaggle site and put it in your notebook folder.

Mj Bhamnai

[email protected]

Have Fun!

you can follow me on:

GitHub
LinkedIn
Kaggle

Please Fork the Repository to continue...

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].