Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → mjbahmani → Machine Learning Workflow With Python

mjbahmani / Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Programming Languages

python

139335 projects - #7 most used programming language

Labels

jupyter-notebook machine-learning data-visualization workflow machine-learning-algorithms kaggle feature-extraction feature-engineering courses kaggle-competition data-cleaning gradient-boosting kmeans

Projects that are alternatives of or similar to Machine Learning Workflow With Python

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+648.41%)

Mutual labels: kaggle-competition, kaggle, jupyter-notebook, data-cleaning, feature-extraction, data-visualization

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (-45.22%)

Mutual labels: kaggle-competition, kaggle, jupyter-notebook, feature-extraction, feature-engineering

Drugs Recommendation Using Reviews

Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.

Stars: ✭ 35 (-77.71%)

Mutual labels: jupyter-notebook, data-cleaning, feature-engineering, data-visualization

Deep Learning Machine Learning Stock

Stock for Deep Learning and Machine Learning

Stars: ✭ 240 (+52.87%)

Mutual labels: jupyter-notebook, feature-extraction, feature-engineering, data-visualization

Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (+38.85%)

Mutual labels: jupyter-notebook, feature-extraction, feature-engineering, data-visualization

Bike-Sharing-Demand-Kaggle

Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand

Stars: ✭ 33 (-78.98%)

Mutual labels: kaggle, feature-extraction, kaggle-competition, feature-engineering

Amazon Forest Computer Vision

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Stars: ✭ 346 (+120.38%)

Mutual labels: kaggle-competition, kaggle, jupyter-notebook

Articles

A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci

Stars: ✭ 350 (+122.93%)

Mutual labels: jupyter-notebook, machine-learning-algorithms, data-visualization

Kaggle Web Traffic Time Series Forecasting

Solution to Kaggle - Web Traffic Time Series Forecasting

Stars: ✭ 29 (-81.53%)

Mutual labels: kaggle-competition, kaggle, jupyter-notebook

Nlpython

This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"

Stars: ✭ 265 (+68.79%)

Mutual labels: jupyter-notebook, feature-extraction, feature-engineering

Ds and ml projects

Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.

Stars: ✭ 56 (-64.33%)

Mutual labels: jupyter-notebook, machine-learning-algorithms, data-visualization

Kaggle Notebooks

Sample notebooks for Kaggle competitions

Stars: ✭ 77 (-50.96%)

Mutual labels: kaggle-competition, kaggle, jupyter-notebook

Deltapy

DeltaPy - Tabular Data Augmentation (by @firmai)

Stars: ✭ 344 (+119.11%)

Mutual labels: jupyter-notebook, feature-extraction, feature-engineering

Machine learning basics

Plain python implementations of basic machine learning algorithms

Stars: ✭ 3,557 (+2165.61%)

Mutual labels: jupyter-notebook, machine-learning-algorithms, kmeans

Feature Engineering And Feature Selection

A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

Stars: ✭ 526 (+235.03%)

Mutual labels: jupyter-notebook, feature-extraction, feature-engineering

Pytorch Kaggle Starter

Pytorch starter kit for Kaggle competitions

Stars: ✭ 268 (+70.7%)

Mutual labels: kaggle-competition, kaggle, jupyter-notebook

Nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Stars: ✭ 10,698 (+6714.01%)

Mutual labels: machine-learning-algorithms, feature-extraction, feature-engineering

Deep Learning Boot Camp

A community run, 5-day PyTorch Deep Learning Bootcamp

Stars: ✭ 1,270 (+708.92%)

Mutual labels: kaggle-competition, kaggle, jupyter-notebook

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+865.61%)

Mutual labels: jupyter-notebook, data-cleaning, data-visualization

Apartment-Interest-Prediction

Predict people interest in renting specific NYC apartments. The challenge combines structured data, geolocalization, time data, free text and images.

Stars: ✭ 17 (-89.17%)

Mutual labels: kaggle, kaggle-competition, gradient-boosting

View All Similar Projects ➔

📢 Machine Learning Workflow with Python

💻💾📓✒📊

1- Introduction

This is a comprehensive ML techniques with python , that I have spent for more than 6 months to complete it.

I think it is a great opportunity for who want to learn machine learning workflow with python completely. I have covered most of the methods that are implemented for iris until 2018, you can start to learn and review your knowledge about ML with a simple dataset and try to learn and memorize the workflow for your journey in Data science world.

I am open to getting your feedback for improving this

2- Machine Learning Workflow

Field of study that gives computers the ability to learn without being explicitly programmed.

Arthur Samuel, 1959

If you have already read some machine learning books. You have noticed that there are different ways to stream data into machine learning.

most of these books share the following steps (checklist):

Define the Problem(Look at the big picture)
Specify Inputs & Outputs
Data Collection
Exploratory data analysis
Data Preprocessing
Model Design, Training, and Offline Evaluation
Model Deployment, Online Evaluation, and Monitoring
Model Maintenance, Diagnosis, and Retraining

You can see my workflow in the below image :

you should feel free to adapt this checklist to your needs

2-1 Real world Application Vs Competitions

## 3- Problem Definition I think one of the important things when you start a new machine learning project is Defining your problem. that means you should understand business problem.( **Problem Formalization**)

Problem Definition has four steps that have illustrated in the picture below:

3-1 Problem Feature

The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. That's why the name DieTanic. This is a very unforgetable disaster that no one in the world can forget.

It took about $7.5 million to build the Titanic and it sunk under the ocean due to collision. The Titanic Dataset is a very good dataset for begineers to start a journey in data science and participate in competitions in Kaggle.

ٌWe will use the classic titanic data set. This dataset contains information about 11 different variables:

Survival
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked

Note : You must answer the following question: How does your company expact to use and benfit from your model.

3-2 Aim

It is your job to predict if a passenger survived the sinking of the Titanic or not. For each PassengerId in the test set, you must predict a 0 or 1 value for the Survived variable.

3-3 Variables

Age :
1. Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
Sibsp :
1. The dataset defines family relations in this way...
  
  a. Sibling = brother, sister, stepbrother, stepsister
  
  b. Spouse = husband, wife (mistresses and fiancés were ignored)
Parch:
1. The dataset defines family relations in this way...
  
  a. Parent = mother, father
  
  b. Child = daughter, son, stepdaughter, stepson
  
  c. Some children travelled only with a nanny, therefore parch=0 for them.
Pclass :
- A proxy for socio-economic status (SES)
  - 1st = Upper
  - 2nd = Middle
  - 3rd = Lower
Embarked :
- nominal datatype
Name:
- nominal datatype . It could be used in feature engineering to derive the gender from title
Sex:
- nominal datatype
Ticket:
- that have no impact on the outcome variable. Thus, they will be excluded from analysis
Cabin:
- is a nominal datatype that can be used in feature engineering
Fare:
- Indicating the fare
PassengerID:
- have no impact on the outcome variable. Thus, it will be excluded from analysis
Survival:
- dependent variable , 0 or 1

4- Inputs & Outputs

4-1 Inputs

What's our input for this problem: 1. train.csv 1. test.csv

4-2 Outputs

Your score is the percentage of passengers you correctly predict. This is known simply as "accuracy”.

The Outputs should have exactly 2 columns:

1. PassengerId (sorted in any order)
1. Survived (contains your binary predictions: 1 for survived, 0 for deceased)

5- Loading Packages

In this kernel we are using the following packages:

6- Exploratory Data Analysis(EDA)

In this section, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data.

Which variables suggest interesting relationships?
Which observations are unusual?
Analysis of the features!

By the end of the section, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful. then We will review analytical and statistical operations:

5-1 Data Collection
5-2 Visualization
5-3 Data Preprocessing
5-4 Data Cleaning

Note: You can change the order of the above steps.

6-1 Data Collection

Data collection is the process of gathering and measuring data, information or any variables of interest in a standardized and established manner that enables the collector to answer or test hypothesis and evaluate outcomes of the particular collection.[techopedia]
I start Collection Data by the training and testing datasets into Pandas DataFrames

6-2 Visualization

Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns.

With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed.[SAS]

In this section I show you 11 plots with matplotlib and seaborn that is listed in the blew picture:

Help

I hope you have enjoyed reading my python notebook.

If you have any problem to run notebook please open an issue here in github.

for most of the my notebook you need dataset as input.

To use the correct data, please download the dat set from the Kaggle site and put it in your notebook folder.

Mj Bhamnai

[email protected]

Have Fun!

you can follow me on:

GitHub

LinkedIn

Kaggle

Please Fork the Repository to continue...

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 157

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗