All Projects β†’ ashishpatel26 β†’ Amazing Feature Engineering

ashishpatel26 / Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Projects that are alternatives of or similar to Amazing Feature Engineering

Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+10.09%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, feature-extraction, feature-engineering, data-visualization
Data Science Resources
πŸ‘¨πŸ½β€πŸ«You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?πŸ”‹
Stars: ✭ 171 (-21.56%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, data-mining, data-visualization
My Journey In The Data Science World
πŸ“’ Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+438.99%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, feature-extraction, data-visualization
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+595.41%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, scikit-learn, data-visualization
Pydataroad
open source for wechat-official-account (ID: PyDataLab)
Stars: ✭ 302 (+38.53%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, data-mining, data-visualization
Datasist
A Python library for easy data analysis, visualization, exploration and modeling
Stars: ✭ 123 (-43.58%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, feature-engineering, data-visualization
Cookbook 2nd Code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (+148.17%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, data-mining, data-visualization
Model Describer
model-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-89.91%)
Mutual labels:  data-science, data-analysis, data-mining, scikit-learn, data-visualization
Cookbook 2nd
IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (+222.94%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, data-mining, data-visualization
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-83.94%)
Mutual labels:  jupyter-notebook, data-analysis, data-mining, feature-engineering, data-visualization
Ml Workspace
πŸ›  All-in-one web-based IDE specialized for machine learning and data science.
Stars: ✭ 2,337 (+972.02%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, scikit-learn, data-visualization
Hyperlearn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+452.29%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, scikit-learn
Dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Stars: ✭ 1,238 (+467.89%)
Mutual labels:  data-science, data-analysis, data-mining, data-visualization
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (-27.98%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering, data-visualization
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (-48.17%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, data-visualization
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-60.55%)
Mutual labels:  jupyter-notebook, data-science, feature-extraction, feature-engineering
Seaborn Tutorial
This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-47.71%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, data-visualization
Ds and ml projects
Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-74.31%)
Mutual labels:  jupyter-notebook, data-science, scikit-learn, data-visualization
Pbpython
Code, Notebooks and Examples from Practical Business Python
Stars: ✭ 1,724 (+690.83%)
Mutual labels:  jupyter-notebook, data-analysis, scikit-learn, data-visualization
Dtale
Visualizer for pandas data structures
Stars: ✭ 2,864 (+1213.76%)
Mutual labels:  jupyter-notebook, data-science, data-analysis, data-visualization

Feature Engineering & Feature Selection

A comprehensive guide [pdf] [markdown] for Feature Engineering and Feature Selection, with implementations and examples in Python.

Motivation

Feature Engineering & Selection is the most essential part of building a useable machine learning project, even though hundreds of cutting-edge machine learning algorithms coming in these days like deep learning and transfer learning. Indeed, like what Prof Domingos, the author of  'The Master Algorithm' says:

β€œAt the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.”

β€” Prof. Pedro Domingos

001

Data and feature has the most impact on a ML project and sets the limit of how well we can do, while models and algorithms are just approaching that limit. However, few materials could be found that systematically introduce the art of feature engineering, and even fewer could explain the rationale behind. This repo is my personal notes from learning ML and serves as a reference for Feature Engineering & Selection.

Download

Download the PDF here:

Same, but in markdown:

PDF has a much readable format, while Markdown has auto-generated anchor link to navigate from outer source. GitHub sucks at displaying markdown with complex grammar, so I would suggest read the PDF or download the repo and read markdown with Typora.

What You'll Learn

Not only a collection of hands-on functions, but also explanation on Why, How and When to adopt Which techniques of feature engineering in data mining.

  • the nature and risk of data problem we often encounter
  • explanation of the various feature engineering & selection techniques
  • rationale to use it
  • pros & cons of each method
  • code & example

Getting Started

This repo is mainly used as a reference for anyone who are doing feature engineering, and most of the modules are implemented through scikit-learn or its communities.

To run the demos or use the customized function, please download the ZIP file from the repo or just copy-paste any part of the code you find helpful. They should all be very easy to understand.

Required Dependencies:

  • Python 3.5, 3.6 or 3.7
  • numpy>=1.15
  • pandas>=0.23
  • scipy>=1.1.0
  • scikit_learn>=0.20.1
  • seaborn>=0.9.0

Table of Contents and Code Examples

Below is a list of methods currently implemented in the repo.

1. Data Exploration

2. Feature Cleaning

3. Feature Engineering

4. Feature Selection

Key Links and Resources

  • Udemy's Feature Engineering online course

https://www.udemy.com/feature-engineering-for-machine-learning/

  • Udemy's Feature Selection online course

https://www.udemy.com/feature-selection-for-machine-learning

  • JMLR Special Issue on Variable and Feature Selection

http://jmlr.org/papers/special/feature03.html

  • Data Analysis Using Regression and Multilevel/Hierarchical Models, Chapter 25: Missing data

http://www.stat.columbia.edu/~gelman/arm/missing.pdf

  • Data mining and the impact of missing data

http://core.ecu.edu/omgt/krosj/IMDSDataMining2003.pdf

  • PyOD: A Python Toolkit for Scalable Outlier Detection

https://github.com/yzhao062/pyod

  • Weight of Evidence (WoE) Introductory Overview

http://documentation.statsoft.com/StatisticaHelp.aspx?path=WeightofEvidence/WeightofEvidenceWoEIntroductoryOverview

  • About Feature Scaling and Normalization

http://sebastianraschka.com/Articles/2014_about_feature_scaling.html

  • Feature Generation with RF, GBDT and Xgboost

https://blog.csdn.net/anshuai_aw1/article/details/82983997

  • A review of feature selection methods with applications

https://ieeexplore.ieee.org/iel7/7153596/7160221/07160458.pdf

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].