Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ashishpatel26 → Amazing Feature Engineering

ashishpatel26 / Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Labels

jupyter-notebook deep-learning machine-learning data-science data-visualization data-analysis scikit-learn data-mining feature-extraction feature-engineering features

Projects that are alternatives of or similar to Amazing Feature Engineering

Deep Learning Machine Learning Stock

Stock for Deep Learning and Machine Learning

Stars: ✭ 240 (+10.09%)

Mutual labels: jupyter-notebook, data-science, data-analysis, feature-extraction, feature-engineering, data-visualization

Data Science Resources

👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋

Stars: ✭ 171 (-21.56%)

Mutual labels: jupyter-notebook, data-science, data-analysis, data-mining, data-visualization

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+438.99%)

Mutual labels: jupyter-notebook, data-science, data-analysis, feature-extraction, data-visualization

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+595.41%)

Mutual labels: jupyter-notebook, data-science, data-analysis, scikit-learn, data-visualization

Pydataroad

open source for wechat-official-account (ID: PyDataLab)

Stars: ✭ 302 (+38.53%)

Mutual labels: jupyter-notebook, data-science, data-analysis, data-mining, data-visualization

Datasist

A Python library for easy data analysis, visualization, exploration and modeling

Stars: ✭ 123 (-43.58%)

Mutual labels: jupyter-notebook, data-science, data-analysis, feature-engineering, data-visualization

Cookbook 2nd Code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

Stars: ✭ 541 (+148.17%)

Mutual labels: jupyter-notebook, data-science, data-analysis, data-mining, data-visualization

Model Describer

model-describer : Making machine learning interpretable to humans

Stars: ✭ 22 (-89.91%)

Mutual labels: data-science, data-analysis, data-mining, scikit-learn, data-visualization

Cookbook 2nd

IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018

Stars: ✭ 704 (+222.94%)

Mutual labels: jupyter-notebook, data-science, data-analysis, data-mining, data-visualization

Drugs Recommendation Using Reviews

Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.

Stars: ✭ 35 (-83.94%)

Mutual labels: jupyter-notebook, data-analysis, data-mining, feature-engineering, data-visualization

Ml Workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.

Stars: ✭ 2,337 (+972.02%)

Mutual labels: jupyter-notebook, data-science, data-analysis, scikit-learn, data-visualization

Hyperlearn

50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster

Stars: ✭ 1,204 (+452.29%)

Mutual labels: jupyter-notebook, data-science, data-analysis, scikit-learn

Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

Stars: ✭ 1,238 (+467.89%)

Mutual labels: data-science, data-analysis, data-mining, data-visualization

Machine Learning Workflow With Python

This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation

Stars: ✭ 157 (-27.98%)

Mutual labels: jupyter-notebook, feature-extraction, feature-engineering, data-visualization

Pythondata

repo for code published on pythondata.com

Stars: ✭ 113 (-48.17%)

Mutual labels: jupyter-notebook, data-science, data-analysis, data-visualization

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (-60.55%)

Mutual labels: jupyter-notebook, data-science, feature-extraction, feature-engineering

Seaborn Tutorial

This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.

Stars: ✭ 114 (-47.71%)

Mutual labels: jupyter-notebook, data-science, data-analysis, data-visualization

Ds and ml projects

Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.

Stars: ✭ 56 (-74.31%)

Mutual labels: jupyter-notebook, data-science, scikit-learn, data-visualization

Pbpython

Code, Notebooks and Examples from Practical Business Python

Stars: ✭ 1,724 (+690.83%)

Mutual labels: jupyter-notebook, data-analysis, scikit-learn, data-visualization

Dtale

Visualizer for pandas data structures

Stars: ✭ 2,864 (+1213.76%)

Mutual labels: jupyter-notebook, data-science, data-analysis, data-visualization

View All Similar Projects ➔

Feature Engineering & Feature Selection

A comprehensive guide [pdf] [markdown] for Feature Engineering and Feature Selection, with implementations and examples in Python.

Motivation

Feature Engineering & Selection is the most essential part of building a useable machine learning project, even though hundreds of cutting-edge machine learning algorithms coming in these days like deep learning and transfer learning. Indeed, like what Prof Domingos, the author of 'The Master Algorithm' says:

“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.”

— Prof. Pedro Domingos

Data and feature has the most impact on a ML project and sets the limit of how well we can do, while models and algorithms are just approaching that limit. However, few materials could be found that systematically introduce the art of feature engineering, and even fewer could explain the rationale behind. This repo is my personal notes from learning ML and serves as a reference for Feature Engineering & Selection.

Download

Download the PDF here:

PDF Download

Same, but in markdown:

Mark Down Download

PDF has a much readable format, while Markdown has auto-generated anchor link to navigate from outer source. GitHub sucks at displaying markdown with complex grammar, so I would suggest read the PDF or download the repo and read markdown with Typora.

What You'll Learn

Not only a collection of hands-on functions, but also explanation on Why, How and When to adopt Which techniques of feature engineering in data mining.

the nature and risk of data problem we often encounter
explanation of the various feature engineering & selection techniques
rationale to use it
pros & cons of each method
code & example

Getting Started

This repo is mainly used as a reference for anyone who are doing feature engineering, and most of the modules are implemented through scikit-learn or its communities.

To run the demos or use the customized function, please download the ZIP file from the repo or just copy-paste any part of the code you find helpful. They should all be very easy to understand.

Required Dependencies:

Python 3.5, 3.6 or 3.7
numpy>=1.15
pandas>=0.23
scipy>=1.1.0
scikit_learn>=0.20.1
seaborn>=0.9.0

Table of Contents and Code Examples

Below is a list of methods currently implemented in the repo.

1. Data Exploration

1.1 Variables
1.2 Variable Identification
- Check Data Types [guide] [demo]
1.3 Univariate Analysis
- Descriptive Analysis [guide] [demo]
- Discrete Variable Barplot [guide] [demo]
- Discrete Variable Countplot [guide] [demo]
- Discrete Variable Boxplot [guide] [demo]
- Continuous Variable Distplot [guide] [demo]
1.4 Bi-variate Analysis
- Scatter Plot [guide] [demo]
- Correlation Plot [guide] [demo]
- Heat Map [guide] [demo]

2. Feature Cleaning

2.1 Missing Values
- Missing Value Check [guide] [demo]
- Listwise Deletion [guide] [demo]
- Mean/Median/Mode Imputation [guide] [demo]
- End of distribution Imputation [guide] [demo]
- Random Imputation [guide] [demo]
- Arbitrary Value Imputation [guide] [demo]
- Add a variable to denote NA [guide] [demo]
2.2 Outliers
- Detect by Arbitrary Boundary [guide] [demo]
- Detect by Mean & Standard Deviation [guide] [demo]
- Detect by IQR [guide] [demo]
- Detect by MAD [guide] [demo]
- Mean/Median/Mode Imputation [guide] [demo]
- Discretization [guide] [demo]
- Imputation with Arbitrary Value [guide] [demo]
- Windsorization [guide] [demo]
- Discard Outliers [guide] [demo]
2.3 Rare Values
- Mode Imputation [guide] [demo]
- Grouping into One New Category [guide] [demo]
2.4 High Cardinality
- Grouping Labels with Business Understanding [guide]
- Grouping Labels with Rare Occurrence into One Category [guide] [demo]
- Grouping Labels with Decision Tree [guide] [demo]

3. Feature Engineering

3.1 Feature Scaling
- Normalization - Standardization [guide] [demo]
- Min-Max Scaling [guide] [demo]
- Robust Scaling [guide] [demo]
3.2 Discretize
- Equal Width Binning [guide] [demo]
- Equal Frequency Binning [guide] [demo]
- K-means Binning [guide] [demo]
- Discretization by Decision Trees [guide] [demo]
- ChiMerge [guide] [demo]
3.3 Feature Encoding
- One-hot Encoding [guide] [demo]
- Ordinal-Encoding [guide] [demo]
- Count/frequency Encoding [guide]
- Mean Encoding [guide] [demo]
- WOE Encoding [guide] [demo]
- Target Encoding [guide] [demo]
3.4 Feature Transformation
- Logarithmic Transformation [guide] [demo]
- Reciprocal Transformation [guide] [demo]
- Square Root Transformation [guide] [demo]
- Exponential Transformation [guide] [demo]
- Box-cox Transformation [guide] [demo]
- Quantile Transformation [guide] [demo]
3.5 Feature Generation
- Missing Data Derived [guide] [demo]
- Simple Stats [guide]
- Crossing [guide]
- Ratio & Proportion [guide]
- Cross Product [guide]
- Polynomial [guide] [demo]
- Feature Learning by Tree [guide] [demo]
- Feature Learning by Deep Network [guide]

4. Feature Selection

4.1 Filter Method
- Variance [guide] [demo]
- Correlation [guide] [demo]
- Chi-Square [guide] [demo]
- Mutual Information Filter [guide] [demo]
- Information Value (IV) [guide]
4.2 Wrapper Method
- Forward Selection [guide] [demo]
- Backward Elimination [guide] [demo]
- Exhaustive Feature Selection [guide] [demo]
- Genetic Algorithm [guide]
4.3 Embedded Method
- Lasso (L1) [guide] [demo]
- Random Forest Importance [guide] [demo]
- Gradient Boosted Trees Importance [guide] [demo]
4.4 Feature Shuffling
- Random Shuffling [guide] [demo]
4.5 Hybrid Method
- Recursive Feature Selection [guide] [demo]
- Recursive Feature Addition [guide] [demo]

Key Links and Resources

Udemy's Feature Engineering online course

https://www.udemy.com/feature-engineering-for-machine-learning/

Udemy's Feature Selection online course

https://www.udemy.com/feature-selection-for-machine-learning

JMLR Special Issue on Variable and Feature Selection

http://jmlr.org/papers/special/feature03.html

Data Analysis Using Regression and Multilevel/Hierarchical Models, Chapter 25: Missing data

http://www.stat.columbia.edu/~gelman/arm/missing.pdf

Data mining and the impact of missing data

http://core.ecu.edu/omgt/krosj/IMDSDataMining2003.pdf

PyOD: A Python Toolkit for Scalable Outlier Detection

https://github.com/yzhao062/pyod

Weight of Evidence (WoE) Introductory Overview

http://documentation.statsoft.com/StatisticaHelp.aspx?path=WeightofEvidence/WeightofEvidenceWoEIntroductoryOverview

About Feature Scaling and Normalization

http://sebastianraschka.com/Articles/2014_about_feature_scaling.html

Feature Generation with RF, GBDT and Xgboost

https://blog.csdn.net/anshuai_aw1/article/details/82983997

A review of feature selection methods with applications

https://ieeexplore.ieee.org/iel7/7153596/7160221/07160458.pdf

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 218

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗