All Projects → YC-Coder-Chen → Feature Engineering Handbook

YC-Coder-Chen / Feature Engineering Handbook

A practical feature engineering handbook

Projects that are alternatives of or similar to Feature Engineering Handbook

Open source demos
A collection of demos showcasing automated feature engineering and machine learning in diverse use cases
Stars: ✭ 391 (+116.02%)
Mutual labels:  jupyter-notebook, feature-engineering
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-80.66%)
Mutual labels:  jupyter-notebook, feature-engineering
Feature Engineering And Feature Selection
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Stars: ✭ 526 (+190.61%)
Mutual labels:  jupyter-notebook, feature-engineering
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+32.6%)
Mutual labels:  jupyter-notebook, feature-engineering
The Data Science Workshop
A New, Interactive Approach to Learning Data Science
Stars: ✭ 126 (-30.39%)
Mutual labels:  jupyter-notebook, feature-engineering
Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (+90.06%)
Mutual labels:  jupyter-notebook, feature-engineering
Sgx Full Orderbook Tick Data Trading Strategy
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.
Stars: ✭ 733 (+304.97%)
Mutual labels:  jupyter-notebook, feature-engineering
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (+46.41%)
Mutual labels:  jupyter-notebook, feature-engineering
Datasist
A Python library for easy data analysis, visualization, exploration and modeling
Stars: ✭ 123 (-32.04%)
Mutual labels:  jupyter-notebook, feature-engineering
The Building Data Genome Project
A collection of non-residential buildings for performance analysis and algorithm benchmarking
Stars: ✭ 117 (-35.36%)
Mutual labels:  jupyter-notebook, feature-engineering
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+20.44%)
Mutual labels:  jupyter-notebook, feature-engineering
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (-13.26%)
Mutual labels:  jupyter-notebook, feature-engineering
Featexp
Feature exploration for supervised learning
Stars: ✭ 688 (+280.11%)
Mutual labels:  jupyter-notebook, feature-engineering
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-52.49%)
Mutual labels:  jupyter-notebook, feature-engineering
Ppdai risk evaluation
“魔镜杯”风控算法大赛 拍拍贷风控模型,接近冠军分数
Stars: ✭ 144 (-20.44%)
Mutual labels:  jupyter-notebook, feature-engineering
Autofeat
Linear Prediction Model with Automated Feature Engineering and Selection Capabilities
Stars: ✭ 178 (-1.66%)
Mutual labels:  jupyter-notebook, feature-engineering
Libfm in keras
This notebook shows how to implement LibFM in Keras and how it was used in the Talking Data competition on Kaggle.
Stars: ✭ 181 (+0%)
Mutual labels:  jupyter-notebook
Subpixel
subpixel: A subpixel convnet for super resolution with Tensorflow
Stars: ✭ 2,114 (+1067.96%)
Mutual labels:  jupyter-notebook
Wibd Workshops 2018
Stars: ✭ 181 (+0%)
Mutual labels:  jupyter-notebook
Lets Plot Kotlin
Kotlin API for Lets-Plot - an open-source plotting library for statistical data.
Stars: ✭ 181 (+0%)
Mutual labels:  jupyter-notebook

Feature-Engineering-Handbook

Welcome! This repo provides an interactive and complete practical feature engineering tutorial in Jupyter Notebook. It contains three parts: Data Prepocessing, Feature Selection and Dimension Reduction. Each part is demonstrated separately in one notebook. Since some feature selection algorithms such as Simulated Annealing and Genetic Algorithm lack complete implementation in python, we also provide corresponding python scripts (Simulated Annealing, Genetic Algorithm) and cover them in our tutorial for your reference.

Brief Introduction

Table of Content

  • 1  Data Prepocessing
    • 1.1  Static Continuous Variables
      • 1.1.1  Discretization
        • 1.1.1.1  Binarization
        • 1.1.1.2  Binning
      • 1.1.2  Scaling
        • 1.1.2.1  Stardard Scaling (Z-score standardization)
        • 1.1.2.2  MinMaxScaler (Scale to range)
        • 1.1.2.3  RobustScaler (Anti-outliers scaling)
        • 1.1.2.4  Power Transform (Non-linear transformation)
      • 1.1.3  Normalization
      • 1.1.4  Imputation of missing values
        • 1.1.4.1  Univariate feature imputation
        • 1.1.4.2  Multivariate feature imputation
        • 1.1.4.3  Marking imputed values
      • 1.1.5  Feature Transformation
        • 1.1.5.1  Polynomial Transformation
        • 1.1.5.2  Custom Transformation
    • 1.2  Static Categorical Variables
      • 1.2.1  Ordinal Encoding
      • 1.2.2  One-hot Encoding
      • 1.2.3  Hashing Encoding
      • 1.2.4  Helmert Coding
      • 1.2.5  Sum (Deviation) Coding
      • 1.2.6  Target Encoding
      • 1.2.7  M-estimate Encoding
      • 1.2.8  James-Stein Encoder
      • 1.2.9  Weight of Evidence Encoder
      • 1.2.10  Leave One Out Encoder
      • 1.2.11  Catboost Encoder
    • 1.3  Time Series Variables
      • 1.3.1  Time Series Categorical Features
      • 1.3.2  Time Series Continuous Features
      • 1.3.3  Implementation
        • 1.3.3.1  Create EntitySet
        • 1.3.3.2  Set up cut-time
        • 1.3.3.3  Auto Feature Engineering
  • 2  Feature Selection
    • 2.1  Filter Methods
      • 2.1.1  Univariate Filter Methods
        • 2.1.1.1  Variance Threshold
        • 2.1.1.2  Pearson Correlation (regression problem)
        • 2.1.1.3  Distance Correlation (regression problem)
        • 2.1.1.4  F-Score (regression problem)
        • 2.1.1.5  Mutual Information (regression problem)
        • 2.1.1.6  Chi-squared Statistics (classification problem)
        • 2.1.1.7  F-Score (classification problem)
        • 2.1.1.8  Mutual Information (classification problem)
      • 2.1.2  Multivariate Filter Methods
        • 2.1.2.1  Max-Relevance Min-Redundancy (mRMR)
        • 2.1.2.2  Correlation-based Feature Selection (CFS)
        • 2.1.2.3  Fast Correlation-based Filter (FCBF)
        • 2.1.2.4  ReliefF
        • 2.1.2.5  Spectral Feature Selection (SPEC)
    • 2.2  Wrapper Methods
      • 2.2.1  Deterministic Algorithms
        • 2.2.1.1  Recursive Feature Elimination (SBS)
      • 2.2.2  Randomized Algorithms
        • 2.2.2.1  Simulated Annealing (SA)
        • 2.2.2.2  Genetic Algorithm (GA)
    • 2.3  Embedded Methods
      • 2.3.1  Regulization Based Methods
        • 2.3.1.1  Lasso Regression (Linear Regression with L1 Norm)
        • 2.3.1.2  Logistic Regression (with L1 Norm)
        • 2.3.1.3  LinearSVR/ LinearSVC
      • 2.3.2  Tree Based Methods
  • 3  Dimension Reduction
    • 3.1  Unsupervised Methods
      • 3.1.1  PCA (Principal Components Analysis)
    • 3.2  Supervised Methods
      • 3.2.1  LDA (Linear Discriminant Analysis)

Reference

References have been included in each Jupyter Notebook.

Author

@Yingxiang Chen
@Zihan Yang

Contact

If there are any mistakes, please feel free to reach out and correct us!

Yingxiang Chen E-mail: [email protected]
Zihan Yang E-mai: [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].