All Projects → Yimeng-Zhang → Feature Engineering And Feature Selection

Yimeng-Zhang / Feature Engineering And Feature Selection

A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Feature Engineering And Feature Selection

Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-58.56%)
Mutual labels:  jupyter-notebook, data-mining, feature-extraction, feature-engineering
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-93.35%)
Mutual labels:  jupyter-notebook, data-mining, feature-engineering
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-83.65%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (-54.37%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (-34.6%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
The Building Data Genome Project
A collection of non-residential buildings for performance analysis and algorithm benchmarking
Stars: ✭ 117 (-77.76%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (-70.15%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (-49.62%)
Mutual labels:  jupyter-notebook, feature-extraction, feature-engineering
Bike-Sharing-Demand-Kaggle
Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand
Stars: ✭ 33 (-93.73%)
Mutual labels:  feature-extraction, feature-engineering
autoencoders tensorflow
Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Stars: ✭ 66 (-87.45%)
Mutual labels:  feature-extraction, feature-engineering
fastknn
Fast k-Nearest Neighbors Classifier for Large Datasets
Stars: ✭ 64 (-87.83%)
Mutual labels:  feature-extraction, feature-engineering
50-days-of-Statistics-for-Data-Science
This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.
Stars: ✭ 19 (-96.39%)
Mutual labels:  feature-extraction, feature-engineering
tsflex
Flexible time series feature extraction & processing
Stars: ✭ 252 (-52.09%)
Mutual labels:  feature-extraction, feature-engineering
DataCon
🏆DataCon大数据安全分析大赛,2019年方向二(恶意代码检测)冠军源码、2020年方向五(恶意代码分析)季军源码
Stars: ✭ 69 (-86.88%)
Mutual labels:  data-mining, feature-engineering
mistql
A miniature lisp-like language for querying JSON-like structures. Tuned for clientside ML feature extraction.
Stars: ✭ 260 (-50.57%)
Mutual labels:  feature-extraction, feature-engineering
gan tensorflow
Automatic feature engineering using Generative Adversarial Networks using TensorFlow.
Stars: ✭ 48 (-90.87%)
Mutual labels:  feature-extraction, feature-engineering
Lasio
Python library for reading and writing well data using Log ASCII Standard (LAS) files
Stars: ✭ 234 (-55.51%)
Mutual labels:  jupyter-notebook, data-mining
featurewiz
Use advanced feature engineering strategies and select best features from your data set with a single line of code.
Stars: ✭ 229 (-56.46%)
Mutual labels:  feature-extraction, feature-engineering
feature engine
Feature engineering package with sklearn like functionality
Stars: ✭ 758 (+44.11%)
Mutual labels:  feature-extraction, feature-engineering
Pydataroad
open source for wechat-official-account (ID: PyDataLab)
Stars: ✭ 302 (-42.59%)
Mutual labels:  jupyter-notebook, data-mining

Feature Engineering & Feature Selection

A comprehensive guide [pdf] [markdown] for Feature Engineering and Feature Selection, with implementations and examples in Python.

Motivation

Feature Engineering & Selection is the most essential part of building a useable machine learning project, even though hundreds of cutting-edge machine learning algorithms coming in these days like deep learning and transfer learning. Indeed, like what Prof Domingos, the author of  'The Master Algorithm' says:

“At the end of the day, some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.”

— Prof. Pedro Domingos

001

Data and feature has the most impact on a ML project and sets the limit of how well we can do, while models and algorithms are just approaching that limit. However, few materials could be found that systematically introduce the art of feature engineering, and even fewer could explain the rationale behind. This repo is my personal notes from learning ML and serves as a reference for Feature Engineering & Selection.

Download

Download the PDF here:

Same, but in markdown:

PDF has a much readable format, while Markdown has auto-generated anchor link to navigate from outer source. GitHub sucks at displaying markdown with complex grammar, so I would suggest read the PDF or download the repo and read markdown with Typora.

What You'll Learn

Not only a collection of hands-on functions, but also explanation on Why, How and When to adopt Which techniques of feature engineering in data mining.

  • the nature and risk of data problem we often encounter
  • explanation of the various feature engineering & selection techniques
  • rationale to use it
  • pros & cons of each method
  • code & example

Getting Started

This repo is mainly used as a reference for anyone who are doing feature engineering, and most of the modules are implemented through scikit-learn or its communities.

To run the demos or use the customized function, please download the ZIP file from the repo or just copy-paste any part of the code you find helpful. They should all be very easy to understand.

Required Dependencies:

  • Python 3.5, 3.6 or 3.7
  • numpy>=1.15
  • pandas>=0.23
  • scipy>=1.1.0
  • scikit_learn>=0.20.1
  • seaborn>=0.9.0

Table of Contents and Code Examples

Below is a list of methods currently implemented in the repo.

1. Data Exploration

2. Feature Cleaning

3. Feature Engineering

4. Feature Selection

Key Links and Resources

  • Udemy's Feature Engineering online course

https://www.udemy.com/feature-engineering-for-machine-learning/

  • Udemy's Feature Selection online course

https://www.udemy.com/feature-selection-for-machine-learning

  • JMLR Special Issue on Variable and Feature Selection

http://jmlr.org/papers/special/feature03.html

  • Data Analysis Using Regression and Multilevel/Hierarchical Models, Chapter 25: Missing data

http://www.stat.columbia.edu/~gelman/arm/missing.pdf

  • Data mining and the impact of missing data

http://core.ecu.edu/omgt/krosj/IMDSDataMining2003.pdf

  • PyOD: A Python Toolkit for Scalable Outlier Detection

https://github.com/yzhao062/pyod

  • Weight of Evidence (WoE) Introductory Overview

http://documentation.statsoft.com/StatisticaHelp.aspx?path=WeightofEvidence/WeightofEvidenceWoEIntroductoryOverview

  • About Feature Scaling and Normalization

http://sebastianraschka.com/Articles/2014_about_feature_scaling.html

  • Feature Generation with RF, GBDT and Xgboost

https://blog.csdn.net/anshuai_aw1/article/details/82983997

  • A review of feature selection methods with applications

https://ieeexplore.ieee.org/iel7/7153596/7160221/07160458.pdf

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].