All Projects → abhayspawar → Featexp

abhayspawar / Featexp

Licence: mit
Feature exploration for supervised learning

Projects that are alternatives of or similar to Featexp

Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-87.5%)
Mutual labels:  jupyter-notebook, data-science, feature-engineering
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (-65.12%)
Mutual labels:  jupyter-notebook, data-science, feature-engineering
Datasist
A Python library for easy data analysis, visualization, exploration and modeling
Stars: ✭ 123 (-82.12%)
Mutual labels:  jupyter-notebook, data-science, feature-engineering
Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (-50%)
Mutual labels:  jupyter-notebook, data-science, feature-engineering
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-68.31%)
Mutual labels:  jupyter-notebook, data-science, feature-engineering
Open source demos
A collection of demos showcasing automated feature engineering and machine learning in diverse use cases
Stars: ✭ 391 (-43.17%)
Mutual labels:  jupyter-notebook, data-science, feature-engineering
Data Science Portfolio
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Stars: ✭ 559 (-18.75%)
Mutual labels:  jupyter-notebook, data-science
Data Analysis And Machine Learning Projects
Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Stars: ✭ 5,166 (+650.87%)
Mutual labels:  jupyter-notebook, data-science
Sigma coding youtube
This is a collection of all the code that can be found on my YouTube channel Sigma Coding.
Stars: ✭ 611 (-11.19%)
Mutual labels:  jupyter-notebook, data-science
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (-7.99%)
Mutual labels:  jupyter-notebook, data-science
Feature Selection
Features selector based on the self selected-algorithm, loss function and validation method
Stars: ✭ 534 (-22.38%)
Mutual labels:  data-science, feature-engineering
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+722.09%)
Mutual labels:  jupyter-notebook, data-science
Zero To Mastery Ml
All course materials for the Zero to Mastery Machine Learning and Data Science course.
Stars: ✭ 631 (-8.28%)
Mutual labels:  jupyter-notebook, data-science
Probabilistic Programming And Bayesian Methods For Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
Stars: ✭ 23,912 (+3375.58%)
Mutual labels:  jupyter-notebook, data-science
Cookbook 2nd Code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (-21.37%)
Mutual labels:  jupyter-notebook, data-science
Datasets For Recommender Systems
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Stars: ✭ 564 (-18.02%)
Mutual labels:  jupyter-notebook, data-science
Intro To Python
An intro to Python & programming for wanna-be data scientists
Stars: ✭ 536 (-22.09%)
Mutual labels:  jupyter-notebook, data-science
Fastai2
Temporary home for fastai v2 while it's being developed
Stars: ✭ 630 (-8.43%)
Mutual labels:  jupyter-notebook, data-science
Tsfresh
Automatic extraction of relevant features from time series:
Stars: ✭ 6,077 (+783.28%)
Mutual labels:  jupyter-notebook, data-science
Featuretools
An open source python library for automated feature engineering
Stars: ✭ 5,891 (+756.25%)
Mutual labels:  data-science, feature-engineering

featexp

Feature exploration for supervised learning. Helps with feature understanding, identifying noisy features, feature debugging, leakage detection and model monitoring.

Installation

pip install featexp

Using featexp

Detailed Medium post on using featexp. Translations from web: Chinese, Russian

featexp draws plots similar to partial dependence plots, but directly from data instead of using a trained model like current implementations of pdp do. Since it draws plots from data directly, it helps with understanding the features well and building better ML models.

from featexp import get_univariate_plots
get_univariate_plots(data=data_train, target_col='target', data_test=data_test, features_list=['DAYS_EMPLOYED'])

# data_test and features_list are optional.
# Draws plots for all columns if features_list not passed
# Draws only train data plots if no test_data passed

Output1 featexp bins a feature into equal population bins and shows the mean value of the dependent variable (target) in each bin. Here's how to read these plots:

  1. The trend plot on the left helps you understand the relationship between target and feature.
  2. Population distribution helps you make sure the feature is correct.
  3. Also, shows the number of trend direction changes and the correlation between train and test trend which can be used to identify noisy features. A high number of trend changes or low trend correlation implies high noise.

Example of a noisy feature: Has low trend correlation Noisy feature

Getting binned feature stats

Returns mean target and population in each bin of a feature

from featexp import univariate_plotter
binned_data_train, binned_data_test = univariate_plotter(data=data_train, target_col='target', feature='DAYS_EMPLOYED', data_test=data_test)
# For only train data
binned_data_train = univariate_plotter(data=data_train, target_col='target', feature='DAYS_EMPLOYED')

Getting stats for all features

Returns trend changes and trend correlation for all features in a dataframe

from featexp import get_trend_stats_feature
stats = get_trend_stats(data=data_train, target_col='target', data_test=data_test)

# data_test is optional. If not passed, trend correlations aren't calculated.

Returns a dataframe with trend changes and trend correlation which can be used for dropping the noisy features, etc. Output1

Leakage detection

It helps with identifying why a feature is leaky which helps with debugging.

Leaky feature Nulls have 0% mean target and 100% mean target in other bins. Implies this feature is populated only for target = 1.

Citing featexp

s

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].