Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → abhayspawar → Featexp

abhayspawar / Featexp

Licence: mit

Feature exploration for supervised learning

Labels

jupyter-notebook machine-learning data-science visualization feature-engineering

Projects that are alternatives of or similar to Featexp

Kaggle Competitions

There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.

Stars: ✭ 86 (-87.5%)

Mutual labels: jupyter-notebook, data-science, feature-engineering

Deep Learning Machine Learning Stock

Stock for Deep Learning and Machine Learning

Stars: ✭ 240 (-65.12%)

Mutual labels: jupyter-notebook, data-science, feature-engineering

Datasist

A Python library for easy data analysis, visualization, exploration and modeling

Stars: ✭ 123 (-82.12%)

Mutual labels: jupyter-notebook, data-science, feature-engineering

Deltapy

DeltaPy - Tabular Data Augmentation (by @firmai)

Stars: ✭ 344 (-50%)

Mutual labels: jupyter-notebook, data-science, feature-engineering

Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (-68.31%)

Mutual labels: jupyter-notebook, data-science, feature-engineering

Open source demos

A collection of demos showcasing automated feature engineering and machine learning in diverse use cases

Stars: ✭ 391 (-43.17%)

Mutual labels: jupyter-notebook, data-science, feature-engineering

Data Science Portfolio

Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

Stars: ✭ 559 (-18.75%)

Mutual labels: jupyter-notebook, data-science

Data Analysis And Machine Learning Projects

Repository of teaching materials, code, and data for my data analysis and machine learning projects.

Stars: ✭ 5,166 (+650.87%)

Mutual labels: jupyter-notebook, data-science

Sigma coding youtube

This is a collection of all the code that can be found on my YouTube channel Sigma Coding.

Stars: ✭ 611 (-11.19%)

Mutual labels: jupyter-notebook, data-science

Speech Emotion Analyzer

The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)

Stars: ✭ 633 (-7.99%)

Mutual labels: jupyter-notebook, data-science

Feature Selection

Features selector based on the self selected-algorithm, loss function and validation method

Stars: ✭ 534 (-22.38%)

Mutual labels: data-science, feature-engineering

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+722.09%)

Mutual labels: jupyter-notebook, data-science

Zero To Mastery Ml

All course materials for the Zero to Mastery Machine Learning and Data Science course.

Stars: ✭ 631 (-8.28%)

Mutual labels: jupyter-notebook, data-science

Probabilistic Programming And Bayesian Methods For Hackers

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Stars: ✭ 23,912 (+3375.58%)

Mutual labels: jupyter-notebook, data-science

Cookbook 2nd Code

Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]

Stars: ✭ 541 (-21.37%)

Mutual labels: jupyter-notebook, data-science

Datasets For Recommender Systems

This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)

Stars: ✭ 564 (-18.02%)

Mutual labels: jupyter-notebook, data-science

Intro To Python

An intro to Python & programming for wanna-be data scientists

Stars: ✭ 536 (-22.09%)

Mutual labels: jupyter-notebook, data-science

Fastai2

Temporary home for fastai v2 while it's being developed

Stars: ✭ 630 (-8.43%)

Mutual labels: jupyter-notebook, data-science

Tsfresh

Automatic extraction of relevant features from time series:

Stars: ✭ 6,077 (+783.28%)

Mutual labels: jupyter-notebook, data-science

Featuretools

An open source python library for automated feature engineering

Stars: ✭ 5,891 (+756.25%)

Mutual labels: data-science, feature-engineering

View All Similar Projects ➔

featexp

Feature exploration for supervised learning. Helps with feature understanding, identifying noisy features, feature debugging, leakage detection and model monitoring.

Installation

pip install featexp

Using featexp

Detailed Medium post on using featexp. Translations from web: Chinese, Russian

featexp draws plots similar to partial dependence plots, but directly from data instead of using a trained model like current implementations of pdp do. Since it draws plots from data directly, it helps with understanding the features well and building better ML models.

from featexp import get_univariate_plots
get_univariate_plots(data=data_train, target_col='target', data_test=data_test, features_list=['DAYS_EMPLOYED'])

# data_test and features_list are optional.
# Draws plots for all columns if features_list not passed
# Draws only train data plots if no test_data passed

featexp bins a feature into equal population bins and shows the mean value of the dependent variable (target) in each bin. Here's how to read these plots:

The trend plot on the left helps you understand the relationship between target and feature.
Population distribution helps you make sure the feature is correct.
Also, shows the number of trend direction changes and the correlation between train and test trend which can be used to identify noisy features. A high number of trend changes or low trend correlation implies high noise.

Example of a noisy feature: Has low trend correlation

Getting binned feature stats

Returns mean target and population in each bin of a feature

from featexp import univariate_plotter
binned_data_train, binned_data_test = univariate_plotter(data=data_train, target_col='target', feature='DAYS_EMPLOYED', data_test=data_test)
# For only train data
binned_data_train = univariate_plotter(data=data_train, target_col='target', feature='DAYS_EMPLOYED')

Getting stats for all features

Returns trend changes and trend correlation for all features in a dataframe

from featexp import get_trend_stats_feature
stats = get_trend_stats(data=data_train, target_col='target', data_test=data_test)

# data_test is optional. If not passed, trend correlations aren't calculated.

Returns a dataframe with trend changes and trend correlation which can be used for dropping the noisy features, etc.

Leakage detection

It helps with identifying why a feature is leaky which helps with debugging.

Nulls have 0% mean target and 100% mean target in other bins. Implies this feature is populated only for target = 1.

Citing featexp

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 688

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗