All Projects → feature-engine → feature_engine

feature-engine / feature_engine

Licence: BSD-3-Clause License
Feature engineering package with sklearn like functionality

Programming Languages

python
139335 projects - #7 most used programming language
TeX
3793 projects

Projects that are alternatives of or similar to feature engine

featurewiz
Use advanced feature engineering strategies and select best features from your data set with a single line of code.
Stars: ✭ 229 (-69.79%)
Mutual labels:  feature-selection, feature-extraction, feature-engineering
50-days-of-Statistics-for-Data-Science
This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.
Stars: ✭ 19 (-97.49%)
Mutual labels:  feature-selection, feature-extraction, feature-engineering
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-71.24%)
Mutual labels:  scikit-learn, feature-extraction, feature-engineering
skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (-97.1%)
Mutual labels:  scikit-learn, feature-selection, feature-engineering
AutoTabular
Automatic machine learning for tabular data. ⚡🔥⚡
Stars: ✭ 51 (-93.27%)
Mutual labels:  scikit-learn, feature-engineering
exemplary-ml-pipeline
Exemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-96.97%)
Mutual labels:  feature-selection, feature-engineering
dominance-analysis
This package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.
Stars: ✭ 111 (-85.36%)
Mutual labels:  feature-selection, feature-engineering
pyHSICLasso
Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data
Stars: ✭ 125 (-83.51%)
Mutual labels:  feature-selection, feature-extraction
Bike-Sharing-Demand-Kaggle
Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand
Stars: ✭ 33 (-95.65%)
Mutual labels:  feature-extraction, feature-engineering
autoencoders tensorflow
Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Stars: ✭ 66 (-91.29%)
Mutual labels:  feature-extraction, feature-engineering
sklearn-audio-classification
An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP
Stars: ✭ 31 (-95.91%)
Mutual labels:  scikit-learn, feature-engineering
Market-Mix-Modeling
Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
Stars: ✭ 31 (-95.91%)
Mutual labels:  feature-selection, feature-engineering
msda
Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector
Stars: ✭ 80 (-89.45%)
Mutual labels:  feature-selection, feature-engineering
NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Stars: ✭ 797 (+5.15%)
Mutual labels:  feature-selection, feature-engineering
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-96.31%)
Mutual labels:  feature-selection, feature-engineering
tsflex
Flexible time series feature extraction & processing
Stars: ✭ 252 (-66.75%)
Mutual labels:  feature-extraction, feature-engineering
Hyperactive
A hyperparameter optimization and data collection toolbox for convenient and fast prototyping of machine-learning models.
Stars: ✭ 182 (-75.99%)
Mutual labels:  scikit-learn, feature-engineering
adapt
Awesome Domain Adaptation Python Toolbox
Stars: ✭ 46 (-93.93%)
Mutual labels:  scikit-learn, feature-selection
fastknn
Fast k-Nearest Neighbors Classifier for Large Datasets
Stars: ✭ 64 (-91.56%)
Mutual labels:  feature-extraction, feature-engineering
gan tensorflow
Automatic feature engineering using Generative Adversarial Networks using TensorFlow.
Stars: ✭ 48 (-93.67%)
Mutual labels:  feature-extraction, feature-engineering

Feature Engine

PythonVersion License https://github.com/feature-engine/feature_engine/blob/master/LICENSE.md PyPI version Conda https://anaconda.org/conda-forge/feature_engine CircleCI https://app.circleci.com/pipelines/github/feature-engine/feature_engine?branch=1.1.X Documentation Status https://feature-engine.readthedocs.io/en/latest/index.html Join the chat at https://gitter.im/feature_engine/community Sponsorship https://www.trainindata.com/ Downloads Downloads DOI DOI

Feature-engine is a Python library with multiple transformers to engineer and select features for use in machine learning models. Feature-engine's transformers follow Scikit-learn's functionality with fit() and transform() methods to learn the transforming parameters from the data and then transform it.

Feature-engine features in the following resources

Blogs about Feature-engine

En Español

Documentation

Current Feature-engine's transformers include functionality for:

  • Missing Data Imputation
  • Categorical Encoding
  • Discretisation
  • Outlier Capping or Removal
  • Variable Transformation
  • Variable Creation
  • Variable Selection
  • Datetime Feature Extraction
  • Preprocessing
  • Scikit-learn Wrappers

Imputation Methods

  • MeanMedianImputer
  • RandomSampleImputer
  • EndTailImputer
  • AddMissingIndicator
  • CategoricalImputer
  • ArbitraryNumberImputer
  • DropMissingData

Encoding Methods

  • OneHotEncoder
  • OrdinalEncoder
  • CountFrequencyEncoder
  • MeanEncoder
  • WoEEncoder
  • PRatioEncoder
  • RareLabelEncoder
  • DecisionTreeEncoder

Discretisation methods

  • EqualFrequencyDiscretiser
  • EqualWidthDiscretiser
  • DecisionTreeDiscretiser
  • ArbitraryDiscreriser

Outlier Handling methods

  • Winsorizer
  • ArbitraryOutlierCapper
  • OutlierTrimmer

Variable Transformation methods

  • LogTransformer
  • LogCpTransformer
  • ReciprocalTransformer
  • PowerTransformer
  • BoxCoxTransformer
  • YeoJohnsonTransformer

Variable Creation:

  • MathematicalCombination
  • CombineWithReferenceFeature
  • CyclicalTransformer

Feature Selection:

  • DropFeatures
  • DropConstantFeatures
  • DropDuplicateFeatures
  • DropCorrelatedFeatures
  • SmartCorrelationSelection
  • ShuffleFeaturesSelector
  • SelectBySingleFeaturePerformance
  • SelectByTargetMeanPerformance
  • RecursiveFeatureElimination
  • RecursiveFeatureAddition
  • DropHighPSIFeatures

Preprocessing

  • MatchVariables

Wrappers:

  • SklearnTransformerWrapper

Installation

From PyPI using pip:

pip install feature_engine

From Anaconda:

conda install -c conda-forge feature_engine

Or simply clone it:

git clone https://github.com/feature-engine/feature_engine.git

Example Usage

>>> import pandas as pd
>>> from feature_engine.encoding import RareLabelEncoder

>>> data = {'var_A': ['A'] * 10 + ['B'] * 10 + ['C'] * 2 + ['D'] * 1}
>>> data = pd.DataFrame(data)
>>> data['var_A'].value_counts()
Out[1]:
A    10
B    10
C     2
D     1
Name: var_A, dtype: int64
>>> rare_encoder = RareLabelEncoder(tol=0.10, n_categories=3)
>>> data_encoded = rare_encoder.fit_transform(data)
>>> data_encoded['var_A'].value_counts()
Out[2]:
A       10
B       10
Rare     3
Name: var_A, dtype: int64

Find more examples in our Jupyter Notebook Gallery or in the documentation.

Contribute

Details about how to contribute can be found in the Contribute Page

Briefly:

  • Fork the repo
  • Clone your fork into your local computer: git clone https://github.com/<YOURUSERNAME>/feature_engine.git
  • navigate into the repo folder cd feature_engine
  • Install Feature-engine as a developer: pip install -e .
  • Optional: Create and activate a virtual environment with any tool of choice
  • Install Feature-engine dependencies: pip install -r requirements.txt and pip install -r test_requirements.txt
  • Create a feature branch with a meaningful name for your feature: git checkout -b myfeaturebranch
  • Develop your feature, tests and documentation
  • Make sure the tests pass
  • Make a PR

Thank you!!

Documentation

Feature-engine documentation is built using Sphinx and is hosted on Read the Docs.

To build the documentation make sure you have the dependencies installed: from the root directory: pip install -r docs/requirements.txt.

Now you can build the docs using: sphinx-build -b html docs build

License

BSD 3-Clause

Donate

Sponsor the maintainer to support her continue expanding Feature-engine.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].