All Projects → smazzanti → mrmr

smazzanti / mrmr

Licence: GPL-3.0 license
mRMR (minimum-Redundancy-Maximum-Relevance) for automatic feature selection at scale.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to mrmr

Reinforcement-Learning-Feature-Selection
Feature selection for maximizing expected cumulative reward
Stars: ✭ 27 (-84.12%)
Mutual labels:  feature-selection
fuseml
FuseML aims to provide an MLOps framework as the medium dynamically integrating together the AI/ML tools of your choice. It's an extensible tool built through collaboration, where Data Engineers and DevOps Engineers can come together and contribute with reusable integration code.
Stars: ✭ 73 (-57.06%)
Mutual labels:  mlops
GraphOfDocs
GraphOfDocs: Representing multiple documents as a single graph
Stars: ✭ 13 (-92.35%)
Mutual labels:  feature-selection
hub
Public reusable components for Polyaxon
Stars: ✭ 8 (-95.29%)
Mutual labels:  mlops
scene-recognition-pytorch1.x
Evaluate wandb, tensorboard, neptune, mlflow, etc
Stars: ✭ 37 (-78.24%)
Mutual labels:  mlops
monai-deploy
MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.
Stars: ✭ 56 (-67.06%)
Mutual labels:  mlops
domino-research
Projects developed by Domino's R&D team
Stars: ✭ 74 (-56.47%)
Mutual labels:  mlops
L0Learn
Efficient Algorithms for L0 Regularized Learning
Stars: ✭ 74 (-56.47%)
Mutual labels:  feature-selection
mltrace
Coarse-grained lineage and tracing for machine learning pipelines.
Stars: ✭ 449 (+164.12%)
Mutual labels:  mlops
serving-pytorch-models
Serving PyTorch models with TorchServe 🔥
Stars: ✭ 91 (-46.47%)
Mutual labels:  mlops
VickyBytes
Subscribe to this GitHub repo to access the latest tech talks, tech demos, learning materials & modules, and developer community updates!
Stars: ✭ 48 (-71.76%)
Mutual labels:  mlops
Boostcamp-AI-Tech-Product-Serving
[Machine Learning Engineer Basic Guide] 부스트캠프 AI Tech - Product Serving 자료
Stars: ✭ 280 (+64.71%)
Mutual labels:  mlops
qaboard
Algorithm engineering is hard enough: don't spend your time with logistics. QA-Board organizes your runs and lets you visualize, compare and share results.
Stars: ✭ 48 (-71.76%)
Mutual labels:  mlops
aml-registermodel
GitHub Action that allows you to register models to your Azure Machine Learning Workspace.
Stars: ✭ 14 (-91.76%)
Mutual labels:  mlops
Ball
Statistical Inference and Sure Independence Screening via Ball Statistics
Stars: ✭ 22 (-87.06%)
Mutual labels:  feature-selection
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-83.53%)
Mutual labels:  feature-selection
mlops-case-study
MLOps Case Study
Stars: ✭ 23 (-86.47%)
Mutual labels:  mlops
msda
Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector
Stars: ✭ 80 (-52.94%)
Mutual labels:  feature-selection
chitra
A multi-functional library for full-stack Deep Learning. Simplifies Model Building, API development, and Model Deployment.
Stars: ✭ 210 (+23.53%)
Mutual labels:  mlops
skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (-87.06%)
Mutual labels:  feature-selection

drawing

What is mRMR

mRMR, which stands for "minimum Redundancy - Maximum Relevance", is a feature selection algorithm.

Why is it unique

The peculiarity of mRMR is that it is a minimal-optimal feature selection algorithm.
This means it is designed to find the smallest relevant subset of features for a given Machine Learning task.

Selecting the minimum number of useful features is desirable for many reasons:

  • memory consumption,
  • time required,
  • performance,
  • explainability of results.

This is why a minimal-optimal method such as mrmr is often preferable.

On the contrary, the majority of other methods (for instance, Boruta or Positive-Feature-Importance) are classified as all-relevant, since they identify all the features that have some kind of relationship with the target variable.

When to use mRMR

Due to its efficiency, mRMR is ideal for practical ML applications, where it is necessary to perform feature selection frequently and automatically, in a relatively small amount of time.

For instance, in 2019, Uber engineers published a paper describing how they implemented mRMR in their marketing machine learning platform Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform.

How to install this package

You can install this package in your environment via pip:

pip install mrmr_selection

And then import it in Python through:

import mrmr

How to use this package

This package is designed to do mMRM selection through different tools, depending on your needs and constraints.

Currently, the following tools are supported (others will be added):

  • Pandas (in-memory)
  • Spark
  • Google BigQuery

The package has a module for each supported tool. Each module has at least these 2 functions:

  • mrmr_classif, for feature selection when the target variable is categorical (binary or multiclass).
  • mrmr_regression, for feature selection when the target variable is numeric.

Let's see some examples.

1. Pandas example

You have a Pandas DataFrame (X) and a Series which is your target variable (y). You want to select the best K features to make predictions on y.

# create some pandas data
from sklearn.datasets import make_classification
X, y = make_classification(n_samples = 1000, n_features = 50, n_informative = 10, n_redundant = 40)
X = pd.DataFrame(X)
y = pd.Series(y)

# select top 10 features using mRMR
from mrmr import mrmr_classif
selected_features = mrmr_classif(X=X, y=y, K=10)

Note: the output of mrmr_classif is a list containing K selected features. This is a ranking, therefore, if you want to make a further selection, take the first elements of this list.

2. Spark example

# create some spark data
import pyspark
session = pyspark.sql.SparkSession(pyspark.context.SparkContext())
data = [(1.0, 1.0, 1.0, 7.0, 1.5, -2.3), 
        (2.0, float('NaN'), 2.0, 7.0, 8.5, 6.7), 
        (2.0, float('NaN'), 3.0, 7.0, -2.3, 4.4),
        (3.0, 4.0, 3.0, 7.0, 0.0, 0.0),
        (4.0, 5.0, 4.0, 7.0, 12.1, -5.2)]
columns = ["target", "some_null", "feature", "constant", "other_feature", "another_feature"]
df_spark = session.createDataFrame(data=data, schema=columns)

# select top 2 features using mRMR
import mrmr
selected_features = mrmr.spark.mrmr_regression(df=df_spark, target_column="target", K=2)

3. Google BigQuery example

# initialize BigQuery client
from google.cloud.bigquery import Client
bq_client = Client(credentials=your_credentials)

# select top 20 features using mRMR
import mrmr
selected_features = mrmr.bigquery.mrmr_regression(
    bq_client=bq_client,
    table_id='bigquery-public-data.covid19_open_data.covid19_open_data',
    target_column='new_deceased',
    K=20
)

Reference

For an easy-going introduction to mRMR, read my article on Towards Data Science: “MRMR” Explained Exactly How You Wished Someone Explained to You.

Also, this article describes an example of mRMR used on the world famous MNIST dataset: Feature Selection: How To Throw Away 95% of Your Data and Get 95% Accuracy.

mRMR was born in 2003, this is the original paper: Minimum Redundancy Feature Selection From Microarray Gene Expression Data.

Since then, it has been used in many practical applications, due to its simplicity and effectiveness. For instance, in 2019, Uber engineers published a paper describing how they implemented MRMR in their marketing machine learning platform Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].