All Projects → NVIDIA-Merlin → NVTabular

NVIDIA-Merlin / NVTabular

Licence: Apache-2.0 license
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to NVTabular

exemplary-ml-pipeline
Exemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-97.11%)
Mutual labels:  feature-selection, feature-engineering
featurewiz
Use advanced feature engineering strategies and select best features from your data set with a single line of code.
Stars: ✭ 229 (-71.27%)
Mutual labels:  feature-selection, feature-engineering
dominance-analysis
This package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.
Stars: ✭ 111 (-86.07%)
Mutual labels:  feature-selection, feature-engineering
Graph Networks
A list of interesting graph neural networks (GNN) links with a primary interest in recommendations and tensorflow that is continually updated and refined
Stars: ✭ 192 (-75.91%)
Mutual labels:  recommendation-system, recommender-system
skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (-97.24%)
Mutual labels:  feature-selection, feature-engineering
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (-74.65%)
Mutual labels:  recommendation-system, recommender-system
Market-Mix-Modeling
Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
Stars: ✭ 31 (-96.11%)
Mutual labels:  feature-selection, feature-engineering
Movie Recommender System
Basic Movie Recommendation Web Application using user-item collaborative filtering.
Stars: ✭ 85 (-89.34%)
Mutual labels:  recommendation-system, recommender-system
AutoTS
Automated Time Series Forecasting
Stars: ✭ 665 (-16.56%)
Mutual labels:  preprocessing, feature-engineering
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-96.49%)
Mutual labels:  feature-selection, feature-engineering
Rival
RiVal recommender system evaluation toolkit
Stars: ✭ 145 (-81.81%)
Mutual labels:  recommendation-system, recommender-system
retailbox
🛍️RetailBox - eCommerce Recommender System using Machine Learning
Stars: ✭ 32 (-95.98%)
Mutual labels:  recommendation-system, recommender-system
Movielens Recommender System Javascript
🍃 Recommender System in JavaScript for the MovieLens Database
Stars: ✭ 105 (-86.83%)
Mutual labels:  recommendation-system, recommender-system
Recommendersystem Dataset
This repository contains some datasets that I have collected in Recommender Systems.
Stars: ✭ 249 (-68.76%)
Mutual labels:  recommendation-system, recommender-system
Recsys
计算广告/推荐系统/机器学习(Machine Learning)/点击率(CTR)/转化率(CVR)预估/点击率预估
Stars: ✭ 1,350 (+69.39%)
Mutual labels:  recommendation-system, recommender-system
50-days-of-Statistics-for-Data-Science
This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.
Stars: ✭ 19 (-97.62%)
Mutual labels:  feature-selection, feature-engineering
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-95.61%)
Mutual labels:  recommendation-system, feature-engineering
Tensorrec
A TensorFlow recommendation algorithm and framework in Python.
Stars: ✭ 1,130 (+41.78%)
Mutual labels:  recommendation-system, recommender-system
feature engine
Feature engineering package with sklearn like functionality
Stars: ✭ 758 (-4.89%)
Mutual labels:  feature-selection, feature-engineering
msda
Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector
Stars: ✭ 80 (-89.96%)
Mutual labels:  feature-selection, feature-engineering

NVTabular

PyPI LICENSE Documentation

NVTabular is a feature engineering and preprocessing library for tabular data that is designed to easily manipulate terabyte scale datasets and train deep learning (DL) based recommender systems. It provides high-level abstraction to simplify code and accelerates computation on the GPU using the RAPIDS Dask-cuDF library.

NVTabular is a component of NVIDIA Merlin, an open source framework for building and deploying recommender systems and works with the other Merlin components including Merlin Models, HugeCTR and Merlin Systems to provide end-to-end acceleration of recommender systems on the GPU. Extending beyond model training, with NVIDIA’s Triton Inference Server, the feature engineering and preprocessing steps performed on the data during training can be automatically applied to incoming data during inference.

Benefits

When training DL recommender systems, data scientists and machine learning (ML) engineers have been faced with the following challenges:

  • Huge Datasets: Commercial recommenders are trained on huge datasets that may be several terabytes in scale.
  • Complex Data Feature Engineering and Preprocessing Pipelines: Datasets need to be preprocessed and transformed so that they can be used with DL models and frameworks. In addition, feature engineering creates an extensive set of new features from existing ones, requiring multiple iterations to arrive at an optimal solution.
  • Input Bottleneck: Data loading, if not well optimized, can be the slowest part of the training process, leading to under-utilization of high-throughput computing devices such as GPUs.
  • Extensive Repeated Experimentation: The entire data engineering, training, and evaluation process can be repetitious and time consuming, requiring significant computational resources.

NVTabular alleviates these challenges and helps data scientists and ML engineers:

  • process datasets that exceed GPU and CPU memory without having to worry about scale.
  • focus on what to do with the data and not how to do it by using abstraction at the operation level.
  • prepare datasets quickly and easily for experimentation so that more models can be trained.
  • deploy models into production by providing faster dataset transformation

Learn more in the NVTabular core features documentation.

Performance

When running NVTabular on the Criteo 1TB Click Logs Dataset using a single V100 32GB GPU, feature engineering and preprocessing was able to be completed in 13 minutes. Furthermore, when running NVTabular on a DGX-1 cluster with eight V100 GPUs, feature engineering and preprocessing was able to be completed within three minutes. Combined with HugeCTR, the dataset can be processed and a full model can be trained in only six minutes.

The performance of the Criteo DRLM workflow also demonstrates the effectiveness of the NVTabular library. The original ETL script provided in Numpy took over five days to complete. Combined with CPU training, the total iteration time is over one week. By optimizing the ETL code in Spark and running on a DGX-1 equivalent cluster, the time to complete feature engineering and preprocessing was reduced to three hours. Meanwhile, training was completed in one hour.

Installation

NVTabular requires Python version 3.7+. Additionally, GPU support requires:

  • CUDA version 11.0+
  • NVIDIA Pascal GPU or later (Compute Capability >=6.0)
  • NVIDIA driver 450.80.02+
  • Linux or WSL

Installing NVTabular Using Conda

NVTabular can be installed with Anaconda from the nvidia channel by running the following command:

conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.2

Installing NVTabular Using Pip

NVTabular can be installed with pip by running the following command:

pip install nvtabular

Installing NVTabular with Pip causes NVTabular to run on the CPU only and might require installing additional dependencies manually. When you run NVTabular in one of our Docker containers, the dependencies are already installed.

Installing NVTabular with Docker

NVTabular Docker containers are available in the NVIDIA Merlin container repository. The following table summarizes the key information about the containers:

Container Name Container Location Functionality
merlin-hugectr https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr NVTabular, HugeCTR, and Triton Inference
merlin-tensorflow https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow NVTabular, Tensorflow and Triton Inference
merlin-pytorch https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-pytorch NVTabular, PyTorch, and Triton Inference

To use these Docker containers, you'll first need to install the NVIDIA Container Toolkit to provide GPU support for Docker. You can use the NGC links referenced in the table above to obtain more information about how to launch and run these containers. To obtain more information about the software and model versions that NVTabular supports per container, see Support Matrix.

Notebook Examples and Tutorials

We provide a collection of examples, use cases, and tutorials as Jupyter notebooks covering:

  • Feature engineering and preprocessing with NVTabular
  • Advanced workflows with NVTabular
  • Scaling to multi-GPU and multi-node systems
  • Integrating NVTabular with HugeCTR
  • Deploying to inference with Triton

Feedback and Support

If you'd like to contribute to the library directly, see the Contributing.md. We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this survey.

If you're interested in learning more about how NVTabular works, see our NVTabular documentation. We also have API documentation that outlines the specifics of the available calls within the library.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].