All Projects β†’ IBM β†’ TabFormer

IBM / TabFormer

Licence: Apache-2.0 license
Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TabFormer

NLP-paper
🎨 🎨NLP θ‡ͺ焢语言倄理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-89%)
Mutual labels:  transformer, gpt, bert
FasterTransformer
Transformer related optimization, including BERT, GPT
Stars: ✭ 1,571 (+651.67%)
Mutual labels:  transformer, gpt, bert
Bert Pytorch
Google AI 2018 BERT pytorch implementation
Stars: ✭ 4,642 (+2121.05%)
Mutual labels:  transformer, bert
Nlp Tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Stars: ✭ 9,895 (+4634.45%)
Mutual labels:  transformer, bert
saint
The official PyTorch implementation of recent paper - SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
Stars: ✭ 209 (+0%)
Mutual labels:  tabular-data, transformer
bert-as-a-service TFX
End-to-end pipeline with TFX to train and deploy a BERT model for sentiment analysis.
Stars: ✭ 32 (-84.69%)
Mutual labels:  transformer, bert
SIGIR2021 Conure
One Person, One Model, One World: Learning Continual User Representation without Forgetting
Stars: ✭ 23 (-89%)
Mutual labels:  transformer, bert
Bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Stars: ✭ 3,443 (+1547.37%)
Mutual labels:  transformer, bert
are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
Stars: ✭ 128 (-38.76%)
Mutual labels:  transformer, bert
vietnamese-roberta
A Robustly Optimized BERT Pretraining Approach for Vietnamese
Stars: ✭ 22 (-89.47%)
Mutual labels:  transformer, bert
DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-86.12%)
Mutual labels:  bert, huggingface
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-91.39%)
Mutual labels:  transformer, huggingface
text-generation-transformer
text generation based on transformer
Stars: ✭ 36 (-82.78%)
Mutual labels:  transformer, bert
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-89.47%)
Mutual labels:  transformer, bert
bert in a flask
A dockerized flask API, serving ALBERT and BERT predictions using TensorFlow 2.0.
Stars: ✭ 32 (-84.69%)
Mutual labels:  transformer, bert
semantic-document-relations
Implementation, trained models and result data for the paper "Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles"
Stars: ✭ 21 (-89.95%)
Mutual labels:  transformer, bert
Transformers
πŸ€— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+26570.81%)
Mutual labels:  transformer, bert
les-military-mrc-rank7
θŽ±ζ–―ζ―οΌšε…¨ε›½η¬¬δΊŒε±Šβ€œε†›δΊ‹ζ™Ίθƒ½ζœΊε™¨ι˜…θ―»β€ζŒ‘ζˆ˜θ΅› - Rank7 θ§£ε†³ζ–Ήζ‘ˆ
Stars: ✭ 37 (-82.3%)
Mutual labels:  transformer, bert
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: ✭ 39 (-81.34%)
Mutual labels:  transformer, bert
zero-administration-inference-with-aws-lambda-for-hugging-face
Zero administration inference with AWS Lambda for πŸ€—
Stars: ✭ 19 (-90.91%)
Mutual labels:  transformer, huggingface

Tabular Transformers for Modeling Multivariate Time Series

This repository provides the pytorch source code, and data for tabular transformers (TabFormer). Details are described in the paper Tabular Transformers for Modeling Multivariate Time Series, to be presented at ICASSP 2021.

Summary

  • Modules for hierarchical transformers for tabular data
  • A synthetic credit card transaction dataset
  • Modified Adaptive Softmax for handling masking
  • Modified DataCollatorForLanguageModeling for tabular data
  • The modules are built within transformers from HuggingFace πŸ€—. (HuggingFace is ❀️)

Requirements

  • Python (3.7)
  • Pytorch (1.6.0)
  • HuggingFace / Transformer (3.2.0)
  • scikit-learn (0.23.2)
  • Pandas (1.1.2)

(X) represents the versions which code is tested on.

These can be installed using yaml by running :

conda env create -f setup.yml

Credit Card Transaction Dataset

The synthetic credit card transaction dataset is provided in ./data/credit_card. There are 24M records with 12 fields. You would need git-lfs to access the data. If you are facing issue related to LFS bandwidth, you can use this direct link to access the data. You can then ignore git-lfs files by prefixing GIT_LFS_SKIP_SMUDGE=1 to the git clone .. command.

figure


PRSA Dataset

For PRSA dataset, one have to download the PRSA dataset from Kaggle and place them in ./data/card directory.


Tabular BERT

To train a tabular BERT model on credit card transaction or PRSA dataset run :

$ python main.py --do_train --mlm --field_ce --lm_type bert \
                 --field_hs 64 --data_type [prsa/card] \
                 --output_dir [output_dir]

Tabular GPT2

To train a tabular GPT2 model on credit card transactions for a particular user-id :


$ python main.py --do_train --lm_type gpt2 --field_ce --flatten --data_type card \
                 --data_root [path_to_data] --user_ids [user-id] \
                 --output_dir [output_dir]
    

Description of some options (more can be found in args.py):

  • --data_type choices are prsa and card for Beijing PM2.5 dataset and credit-card transaction dataset respecitively.
  • --mlm for masked language model; option for transformer trainer for BERT
  • --field_hs hidden size for field level transformer
  • --lm_type choices from bert and gpt2
  • --user_ids option to pick only transacations from particular user ids.

Citation

@inproceedings{padhi2021tabular,
  title={Tabular transformers for modeling multivariate time series},
  author={Padhi, Inkit and Schiff, Yair and Melnyk, Igor and Rigotti, Mattia and Mroueh, Youssef and Dognin, Pierre and Ross, Jerret and Nair, Ravi and Altman, Erik},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={3565--3569},
  year={2021},
  organization={IEEE},
  url={https://ieeexplore.ieee.org/document/9414142}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].