All Projects → ankit-ai → Bertqa Attention On Steroids

ankit-ai / Bertqa Attention On Steroids

Licence: apache-2.0
BertQA - Attention on Steroids

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bertqa Attention On Steroids

Jddc solution 4th
2018-JDDC大赛第4名的解决方案
Stars: ✭ 235 (+109.82%)
Mutual labels:  jupyter-notebook, attention, qa, transformer
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+2951.79%)
Mutual labels:  jupyter-notebook, attention, transformer
Nlp Tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Stars: ✭ 9,895 (+8734.82%)
Mutual labels:  jupyter-notebook, attention, transformer
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+143.75%)
Mutual labels:  jupyter-notebook, dataset, nlp-machine-learning
Dab
Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (+162.5%)
Mutual labels:  jupyter-notebook, nlp-machine-learning, transformer
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (-42.86%)
Mutual labels:  jupyter-notebook, attention, transformer
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (+266.96%)
Mutual labels:  jupyter-notebook, attention, transformer
Scientificsummarizationdatasets
Datasets I have created for scientific summarization, and a trained BertSum model
Stars: ✭ 100 (-10.71%)
Mutual labels:  jupyter-notebook, dataset, transformer
Symbolic Musical Datasets
🎹 symbolic musical datasets
Stars: ✭ 79 (-29.46%)
Mutual labels:  jupyter-notebook, dataset
Attention Transfer
Improving Convolutional Networks via Attention Transfer (ICLR 2017)
Stars: ✭ 1,231 (+999.11%)
Mutual labels:  jupyter-notebook, attention
Smiles Transformer
Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al.
Stars: ✭ 86 (-23.21%)
Mutual labels:  jupyter-notebook, transformer
Machine Learning
My Attempt(s) In The World Of ML/DL....
Stars: ✭ 78 (-30.36%)
Mutual labels:  jupyter-notebook, attention
Cubicasa5k
CubiCasa5k floor plan dataset
Stars: ✭ 98 (-12.5%)
Mutual labels:  jupyter-notebook, dataset
Openml R
R package to interface with OpenML
Stars: ✭ 81 (-27.68%)
Mutual labels:  jupyter-notebook, dataset
Raccoon dataset
The dataset is used to train my own raccoon detector and I blogged about it on Medium
Stars: ✭ 1,177 (+950.89%)
Mutual labels:  jupyter-notebook, dataset
Njunmt Tf
An open-source neural machine translation system developed by Natural Language Processing Group, Nanjing University.
Stars: ✭ 97 (-13.39%)
Mutual labels:  attention, transformer
Indonesian Language Models
Indonesian Language Models and its Usage
Stars: ✭ 64 (-42.86%)
Mutual labels:  jupyter-notebook, transformer
Body reconstruction references
Paper, dataset and code collection on human body reconstruction
Stars: ✭ 96 (-14.29%)
Mutual labels:  dataset, code
Objectron
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes
Stars: ✭ 1,352 (+1107.14%)
Mutual labels:  jupyter-notebook, dataset
Fma
FMA: A Dataset For Music Analysis
Stars: ✭ 1,391 (+1141.96%)
Mutual labels:  jupyter-notebook, dataset

BertQA - Attention on Steroids

Developers - Ankit Chadha ([email protected]) Rewa Sood ([email protected])


This repository is based off Hugging face's PyTorch BERT implementation

This was done as part of CS224n: Natural Language Processing with Deep Learning - Stanford / Winter 2019 class project. At the time of submission, we were #1 on the class's SQuAD Leaderboard.



Abstract


In this work, we extend the Bidirectional Encoder Representations from Transformers (BERT) with an emphasis on directed coattention to obtain an improved F1 performance on the SQUAD2.0 dataset. The Transformer architecture on which BERT is based places hierarchical global attention on the concatenation of the context and query. Our additions to the BERT architecture augment this attention with a more focused context to query and query to context attention via a set of modified Transformer encoder units. In addition, we explore adding convolution based feature extraction within the coattention architecture to add localized information to self-attention. The base BERT architecture with no SQUAD2.0 specific finetuning produces results with an F1 of 74. We found that coattention significantly improves the no answer F1 by 4 points while causing a loss in the has answer F1 score by the same amount. After adding skip connections the no answer F1 improved further without causing an additional loss in has answer F1. The addition of localized feature extraction added to attention produced the best results with an overall dev F1 of 77.03 due to a marked improvement in the has answer F1 score. We applied our findings to the large BERT model which contains twice as many layers and further used our own augmented version of the SQUAD 2.0 dataset created by back translation. Finaly, we performed hyperparameter tuning and ensembled our best models for a final F1/EM of 82.148/79.239 (Attention on Steroids, PCE Test Leaderboard).

Neural Architecture


Here is an overview of our network architecture BERTQA

Dataset (SQuAD 2.Q)


We use an augmented version of the SQuAD 2.0 dataset based on the concept of Back Translation. You can download the dataset here.

To read more on the process of Back Translation you can refer this resource

Command Lines


This repository has command line bash files with the optimal hyperparameters our network was tuned for.

1. Sanity Check 
#Launch a debug run on 1 example out of the SQuAD 2.0 training set - Beyonce paragraph 
examples/rundbg.sh

2. Train on SQuAD 2.Q
#Fine tunes BERT layers on SQuAD 2.Q and trains additional directed co-attention layers.
run_bertqa_expt.sh

3. Train on SQuAD 2.0
#Fine tunes BERT embedding layers on SQuAD 2.0 and trains additional directed co-attention layers.
examples/run_bertqa.sh

BibTeX


@misc{Stanford-CS224n,
  author = {Chadha,Ankit;Sood,Rewa},
  title = {BertQA - Attention on Steroids},
  year = {2019},
  publisher = {Stanford-CS224n},
  howpublished = {\url{https://github.com/ankit-ai/BertQA-Attention-on-Steroids}}
}

Refer to the paper for more details on our hyperparameters chosen.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].