All Projects → malllabiisc → Dips

malllabiisc / Dips

Licence: apache-2.0
NAACL 2019: Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dips

text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+218.64%)
Mutual labels:  natural-language-generation, data-augmentation
SGCP
TACL 2020: Syntax-Guided Controlled Generation of Paraphrases
Stars: ✭ 67 (+13.56%)
Mutual labels:  paper, natural-language-generation
Documents
Documentation for Phase 4 Ground
Stars: ✭ 31 (-47.46%)
Mutual labels:  paper
Transferring Gans
ECCV2018
Stars: ✭ 54 (-8.47%)
Mutual labels:  diversity
Worldgeneration
Generating Interactive Fiction worlds from story plots
Stars: ✭ 43 (-27.12%)
Mutual labels:  natural-language-generation
Dlow
Official PyTorch Implementation of "DLow: Diversifying Latent Flows for Diverse Human Motion Prediction". ECCV 2020.
Stars: ✭ 32 (-45.76%)
Mutual labels:  diversity
Contributor covenant
Pledge your respect and appreciation for contributors of all kinds to your open source project.
Stars: ✭ 1,044 (+1669.49%)
Mutual labels:  diversity
Infogan
Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
Stars: ✭ 948 (+1506.78%)
Mutual labels:  paper
Pytorch Classification Uncertainty
This repo contains a PyTorch implementation of the paper: "Evidential Deep Learning to Quantify Classification Uncertainty"
Stars: ✭ 59 (+0%)
Mutual labels:  paper
Ludwig
Data-centric declarative deep learning framework
Stars: ✭ 8,018 (+13489.83%)
Mutual labels:  natural-language-generation
Handwriting recogition using adversarial learning
[CVPR 2019] "Handwriting Recognition in Low-resource Scripts using Adversarial Learning ”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
Stars: ✭ 52 (-11.86%)
Mutual labels:  data-augmentation
Describing a knowledge base
Code for Describing a Knowledge Base
Stars: ✭ 42 (-28.81%)
Mutual labels:  natural-language-generation
Pqg Pytorch
Paraphrase Generation model using pair-wise discriminator loss
Stars: ✭ 33 (-44.07%)
Mutual labels:  natural-language-generation
Convai Baseline
ConvAI baseline solution
Stars: ✭ 49 (-16.95%)
Mutual labels:  natural-language-generation
Essentials
The essential plugin suite for Minecraft servers.
Stars: ✭ 957 (+1522.03%)
Mutual labels:  paper
Multiagent Particle Envs
Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
Stars: ✭ 1,086 (+1740.68%)
Mutual labels:  paper
Nlp xiaojiang
自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text classification), 实体提取(ner,bert+bilstm+crf),数据增强(text augment, data enhance),同义句同义词生成,句子主干提取(mainpart),中文汉语短文本相似度,文本特征工程,keras-http-service调用
Stars: ✭ 954 (+1516.95%)
Mutual labels:  data-augmentation
Neural Architecture Search With Rl
Minimal Tensorflow implementation of the paper "Neural Architecture Search With Reinforcement Learning" presented at ICLR 2017
Stars: ✭ 37 (-37.29%)
Mutual labels:  paper
Style Transfer In Text
Paper List for Style Transfer in Text
Stars: ✭ 1,030 (+1645.76%)
Mutual labels:  paper
Bert In Production
A collection of resources on using BERT (https://arxiv.org/abs/1810.04805 ) and related Language Models in production environments.
Stars: ✭ 58 (-1.69%)
Mutual labels:  paper

Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation

Source code for NAACL 2019 paper: Submodular optimization-based diverse paraphrasing and its effectiveness in data augmentation

Image

  • Overview of DiPS during decoding to generate k paraphrases. At each time step, a set of N sequences V(t) is used to determine k < N sequences (X) via submodular maximization . The above figure illustrates the motivation behind each submodular component. Please see Section 4 in the paper for details.

Dependencies

  • compatible with python 3.6
  • dependencies can be installed using requirements.txt

Dataset

Download the following datasets:

Extract and place them in the data directory. Path : data/<dataset-folder-name>. A sample dataset folder might look like data/quora/<train/test/val>/<src.txt/tgt.txt>.

Setup:

To get the project's source code, clone the github repository:

$ git clone https://github.com/malllabiisc/DiPS

Install VirtualEnv using the following (optional):

$ [sudo] pip install virtualenv

Create and activate your virtual environment (optional):

$ virtualenv -p python3 venv
$ source venv/bin/activate

Install all the required packages:

$ pip install -r requirements.txt

Install the submodopt package by running the following command from the root directory of the repository:

$ cd ./packages/submodopt
$ python setup.py install
$ cd ../../

Training the sequence to sequence model

python -m src.main -mode train -gpu 0 -use_attn -bidirectional -dataset quora -run_name <run_name>

Create dictionary for submodular subset selection. Used for Semantic similarity (L2)

To use trained embeddings -

python -m src.create_dict -model trained -run_name <run_name> -gpu 0

To use pretrained word2vec embeddings -

python -m src.create_dict -model pretrained -run_name <run_name> -gpu 0

This will generate the word2vec.pickle file in data/embeddings

Decoding using submodularity

python -m src.main -mode decode -selec submod -run_name <run_name> -beam_width 10 -gpu 0

Citation

Please cite the following paper if you find this work relevant to your application

@inproceedings{dips2019,
    title = "Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation",
    author = "Kumar, Ashutosh  and
      Bhattamishra, Satwik  and
      Bhandari, Manik  and
      Talukdar, Partha",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N19-1363",
    pages = "3609--3619"
}

For any clarification, comments, or suggestions please create an issue or contact [email protected] or Satwik Bhattamishra

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].