All Projects → jiacheng-ye → kg_one2set

jiacheng-ye / kg_one2set

Licence: other
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to kg one2set

keyphrase-generation-rl
Code for the ACL 19 paper "Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards"
Stars: ✭ 95 (+63.79%)
Mutual labels:  keyphrase-generation, pytorch-implementation
OpenNMT-kpg-release
Keyphrase Generation
Stars: ✭ 162 (+179.31%)
Mutual labels:  keyphrase-generation, pytorch-implementation
ClusterTransformer
Topic clustering library built on Transformer embeddings and cosine similarity metrics.Compatible with all BERT base transformers from huggingface.
Stars: ✭ 36 (-37.93%)
Mutual labels:  pytorch-implementation
PyTorch
An open source deep learning platform that provides a seamless path from research prototyping to production deployment
Stars: ✭ 17 (-70.69%)
Mutual labels:  pytorch-implementation
loc2vec
Pytorch implementation of the Loc2Vec with some modifications for speed
Stars: ✭ 40 (-31.03%)
Mutual labels:  pytorch-implementation
nnDetection
nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.
Stars: ✭ 355 (+512.07%)
Mutual labels:  pytorch-implementation
Deep-MVLM
A tool for precisely placing 3D landmarks on 3D facial scans based on the paper "Multi-view Consensus CNN for 3D Facial Landmark Placement"
Stars: ✭ 71 (+22.41%)
Mutual labels:  pytorch-implementation
triplet-loss-pytorch
Highly efficient PyTorch version of the Semi-hard Triplet loss ⚡️
Stars: ✭ 79 (+36.21%)
Mutual labels:  pytorch-implementation
ViT-V-Net for 3D Image Registration Pytorch
Vision Transformer for 3D medical image registration (Pytorch).
Stars: ✭ 169 (+191.38%)
Mutual labels:  pytorch-implementation
ResNet-50-CBAM-PyTorch
Implementation of Resnet-50 with and without CBAM in PyTorch v1.8. Implementation tested on Intel Image Classification dataset from https://www.kaggle.com/puneet6060/intel-image-classification.
Stars: ✭ 31 (-46.55%)
Mutual labels:  pytorch-implementation
Magic-VNet
VNet for 3d volume segmentation
Stars: ✭ 45 (-22.41%)
Mutual labels:  pytorch-implementation
DCAN
[AAAI 2020] Code release for "Domain Conditioned Adaptation Network" https://arxiv.org/abs/2005.06717
Stars: ✭ 27 (-53.45%)
Mutual labels:  pytorch-implementation
TS3000 TheChatBOT
Its a social networking chat-bot trained on Reddit dataset . It supports open bounded queries developed on the concept of Neural Machine Translation. Beware of its being sarcastic just like its creator 😝 BDW it uses Pytorch framework and Python3.
Stars: ✭ 20 (-65.52%)
Mutual labels:  pytorch-implementation
nvae
An unofficial toy implementation for NVAE 《A Deep Hierarchical Variational Autoencoder》
Stars: ✭ 83 (+43.1%)
Mutual labels:  pytorch-implementation
skillful nowcasting
Implementation of DeepMind's Deep Generative Model of Radar (DGMR) https://arxiv.org/abs/2104.00954
Stars: ✭ 117 (+101.72%)
Mutual labels:  pytorch-implementation
DocTr
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Stars: ✭ 202 (+248.28%)
Mutual labels:  pytorch-implementation
Dynamic Model Pruning with Feedback
Implement of Dynamic Model Pruning with Feedback with pytorch
Stars: ✭ 25 (-56.9%)
Mutual labels:  pytorch-implementation
tfvaegan
[ECCV 2020] Official Pytorch implementation for "Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification". SOTA results for ZSL and GZSL
Stars: ✭ 107 (+84.48%)
Mutual labels:  pytorch-implementation
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+86.21%)
Mutual labels:  pytorch-implementation
TailCalibX
Pytorch implementation of Feature Generation for Long-Tail Classification by Rahul Vigneswaran, Marc T Law, Vineeth N Balasubramaniam and Makarand Tapaswi
Stars: ✭ 32 (-44.83%)
Mutual labels:  pytorch-implementation

One2Set

This repository contains the code for our ACL 2021 paper “One2Set: Generating Diverse Keyphrases as a Set”.

Our implementation is built on the source code from keyphrase-generation-rl and fastNLP. Thanks for their work.

If you use this code, please cite our paper:

@inproceedings{ye2021one2set,
  title={One2Set: Generating Diverse Keyphrases as a Set},
  author={Ye, Jiacheng and Gui, Tao and Luo, Yichao and Xu, Yige and Zhang, Qi},
  booktitle={Proceedings of ACL},
  year={2021}
}

Dependency

  • python 3.5+
  • pytorch 1.0+

Dataset

The datasets can be downloaded from here, which are the tokenized version of the datasets provided by Ken Chen:

  • The testsets directory contains the five datasets for testing (i.e., inspec, krapivin, nus, and semeval and kp20k), where each of the datasets contains test_src.txt and test_trg.txt.
  • The kp20k_separated directory contains the training and validation files (i.e., train_src.txt, train_trg.txt, valid_src.txt and valid_trg.txt).
  • Each line of the *_src.txt file is the source document, which contains the tokenized words of title <eos> abstract .
  • Each line of the *_trg.txt file contains the target keyphrases separated by an ; character. The <peos> is used to mark the end of present ground-truth keyphrases and train a separate set loss for SetTrans model. For example, each line can be like present keyphrase one;present keyphrase two;<peos>;absent keyprhase one;absent keyphrase two.

Quick Start

The whole process includes the following steps:

  • Preprocessing: The preprocess.py script numericalizes the train_src.txt, train_trg.txt,valid_src.txt and valid_trg.txt files, and produces train.one2many.pt, valid.one2many.pt and vocab.pt.
  • Training: The train.py script loads the train.one2many.pt, valid.one2many.pt and vocab.pt file and performs training. We evaluate the model every 8000 batches on the valid set, and the model will be saved if the valid loss is lower than the previous one.
  • Decoding: The predict.py script loads the trained model and performs decoding on the five test datasets. The prediction file will be saved, which is like predicted keyphrase one;predicted keyphrase two;…. For SetTrans, we ignore the $\varnothing$ predictions that represent the meaning of “no corresponding keyphrase”.
  • Evaluation: The evaluate_prediction.py script loads the ground-truth and predicted keyphrases, and calculates the $F_1@5$ and $F_1@M$ metrics.

For the sake of simplicity, we provide an one-click script in the script directory. You can run the following command to run the whole process with SetTrans model under One2Set paradigm:

bash scripts/run_one2set.sh

You can also run the baseline Transformer model under One2Seq paradigm with the following command:

bash scripts/run_one2seq.sh

Note:

  • Please download and unzip the datasets in the ./data directory first.
  • To run all the bash files smoothly, you may need to specify the correct home_dir (i.e., the absolute path to kg_one2set dictionary) and the gpu id for CUDA_VISIBLE_DEVICES. We provide a small amount of data to quickly test whether your running environment is correct. You can test by running the following command:
bash scripts/run_small_one2set.sh

Resources

You can download our trained model here. We also provide raw predictions and corresponding evaluation results of three runs with different random seeds here, which contains the following files:

test
├── Full_One2set_Copy_Seed27_Dropout0.1_LR0.0001_BS12_MaxLen6_MaxNum20_LossScalePre0.2_LossScaleAb0.1_Step2_SetLoss
│   ├── inspec
│   │   ├── predictions.txt
│   │   └── results_log_5_M_5_M_5_M.txt
│   ├── kp20k
│   │   ├── predictions.txt
│   │   └── results_log_5_M_5_M_5_M.txt
│   ├── krapivin
│   │   ├── predictions.txt
│   │   └── results_log_5_M_5_M_5_M.txt
│   ├── nus
│   │   ├── predictions.txt
│   │   └── results_log_5_M_5_M_5_M.txt
│   └── semeval
│       ├── predictions.txt
│       └── results_log_5_M_5_M_5_M.txt
├── Full_One2set_Copy_Seed527_Dropout0.1_LR0.0001_BS12_MaxLen6_MaxNum20_LossScalePre0.2_LossScaleAb0.1_Step2_SetLoss
│   ├── ...
└── Full_One2set_Copy_Seed9527_Dropout0.1_LR0.0001_BS12_MaxLen6_MaxNum20_LossScalePre0.2_LossScaleAb0.1_Step2_SetLoss
    ├── ...
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].