All Projects → supercoderhawk → deep-keyphrase

supercoderhawk / deep-keyphrase

Licence: other
seq2seq based keyphrase generation model sets, including copyrnn copycnn and copytransfomer

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to deep-keyphrase

ake-datasets
Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
Stars: ✭ 125 (+145.1%)
Mutual labels:  keyword-extraction, keyphrase-extraction, keyphrase-generation
Shakespearizing-Modern-English
Code for "Jhamtani H.*, Gangal V.*, Hovy E. and Nyberg E. Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models" Workshop on Stylistic Variation, EMNLP 2017
Stars: ✭ 64 (+25.49%)
Mutual labels:  seq2seq, copynet
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+17.65%)
Mutual labels:  keyword-extraction, keyphrase-extraction
dynmt-py
Neural machine translation implementation using dynet's python bindings
Stars: ✭ 17 (-66.67%)
Mutual labels:  seq2seq
kg one2set
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"
Stars: ✭ 58 (+13.73%)
Mutual labels:  keyphrase-generation
chatbot
kbqa task-oriented qa seq2seq ir neo4j jena seq2seq tf chatbot chat
Stars: ✭ 32 (-37.25%)
Mutual labels:  seq2seq
DeepLearning-Lab
Code lab for deep learning. Including rnn,seq2seq,word2vec,cross entropy,bidirectional rnn,convolution operation,pooling operation,InceptionV3,transfer learning.
Stars: ✭ 83 (+62.75%)
Mutual labels:  seq2seq
seq3
Source code for the NAACL 2019 paper "SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression"
Stars: ✭ 121 (+137.25%)
Mutual labels:  seq2seq
YodaSpeak
Translating English to Yoda English using Sequence-to-Sequence with Tensorflow.
Stars: ✭ 25 (-50.98%)
Mutual labels:  seq2seq
keras-chatbot-web-api
Simple keras chat bot using seq2seq model with Flask serving web
Stars: ✭ 51 (+0%)
Mutual labels:  seq2seq
minimal-nmt
A minimal nmt example to serve as an seq2seq+attention reference.
Stars: ✭ 36 (-29.41%)
Mutual labels:  seq2seq
sentence2vec
Deep sentence embedding using Sequence to Sequence learning
Stars: ✭ 23 (-54.9%)
Mutual labels:  seq2seq
CVAE Dial
CVAE_XGate model in paper "Xu, Dusek, Konstas, Rieser. Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity"
Stars: ✭ 16 (-68.63%)
Mutual labels:  seq2seq
adversarial-code-generation
Source code for the ICLR 2021 work "Generating Adversarial Computer Programs using Optimized Obfuscations"
Stars: ✭ 16 (-68.63%)
Mutual labels:  seq2seq
Adversarial-Learning-for-Generative-Conversational-Agents
This repository contains a new adversarial training method for Generative Conversational Agents
Stars: ✭ 71 (+39.22%)
Mutual labels:  seq2seq
position-rank
PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents
Stars: ✭ 89 (+74.51%)
Mutual labels:  keyphrase-extraction
classifier multi label seq2seq attention
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification,seq2seq,attention,beam search
Stars: ✭ 26 (-49.02%)
Mutual labels:  seq2seq
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-45.1%)
Mutual labels:  seq2seq
OpnEco
OpnEco is a Python3 project developed to aid content writers throughout the content writing process. By content writers, for content writers.
Stars: ✭ 18 (-64.71%)
Mutual labels:  keyword-extraction
SIFRank
The code of our paper "SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model"
Stars: ✭ 96 (+88.24%)
Mutual labels:  keyphrase-extraction

deep-keyphrase

Implement some keyphrase generation algorithm

Description

Implemented Paper

CopyRNN

Deep Keyphrase Generation (Meng et al., 2017)

ToDo List

CopyCNN

CopyTransformer

Usage

required files (4 files in total)

  1. vocab_file: word line by line (don't with index!!!!)

    this
    paper
    proposes
    
  2. training, valid and test file

data format for training, valid and test

json line format, every line is a dict:

{'tokens': ['this', 'paper', 'proposes', 'using', 'virtual', 'reality', 'to', 'enhance', 'the', 'perception', 'of', 'actions', 'by', 'distant', 'users', 'on', 'a', 'shared', 'application', '.', 'here', ',', 'distance', 'may', 'refer', 'either', 'to', 'space', '(', 'e.g.', 'in', 'a', 'remote', 'synchronous', 'collaboration', ')', 'or', 'time', '(', 'e.g.', 'during', 'playback', 'of', 'recorded', 'actions', ')', '.', 'our', 'approach', 'consists', 'in', 'immersing', 'the', 'application', 'in', 'a', 'virtual', 'inhabited', '3d', 'space', 'and', 'mimicking', 'user', 'actions', 'by', 'animating', 'avatars', '.', 'we', 'illustrate', 'this', 'approach', 'with', 'two', 'applications', ',', 'the', 'one', 'for', 'remote', 'collaboration', 'on', 'a', 'shared', 'application', 'and', 'the', 'other', 'to', 'playback', 'recorded', 'sequences', 'of', 'user', 'actions', '.', 'we', 'suggest', 'this', 'could', 'be', 'a', 'low', 'cost', 'enhancement', 'for', 'telepresence', '.'] ,
'keyphrases': [['telepresence'], ['animation'], ['avatars'], ['application', 'sharing'], ['collaborative', 'virtual', 'environments']]}

Training

download the kp20k

mkdir data
mkdir data/raw
mkdir data/raw/kp20k_new
# !! please unzip kp20k data put the files into above folder manually
python -m nltk.downloader punkt
bash scripts/prepare_kp20k.sh
bash scripts/train_copyrnn_kp20k.sh

# start tensorboard
# enter the experiment result dir, suffix is time that experiment starts
cd data/kp20k/copyrnn_kp20k_basic-20191212-080000
# start tensorboard services
tenosrboard --bind_all --logdir logs --port 6006

Notes

  1. compared with the original seq2seq-keyphrase-pytorch
    1. fix the implementation error:
      1. copy mechanism
      2. train and inference are not correspond (training doesn't have input feeding and inference has input feeding)
    2. easy data preparing
    3. tensorboard support
    4. faster beam search (6x faster used cpu and more than 10x faster used gpu)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].