All Projects → rohithreddy024 → Text Summarizer Pytorch

rohithreddy024 / Text Summarizer Pytorch

Pytorch implementation of "A Deep Reinforced Model for Abstractive Summarization" paper and pointer generator network

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Text Summarizer Pytorch

Extendedsumm
On Generating Extended Summaries of Long Documents
Stars: ✭ 63 (-68.97%)
Mutual labels:  text-summarization
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+940.39%)
Mutual labels:  text-summarization
Rouge 2.0
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Stars: ✭ 167 (-17.73%)
Mutual labels:  text-summarization
Skip Thought Tf
An implementation of skip-thought vectors in Tensorflow
Stars: ✭ 77 (-62.07%)
Mutual labels:  text-summarization
Textsum Gan
Tensorflow re-implementation of GAN for text summarization
Stars: ✭ 111 (-45.32%)
Mutual labels:  text-summarization
Textsum
Preparing a dataset for TensorFlow text summarization (TextSum) model.
Stars: ✭ 140 (-31.03%)
Mutual labels:  text-summarization
Producttitlesummarizationcorpus
Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"
Stars: ✭ 61 (-69.95%)
Mutual labels:  text-summarization
Neural conversation models
Tensorflow based Neural Conversation Models
Stars: ✭ 192 (-5.42%)
Mutual labels:  beam-search
Discobert
Code for paper "Discourse-Aware Neural Extractive Text Summarization" (ACL20)
Stars: ✭ 120 (-40.89%)
Mutual labels:  text-summarization
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (-21.67%)
Mutual labels:  beam-search
Text Summarizer
Python Framework for Extractive Text Summarization
Stars: ✭ 96 (-52.71%)
Mutual labels:  text-summarization
Transformersum
Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.
Stars: ✭ 107 (-47.29%)
Mutual labels:  text-summarization
Seq2seq chatbot new
基于seq2seq模型的简单对话系统的tf实现,具有embedding、attention、beam_search等功能,数据集是Cornell Movie Dialogs
Stars: ✭ 144 (-29.06%)
Mutual labels:  beam-search
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+457.64%)
Mutual labels:  text-summarization
Nlg Yongzhuo
中文文本生成(NLG)之文本摘要(text summarization)工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。(graph,feature,topic model,summarize tool or tookit)
Stars: ✭ 175 (-13.79%)
Mutual labels:  text-summarization
Fast Ctc Decode
Blitzing Fast CTC Beam Search Decoder
Stars: ✭ 62 (-69.46%)
Mutual labels:  beam-search
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (-37.93%)
Mutual labels:  beam-search
Seq2seq
基于Pytorch的中文聊天机器人 集成BeamSearch算法
Stars: ✭ 200 (-1.48%)
Mutual labels:  beam-search
Kr Wordrank
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
Stars: ✭ 182 (-10.34%)
Mutual labels:  text-summarization
Pythonrouge
Python wrapper for evaluating summarization quality by ROUGE package
Stars: ✭ 155 (-23.65%)
Mutual labels:  text-summarization

Text-Summarizer-Pytorch

Combining A Deep Reinforced Model for Abstractive Summarization and Get To The Point: Summarization with Pointer-Generator Networks

Model Description

  • LSTM based Sequence-to-Sequence model for Abstractive Summarization
  • Pointer mechanism for handling Out of Vocabulary (OOV) words See et al. (2017)
  • Intra-temporal and Intra-decoder attention for handling repeated words Paulus et al. (2018)
  • Self-critic policy gradient training along with MLE training Paulus et al. (2018)

Prerequisites

  • Pytorch
  • Tensorflow
  • Python 2 & 3
  • rouge

Data

  • Download train and valid pairs (article, title) of OpenNMT provided Gigaword dataset from here
  • Copy files train.article.txt, train.title.txt, valid.article.filter.txtand valid.title.filter.txt to data/unfinished folder
  • Files are already preprcessed

Creating .bin files and vocab file

  • The model accepts data in the form of .bin files.
  • To convert .txt file into .bin file and chunk them further, run (requires Python 2 & Tensorflow):
python make_data_files.py
  • You will find the data in data/chunked folder and vocab file in data folder

Training

  • As suggested in Paulus et al. (2018), first pretrain the seq-to-seq model using MLE (with Python 3):
python train.py --train_mle=yes --train_rl=no --mle_weight=1.0
  • Next, find the best saved model on validation data by running (with Python 3):
python eval.py --task=validate --start_from=0005000.tar
  • After finding the best model (lets say 0100000.tar) with high rouge-l f score, load it and run (with Python 3):
python train.py --train_mle=yes --train_rl=yes --mle_weight=0.25 --load_model=0100000.tar --new_lr=0.0001

for MLE + RL training (or)

python train.py --train_mle=no --train_rl=yes --mle_weight=0.0 --load_model=0100000.tar --new_lr=0.0001

for RL training

Validation

  • To perform validation of RL training, run (with Python 3):
python eval.py --task=validate --start_from=0100000.tar

Testing

  • After finding the best model of RL training (lets say 0200000.tar), evaluate it on test data & get all rouge metrics by running (with Python 3):
python eval.py --task=test --load_model=0200000.tar

Results

  • Rouge scores obtained by using best MLE trained model on test set:
    scores: {
    'rouge-1': {'f': 0.4412018559893622, 'p': 0.4814799494024485, 'r': 0.4232331027817015},
    'rouge-2': {'f': 0.23238981595683728, 'p': 0.2531296070596062, 'r': 0.22407861554997008},
    'rouge-l': {'f': 0.40477682528278364, 'p': 0.4584684491434479, 'r': 0.40351107200202596}
    }

  • Rouge scores obtained by using best MLE + RL trained model on test set:
    scores: {
    'rouge-1': {'f': 0.4499047033247696, 'p': 0.4853756369556345, 'r': 0.43544461386607497},
    'rouge-2': {'f': 0.24037014314625643, 'p': 0.25903387205387235, 'r': 0.23362662645146298},
    'rouge-l': {'f': 0.41320241732946406, 'p': 0.4616655167980162, 'r': 0.4144419466382236}
    }

  • Training log file is included in the repository

Examples

article: russia 's lower house of parliament was scheduled friday to debate an appeal to the prime minister that challenged the right of u.s.-funded radio liberty to operate in russia following its introduction of broadcasts targeting chechnya .
ref: russia 's lower house of parliament mulls challenge to radio liberty
dec: russian parliament to debate on banning radio liberty

article: continued dialogue with the democratic people 's republic of korea is important although australia 's plan to open its embassy in pyongyang has been shelved because of the crisis over the dprk 's nuclear weapons program , australian foreign minister alexander downer said on friday .
ref: dialogue with dprk important says australian foreign minister
dec: australian fm says dialogue with dprk important

article: water levels in the zambezi river are rising due to heavy rains in its catchment area , prompting zimbabwe 's civil protection unit -lrb- cpu -rrb- to issue a flood alert for people living in the zambezi valley , the herald reported on friday .
ref: floods loom in zambezi valley
dec: water levels rising in zambezi river

article: tens of thousands of people have fled samarra , about ## miles north of baghdad , in recent weeks , expecting a showdown between u.s. troops and heavily armed groups within the city , according to u.s. and iraqi sources .
ref: thousands flee samarra fearing battle
dec: tens of thousands flee samarra expecting showdown with u.s. troops

article: the #### tung blossom festival will kick off saturday with a fun-filled ceremony at the west lake resort in the northern taiwan county of miaoli , a hakka stronghold , the council of hakka affairs -lrb- cha -rrb- announced tuesday .
ref: #### tung blossom festival to kick off saturday
dec: #### tung blossom festival to kick off in miaoli

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].