All Projects → dongjun-Lee → Text Summarization Tensorflow

dongjun-Lee / Text Summarization Tensorflow

Licence: mit
Tensorflow seq2seq Implementation of Text Summarization.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Text Summarization Tensorflow

Text summurization abstractive methods
Multiple implementations for abstractive text summurization , using google colab
Stars: ✭ 359 (-31.88%)
Mutual labels:  seq2seq, text-summarization, encoder-decoder
Text summarization with tensorflow
Implementation of a seq2seq model for summarization of textual data. Demonstrated on amazon reviews, github issues and news articles.
Stars: ✭ 226 (-57.12%)
Mutual labels:  seq2seq, text-summarization, encoder-decoder
Screenshot To Code
A neural network that transforms a design mock-up into a static website.
Stars: ✭ 13,561 (+2473.24%)
Mutual labels:  seq2seq, encoder-decoder
Speech recognition with tensorflow
Implementation of a seq2seq model for Speech Recognition using the latest version of TensorFlow. Architecture similar to Listen, Attend and Spell.
Stars: ✭ 253 (-51.99%)
Mutual labels:  seq2seq, encoder-decoder
TextSumma
reimplementing Neural Summarization by Extracting Sentences and Words
Stars: ✭ 16 (-96.96%)
Mutual labels:  text-summarization, seq2seq
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+87.86%)
Mutual labels:  seq2seq, encoder-decoder
Deep News Summarization
News summarization using sequence to sequence model with attention in TensorFlow.
Stars: ✭ 167 (-68.31%)
Mutual labels:  seq2seq, encoder-decoder
ai-visual-storytelling-seq2seq
Implementation of seq2seq model for Visual Storytelling Challenge (VIST) http://visionandlanguage.net/VIST/index.html
Stars: ✭ 50 (-90.51%)
Mutual labels:  seq2seq, encoder-decoder
Embedding
Embedding模型代码和学习笔记总结
Stars: ✭ 25 (-95.26%)
Mutual labels:  seq2seq, encoder-decoder
NeuralCodeTranslator
Neural Code Translator provides instructions, datasets, and a deep learning infrastructure (based on seq2seq) that aims at learning code transformations
Stars: ✭ 32 (-93.93%)
Mutual labels:  seq2seq, encoder-decoder
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+548.58%)
Mutual labels:  seq2seq, encoder-decoder
Encoder decoder
Four styles of encoder decoder model by Python, Theano, Keras and Seq2Seq
Stars: ✭ 269 (-48.96%)
Mutual labels:  seq2seq, encoder-decoder
Transformer Temporal Tagger
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (-89.56%)
Mutual labels:  seq2seq, encoder-decoder
Keras Text Summarization
Text summarization using seq2seq in Keras
Stars: ✭ 260 (-50.66%)
Mutual labels:  seq2seq, text-summarization
Tf Seq2seq
Sequence to sequence learning using TensorFlow.
Stars: ✭ 387 (-26.57%)
Mutual labels:  seq2seq, encoder-decoder
Multiwoz
Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)
Stars: ✭ 384 (-27.13%)
Mutual labels:  seq2seq
Joeynmt
Minimalist NMT for educational purposes
Stars: ✭ 420 (-20.3%)
Mutual labels:  seq2seq
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (-27.89%)
Mutual labels:  encoder-decoder
Sumeval
Well tested & Multi-language evaluation framework for text summarization.
Stars: ✭ 483 (-8.35%)
Mutual labels:  text-summarization
Tf seq2seq chatbot
[unmaintained]
Stars: ✭ 420 (-20.3%)
Mutual labels:  seq2seq

tensorflow-text-summarization

Simple Tensorflow implementation of text summarization using seq2seq library.

Model

Encoder-Decoder model with attention mechanism.

Word Embedding

Used Glove pre-trained vectors to initialize word embedding.

Encoder

Used LSTM cell with stack_bidirectional_dynamic_rnn.

Decoder

Used LSTM BasicDecoder for training, and BeamSearchDecoder for inference.

Attention Mechanism

Used BahdanauAttention with weight normalization.

Requirements

  • Python 3
  • Tensorflow (>=1.8.0)
  • pip install -r requirements.txt

Usage

Prepare data

Dataset is available at harvardnlp/sent-summary. Locate the summary.tar.gz file in project root directory. Then,

$ python prep_data.py

To use Glove pre-trained embedding, download it via

$ python prep_data.py --glove

Train

We used sumdata/train/train.article.txt and sumdata/train/train.title.txt for training data. To train the model, use

$ python train.py

To use Glove pre-trained vectors as initial embedding, use

$ python train.py --glove

Additional Hyperparamters

$ python train.py -h
usage: train.py [-h] [--num_hidden NUM_HIDDEN] [--num_layers NUM_LAYERS]
                [--beam_width BEAM_WIDTH] [--glove]
                [--embedding_size EMBEDDING_SIZE]
                [--learning_rate LEARNING_RATE] [--batch_size BATCH_SIZE]
                [--num_epochs NUM_EPOCHS] [--keep_prob KEEP_PROB] [--toy]

optional arguments:
  -h, --help            show this help message and exit
  --num_hidden NUM_HIDDEN
                        Network size.
  --num_layers NUM_LAYERS
                        Network depth.
  --beam_width BEAM_WIDTH
                        Beam width for beam search decoder.
  --glove               Use glove as initial word embedding.
  --embedding_size EMBEDDING_SIZE
                        Word embedding size.
  --learning_rate LEARNING_RATE
                        Learning rate.
  --batch_size BATCH_SIZE
                        Batch size.
  --num_epochs NUM_EPOCHS
                        Number of epochs.
  --keep_prob KEEP_PROB
                        Dropout keep prob.
  --toy                 Use only 5K samples of data

Test

Generate summary of each article in sumdata/train/valid.article.filter.txt by

$ python test.py

It will generate result summary file result.txt. Check out ROUGE metrics between result.txt and sumdata/train/valid.title.filter.txt using pltrdy/files2rouge.

Sample Summary Output

"general motors corp. said wednesday its us sales fell ##.# percent in december and four percent in #### with the biggest losses coming from passenger car sales ."
> Model output: gm us sales down # percent in december
> Actual title: gm december sales fall # percent

"japanese share prices rose #.## percent thursday to <unk> highest closing high for more than five years as fresh gains on wall street fanned upbeat investor sentiment , dealers said ."
> Model output:  tokyo shares close # percent higher
> Actual title: tokyo shares close up # percent

"hong kong share prices opened #.## percent higher thursday on follow-through interest in properties after wednesday 's sharp gains on abating interest rate worries , dealers said ."
> Model output: hong kong shares open higher
> Actual title: hong kong shares open higher as rate worries ease

"the dollar regained some lost ground in asian trade thursday in what was seen as a largely technical rebound after weakness prompted by expectations of a shift in us interest rate policy , dealers said ."
> Model output: dollar stable in asian trade
> Actual title: dollar regains ground in asian trade

"the final results of iraq 's december general elections are due within the next four days , a member of the iraqi electoral commission said on thursday ."
> Model output: iraqi election results due in next four days
> Actual title: iraqi election final results out within four days

"microsoft chairman bill gates late wednesday unveiled his vision of the digital lifestyle , outlining the latest version of his windows operating system to be launched later this year ."
> Model output: bill gates unveils new technology vision
> Actual title: gates unveils microsoft 's vision of digital lifestyle

Pre-trained Model

To test with pre-trained model, download pre_trained.zip, and locate it in the project root directory. Then,

$ unzip pre_trained.zip
$ python test.py
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].