All Projects → luopeixiang → awesome-text-summarization

luopeixiang / awesome-text-summarization

Licence: other
Text summarization starting from scratch.

Projects that are alternatives of or similar to awesome-text-summarization

Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (-55.81%)
Mutual labels:  text-summarization, extractive-summarization, abstractive-summarization
seq3
Source code for the NAACL 2019 paper "SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression"
Stars: ✭ 121 (+40.7%)
Mutual labels:  abstractive-summarization, sentence-compression
Scripts-for-extractive-summarization
Scripts for an upcoming blog "Extractive vs. Abstractive Summarization" for RaRe Technologies.
Stars: ✭ 12 (-86.05%)
Mutual labels:  text-summarization, extractive-summarization
DocSum
A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.
Stars: ✭ 58 (-32.56%)
Mutual labels:  text-summarization, abstractive-summarization
PlanSum
[AAAI2021] Unsupervised Opinion Summarization with Content Planning
Stars: ✭ 25 (-70.93%)
Mutual labels:  text-summarization, abstractive-summarization
Entity2Topic
[NAACL2018] Entity Commonsense Representation for Neural Abstractive Summarization
Stars: ✭ 20 (-76.74%)
Mutual labels:  text-summarization, abstractive-summarization
xl-sum
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Stars: ✭ 160 (+86.05%)
Mutual labels:  text-summarization, abstractive-summarization
gazeta
Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке
Stars: ✭ 25 (-70.93%)
Mutual labels:  text-summarization, abstractive-summarization
summarize-webpage
A small NLP SAAS project that summarize a webpage
Stars: ✭ 34 (-60.47%)
Mutual labels:  text-summarization
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+65.12%)
Mutual labels:  text-mining
sensim
Sentence Similarity Estimator (SenSim)
Stars: ✭ 15 (-82.56%)
Mutual labels:  text-mining
aera-workshop
This workshop introduces participants to the Learning Analytics (LA), and provides a brief overview of LA methodologies, literature, applications, and ethical issues as they relate to STEM education.
Stars: ✭ 14 (-83.72%)
Mutual labels:  text-mining
eventextraction
中文复合事件抽取,能识别文本的模式,包括条件事件、顺承事件、反转事件等,可以用于文本逻辑性分析。
Stars: ✭ 17 (-80.23%)
Mutual labels:  text-mining
named-entity-recognition
Notebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities
Stars: ✭ 18 (-79.07%)
Mutual labels:  text-mining
Text-Analysis
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-44.19%)
Mutual labels:  text-mining
textstem
Tools for fast text stemming & lemmatization
Stars: ✭ 36 (-58.14%)
Mutual labels:  text-mining
blueprints-text
Jupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"
Stars: ✭ 103 (+19.77%)
Mutual labels:  text-mining
TextRank-node
No description or website provided.
Stars: ✭ 21 (-75.58%)
Mutual labels:  text-summarization
snorkeling
Extracting biomedical relationships from literature with Snorkel 🏊
Stars: ✭ 56 (-34.88%)
Mutual labels:  text-mining
elpresidente
🇺🇸 Search and Extract Corpus Elements from 'The American Presidency Project'
Stars: ✭ 21 (-75.58%)
Mutual labels:  text-mining

awesome-text-summarization

Text summarization starting from scratch.

This repository will keep updating...

Table of Contents

Basic Concept

Definition

Summarization is the task of producing a shorter version of one or several documents that preserves most of the input's meaning.

Types of summarization

Extractive summaries (extracts) are produced by concatenating several sentences taken exactly as they appear in the materials being summarized.

Abstractive summaries (abstracts), are written to convey the main information in the input and may reuse phrases or clauses from it, but the summaries are overall expressed in the words of the summary author.

Summary Informativeness evaluation

  • ROUGE-N: measures the N-gram units common between a particular summary and a col- lection of reference summaries where N determines the N-gram’s length. E.g., ROUGE-1 for unigrams and ROUGE-2 for bi-grams.
  • ROUGE-L: computes Longest Common Subsequence (LCS) metric.
  • BLUE : BLEU is basically calculated on the n-gram co-occerance between the generated summary and the gold (You don't need to specify the "n" unlike ROUGE).
  • METEOR : based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision.

DataSet

  • Annotated English Gigaword

    • for sentence summarization
  • CNN/Daily Mail dataset

    • for document summatization
  • DUC 2004

  • CORNELL NEWSROOM

    • is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.
  • Google Dataset

    • Large corpus of uncompressed and compressed sentences from news articles.

Papers

Survey

Recent automatic text summarization techniques:a survey

Automatic summarization

Abstractive Document summarization

1.words-lvt2k-temp-att (Nallapti et al., 2016) : Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond

2.Graph-Based Attn : Abstractive Document Summarization with a Graph-Based Attentional Neural Model

3.Pointer-generator + coverage (See et al., 2017) : Get To The Point: Summarization with Pointer-Generator Networks

4.KIGN+Prediction-guide : Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network

5.Explicit Info Selection Modeling(Li et al., 2018a) : Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling

6.Structural Regularization(Li et al., 2018b) : Improving Neural Abstractive Document Summarization with Structural Regularization

7.end2end w/ inconsistency loss (Hsu et al., 2018): A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

8.Pointer + Coverage + EntailmentGen + QuestionGen (Guo et al., 2018) : Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation


Based Reinforcement Learning:

1.ML+RL ROUGE+Novel, with LM (Kryscinski et al., 2018) : Improving Abstraction in Text Summarization

2.RL + pg + cbdec (Jiang and Bansal, 2018): Closed-Book Training to Improve Summarization Encoder Memory

3.rnn-ext + abs + RL + rerank (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

4.ML+RL, with intra-attention : A Deep Reinforced Model for Abstractive Summarization

5.ML+RL ROUGE+Novel, with LM : Improving Abstraction in Text Summarization

6.GAN : Generative Adversarial Network for Abstractive Text Summarization

7.DCA (Celikyilmaz et al., 2018) : Summarization

8.ROUGESal+Ent RL (Pasunuru and Bansal, 2018): Multi-Reward Reinforced Summarization with Saliency and Entailment


Extractive Document summarization

1.TEXTRANK(graph based): TextRank: Bringing Order intoTexts

2.SWAP-NET : Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks

3.NN-SE : [Neural summarization by extracting sentences and words

4.HSASS : A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)

5.NeuSUM (Zhou et al., 2018) : Neural Document Summarization by Jointly Learning to Score and Select Sentences

6.Latent (Zhang et al., 2018) : Neural Latent Extractive Document Summarization

Based Reinforcement Learning

1.rnn-ext + RL (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

2.Bottom-Up Summarization (Gehrmann et al., 2018): Bottom-Up Abstractive Summarization

3.BANDITSUM :BANDITSUM: Extractive Summarization as a Contextual Bandit

4.SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents

5.Refrech: Ranking sentences for extractive summarization with reinforcement learning

6.DQN: Deep reinforcement learning for extractive document summarization:

7.RNES w/o coherence :Learning to Extract Coherent Summary via Deep Reinforcement Learning

Sentence Summarization

1.Re^3 Sum (Cao et al., 2018) : Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization

2.FTSum_g (Cao et al., 2018) : Faithful to the Original: Fact Aware Neural Abstractive Summarization

3.Seq2seq + E2T_cnn (Amplayo et al., 2018) : Abstractive Sentence Summarization with Attentive Recurrent Neural Networks

4.EndDec+WFE (Suzuki and Nagata, 2017) : Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization

5.DRGD (Li et al., 2017) : Deep Recurrent Generative Decoder for Abstractive Text Summarization

6.BiRNN + LM Evaluator (Zhao et al. 2018) : A Language Model based Evaluator for Sentence Compression

Unsupervised Abstractive Summarization

1.MeanSum : MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization

2.Semantic Abstractive Sum based AMR(2018 Dohare): Unsupervised Semantic Abstractive Summarization

3.Paraphrastic Sentence Fusion Model(2018 Nayeem): Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion

Multi Document Summarization

1.(Z Cao 2017) : Improving Multi-Document Summarization via Text Classification

2.Based AMR : Abstract Meaning Representation for Multi-Document Summarization.

3 Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion.

4 Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization.

5 Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization.

6 Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization.

7 Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

Evaluation Metrics

1.ROUGE(2004) : Rouge: A package for automatic evaluation of summaries

2.BLUE(2002) : BLEU: a Method for Automatic Evaluation of Machine Translation

3.BE(2006) : Automated Summarization Evaluation with Basic Elements

4.Pyramid Method(2007) : Evaluating Content Selection in Summarization: The Pyramid Method

5.(2018 Shaflei) : Summarization Evaluation in the Absence of Human Model Summaries Using the Compositionality of Word Embeddings

6.(2018 Honda) : Pruning Basic Elements for Better Automatic Evaluation of Summaries

Other Resources

awesome-text-summatization :

SOTA in summarizaiton : The current state-of-the-art

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].