Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → HHousen → Transformersum

HHousen / Transformersum

Licence: gpl-3.0

Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning summarization text-summarization

Projects that are alternatives of or similar to Transformersum

Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке

Stars: ✭ 25 (-76.64%)

Mutual labels: text-summarization, summarization

Python wrapper for evaluating summarization quality by ROUGE package

Stars: ✭ 155 (+44.86%)

Mutual labels: text-summarization, summarization

TextRank implementation for Python 3.

Stars: ✭ 1,008 (+842.06%)

Mutual labels: text-summarization, summarization

Natural Language Processing notes and implementations.

Stars: ✭ 66 (-38.32%)

Mutual labels: text-summarization, summarization

[AAAI2021] Unsupervised Opinion Summarization with Content Planning

Stars: ✭ 25 (-76.64%)

Mutual labels: text-summarization, summarization

A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.

Stars: ✭ 58 (-45.79%)

Mutual labels: text-summarization, summarization

Text summarization with tensorflow

Implementation of a seq2seq model for summarization of textual data. Demonstrated on amazon reviews, github issues and news articles.

Stars: ✭ 226 (+111.21%)

Mutual labels: text-summarization, summarization

[NAACL2018] Entity Commonsense Representation for Neural Abstractive Summarization

Stars: ✭ 20 (-81.31%)

Mutual labels: text-summarization, summarization

No description or website provided.

Stars: ✭ 21 (-80.37%)

Mutual labels: text-summarization, summarization

💨Making communication📞easier and faster🚅for all 👦 + 👧 + 👴 + 👶 + 🐮 + 🐦 + 🐱

Stars: ✭ 43 (-59.81%)

Mutual labels: text-summarization

Skip Thought Tf

An implementation of skip-thought vectors in Tensorflow

Stars: ✭ 77 (-28.04%)

Mutual labels: text-summarization

Awesome Text Summarization

The guide to tackle with the Text Summarization

Stars: ✭ 990 (+825.23%)

Mutual labels: text-summarization

LexRank for Korean.

Stars: ✭ 50 (-53.27%)

Mutual labels: summarization

The repo containing the Critical Role Dungeons and Dragons Dataset.

Stars: ✭ 83 (-22.43%)

Mutual labels: summarization

Text Summarizer

Python Framework for Extractive Text Summarization

Stars: ✭ 96 (-10.28%)

Mutual labels: text-summarization

文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法

Stars: ✭ 956 (+793.46%)

Mutual labels: text-summarization

Codebase for the Summary Loop paper at ACL2020

Stars: ✭ 26 (-75.7%)

Mutual labels: summarization

What I Have Read

Paper Lists, Notes and Slides, Focus on NLP. For summarization, please refer to https://github.com/xcfcode/Summarization-Papers

Stars: ✭ 110 (+2.8%)

Mutual labels: summarization

Lecture Summarizer

Lecture summarization with BERT

Stars: ✭ 94 (-12.15%)

Mutual labels: summarization

Multi-document summarization tool relying on ILP and sentence fusion

Stars: ✭ 72 (-32.71%)

Mutual labels: summarization

View All Similar Projects ➔

TransformerSum

Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.

TransformerSum is a library that aims to make it easy to train, evaluate, and use machine learning transformer models that perform automatic summarization. It features tight integration with huggingface/transformers which enables the easy usage of a wide variety of architectures and pre-trained models. There is a heavy emphasis on code readability and interpretability so that both beginners and experts can build new components. Both the extractive and abstractive model classes are written using pytorch_lightning, which handles the PyTorch training loop logic, enabling easy usage of advanced features such as 16-bit precision, multi-GPU training, and much more. TransformerSum supports both the extractive and abstractive summarization of long sequences (4,096 to 16,384 tokens) using the longformer (extractive) and LongformerEncoderDecoder (abstractive), which is a combination of BART (paper) and the longformer. TransformerSum also contains models that can run on resource-limited devices while still maintaining high levels of accuracy. Models are automatically evaluated with the ROUGE metric but human tests can be conducted by the user.

Check out the documentation for usage details.

Features

For extractive summarization, compatible with every huggingface/transformers transformer encoder model.
For abstractive summarization, compatible with every huggingface/transformers EncoderDecoder and Seq2Seq model.
Currently, 10+ pre-trained extractive models available to summarize text trained on 3 datasets (CNN-DM, WikiHow, and ArXiv-PebMed).
Contains pre-trained models that excel at summarization on resource-limited devices: On CNN-DM, mobilebert-uncased-ext-sum achieves about 97% of the performance of BertSum while containing 4.45 times fewer parameters. It achieves about 94% of the performance of MatchSum (Zhong et al., 2020), the current extractive state-of-the-art.
Contains code to train models that excel at summarizing long sequences: The longformer (extractive) and LongformerEncoderDecoder (abstractive) can summarize sequences of lengths up to 4,096 tokens by default, but can be trained to summarize sequences of more than 16k tokens.
Integration with huggingface/nlp means any summarization dataset in the nlp library can be used for both abstractive and extractive training.
"Smart batching" (extractive) and trimming (abstractive) support to not perform unnecessary calculations (speeds up training).
Use of pytorch_lightning for code readability.
Extensive documentation.
Three pooling modes (convert word vectors to sentence embeddings): mean or max of word embeddings in addition to the CLS token.

Pre-trained Models

All pre-trained models (including larger models and other architectures) are located in the documentation. The below is a fraction of the available models.

Extractive

Name	Dataset	Comments	R1/R2/RL/RL-Sum	Model Download	Data Download
mobilebert-uncased-ext-sum	CNN/DM	None	42.01/19.31/26.89/38.53	Model	CNN/DM Bert Uncased
distilroberta-base-ext-sum	CNN/DM	None	42.87/20.02/27.46/39.31	Model	CNN/DM Roberta
roberta-base-ext-sum	CNN/DM	None	43.24/20.36/27.64/39.65	Model	CNN/DM Roberta
mobilebert-uncased-ext-sum	WikiHow	None	30.72/8.78/19.18/28.59	Model	WikiHow Bert Uncased
distilroberta-base-ext-sum	WikiHow	None	31.07/8.96/19.34/28.95	Model	WikiHow Roberta
roberta-base-ext-sum	WikiHow	None	31.26/09.09/19.47/29.14	Model	WikiHow Roberta
mobilebert-uncased-ext-sum	arXiv-PubMed	None	33.97/11.74/19.63/30.19	Model	arXiv-PubMed Bert Uncased
distilroberta-base-ext-sum	arXiv-PubMed	None	34.70/12.16/19.52/30.82	Model	arXiv-PubMed Roberta
roberta-base-ext-sum	arXiv-PubMed	None	34.81/12.26/19.65/30.91	Model	arXiv-PubMed Roberta

Abstractive

Name	Dataset	Comments	Model Download
longformer-encdec-8192-bart-large-abs-sum	arXiv-PubMed	None	Not yet...

Install

Installation is made easy due to conda environments. Simply run this command from the root project directory: conda env create --file environment.yml and conda will create and environment called transformersum with all the required packages from environment.yml. The spacy en_core_web_sm model is required for the convert_to_extractive.py script to detect sentence boundaries.

Step-by-Step Instructions

Clone this repository: git clone https://github.com/HHousen/transformersum.git.
Change to project directory: cd transformersum.
Run installation command: conda env create --file environment.yml.
(Optional) If using the convert_to_extractive.py script then download the en_core_web_sm spacy model: python -m spacy download en_core_web_sm.

Meta

Hayden Housen – haydenhousen.com

Distributed under the GNU General Public License v3.0. See the LICENSE for more information.

https://github.com/HHousen

Attributions

Code heavily inspired by the following projects:
- Adapting BERT for Extractive Summariation: BertSum
- Text Summarization with Pretrained Encoders: PreSumm
- Word/Sentence Embeddings: sentence-transformers
- CNN/CM Dataset: cnn-dailymail
- PyTorch Lightning Classifier: lightning-text-classification
Important projects utilized:
- PyTorch: pytorch
- Training code: pytorch_lightning
- Transformer Models: huggingface/transformers

Contributing

All Pull Requests are greatly welcomed.

Questions? Commends? Issues? Don't hesitate to open an issue and briefly describe what you are experiencing (with any error logs if necessary). Thanks.

Fork it (https://github.com/HHousen/TransformerSum/fork)
Create your feature branch (git checkout -b feature/fooBar)
Commit your changes (git commit -am 'Add some fooBar')
Push to the branch (git push origin feature/fooBar)
Create a new Pull Request

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 107

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗