All Projects → aj-naik → Text-Summarization

aj-naik / Text-Summarization

Licence: MIT license
Abstractive and Extractive Text summarization using Transformers.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Text-Summarization

Transformer-QG-on-SQuAD
Implement Question Generator with SOTA pre-trained Language Models (RoBERTa, BERT, GPT, BART, T5, etc.)
Stars: ✭ 28 (-26.32%)
Mutual labels:  bart, bert, roberta, gpt2
DocSum
A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.
Stars: ✭ 58 (+52.63%)
Mutual labels:  transformers, bart, text-summarization, abstractive-summarization
Albert zh
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
Stars: ✭ 3,500 (+9110.53%)
Mutual labels:  bert, roberta, xlnet
Roberta zh
RoBERTa中文预训练模型: RoBERTa for Chinese
Stars: ✭ 1,953 (+5039.47%)
Mutual labels:  bert, roberta, gpt2
question generator
An NLP system for generating reading comprehension questions
Stars: ✭ 188 (+394.74%)
Mutual labels:  transformers, bert, t5
CLUE pytorch
CLUE baseline pytorch CLUE的pytorch版本基线
Stars: ✭ 72 (+89.47%)
Mutual labels:  bert, roberta, xlnet
Bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
Stars: ✭ 3,443 (+8960.53%)
Mutual labels:  bert, roberta, gpt2
awesome-text-summarization
Text summarization starting from scratch.
Stars: ✭ 86 (+126.32%)
Mutual labels:  text-summarization, extractive-summarization, abstractive-summarization
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-36.84%)
Mutual labels:  transformers, bert, roberta
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+147.37%)
Mutual labels:  transformers, bert, gpt2
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+6281.58%)
Mutual labels:  transformers, bert, roberta
erc
Emotion recognition in conversation
Stars: ✭ 34 (-10.53%)
Mutual labels:  transformers, bert, roberta
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+6526.32%)
Mutual labels:  transformers, bert, xlnet
AiSpace
AiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0
Stars: ✭ 28 (-26.32%)
Mutual labels:  bert, xlnet
streamlit-light-leaflet
Streamlit quick & dirty Leaflet component that sends back coordinates on map click
Stars: ✭ 22 (-42.11%)
Mutual labels:  prototype, streamlit
vietnamese-roberta
A Robustly Optimized BERT Pretraining Approach for Vietnamese
Stars: ✭ 22 (-42.11%)
Mutual labels:  bert, roberta
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-39.47%)
Mutual labels:  bert, xlnet
Nlp Architect
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Stars: ✭ 2,768 (+7184.21%)
Mutual labels:  transformers, bert
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (+468.42%)
Mutual labels:  transformers, bert
les-military-mrc-rank7
莱斯杯:全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案
Stars: ✭ 37 (-2.63%)
Mutual labels:  bert, roberta

Text-Summarization

Abstractive and Extractive Text summarization Transformer model and API.

Project History

I wanted to create an abstractive text summarization app as a tool to help in university studies. Researched and tried various models for text summarization including LSTMS and RNNs etc. The output was okay enough from a project point of view but not good enough for actual use case. Hence I decided to go with Transformers which produce good enough summary for real world use case. I used T5, Pegasus Longformer2RoBerta, BART and LED . According to my tests the models surprisingly, Pegasus produced better output than the other two. Longformer2RobBerta should have been the best model as it is meant to be used for summarization of long documents but the output produced wasn't upto the mark. BART and LED also gave decentish outputs. Overall Pegasus provided a good abstractive summary

Also tried a few extractive based transformer models like BERT, GPT2, XLNet. The output was almost indistingushible from a human summary.

Project

  1. 'src' directory contains 3 sub directories:
  • 'abstractive' which contains notebooks for T5, Pegasus, Longformer2RoBerta, BART and LED abstractive summarization models.
  • 'extractive' which contains BERT, GPT2 and XLNet extractive summarization models.
  1. 'prototype' directory contains a web app prototype created using Streamlit framework (Used T5) for testing purposes. To run it locally:-
    • Git Clone repo
    • Go to 'prototype' directory, open command prompt there and run 'streamlit run app.py'
  2. 'app' directory contains an API created for both Abstractive and Extractive (Pegasus and XLNet) summaries. To test API locally:
  • Run pip install -r requirements.txt to install all dependencies
  • Open terminal in project directory and run uvicorn app.main:app --reload
  • After the application startup is completed, go to localhost:8000/docs to try it out

Note:-

  • API will soon be deployed to cloud for inference and then integrated into FLASK application as direct usage of transformer leads to timeout.
  • Dont copy paste 2 paras directly while testing. Remove all instances of new line so as to convert text to 1 continuous paragraph. Otherwise it will lead to Error 422.

Tech Used

These are the libraries and technologies used or will be used in the project.

  1. PyTorch
  2. Transformers Library
  3. Streamlit
  4. Flask (Work in Progress)
  5. FastAPI

To Do

  1. Create a web app using Flask and host on cloud platforms for easy usage. (Done)
  2. Build a chrome extension for use in web site (More portable and faster than web app). (WIP)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].