Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

Stars: ✭ 216 (+468.42%)

Mutual labels: transformers, bert

les-military-mrc-rank7

莱斯杯：全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案

Stars: ✭ 37 (-2.63%)

Mutual labels: bert, roberta

View All Similar Projects ➔

Text-Summarization

Abstractive and Extractive Text summarization Transformer model and API.

Project History

I wanted to create an abstractive text summarization app as a tool to help in university studies. Researched and tried various models for text summarization including LSTMS and RNNs etc. The output was okay enough from a project point of view but not good enough for actual use case. Hence I decided to go with Transformers which produce good enough summary for real world use case. I used T5, Pegasus Longformer2RoBerta, BART and LED . According to my tests the models surprisingly, Pegasus produced better output than the other two. Longformer2RobBerta should have been the best model as it is meant to be used for summarization of long documents but the output produced wasn't upto the mark. BART and LED also gave decentish outputs. Overall Pegasus provided a good abstractive summary

Also tried a few extractive based transformer models like BERT, GPT2, XLNet. The output was almost indistingushible from a human summary.

Project

'src' directory contains 3 sub directories:

'abstractive' which contains notebooks for T5, Pegasus, Longformer2RoBerta, BART and LED abstractive summarization models.
'extractive' which contains BERT, GPT2 and XLNet extractive summarization models.

'prototype' directory contains a web app prototype created using Streamlit framework (Used T5) for testing purposes. To run it locally:-
- Git Clone repo
- Go to 'prototype' directory, open command prompt there and run 'streamlit run app.py'
'app' directory contains an API created for both Abstractive and Extractive (Pegasus and XLNet) summaries. To test API locally:

Run pip install -r requirements.txt to install all dependencies
Open terminal in project directory and run uvicorn app.main:app --reload
After the application startup is completed, go to localhost:8000/docs to try it out

Note:-

API will soon be deployed to cloud for inference and then integrated into FLASK application as direct usage of transformer leads to timeout.
Dont copy paste 2 paras directly while testing. Remove all instances of new line so as to convert text to 1 continuous paragraph. Otherwise it will lead to Error 422.

Tech Used

These are the libraries and technologies used or will be used in the project.

PyTorch
Transformers Library
Streamlit
Flask (Work in Progress)
FastAPI

To Do

~~Create a web app using Flask and host on cloud platforms for easy usage.~~ (Done)
Build a chrome extension for use in web site (More portable and faster than web app). (WIP)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

aj-naik / Text-Summarization

Programming Languages

Labels

Projects that are alternatives of or similar to Text-Summarization

Text-Summarization

Project History

Project

Tech Used

To Do