All Projects → sildar → Potara

sildar / Potara

Licence: mit
Multi-document summarization tool relying on ILP and sentence fusion

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Potara

query-focused-sum
Official code repository for "Exploring Neural Models for Query-Focused Summarization".
Stars: ✭ 17 (-76.39%)
Mutual labels:  summarization
summary-explorer
Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.
Stars: ✭ 34 (-52.78%)
Mutual labels:  summarization
Pointer summarizer
pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"
Stars: ✭ 629 (+773.61%)
Mutual labels:  summarization
technical-articles
Technical Pieces collected in practices
Stars: ✭ 35 (-51.39%)
Mutual labels:  summarization
Copycat-abstractive-opinion-summarizer
ACL 2020 Unsupervised Opinion Summarization as Copycat-Review Generation
Stars: ✭ 76 (+5.56%)
Mutual labels:  summarization
Seq2seq Summarizer
Pointer-generator reinforced seq2seq summarization in PyTorch
Stars: ✭ 306 (+325%)
Mutual labels:  summarization
video-summarizer
Summarizes videos into much shorter videos. Ideal for long lecture videos.
Stars: ✭ 92 (+27.78%)
Mutual labels:  summarization
Lexrankr
LexRank for Korean.
Stars: ✭ 50 (-30.56%)
Mutual labels:  summarization
summarize-radiology-findings
Code and pretrained model for paper "Learning to Summarize Radiology Findings"
Stars: ✭ 63 (-12.5%)
Mutual labels:  summarization
Headlines
Automatically generate headlines to short articles
Stars: ✭ 516 (+616.67%)
Mutual labels:  summarization
article-summary-deep-learning
📖 Using deep learning and scraping to analyze/summarize articles! Just drop in any URL!
Stars: ✭ 18 (-75%)
Mutual labels:  summarization
FYP-AutoTextSum
Automatic Text Summarization with Machine Learning
Stars: ✭ 16 (-77.78%)
Mutual labels:  summarization
Statsbase.jl
Basic statistics for Julia
Stars: ✭ 326 (+352.78%)
Mutual labels:  summarization
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+161.11%)
Mutual labels:  summarization
Summary loop
Codebase for the Summary Loop paper at ACL2020
Stars: ✭ 26 (-63.89%)
Mutual labels:  summarization
textdigester
TextDigester: document summarization java library
Stars: ✭ 23 (-68.06%)
Mutual labels:  summarization
TextRank-node
No description or website provided.
Stars: ✭ 21 (-70.83%)
Mutual labels:  summarization
Awesome machine learning solutions
A curated list of repositories for my book Machine Learning Solutions.
Stars: ✭ 65 (-9.72%)
Mutual labels:  summarization
Textrank
TextRank implementation for Python 3.
Stars: ✭ 1,008 (+1300%)
Mutual labels:  summarization
Abstractive Summarization With Transfer Learning
Abstractive summarisation using Bert as encoder and Transformer Decoder
Stars: ✭ 358 (+397.22%)
Mutual labels:  summarization

image image Build Status Coverage Status Requirements Status

What is this?

Potara is a multi-document summarization system that relies on Integer Linear Programming (ILP) and sentence fusion.

Its goal is to summarize a set of related documents in a few sentences. It proceeds by fusing similar sentences in order to create sentences that are either shorter or more informative than those found in the documents. It then uses ILP in order to choose the best set of sentences, fused or not, that will compose the resulting summary.

It relies on state-of-the-art (as of 2014) approaches introduced by Gillick and Favre for the ILP strategy, and Filippova for the sentence fusion.

It is compatible and tested with Python 3.5 and 3.6.

Install

The easy way

You should be able to install potara and its dependencies with pip

pip install potara

You can also clone this repo and use the requirements.txt file to install dependencies

further requirements

You will also need GLPK, which is used to obtain an optimal summary (example for Debian-based distro)

$ sudo apt-get install glpk

For Ubuntu-based distros you can use:

$ sudo apt-get install libglpk40

You can check that the install run successfully by cloning the repo and running

$ python setup.py test

If you have issues with install, you can check the .travis.yml file of the repo, which corresponds to a working build.

How To

Basically, you can use the following

from potara.summarizer import Summarizer
from potara.document import Document

s = Summarizer()

# Adding docs, preprocessing them and computing some infos for the summarizer
s.setDocuments([Document('data/' + str(i) + '.txt')
                for i in range(1,10)])
       
# Summarizing, where the actual work is done
s.summarize()

# You can then print the summary
print(s.summary)

There's some preprocessing involved and a sentence fusion step, but I made it easily tunable. Preprocessing may take a while (a few minutes) since there is a lot going on under the hood. Default parameters are currently set for summarizing ~10 documents. You can summarize a smaller amount of documents by tweaking the "minbigramcount" parameter of the summarizer :

s = Summarizer(minbigramcount=2)

Summarizing less than 4 documents would probably yield a bad summary.

Similarity models

Potara relies on similarity scores between sentences. These scores can be shallow using a cosine similarity, or "deep" using gensim Word2Vec semantic representation of words. For the second use case, you'll want to train your own model or use pretrained models. You may contact me if you want to use potara that way, and I may create a tutorial on the matter for the occasion.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].