All Projects → vishnu45 → NLP-Extractive-NEWS-summarization-using-MMR

vishnu45 / NLP-Extractive-NEWS-summarization-using-MMR

Licence: other
A simple python implementation of the Maximal Marginal Relevance (MMR) baseline system for text summarization.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to NLP-Extractive-NEWS-summarization-using-MMR

Scripts-for-extractive-summarization
Scripts for an upcoming blog "Extractive vs. Abstractive Summarization" for RaRe Technologies.
Stars: ✭ 12 (-79.66%)
Mutual labels:  text-summarization, rouge-evaluation
Text Summarizer Pytorch
Pytorch implementation of "A Deep Reinforced Model for Abstractive Summarization" paper and pointer generator network
Stars: ✭ 203 (+244.07%)
Mutual labels:  text-summarization
Producttitlesummarizationcorpus
Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"
Stars: ✭ 61 (+3.39%)
Mutual labels:  text-summarization
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+3479.66%)
Mutual labels:  text-summarization
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+1818.64%)
Mutual labels:  text-summarization
Pythonrouge
Python wrapper for evaluating summarization quality by ROUGE package
Stars: ✭ 155 (+162.71%)
Mutual labels:  text-summarization
Textrank
TextRank implementation for Python 3.
Stars: ✭ 1,008 (+1608.47%)
Mutual labels:  text-summarization
email-summarization
A module for E-mail Summarization which uses clustering of skip-thought sentence embeddings.
Stars: ✭ 81 (+37.29%)
Mutual labels:  text-summarization
Kr Wordrank
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
Stars: ✭ 182 (+208.47%)
Mutual labels:  text-summarization
Discobert
Code for paper "Discourse-Aware Neural Extractive Text Summarization" (ACL20)
Stars: ✭ 120 (+103.39%)
Mutual labels:  text-summarization
Textsum Gan
Tensorflow re-implementation of GAN for text summarization
Stars: ✭ 111 (+88.14%)
Mutual labels:  text-summarization
Skip Thought Tf
An implementation of skip-thought vectors in Tensorflow
Stars: ✭ 77 (+30.51%)
Mutual labels:  text-summarization
Rouge 2.0
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Stars: ✭ 167 (+183.05%)
Mutual labels:  text-summarization
Extendedsumm
On Generating Extended Summaries of Long Documents
Stars: ✭ 63 (+6.78%)
Mutual labels:  text-summarization
Text summarization with tensorflow
Implementation of a seq2seq model for summarization of textual data. Demonstrated on amazon reviews, github issues and news articles.
Stars: ✭ 226 (+283.05%)
Mutual labels:  text-summarization
Senpai
💨Making communication📞easier and faster🚅for all 👦 + 👧 + 👴 + 👶 + 🐮 + 🐦 + 🐱
Stars: ✭ 43 (-27.12%)
Mutual labels:  text-summarization
Textsum
Preparing a dataset for TensorFlow text summarization (TextSum) model.
Stars: ✭ 140 (+137.29%)
Mutual labels:  text-summarization
Transformersum
Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.
Stars: ✭ 107 (+81.36%)
Mutual labels:  text-summarization
TextSumma
reimplementing Neural Summarization by Extracting Sentences and Words
Stars: ✭ 16 (-72.88%)
Mutual labels:  text-summarization
Nlg Yongzhuo
中文文本生成(NLG)之文本摘要(text summarization)工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。(graph,feature,topic model,summarize tool or tookit)
Stars: ✭ 175 (+196.61%)
Mutual labels:  text-summarization

Comparison of MMR and LexRank Automatic Text Summarization approaches

Automatic summarization techniques are used to automate the process of summarizing document(s) to form a relatively shorter summary that conveys the most important information from the original larger text. Multi-document summarization in particular is used to extract summary from multiple documents written about the same topic. Here we have to tried to implement and compare two very commonly used techniques for multi-document automatic text summarization:

Implementation details

We have implemented both MMR and LexRank algorithms in python. For evaluation purpose we have used the DUC2004 data corpus which contains two sets of documents.

  • Documents/clusters: contains 50 different topics each containing on average 10 news articles.

  • Manual summaries: manually created summary for each of the 50 topics.

The generated summaries are evaluated against the human summaries using the ROUGE toolkit. The ROUGE scores help to compare the efficiency of the individual summarization systems. However we have also performed an analysis of how much similar (overlap) the summaries generated by each of these systems are by calculating Jaccard coefficient score for sentence level and word level overlap.

System/software requirements

We had implemented both MMR and LexRank and ran the evaluations (both ROUGE and Jaccard evaluations) on Ubuntu 14.04. The following packages were installed as part of this on the Ubuntu OS:

Files and Folders

Folder Description
root root folder of project containing all required files and folders
Documents news articles relating to the 50 topics (each topic containing 10 articles)
Humman_Summaries human summaries used to evaluate the quality of system generated
Lexrank_results folder which holds the system generated summaries of LexRank
MMR_results folder which holds the system generated summaries of MMR
LexRank.py LexRank summarizer implementation
mmr_summarizer.py MMR summarizer implementation
sentence.py sentence class for modelling sentences in the document cluster
jaccardScore.py for generating jaccard coefficient at word and sentence level
test_pyrouge.py for generating the ROUGE scores for the system summaries

How to run:

  • For generating the MMR system summaries run the mmr_summarizer.py. The results will be generated in the MMR_results folder.

  • For generating the LexRank system summaries run the LexRank.py. The results will be generated in the Lexrank_results folder.

  • For generating the ROUGE scores run the test_pyrouge.py. Results will be displayed on the terminal

  • For generating the Jaccard coefficient scores run the jaccardScore.py. Both word and sentence level scores will be displayed on the screen

NOTE:

The documents from DUC2004 have not been added here. These documents can be obtained from here.

This work was done as part of the CAP6640: Natural Language Processing course at UCF in Spring 2016 along with Amar Nair and Syed Ahmed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].