Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → vishnu45 → NLP-Extractive-NEWS-summarization-using-MMR

vishnu45 / NLP-Extractive-NEWS-summarization-using-MMR

Licence: other

A simple python implementation of the Maximal Marginal Relevance (MMR) baseline system for text summarization.

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning natural-language-processing text-summarization mmr lexrank rouge-evaluation multi-document-summarization jaccard-coefficient-scores

Projects that are alternatives of or similar to NLP-Extractive-NEWS-summarization-using-MMR

Scripts-for-extractive-summarization

Scripts for an upcoming blog "Extractive vs. Abstractive Summarization" for RaRe Technologies.

Stars: ✭ 12 (-79.66%)

Mutual labels: text-summarization, rouge-evaluation

Text Summarizer Pytorch

Pytorch implementation of "A Deep Reinforced Model for Abstractive Summarization" paper and pointer generator network

Stars: ✭ 203 (+244.07%)

Mutual labels: text-summarization

Producttitlesummarizationcorpus

Dataset for CIKM 2018 paper "Multi-Source Pointer Network for Product Title Summarization"

Stars: ✭ 61 (+3.39%)

Mutual labels: text-summarization

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+3479.66%)

Mutual labels: text-summarization

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+1818.64%)

Mutual labels: text-summarization

Python wrapper for evaluating summarization quality by ROUGE package

Stars: ✭ 155 (+162.71%)

Mutual labels: text-summarization

TextRank implementation for Python 3.

Stars: ✭ 1,008 (+1608.47%)

Mutual labels: text-summarization

email-summarization

A module for E-mail Summarization which uses clustering of skip-thought sentence embeddings.

Stars: ✭ 81 (+37.29%)

Mutual labels: text-summarization

비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다

Stars: ✭ 182 (+208.47%)

Mutual labels: text-summarization

Code for paper "Discourse-Aware Neural Extractive Text Summarization" (ACL20)

Stars: ✭ 120 (+103.39%)

Mutual labels: text-summarization

Tensorflow re-implementation of GAN for text summarization

Stars: ✭ 111 (+88.14%)

Mutual labels: text-summarization

Skip Thought Tf

An implementation of skip-thought vectors in Tensorflow

Stars: ✭ 77 (+30.51%)

Mutual labels: text-summarization

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.

Stars: ✭ 167 (+183.05%)

Mutual labels: text-summarization

On Generating Extended Summaries of Long Documents

Stars: ✭ 63 (+6.78%)

Mutual labels: text-summarization

Text summarization with tensorflow

Implementation of a seq2seq model for summarization of textual data. Demonstrated on amazon reviews, github issues and news articles.

Stars: ✭ 226 (+283.05%)

Mutual labels: text-summarization

💨Making communication📞easier and faster🚅for all 👦 + 👧 + 👴 + 👶 + 🐮 + 🐦 + 🐱

Stars: ✭ 43 (-27.12%)

Mutual labels: text-summarization

Preparing a dataset for TensorFlow text summarization (TextSum) model.

Stars: ✭ 140 (+137.29%)

Mutual labels: text-summarization

Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.

Stars: ✭ 107 (+81.36%)

Mutual labels: text-summarization

reimplementing Neural Summarization by Extracting Sentences and Words

Stars: ✭ 16 (-72.88%)

Mutual labels: text-summarization

中文文本生成（NLG）之文本摘要（text summarization）工具包, 语料数据(corpus data), 抽取式摘要 Extractive text summary of Lead3、keyword、textrank、text teaser、word significance、LDA、LSI、NMF。（graph，feature，topic model，summarize tool or tookit）

Stars: ✭ 175 (+196.61%)

Mutual labels: text-summarization

View All Similar Projects ➔

Comparison of MMR and LexRank Automatic Text Summarization approaches

Automatic summarization techniques are used to automate the process of summarizing document(s) to form a relatively shorter summary that conveys the most important information from the original larger text. Multi-document summarization in particular is used to extract summary from multiple documents written about the same topic. Here we have to tried to implement and compare two very commonly used techniques for multi-document automatic text summarization:

Maximal Marginal Relevance (MMR)
LexRank

Implementation details

We have implemented both MMR and LexRank algorithms in python. For evaluation purpose we have used the DUC2004 data corpus which contains two sets of documents.

Documents/clusters: contains 50 different topics each containing on average 10 news articles.
Manual summaries: manually created summary for each of the 50 topics.

The generated summaries are evaluated against the human summaries using the ROUGE toolkit. The ROUGE scores help to compare the efficiency of the individual summarization systems. However we have also performed an analysis of how much similar (overlap) the summaries generated by each of these systems are by calculating Jaccard coefficient score for sentence level and word level overlap.

System/software requirements

We had implemented both MMR and LexRank and ran the evaluations (both ROUGE and Jaccard evaluations) on Ubuntu 14.04. The following packages were installed as part of this on the Ubuntu OS:

Files and Folders

Folder	Description
root	root folder of project containing all required files and folders
Documents	news articles relating to the 50 topics (each topic containing 10 articles)
Humman_Summaries	human summaries used to evaluate the quality of system generated
Lexrank_results	folder which holds the system generated summaries of LexRank
MMR_results	folder which holds the system generated summaries of MMR
LexRank.py	LexRank summarizer implementation
mmr_summarizer.py	MMR summarizer implementation
sentence.py	sentence class for modelling sentences in the document cluster
jaccardScore.py	for generating jaccard coefficient at word and sentence level
test_pyrouge.py	for generating the ROUGE scores for the system summaries

How to run:

For generating the MMR system summaries run the mmr_summarizer.py. The results will be generated in the MMR_results folder.
For generating the LexRank system summaries run the LexRank.py. The results will be generated in the Lexrank_results folder.
For generating the ROUGE scores run the test_pyrouge.py. Results will be displayed on the terminal
For generating the Jaccard coefficient scores run the jaccardScore.py. Both word and sentence level scores will be displayed on the screen

NOTE:

The documents from DUC2004 have not been added here. These documents can be obtained from here.

This work was done as part of the CAP6640: Natural Language Processing course at UCF in Spring 2016 along with Amar Nair and Syed Ahmed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 59

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗