rktamplayo / PlanSum

Licence: MIT license

[AAAI2021] Unsupervised Opinion Summarization with Content Planning

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PlanSum

Copycat-abstractive-opinion-summarizer

ACL 2020 Unsupervised Opinion Summarization as Copycat-Review Generation

Stars: ✭ 76 (+204%)

Mutual labels: amazon, reviews, yelp, summarization, natural-language-generation, abstractive-text-summarization, abstractive-summarization, opinion-summarization

Entity2Topic

[NAACL2018] Entity Commonsense Representation for Neural Abstractive Summarization

Stars: ✭ 20 (-20%)

Mutual labels: text-generation, text-summarization, summarization, natural-language-generation, abstractive-summarization

gazeta

Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке

Stars: ✭ 25 (+0%)

Mutual labels: text-summarization, summarization, abstractive-text-summarization, abstractive-summarization

DocSum

A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.

Stars: ✭ 58 (+132%)

Mutual labels: text-summarization, summarization, abstractive-text-summarization, abstractive-summarization

Onnxt5

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Stars: ✭ 143 (+472%)

Mutual labels: sentiment-analysis, text-generation, summarization

SelSum

Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Stars: ✭ 36 (+44%)

Mutual labels: amazon, summarization, opinion-mining

xl-sum

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

Stars: ✭ 160 (+540%)

Mutual labels: text-summarization, abstractive-text-summarization, abstractive-summarization

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+8348%)

Mutual labels: sentiment-analysis, text-summarization

Paribhasha

paribhasha.herokuapp.com/

Stars: ✭ 21 (-16%)

Mutual labels: sentiment-analysis, summarization

data-summ-cnn dailymail

non-anonymized cnn/dailymail dataset for text summarization

Stars: ✭ 12 (-52%)

Mutual labels: summarization, abstractive-text-summarization

opinionMining

Opinion Mining/Sentiment Analysis Classifier using Genetic Programming

Stars: ✭ 13 (-48%)

Mutual labels: sentiment-analysis, opinion-mining

Nlp Papers

Papers and Book to look at when starting NLP 📚

Stars: ✭ 111 (+344%)

Mutual labels: sentiment-analysis, summarization

Sa Papers

📄 Deep Learning 中 Sentiment Analysis 論文統整與分析 😀😡☹️😭🙄🤢

Stars: ✭ 111 (+344%)

Mutual labels: sentiment-analysis, summarization

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+4428%)

Mutual labels: sentiment-analysis, text-summarization

Harvesttext

文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法

Stars: ✭ 956 (+3724%)

Mutual labels: sentiment-analysis, text-summarization

factsumm

FactSumm: Factual Consistency Scorer for Abstractive Summarization

Stars: ✭ 83 (+232%)

Mutual labels: summarization, abstractive-summarization

Text-Summarization

Abstractive and Extractive Text summarization using Transformers.

Stars: ✭ 38 (+52%)

Mutual labels: text-summarization, abstractive-summarization

nlp-akash

Natural Language Processing notes and implementations.

Stars: ✭ 66 (+164%)

Mutual labels: text-summarization, summarization

FewSum

Few-shot learning framework for opinion summarization published at EMNLP 2020.

Stars: ✭ 29 (+16%)

Mutual labels: summarization, opinion-summarization

amazon-reviews

Sentiment Analysis & Topic Modeling with Amazon Reviews

Stars: ✭ 26 (+4%)

Mutual labels: sentiment-analysis, amazon

View All Similar Projects ➔

PlanSum

[AAAI2021] Unsupervised Opinion Summarization with Content Planning

This PyTorch code was used in the experiments of the research paper

Reinald Kim Amplayo, Stefanos Angelidis, and Mirella Lapata. Unsupervised Opinion Summarization with Content Planning. AAAI, 2021.

The code is cleaned post-acceptance and may run into some errors. Although I did some quick check and saw the code ran fine, please create an issue if you encounter errors and I will try to fix them as soon as possible.

Data

We used three different datasets from three different papers: Amazon (Brazinskas et al., 2020), Rotten Tomatoes (Wang and Ling, 2016), and Yelp (Chu and Liu, 2019). For convenience, we provide the train/dev/test datasets here which are preprocessed accordingly and saved in three separate json files. A file contains a list of instances, where one instance is formatted as follows:

{
    "reviews": [
       ["this is the first review.", 5],
       ["this is the second review.", 3],
       "..."
    ],
    "summary": "this is the first summary.",
    "..."
}

In the example above, reviews is a list of review-rating tuples. For Amazon dev/test files, the summary is instead a list of reference summaries. There can be other information included in the files but are not used in the code (e.g., category and prod_id in the Amazon datasets). When using the datasets, please also cite the corresponding papers (listed below).

Running the code

PlanSum follows a Condense-Abstract Framework (Amplayo and Lapata, 2019) where we first condense the reviews into encodings and then use the encodings as input to a summarization model. In PlanSum, the content plan induction model is the Condense model, while the opinion summarization model is the Abstract model. Below, we show a step-by-step procedure on how to run and generate summaries using PlanSum on the Yelp dataset.

Step 0: Download the datasets

Download the preprocessed datasets here. You can also skip steps by downloading the model files in the model/ directory and the train.plan.json files in the data/ directory.

Step 1: Train the content plan induction model

This can be done by simply running src/train_condense.py in the form:

python src/train_condense.py -mode=train -data_type=yelp

This will create a model/ directory and a model file named condense.model. There are multiple arguments that need to be set, but the default setting is fine for Yelp. The settings used for Amazon and Rotten Tomatoes are commented in the code.

Step 2: Create the synthetic training dataset

PlanSum uses a synthetic data creation method where we sample reviews from the corpus and transform them into review-summary pairs. To do this, we use the same code -mode=create, i.e.

python src/train_condense.py -mode=create -data_type=yelp

This will create a new json file named train.plan.json in the data/yelp/ directory. This is the synthetic training dataset used to train the summarization model.

Step 3: Train the summarization model

This is done by simply running src/train_abstract.py:

python src/train_abstract.py -mode=train -data_type=yelp

This will create a model file named abstract.model in the model/ directory. There are also arguments here that need to be set, but the default setting is fine for Yelp. Settings used for other datasets are commented in the code.

Step 4: Generate the summaries

Generating the summaries can be done by running:

python src/train_abstract.py -mode=eval -data_type=yelp

This will create an output/ directory and a file containing the summaries named predictions.txt.

I just want your summaries!

This repo also include an output/ directory which includes the generated summaries from five different systems:

gold.sol contains the gold-standard summaries
plansum.sol contains summaries produced by PlanSum (this paper)
denoisesum.sol contains summaries produced by DenoiseSum (Amplayo and Lapata, 2020)
copycat.sol contains summaries produced by CopyCat (Brazinskas et al., 2020)
bertcent.sol contains summaries produced by BertCent (this paper)

Please do rightfully cite the corresponding papers when using these outputs (e.g., by comparing them with your model's).

Cite the necessary papers

To cite the paper/code/data splits, please use this BibTeX:

@inproceedings{amplayo2021unsupervised,
	Author = {Amplayo, Reinald Kim and Angelidis, Stefanos and Lapata, Mirella},
	Booktitle = {AAAI},
	Year = {2021},
	Title = {Unsupervised Opinion Summarization with Content Planning},
}

If using the datasets, please also cite the original authors of the datasets:

@inproceedings{bravzinskas2020unsupervised,
	Author = {Bra{\v{z}}inskas, Arthur and Lapata, Mirella and Titov, Ivan},
	Booktitle = {ACL},
	Year = {2020},
	Title = {Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation},
}

@inproceedings{wang2016neural,
	Author = {Wang, Lu and Ling, Wang},
	Booktitle = {NAACL},
	Year = {2016},
	Title = {Neural Network-Based Abstract Generation for Opinions and Arguments},
}

@inproceedings{chu2019meansum,
	Author = {Chu, Eric and Liu, Peter},
	Booktitle = {ICML},
	Year = {2019},
	Title = {{M}ean{S}um: A Neural Model for Unsupervised Multi-Document Abstractive Summarization},
}

If there are any questions, please send me an email: reinald.kim at ed dot ac dot uk

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

rktamplayo / PlanSum

Programming Languages

Labels

Projects that are alternatives of or similar to PlanSum

PlanSum

Data

Running the code

Step 0: Download the datasets

Step 1: Train the content plan induction model

Step 2: Create the synthetic training dataset

Step 3: Train the summarization model

Step 4: Generate the summaries

I just want your summaries!

Cite the necessary papers