All Projects → rktamplayo → PlanSum

rktamplayo / PlanSum

Licence: MIT license
[AAAI2021] Unsupervised Opinion Summarization with Content Planning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PlanSum

Copycat-abstractive-opinion-summarizer
ACL 2020 Unsupervised Opinion Summarization as Copycat-Review Generation
Stars: ✭ 76 (+204%)
Mutual labels:  amazon, reviews, yelp, summarization, natural-language-generation, abstractive-text-summarization, abstractive-summarization, opinion-summarization
Entity2Topic
[NAACL2018] Entity Commonsense Representation for Neural Abstractive Summarization
Stars: ✭ 20 (-20%)
Mutual labels:  text-generation, text-summarization, summarization, natural-language-generation, abstractive-summarization
gazeta
Gazeta: Dataset for automatic summarization of Russian news / Газета: набор данных для автоматического реферирования на русском языке
Stars: ✭ 25 (+0%)
Mutual labels:  text-summarization, summarization, abstractive-text-summarization, abstractive-summarization
DocSum
A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.
Stars: ✭ 58 (+132%)
Mutual labels:  text-summarization, summarization, abstractive-text-summarization, abstractive-summarization
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (+472%)
Mutual labels:  sentiment-analysis, text-generation, summarization
SelSum
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
Stars: ✭ 36 (+44%)
Mutual labels:  amazon, summarization, opinion-mining
xl-sum
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Stars: ✭ 160 (+540%)
Mutual labels:  text-summarization, abstractive-text-summarization, abstractive-summarization
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+8348%)
Mutual labels:  sentiment-analysis, text-summarization
Paribhasha
paribhasha.herokuapp.com/
Stars: ✭ 21 (-16%)
Mutual labels:  sentiment-analysis, summarization
data-summ-cnn dailymail
non-anonymized cnn/dailymail dataset for text summarization
Stars: ✭ 12 (-52%)
Mutual labels:  summarization, abstractive-text-summarization
opinionMining
Opinion Mining/Sentiment Analysis Classifier using Genetic Programming
Stars: ✭ 13 (-48%)
Mutual labels:  sentiment-analysis, opinion-mining
Nlp Papers
Papers and Book to look at when starting NLP 📚
Stars: ✭ 111 (+344%)
Mutual labels:  sentiment-analysis, summarization
Sa Papers
📄 Deep Learning 中 Sentiment Analysis 論文統整與分析 😀😡☹️😭🙄🤢
Stars: ✭ 111 (+344%)
Mutual labels:  sentiment-analysis, summarization
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+4428%)
Mutual labels:  sentiment-analysis, text-summarization
Harvesttext
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Stars: ✭ 956 (+3724%)
Mutual labels:  sentiment-analysis, text-summarization
factsumm
FactSumm: Factual Consistency Scorer for Abstractive Summarization
Stars: ✭ 83 (+232%)
Mutual labels:  summarization, abstractive-summarization
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (+52%)
Mutual labels:  text-summarization, abstractive-summarization
nlp-akash
Natural Language Processing notes and implementations.
Stars: ✭ 66 (+164%)
Mutual labels:  text-summarization, summarization
FewSum
Few-shot learning framework for opinion summarization published at EMNLP 2020.
Stars: ✭ 29 (+16%)
Mutual labels:  summarization, opinion-summarization
amazon-reviews
Sentiment Analysis & Topic Modeling with Amazon Reviews
Stars: ✭ 26 (+4%)
Mutual labels:  sentiment-analysis, amazon

PlanSum

[AAAI2021] Unsupervised Opinion Summarization with Content Planning

This PyTorch code was used in the experiments of the research paper

Reinald Kim Amplayo, Stefanos Angelidis, and Mirella Lapata. Unsupervised Opinion Summarization with Content Planning. AAAI, 2021.

The code is cleaned post-acceptance and may run into some errors. Although I did some quick check and saw the code ran fine, please create an issue if you encounter errors and I will try to fix them as soon as possible.

Data

We used three different datasets from three different papers: Amazon (Brazinskas et al., 2020), Rotten Tomatoes (Wang and Ling, 2016), and Yelp (Chu and Liu, 2019). For convenience, we provide the train/dev/test datasets here which are preprocessed accordingly and saved in three separate json files. A file contains a list of instances, where one instance is formatted as follows:

{
    "reviews": [
       ["this is the first review.", 5],
       ["this is the second review.", 3],
       "..."
    ],
    "summary": "this is the first summary.",
    "..."
}

In the example above, reviews is a list of review-rating tuples. For Amazon dev/test files, the summary is instead a list of reference summaries. There can be other information included in the files but are not used in the code (e.g., category and prod_id in the Amazon datasets). When using the datasets, please also cite the corresponding papers (listed below).

Running the code

PlanSum follows a Condense-Abstract Framework (Amplayo and Lapata, 2019) where we first condense the reviews into encodings and then use the encodings as input to a summarization model. In PlanSum, the content plan induction model is the Condense model, while the opinion summarization model is the Abstract model. Below, we show a step-by-step procedure on how to run and generate summaries using PlanSum on the Yelp dataset.

Step 0: Download the datasets

Download the preprocessed datasets here. You can also skip steps by downloading the model files in the model/ directory and the train.plan.json files in the data/ directory.

Step 1: Train the content plan induction model

This can be done by simply running src/train_condense.py in the form:

python src/train_condense.py -mode=train -data_type=yelp

This will create a model/ directory and a model file named condense.model. There are multiple arguments that need to be set, but the default setting is fine for Yelp. The settings used for Amazon and Rotten Tomatoes are commented in the code.

Step 2: Create the synthetic training dataset

PlanSum uses a synthetic data creation method where we sample reviews from the corpus and transform them into review-summary pairs. To do this, we use the same code -mode=create, i.e.

python src/train_condense.py -mode=create -data_type=yelp

This will create a new json file named train.plan.json in the data/yelp/ directory. This is the synthetic training dataset used to train the summarization model.

Step 3: Train the summarization model

This is done by simply running src/train_abstract.py:

python src/train_abstract.py -mode=train -data_type=yelp

This will create a model file named abstract.model in the model/ directory. There are also arguments here that need to be set, but the default setting is fine for Yelp. Settings used for other datasets are commented in the code.

Step 4: Generate the summaries

Generating the summaries can be done by running:

python src/train_abstract.py -mode=eval -data_type=yelp

This will create an output/ directory and a file containing the summaries named predictions.txt.

I just want your summaries!

This repo also include an output/ directory which includes the generated summaries from five different systems:

  • gold.sol contains the gold-standard summaries
  • plansum.sol contains summaries produced by PlanSum (this paper)
  • denoisesum.sol contains summaries produced by DenoiseSum (Amplayo and Lapata, 2020)
  • copycat.sol contains summaries produced by CopyCat (Brazinskas et al., 2020)
  • bertcent.sol contains summaries produced by BertCent (this paper)

Please do rightfully cite the corresponding papers when using these outputs (e.g., by comparing them with your model's).

Cite the necessary papers

To cite the paper/code/data splits, please use this BibTeX:

@inproceedings{amplayo2021unsupervised,
	Author = {Amplayo, Reinald Kim and Angelidis, Stefanos and Lapata, Mirella},
	Booktitle = {AAAI},
	Year = {2021},
	Title = {Unsupervised Opinion Summarization with Content Planning},
}

If using the datasets, please also cite the original authors of the datasets:

@inproceedings{bravzinskas2020unsupervised,
	Author = {Bra{\v{z}}inskas, Arthur and Lapata, Mirella and Titov, Ivan},
	Booktitle = {ACL},
	Year = {2020},
	Title = {Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation},
}
@inproceedings{wang2016neural,
	Author = {Wang, Lu and Ling, Wang},
	Booktitle = {NAACL},
	Year = {2016},
	Title = {Neural Network-Based Abstract Generation for Opinions and Arguments},
}
@inproceedings{chu2019meansum,
	Author = {Chu, Eric and Liu, Peter},
	Booktitle = {ICML},
	Year = {2019},
	Title = {{M}ean{S}um: A Neural Model for Unsupervised Multi-Document Abstractive Summarization},
}

If there are any questions, please send me an email: reinald.kim at ed dot ac dot uk

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].