All Projects → ruidan → Unsupervised Aspect Extraction

ruidan / Unsupervised Aspect Extraction

Licence: apache-2.0
Code for acl2017 paper "An unsupervised neural attention model for aspect extraction"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Unsupervised Aspect Extraction

TopicNet
Interface for easier topic modelling.
Stars: ✭ 127 (-54.15%)
Mutual labels:  topic-modeling
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-89.17%)
Mutual labels:  topic-modeling
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-88.09%)
Mutual labels:  topic-modeling
twic
Topic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models
Stars: ✭ 51 (-81.59%)
Mutual labels:  topic-modeling
learning-stm
Learning structural topic modeling using the stm R package.
Stars: ✭ 103 (-62.82%)
Mutual labels:  topic-modeling
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (-85.92%)
Mutual labels:  topic-modeling
JoSH
[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (-80.14%)
Mutual labels:  topic-modeling
latent-semantic-analysis
Pipeline for training LSA models using Scikit-Learn.
Stars: ✭ 20 (-92.78%)
Mutual labels:  topic-modeling
tassal
Tree-based Autofolding Software Summarization Algorithm
Stars: ✭ 38 (-86.28%)
Mutual labels:  topic-modeling
topic models
implemented : lsa, plsa, lda
Stars: ✭ 80 (-71.12%)
Mutual labels:  topic-modeling
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-90.25%)
Mutual labels:  topic-modeling
gensimr
📝 Topic Modeling for Humans
Stars: ✭ 35 (-87.36%)
Mutual labels:  topic-modeling
TAKG
The official implementation of ACL 2019 paper "Topic-Aware Neural Keyphrase Generation for Social Media Language"
Stars: ✭ 127 (-54.15%)
Mutual labels:  topic-modeling
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (-46.93%)
Mutual labels:  topic-modeling
pydataberlin-2017
Repo for my talk at the PyData Berlin 2017 conference
Stars: ✭ 63 (-77.26%)
Mutual labels:  topic-modeling
ctpfrec
Python implementation of "Content-based recommendations with poisson factorization", with some extensions
Stars: ✭ 31 (-88.81%)
Mutual labels:  topic-modeling
Twitter-Trends
Twitter Trends is a web-based application that automatically detects and analyzes emerging topics in real time through hashtags and user mentions in tweets. Twitter being the major microblogging service is a reliable source for trends detection. The project involved extracting live streaming tweets, processing them to find top hashtags and user …
Stars: ✭ 82 (-70.4%)
Mutual labels:  topic-modeling
Lda
LDA topic modeling for node.js
Stars: ✭ 262 (-5.42%)
Mutual labels:  topic-modeling
topicApp
A simple Shiny App for Topic Modeling in R
Stars: ✭ 40 (-85.56%)
Mutual labels:  topic-modeling
abae-pytorch
PyTorch implementation of 'An Unsupervised Neural Attention Model for Aspect Extraction' by He et al. ACL2017'
Stars: ✭ 52 (-81.23%)
Mutual labels:  topic-modeling

Unsupervised Aspect Extraction

Codes and Dataset for ACL2017 paper ‘‘An unsupervised neural attention model for aspect extraction’’. (pdf)

Data

You can find the pre-processed datasets and the pre-trained word embeddings in [Download]. The zip file should be decompressed and put in the main folder.

You can also download the original datasets of Restaurant domain and Beer domain in [Download]. For preprocessing, put the decompressed zip file in the main folder and run

python preprocess.py
python word2vec.py

respectively in code/ . The preprocessed files and trained word embeddings for each domain will be saved in a folder preprocessed_data/.

Train

Under code/ and type the following command for training:

THEANO_FLAGS="device=gpu0,floatX=float32" python train.py \
--emb ../preprocessed_data/$domain/w2v_embedding \
--domain $domain \
-o output_dir \

where $domain in ['restaurant', 'beer'] is the corresponding domain, --emb is the path to the pre-trained word embeddings, -o is the path of the output directory. You can find more arguments/hyper-parameters defined in train.py with default values used in our experiments.

After training, two output files will be saved in code/output_dir/$domain/: 1) aspect.log contains extracted aspects with top 100 words for each of them. 2) model_param contains the saved model weights

Evaluation

Under code/ and type the following command:

THEANO_FLAGS="device=gpu0,floatX=float32" python evaluation.py \
--domain $domain \
-o output_dir \

Note that you should keep the values of arguments for evaluation the same as those for training (except --emb, you don't need to specify it), as we need to first rebuild the network architecture and then load the saved model weights.

This will output a file att_weights that contains the attention weights on all test sentences in code/output_dir/$domain.

To assign each test sentence a gold aspect label, you need to first manually map each inferred aspect to a gold aspect label according to its top words, and then uncomment the bottom part in evaluation.py (line 136-144) for evaluaton using F scores.

One example of trained model for the restaurant domain has been put in pre_trained_model/restaurant/, and the corresponding aspect mapping has been provided in evaluation.py (line 136-139). You can uncomment line 28 in evaluation.py and run the above command to evaluate the trained model.

Dependencies

python 2

  • keras 1.2.1
  • theano 0.9.0
  • numpy 1.13.3

See also requirements.txt You can install prerequirements, using the following command.

pip install -r requirements.txt

Cite

If you use the code, please cite the following paper:

@InProceedings{he-EtAl:2017:Long2,
  author    = {He, Ruidan  and  Lee, Wee Sun  and  Ng, Hwee Tou  and  Dahlmeier, Daniel},
  title     = {An Unsupervised Neural Attention Model for Aspect Extraction},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month     = {July},
  year      = {2017},
  address   = {Vancouver, Canada},
  publisher = {Association for Computational Linguistics}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].