All Projects → morningmoni → HiLAP

morningmoni / HiLAP

Licence: other
Code for paper "Hierarchical Text Classification with Reinforced Label Assignment" EMNLP 2019

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to HiLAP

HiGitClass
HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories (ICDM'19)
Stars: ✭ 58 (-50%)
Mutual labels:  text-classification, hierarchical-classification
WeSHClass
[AAAI 2019] Weakly-Supervised Hierarchical Text Classification
Stars: ✭ 83 (-28.45%)
Mutual labels:  text-classification, hierarchical-classification
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (-70.69%)
Mutual labels:  text-classification
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-71.55%)
Mutual labels:  text-classification
augmenty
Augmenty is an augmentation library based on spaCy for augmenting texts.
Stars: ✭ 101 (-12.93%)
Mutual labels:  text-classification
synaptic-simple-trainer
A ready to go text classification trainer based on synaptic (https://github.com/cazala/synaptic)
Stars: ✭ 19 (-83.62%)
Mutual labels:  text-classification
fake-news-detection
This repo is a collection of AWESOME things about fake news detection, including papers, code, etc.
Stars: ✭ 34 (-70.69%)
Mutual labels:  text-classification
monkeylearn-java
Official Java client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Java apps.
Stars: ✭ 23 (-80.17%)
Mutual labels:  text-classification
Kaggle-Twitter-Sentiment-Analysis
Kaggle Twitter Sentiment Analysis Competition
Stars: ✭ 18 (-84.48%)
Mutual labels:  text-classification
DaDengAndHisPython
【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]
Stars: ✭ 59 (-49.14%)
Mutual labels:  text-classification
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-81.03%)
Mutual labels:  text-classification
text-classification-svm
The missing SVM-based text classification module implementing HanLP's interface
Stars: ✭ 46 (-60.34%)
Mutual labels:  text-classification
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-87.07%)
Mutual labels:  text-classification
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-81.03%)
Mutual labels:  text-classification
DeepClassifier
DeepClassifier is aimed at building general text classification model library.It's easy and user-friendly to build any text classification task.
Stars: ✭ 25 (-78.45%)
Mutual labels:  text-classification
opentc
OpenTC is a text classification engine using several algorithms in machine learning
Stars: ✭ 27 (-76.72%)
Mutual labels:  text-classification
Kaggle-project-list
Summary of my projects on kaggle
Stars: ✭ 20 (-82.76%)
Mutual labels:  text-classification
Binary-Text-Classification-Doc2vec-SVM
A Python implementation of a binary text classifier using Doc2Vec and SVM
Stars: ✭ 16 (-86.21%)
Mutual labels:  text-classification
ebe-dataset
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
Stars: ✭ 16 (-86.21%)
Mutual labels:  text-classification
node-fasttext
Nodejs binding for fasttext representation and classification.
Stars: ✭ 39 (-66.38%)
Mutual labels:  text-classification

This repo provides the code with paper "Hierarchical Text Classification with Reinforced Label Assignment" EMNLP 2019.

prediction_animation

HiLAP_architecture

Abstract

While existing hierarchical text classification (HTC) methods attempt to capture label hierarchies for model training, they either make local decisions regarding each label or completely ignore the hierarchy information during inference. To solve the mismatch between training and inference as well as modeling label dependencies in a more principled way, we formulate HTC as a Markov decision process and propose to learn a Label Assignment Policy via deep reinforcement learning to determine where to place an object and when to stop the assignment process. The proposed method, HiLAP, explores the hierarchy during both training and inference time in a consistent manner and makes inter-dependent decisions. As a general framework, HiLAP can incorporate different neural encoders as base models for end-to-end training. Experiments on five public datasets and four base models show that HiLAP yields an average improvement of 33.4% in Macro-F1 over flat classifiers and outperforms state-of-the-art HTC methods by a large margin.

Model

model.py: The main model of HiLAP.

TextCNN.py: Our implementation of "Convolutional Neural Networks for Sentence Classification" EMNLP 2014.

OHCNN(_fast).py: Our implementation of "Effective Use of Word Order for Text Categorization with Convolutional Neural Networks" NAACL 2015.

HAN.py: Our implementation of "Hierarchical Attention Networks for Document Classification" NAACL 2016.

HMCN.py: Our implementation of "Hierarchical Multi-Label Classification Networks" ICML 2018.

Requirements

Python 3

PyTorch 0.3

Data

Due to copyright issues, we can't directly release the datasets used in our experiments. Instead, we provide the links to the five data sources (the first two may require license):

  • RCV1 original release, text data (update: download the text data and convert to docs.txt with format "docid content")
  • NYT
  • Yelp (update: the latest release is different from what we used, pls send an email if you need the version we used)
  • FunGO

Please check readData_*.py to see how to use our scripts to process and generate the datasets from the original data.

Run

All the parameters in conf.py have default values. Change parameters mode, base_model, and dataset and then run main.py to train or test on different settings. To test a model, set load_model=model_file & is_Train=False in conf.py and run main.py.

Cite

@inproceedings{mao-etal-2019-hierarchical,
    title = "Hierarchical Text Classification with Reinforced Label Assignment",
    author = "Mao, Yuning  and
      Tian, Jingjing  and
      Han, Jiawei  and
      Ren, Xiang",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1042",
    doi = "10.18653/v1/D19-1042",
    pages = "445--455",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].