All Projects → guokr → Caver

guokr / Caver

Licence: GPL-3.0 license
Caver: a toolkit for multilabel text classification.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Caver

extremeText
Library for fast text representation and extreme classification.
Stars: ✭ 141 (+271.05%)
Mutual labels:  text-classification, multi-label-classification
classifier multi label
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification
Stars: ✭ 127 (+234.21%)
Mutual labels:  text-classification, multi-label-classification
Text Classification Pytorch
Text classification using deep learning models in Pytorch
Stars: ✭ 683 (+1697.37%)
Mutual labels:  text-classification, attention-model
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (+13.16%)
Mutual labels:  text-classification
kaggle-human-protein-atlas-image-classification
Kaggle 2018 @ Human Protein Atlas Image Classification
Stars: ✭ 34 (-10.53%)
Mutual labels:  multi-label-classification
WSDM-Cup-2019
[ACM-WSDM] 3rd place solution at WSDM Cup 2019, Fake News Classification on Kaggle.
Stars: ✭ 62 (+63.16%)
Mutual labels:  text-classification
Reuters-21578-Classification
Text classification with Reuters-21578 datasets using Gensim Word2Vec and Keras LSTM
Stars: ✭ 44 (+15.79%)
Mutual labels:  text-classification
classification
Vietnamese Text Classification
Stars: ✭ 39 (+2.63%)
Mutual labels:  text-classification
small-text
Active Learning for Text Classification in Python
Stars: ✭ 241 (+534.21%)
Mutual labels:  text-classification
HiGRUs
Implementation of the paper "Hierarchical GRU for Utterance-level Emotion Recognition" in NAACL-2019.
Stars: ✭ 60 (+57.89%)
Mutual labels:  text-classification
attention-mechanism-keras
attention mechanism in keras, like Dense and RNN...
Stars: ✭ 19 (-50%)
Mutual labels:  attention-model
napkinXC
Extremely simple and fast extreme multi-class and multi-label classifiers.
Stars: ✭ 38 (+0%)
Mutual labels:  multi-label-classification
10kGNAD
Ten Thousand German News Articles Dataset for Topic Classification
Stars: ✭ 63 (+65.79%)
Mutual labels:  text-classification
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-21.05%)
Mutual labels:  text-classification
Naive-Bayes-Text-Classifier-in-Java
Naive Bayes Classification used to classify movie reviews as positive or negative
Stars: ✭ 18 (-52.63%)
Mutual labels:  text-classification
MetaLifelongLanguage
Repository containing code for the paper "Meta-Learning with Sparse Experience Replay for Lifelong Language Learning".
Stars: ✭ 21 (-44.74%)
Mutual labels:  text-classification
nsmc-zeppelin-notebook
Movie review dataset Word2Vec & sentiment classification Zeppelin notebook
Stars: ✭ 26 (-31.58%)
Mutual labels:  text-classification
nlp classification
Implementing nlp papers relevant to classification with PyTorch, gluonnlp
Stars: ✭ 224 (+489.47%)
Mutual labels:  text-classification
MetaCat
Minimally Supervised Categorization of Text with Metadata (SIGIR'20)
Stars: ✭ 52 (+36.84%)
Mutual labels:  text-classification
textgo
Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!
Stars: ✭ 33 (-13.16%)
Mutual labels:  text-classification

Caver

Rising a torch in the cave to see the words on the wall, tag your short text in 3 lines. Caver uses Facebook's PyTorch project to make the implementation easier.

Pypi package GitHub release GitHub issues Travis CI

DemoRequirementsInstallPre-trained modelsTrainExamplesDocument

Quick Demo

from caver import CaverModel
model = CaverModel("./checkpoint_path")

sentence = ["看 美 剧 学 英 语 靠 谱 吗",
            "科 比 携 手 姚 明 出 任 2019 篮 球 世 界 杯 全 球 大 使",
            "如 何 在 《 权 力 的 游 戏 》 中 苟 到 最 后",
            "英 雄 联 盟 LPL 夏 季 赛 RNG 能 否 击 败 TOP 战 队"]

model.predict([sentence[0]], top_k=3)
>>> ['美剧', '英语', '英语学习']

model.predict([sentence[1]], top_k=5)
>>> ['篮球', 'NBA', '体育', 'NBA 球员', '运动']

model.predict([sentence[2]], top_k=7)
>>> ['权力的游戏(美剧)', '美剧', '影视评论', '电视剧', '电影', '文学', '小说']

model.predict([sentence[3]], top_k=6)
>>> ['英雄联盟(LoL)', '电子竞技', '英雄联盟职业联赛(LPL)', '游戏', '网络游戏', '多人联机在线竞技游戏 (MOBA)']

Requirements

  • PyTorch
  • tqdm
  • torchtext
  • numpy
  • Python3

Install

$ pip install caver --user

Did you guys have some pre-trained models

Yes, we have released two pre-trained models on Zhihu NLPCC2018 opendataset.

If you want to use the pre-trained model for performing text tagging, you can download it (along with other important inference material) from the Caver releases page. Alternatively, you can run the following command to download and unzip the files in your current directory:

$ wget -O - https://github.com/guokr/Caver/releases/download/0.1/checkpoints_char_cnn.tar.gz | tar zxvf -
$ wget -O - https://github.com/guokr/Caver/releases/download/0.1/checkpoints_char_lstm.tar.gz | tar zxvf -

How to train on your own dataset

$ python3 train.py --input_data_dir {path to your origin dataset}
                   --output_data_dir {path to store the preprocessed dataset}
                   --train_filename train.tsv
                   --valid_filename valid.tsv
                   --checkpoint_dir {path to save the checkpoints}
                   --model {fastText/CNN/LSTM}
                   --batch_size {16, you can modify this for you own}
                   --epoch {10}

More Examples

It's updating, but basically you can check examples.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].