This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en

Stars: ✭ 142 (+111.94%)

Mutual labels: text-classification

ebe-dataset

Evidence-based Explanation Dataset (AACL-IJCNLP 2020)

Stars: ✭ 16 (-76.12%)

Mutual labels: text-classification

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (-50.75%)

Mutual labels: text-classification

DaDengAndHisPython

【微信公众号：大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]

Stars: ✭ 59 (-11.94%)

Mutual labels: text-classification

Kaggle-Twitter-Sentiment-Analysis

Kaggle Twitter Sentiment Analysis Competition

Stars: ✭ 18 (-73.13%)

Mutual labels: text-classification

Binary-Text-Classification-Doc2vec-SVM

A Python implementation of a binary text classifier using Doc2Vec and SVM

Stars: ✭ 16 (-76.12%)

Mutual labels: text-classification

Awesome-Weakly-Supervised-Temporal-Action-Localization

A curated publication list on weakly-supervised temporal action localization

Stars: ✭ 65 (-2.99%)

Mutual labels: weakly-supervised-learning

opentc

OpenTC is a text classification engine using several algorithms in machine learning

Stars: ✭ 27 (-59.7%)

Mutual labels: text-classification

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-67.16%)

Mutual labels: text-classification

fake-news-detection

This repo is a collection of AWESOME things about fake news detection, including papers, code, etc.

Stars: ✭ 34 (-49.25%)

Mutual labels: text-classification

View All Similar Projects ➔

WeSTClass

The source code used for Weakly-Supervised Neural Text Classification, published in CIKM 2018.

Requirements

Before running, you need to first install the required packages by typing following commands:

$ pip3 install -r requirements.txt

Python 3.6 is strongly recommended; using older python versions might lead to package incompatibility issues.

Quick Start

python main.py --dataset ${dataset} --sup_source ${sup_source} --model ${model}

where you need to specify the dataset in ${dataset}, the weak supervision type in ${sup_source} (could be one of ['labels', 'keywords', 'docs']), and the type of neural model to use in ${model} (could be one of ['cnn', 'rnn']).

An example run is provided in test.sh, which can be executed by

./test.sh

More advanced settings on training and hyperparameters are commented in main.py.

Inputs

The weak supervision sources ${sup_source} can come from any of the following:

Label surface names (labels); you need to provide class names for each class in ./${dataset}/classes.txt, where each line begins with the class id (starting from 0), followed by a colon, and then the class label surface name.
Class-related keywords (keywords); you need to provide class-related keywords for each class in ./${dataset}/keywords.txt, where each line begins with the class id (starting from 0), followed by a colon, and then the class-related keywords separated by commas.
Labeled documents (docs); you need to provide labeled document ids for each class in ./${dataset}/doc_id.txt, where each line begins with the class id (starting from 0), followed by a colon, and then document ids in the corpus (starting from 0) of the corresponding class separated by commas.

Examples are given under ./agnews/ and ./yelp/.

Outputs

The final results (document labels) will be written in ./${dataset}/out.txt, where each line is the class label id for the corresponding document.

Intermediate results (e.g. trained network weights, self-training logs) will be saved under ./results/${dataset}/${model}/.

Running on a New Dataset

To execute the code on a new dataset, you need to

Create a directory named ${dataset}.
Put raw corpus (with or without true labels) under ./${dataset}.
Modify the function read_file in load_data.py so that it returns a list of documents in variable data, and corresponding true labels in variable y (If ground truth labels are not available, simply return y = None).
Modify main.py to accept the new dataset; you need to add ${dataset} to argparse, and then specify parameter settings (e.g. update_interval, pretrain_epochs) for the new dataset.

You can always refer to the example datasets when adapting the code for a new dataset.

Citations

Please cite the following paper if you find the code helpful for your research.

@inproceedings{meng2018weakly,
  title={Weakly-Supervised Neural Text Classification},
  author={Meng, Yu and Shen, Jiaming and Zhang, Chao and Han, Jiawei},
  booktitle={Proceedings of the 27th ACM International Conference on Information and Knowledge Management},
  pages={983--992},
  year={2018},
  organization={ACM}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yumeng5 / WeSTClass

Programming Languages

Labels

Projects that are alternatives of or similar to WeSTClass

WeSTClass

Requirements

Quick Start

Inputs

Outputs

Running on a New Dataset

Citations