All Projects → artitw → text2class

artitw / text2class

Licence: Apache-2.0 License
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to text2class

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (+60%)
Mutual labels:  text-classification, transformers, classification, bert
Nlp Recipes
Natural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+38453.33%)
Mutual labels:  text-classification, text, natural-language-understanding
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (+2220%)
Mutual labels:  text-classification, text, classification
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (+126.67%)
Mutual labels:  text-classification, transformers, bert
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+906.67%)
Mutual labels:  text-classification, transformers, bert
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (+46.67%)
Mutual labels:  text-classification, transformers, bert
ML4K-AI-Extension
Use machine learning in AppInventor, with easy training using text, images, or numbers through the Machine Learning for Kids website.
Stars: ✭ 18 (+20%)
Mutual labels:  classifier, text-classification, classification
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+846.67%)
Mutual labels:  classifier, text-classification, classification
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+1426.67%)
Mutual labels:  text-classification, transformers, bert
label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
Stars: ✭ 117 (+680%)
Mutual labels:  transformers, bert, natural-language-understanding
WSDM-Cup-2019
[ACM-WSDM] 3rd place solution at WSDM Cup 2019, Fake News Classification on Kaggle.
Stars: ✭ 62 (+313.33%)
Mutual labels:  text-classification, bert, natural-language-understanding
Tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Stars: ✭ 5,077 (+33746.67%)
Mutual labels:  transformers, bert, natural-language-understanding
classy
classy is a simple-to-use library for building high-performance Machine Learning models in NLP.
Stars: ✭ 61 (+306.67%)
Mutual labels:  transformers, bert, natural-language-understanding
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+16686.67%)
Mutual labels:  text-classification, transformers, bert
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (+153.33%)
Mutual labels:  classifier, text-classification, classification
classy
Super simple text classifier using Naive Bayes. Plug-and-play, no dependencies
Stars: ✭ 12 (-20%)
Mutual labels:  classifier, text, classification
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (+466.67%)
Mutual labels:  text-classification, transformers, bert
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+1260%)
Mutual labels:  text-classification, text
muse-as-service
REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.
Stars: ✭ 45 (+200%)
Mutual labels:  text, bert
dl-relu
Deep Learning using Rectified Linear Units (ReLU)
Stars: ✭ 20 (+33.33%)
Mutual labels:  classifier, classification

Text2Class

Build multi-class text classifiers using state-of-the-art pre-trained contextualized language models, e.g. BERT. Only a few hundred samples per class are necessary to get started.

Background

This project is based on our study: Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models.

Citation

To cite this work, use the following BibTeX citation.

@article{transfer2019multiclass,
  title={Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models},
  author={Liu, Xinyi and Wangperawong, Artit},
  journal={arXiv preprint arXiv:1909.03564},
  year={2019}
}

Installation

pip install text2class

Example usage

Create a dataframe with two columns, such as 'text' and 'label'. No text pre-processing is necessary.

import pandas as pd
from text2class.text_classifier import TextClassifier

df = pd.read_csv("data.csv")

train = df.sample(frac=0.9,random_state=200)
test = df.drop(train.index)

cls = TextClassifier(
	num_labels=3,
	data_column="text",
	label_column="label",
	max_seq_length=128
)

cls.fit(train)

predictions = cls.predict(test["text"])

Advanced usage

Model type

The default model is an uncased Bidirectional Encoder Representations from Transformers (BERT) consisting of 12 transformer layers, 12 self-attention heads per layer, and a hidden size of 768. Below are all models currently supported that you can specify with hub_module_handle. We expect that more will be added in the future. For more information, see BERT's GitHub.

https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
https://tfhub.dev/google/bert_uncased_L-24_H-1024_A-16/1
https://tfhub.dev/google/bert_cased_L-12_H-768_A-12/1
https://tfhub.dev/google/bert_cased_L-24_H-1024_A-16/1
https://tfhub.dev/google/bert_chinese_L-12_H-768_A-12/1
https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1

cls = TextClassifier(
	num_labels=3,
	data_column="text",
	label_column="label",
	max_seq_length=128,
	hub_module_handle="https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1"
)

Contributing

Text2Class is an open-source project founded and maintained to better serve the machine learning and data science community. Please feel free to submit pull requests to contribute to the project. By participating, you are expected to adhere to Text2Class's code of conduct.

Questions?

For questions or help using Text2Class, please submit a GitHub issue.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].