All Projects → jfilter → Text Classification Keras

jfilter / Text Classification Keras

Licence: mit
📚 Text classification library with Keras

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Text Classification Keras

Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (+247.17%)
Mutual labels:  lstm, sentiment-analysis, attention, nlp-machine-learning
sarcasm-detection-for-sentiment-analysis
Sarcasm Detection for Sentiment Analysis
Stars: ✭ 21 (-60.38%)
Mutual labels:  sentiment-analysis, text-classification, lstm
Multimodal Sentiment Analysis
Attention-based multimodal fusion for sentiment analysis
Stars: ✭ 172 (+224.53%)
Mutual labels:  lstm, sentiment-analysis, attention
ntua-slp-semeval2018
Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.
Stars: ✭ 79 (+49.06%)
Mutual labels:  sentiment-analysis, lstm, attention
Rnn Text Classification Tf
Tensorflow Implementation of Recurrent Neural Network (Vanilla, LSTM, GRU) for Text Classification
Stars: ✭ 114 (+115.09%)
Mutual labels:  lstm, text-classification, sentiment-analysis
Context
ConText v4: Neural networks for text categorization
Stars: ✭ 120 (+126.42%)
Mutual labels:  lstm, text-classification, sentiment-analysis
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (-18.87%)
Mutual labels:  text-classification, lstm, attention
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-62.26%)
Mutual labels:  lstm, attention, nlp-machine-learning
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-39.62%)
Mutual labels:  sentiment-analysis, lstm, nlp-machine-learning
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+575.47%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
Text Classification
Implementation of papers for text classification task on DBpedia
Stars: ✭ 682 (+1186.79%)
Mutual labels:  lstm, text-classification, attention
Onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.
Stars: ✭ 143 (+169.81%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+133.96%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
Doc2vec
📓 Long(er) text representation and classification using Doc2Vec embeddings
Stars: ✭ 92 (+73.58%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
Tensorflow Sentiment Analysis On Amazon Reviews Data
Implementing different RNN models (LSTM,GRU) & Convolution models (Conv1D, Conv2D) on a subset of Amazon Reviews data with TensorFlow on Python 3. A sentiment analysis project.
Stars: ✭ 34 (-35.85%)
Mutual labels:  lstm, text-classification, sentiment-analysis
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-64.15%)
Mutual labels:  sentiment-analysis, attention, nlp-machine-learning
Tf Rnn Attention
Tensorflow implementation of attention mechanism for text classification tasks.
Stars: ✭ 735 (+1286.79%)
Mutual labels:  text-classification, sentiment-analysis, attention
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-28.3%)
Mutual labels:  library, text-classification, sentiment-analysis
Chatbot cn
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (+1392.45%)
Mutual labels:  text-classification, sentiment-analysis
Slim.js
Fast & Robust Front-End Micro-framework based on modern standards
Stars: ✭ 789 (+1388.68%)
Mutual labels:  framework, library

Text Classification Keras Build Status PyPI PyPI - Python Version Gitter

A high-level text classification library implementing various well-established models. With a clean and extendable interface to implement custom architectures.

Quick start

Install

pip install text-classification-keras[full]

The [full] will additionally install TensorFlow, Spacy, and Deep Plots. Choose this if you want to get started right away.

Usage

from texcla import experiment, data
from texcla.models import TokenModelFactory, YoonKimCNN
from texcla.preprocessing import FastTextWikiTokenizer

# input text
X = ['some random text', 'another random text lala', 'peter', ...]

# input labels
y = ['a', 'b', 'a', ...]

# use the special tokenizer used for constructing the embeddings
tokenizer = FastTextWikiTokenizer()

# preprocess data (once)
experiment.setup_data(X, y, tokenizer, 'data.bin', max_len=100)

# load data
ds = data.Dataset.load('data.bin')

# construct base
factory = TokenModelFactory(
    ds.num_classes, ds.tokenizer.token_index, max_tokens=100,
    embedding_type='fasttext.wiki.simple', embedding_dims=300)

# choose a model
word_encoder_model = YoonKimCNN()

# build a model
model = factory.build_model(
    token_encoder_model=word_encoder_model, trainable_embeddings=False)

# use experiment.train as wrapper for Keras.fit()
experiment.train(x=ds.X, y=ds.y, validation_split=0.1, model=model,
    word_encoder_model=word_encoder_model)

Check out more examples.

API Documenation

https://jfilter.github.io/text-classification-keras/

Advanced

Embeddings

Choose a pre-trained word embedding by setting the embedding_type and the corresponding embedding dimensions. Set embedding_type=None to initialize the word embeddings randomly (but make sure to set trainable_embeddings=True so you actually train the embeddings).

factory = TokenModelFactory(embedding_type='fasttext.wiki.simple', embedding_dims=300)

FastText

Several pre-trained FastText embeddings are included. For now, we only have the word embeddings and not the n-gram features. All embedding have 300 dimensions.

GloVe

The GloVe embeddings are some kind of predecessor to FastText. In general choose FastText embeddings over GloVe. The dimension for the pre-trained embeddings varies.

Tokenzation

  • To work on token (or word) level, use a TokenTokenizer such e.g. TwokenizeTokenizer or SpacyTokenizer.
  • To work on token and sentence level, use SpacySentenceTokenizer.
  • To create an custom Tokenizer, extend Tokenizer and implement the token_generator method.

Spacy

You may use spaCy for the tokenization. See instructions on how to download model for your target language. E.g. for English:

python -m spacy download en

Models

Token-based Models

When working on token level, use TokenModelFactory.

from texcla.models import TokenModelFactory, YoonKimCNN

factory = TokenModelFactory(tokenizer.num_classes, tokenizer.token_index,
    max_tokens=100, embedding_type='glove.6B.100d')
word_encoder_model = YoonKimCNN()
model = factory.build_model(token_encoder_model=word_encoder_model)

Currently supported models include:

TokenModelFactory.build_model uses the provided word encoder which is then classified via a Dense layer.

Sentence-based Models

When working on sentence level, use SentenceModelFactory.

# Pad max sentences per doc to 500 and max words per sentence to 200.
# Can also use `max_sents=None` to allow variable sized max_sents per mini-batch.

factory = SentenceModelFactory(10, tokenizer.token_index, max_sents=500,
    max_tokens=200, embedding_type='glove.6B.100d')
word_encoder_model = AttentionRNN()
sentence_encoder_model = AttentionRNN()

# Allows you to compose arbitrary word encoders followed by sentence encoder.
model = factory.build_model(word_encoder_model, sentence_encoder_model)
  • Hierarchical attention networks (HANs) can be build by composing two attention based RNN models. This is useful when a document is very large.
  • For smaller document a reasonable way to encode sentences is to average words within it. This can be done by using token_encoder_model=AveragingEncoder()
  • Mix and match encoders as you see fit for your problem.

SentenceModelFactory.build_model created a tiered model where words within a sentence is first encoded using word_encoder_model. All such encodings per sentence is then encoded using sentence_encoder_model.

Related

Contributing

If you have a question, found a bug or want to propose a new feature, have a look at the issues page.

Pull requests are especially welcomed when they fix bugs or improve the code quality.

Acknowledgements

Built upon the work by Raghavendra Kotikalapudi: keras-text.

Citation

If you find Text Classification Keras useful for an academic publication, then please use the following BibTeX to cite it:

@misc{raghakotfiltertexclakeras
    title={Text Classification Keras},
    author={Raghavendra Kotikalapudi, and Johannes Filter, and contributors},
    year={2018},
    publisher={GitHub},
    howpublished={\url{https://github.com/jfilter/text-classification-keras}},
}

License

MIT.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].