All Projects → chiragjn → deep-char-cnn-lstm

chiragjn / deep-char-cnn-lstm

Licence: MIT license
Deep Character CNN LSTM Encoder with Classification and Similarity Models

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to deep-char-cnn-lstm

LSCDetection
Data Sets and Models for Evaluation of Lexical Semantic Change Detection
Stars: ✭ 17 (-15%)
Mutual labels:  embeddings, semantic-similarity
Ner Lstm
Named Entity Recognition using multilayered bidirectional LSTM
Stars: ✭ 532 (+2560%)
Mutual labels:  embeddings, lstm
SentimentAnalysis
(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset
Stars: ✭ 40 (+100%)
Mutual labels:  embeddings, lstm
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (+60%)
Mutual labels:  embeddings, lstm
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (+0%)
Mutual labels:  embeddings, lstm
question-pair
A siamese LSTM to detect sentence/question pairs.
Stars: ✭ 25 (+25%)
Mutual labels:  lstm, siamese
Persian-Sentiment-Analyzer
Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )
Stars: ✭ 30 (+50%)
Mutual labels:  embeddings, lstm
Manhattan-LSTM
Keras and PyTorch implementations of the MaLSTM model for computing Semantic Similarity.
Stars: ✭ 28 (+40%)
Mutual labels:  lstm, semantic-similarity
Ml Ai Experiments
All my experiments with AI and ML
Stars: ✭ 107 (+435%)
Mutual labels:  embeddings, lstm
Stock Rnn
Predict stock market prices using RNN model with multilayer LSTM cells + optional multi-stock embeddings.
Stars: ✭ 1,213 (+5965%)
Mutual labels:  embeddings, lstm
Multi Class Text Classification Cnn Rnn
Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.
Stars: ✭ 570 (+2750%)
Mutual labels:  embeddings, lstm
Kprn
Reasoning Over Knowledge Graph Paths for Recommendation
Stars: ✭ 220 (+1000%)
Mutual labels:  embeddings, lstm
Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (+820%)
Mutual labels:  embeddings, lstm
Embedding As Service
One-Stop Solution to encode sentence to fixed length vectors from various embedding techniques
Stars: ✭ 151 (+655%)
Mutual labels:  encoder, embeddings
DOSE
😷 Disease Ontology Semantic and Enrichment analysis
Stars: ✭ 86 (+330%)
Mutual labels:  semantic-similarity
navec
Compact high quality word embeddings for Russian language
Stars: ✭ 118 (+490%)
Mutual labels:  embeddings
ls-psvr-encoder
A simple command line tool to encode your 180 and 360 videos for sideloading with Littlstar's VR Cinema app for PSVR.
Stars: ✭ 61 (+205%)
Mutual labels:  encoder
x264-rs
x264 bindings
Stars: ✭ 32 (+60%)
Mutual labels:  encoder
novel writer
Train LSTM to writer novel (HongLouMeng here) in Pytorch.
Stars: ✭ 14 (-30%)
Mutual labels:  lstm
Image deionising auto encoder
Noise removal from images using Convolutional autoencoder
Stars: ✭ 34 (+70%)
Mutual labels:  encoder

Deep Character CNN LSTM Encoder with Classification and Similarity Models

In Keras

Overall Idea:

  • Convolve over character embeddings with different kernel sizes
  • Concat them to get the char-word embedding
  • Pass them through a Dense layer with Residual connection
  • Optionally concat them with separate word embedding
  • Pass sequence of obtained word embeddings through a LSTM encoder
  • Train with a constrastive loss function (see References)

Work in Progress

  • TODO: Add loading utils
  • TODO: Add preprocessing and padding utils
  • TODO: Add batching utils
  • TODO: Add model training code
  • TODO: Add model continue-training code
  • TODO: Test Similarity implementation on Quora similar pair dataset
  • TODO: Test Classification implementation on Kaggle Toxic internet comments dataset
  • TODO: Tune Hyperparameters and try different modifications to architectures
  • TODO: Take Hyperparameters using argparse
  • TODO: Add tensorboard and tfdbg support

Example Usage:

from model import ClassifierModel, SimilarityModel

classifier = ClassifierModel(vocab_size=10000,
                             charset_size=100,
                             num_classes=5,
                             mode=ClassifierModel.MULTILABEL,
                             char_kernel_sizes=(3,),
                             encoder_hidden_units=128,
                             bidirectional=False)
classifier.compile_model()

similarity_model = SimilarityModel(vocab_size=10000,
                                   charset_size=100,
                                   num_negative_samples=1)
similarity_model.compile_model()

References:

Overall Idea

  1. Siamese Recurrent Architectures for Learning Sentence Similarity (2016)

Encoder architecture heavily inspired from

  1. Character-Aware Neural Language Models (2015), Kim et. al.
  2. dpressel/baseline

Loss function taken from

  1. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval (2014)

Other Contrastive Loss functions to try

  1. StarSpace: Embed All The Things! (2017) Wu et. al.
  2. Comparision of loss functions for deep embedding
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].