Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Stars: ✭ 2,097 (+294.17%)

Mutual labels: deep-neural-networks, lstm, recurrent-neural-networks

Chicksexer

A Python package for gender classification.

Stars: ✭ 64 (-87.97%)

Mutual labels: natural-language-processing, lstm, recurrent-neural-networks

Gdax Orderbook Ml

Application of machine learning to the Coinbase (GDAX) orderbook

Stars: ✭ 60 (-88.72%)

Mutual labels: deep-neural-networks, lstm, recurrent-neural-networks

Bitcoin Price Prediction Using Lstm

Bitcoin price Prediction ( Time Series ) using LSTM Recurrent neural network

Stars: ✭ 67 (-87.41%)

Mutual labels: deep-neural-networks, lstm, recurrent-neural-networks

Named Entity Recognition

name entity recognition with recurrent neural network(RNN) in tensorflow

Stars: ✭ 20 (-96.24%)

Mutual labels: natural-language-processing, recurrent-neural-networks, named-entity-recognition

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+232.14%)

Mutual labels: natural-language-processing, lstm, named-entity-recognition

Neuronlp2

Deep neural models for core NLP tasks (Pytorch version)

Stars: ✭ 397 (-25.38%)

Mutual labels: natural-language-processing, deep-neural-networks, named-entity-recognition

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (-81.95%)

Mutual labels: natural-language-processing, lstm, recurrent-neural-networks

Deepseqslam

The Official Deep Learning Framework for Route-based Place Recognition

Stars: ✭ 49 (-90.79%)

Mutual labels: deep-neural-networks, lstm, recurrent-neural-networks

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (-13.53%)

Mutual labels: natural-language-processing, named-entity-recognition, embeddings

Deep Learning Time Series

List of papers, code and experiments using deep learning for time series forecasting

Stars: ✭ 796 (+49.62%)

Mutual labels: deep-neural-networks, lstm, recurrent-neural-networks

Pytorch Learners Tutorial

PyTorch tutorial for learners

Stars: ✭ 97 (-81.77%)

Mutual labels: deep-neural-networks, lstm, recurrent-neural-networks

Multitask sentiment analysis

Multitask Deep Learning for Sentiment Analysis using Character-Level Language Model, Bi-LSTMs for POS Tag, Chunking and Unsupervised Dependency Parsing. Inspired by this great article https://arxiv.org/abs/1611.01587

Stars: ✭ 93 (-82.52%)

Mutual labels: natural-language-processing, lstm, recurrent-neural-networks

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+503.2%)

Mutual labels: natural-language-processing, lstm, recurrent-neural-networks

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Stars: ✭ 20 (-96.24%)

Mutual labels: recurrent-neural-networks, embeddings, lstm

Ai Deadlines

⏰ AI conference deadline countdowns

Stars: ✭ 3,852 (+624.06%)

Mutual labels: natural-language-processing, deep-neural-networks

Bytenet Tensorflow

ByteNet for character-level language modelling

Stars: ✭ 319 (-40.04%)

Mutual labels: natural-language-processing, deep-neural-networks

Keras Anomaly Detection

Anomaly detection implemented in Keras

Stars: ✭ 335 (-37.03%)

Mutual labels: lstm, recurrent-neural-networks

View All Similar Projects ➔

This Repository contains the code which implements the approach described in the following Arxiv Preprint: https://arxiv.org/abs/1610.09756 which is published in ICON-16 conference (http://aclweb.org/anthology/W/W16/W16-63.pdf).

Contributors

shreenibhar (Shreenivas Bharadwaj)
monikkinom
vinayakathavale

ner-lstm

Requirements

tensorflow
http://github.com/ltrc/indic-wx-converter (only for hindi)
gensim

Named Entity Recognition using multi-layered bidirectional LSTMs and task adapted word embeddings

Named Entity Recognition is a classification problem of identifying the names of people,organisations,etc (different classes) in a text corpus. Previous approaches to the problems have involved the usage of hand crafted language specific features, CRF and HMM based models, gazetteers, etc. Growing interest in deep learning has led to application of deep neural networks to the existing problems like that of NER. We have implemented a 2 layer bidirectional LSTM network using tensorflow to classify the named entities for CoNNL 2003 NER Shared Task. Classification on the NER Hindi dataset of icon-2013 proceedings was also done. The process and code usage are given below. All codes use argparse for intuitive usage.

Generating the embedding model

Sentences are used as inputs for the recurrent neural network. Representation of words in the sentence is via the form of embeddings. Hence the features for the recurrent neural network are sentences a.k.a sequence of words a.k.a sequence of embeddings. Each unique word should have certain number of features, these are called embeddings or also vectors. These are the input features to the neural architecture we are using.

English

You need a corpus comprising of text separated by only spaces if you are training a model. First train the model or load from an existing one from the files given in embeddings. wordvec_model.py - creates a model of word2vec, 2 ways to create the model either by supplying a corpus to train or restore from word2vec gensim bin file. glove_model.py - creates a model of glove, 2 ways to create the model either by supplying a corpus to train or restore from a glove vector.txt file. Copy corpus in Glove-1.2 directory and run the code from embeddings folder and give name of corpus as param. rnnvec_model.py - creates a model of LSTM, only way is by supplying a corpus.

Hindi

Follow the same steps as english but first convert corpus to english type using hindi_util.py. ######Comparison We have done a comparison between 111 dimension embedding models by training all of them on a small 100mb corpus and evaluating on the conll ner dataset.

Model	Test_a	Test_b
Word2Vec	88.33	83.40
Glove	89.62	83.10
RnnVec	81.07	75.20

Preparing the inputs

Now we have the embedding model, we have to use that to convert our sentences of words to sentences of embeddings. First use resize_input.py to resize your data set to a max sentence length. Use the trained embedding model along with get_conll_embeddings.py or get_icon_embeddings.py for conll and icon respectively to get the pickled input data ready to be fed and train the recurrent neural network. Note that we are adding 11 extra features here to the embeddings themselves which include pos, chunk and capital features of the word.

Deep Neural Network

We have used Google's Tensorflow to implement a bidirectional multilayered rnn cell (LSTM). The hyper parameters are present at the top in main.py. Tweaking the parameters can yield a variety of results which are worth noting. We have used a softmax layer as the last layer of the network to produce the final classification outputs. We tried working with different optimizers and we found that AdamOptimzer produced the best results. The function to calculate the F1 Scores, Prediction Accuracy and Recall is also included in model.py. We have also included the ability to save/restore an existing model using tensorflow's saver functions. The path of the generated pickle file from above needs to be set in input.py. Use the model.py to run the deep neural network which will start running and optimizing the F1 scores.

Final Results

Dataset	Model	Embedding size	Test_a	Test_b
CONLL	Glove	311	93.99	90.32
CONLL	Word2Vec	311	93.5	89.4
ICON	Glove	311	78.6	77.48

CONLL samples

A sample result produced by conll eval script is presented here.

Word2Vec 311 dimensions

Test_a

processed 49644 tokens with 8211 phrases; found: 8080 phrases; correct: 7619. Accuracy = 98.54%

Class	Precision	Recall	FB1	Numbers
NER	94.29	92.79	93.54	8080
LOC	94.86	94.25	94.56	2023
MISC	91.99	85.09	88.40	1123
ORG	91.98	89.76	90.86	2020
PER	96.40	97.16	96.78	2914

Test_b

processed 45151 tokens with 7719 phrases; found: 7740 phrases; correct: 6911. Accuracy = 97.49%

Class	Precision	Recall	FB1	Numbers
NER	89.29	89.53	89.41	7740
LOC	89.67	90.87	90.27	1898
MISC	75.80	75.80	75.80	905
ORG	88.03	87.27	87.65	2415
PER	95.04	95.69	95.37	2522

Papers on WordVectors

https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf Good tutorial on WordVectors: https://www.tensorflow.org/versions/r0.9/tutorials/word2vec/index.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 532

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (12) 🔗