All Projects → yuvalpinter → Mimick

yuvalpinter / Mimick

Licence: gpl-3.0
Code for Mimicking Word Embeddings using Subword RNNs (EMNLP 2017)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Mimick

Textblob Ar
Arabic support for textblob
Stars: ✭ 60 (-60.53%)
Mutual labels:  word-embeddings, part-of-speech-tagger
Jptdp
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)
Stars: ✭ 146 (-3.95%)
Mutual labels:  lstm, part-of-speech-tagger
Sign Language
Sign Language Recognition for Deaf People
Stars: ✭ 65 (-57.24%)
Mutual labels:  convolutional-neural-networks, lstm
Deepseqslam
The Official Deep Learning Framework for Route-based Place Recognition
Stars: ✭ 49 (-67.76%)
Mutual labels:  convolutional-neural-networks, lstm
Pytorch convlstm
convolutional lstm implementation in pytorch
Stars: ✭ 126 (-17.11%)
Mutual labels:  convolutional-neural-networks, lstm
Image Captioning
Image Captioning: Implementing the Neural Image Caption Generator with python
Stars: ✭ 52 (-65.79%)
Mutual labels:  convolutional-neural-networks, lstm
Pytorch Learners Tutorial
PyTorch tutorial for learners
Stars: ✭ 97 (-36.18%)
Mutual labels:  convolutional-neural-networks, lstm
Personality Detection
Implementation of a hierarchical CNN based model to detect Big Five personality traits
Stars: ✭ 338 (+122.37%)
Mutual labels:  convolutional-neural-networks, lstm
Context
ConText v4: Neural networks for text categorization
Stars: ✭ 120 (-21.05%)
Mutual labels:  convolutional-neural-networks, lstm
Exermote
Using Machine Learning to predict the type of exercise from movement data
Stars: ✭ 108 (-28.95%)
Mutual labels:  convolutional-neural-networks, lstm
Cnn lstm ctc ocr
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
Stars: ✭ 464 (+205.26%)
Mutual labels:  convolutional-neural-networks, lstm
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+1062.5%)
Mutual labels:  lstm, part-of-speech-tagger
Easy Deep Learning With Keras
Keras tutorial for beginners (using TF backend)
Stars: ✭ 367 (+141.45%)
Mutual labels:  convolutional-neural-networks, lstm
Lstm Context Embeddings
Augmenting word embeddings with their surrounding context using bidirectional RNN
Stars: ✭ 57 (-62.5%)
Mutual labels:  lstm, word-embeddings
Thesemicolon
This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.
Stars: ✭ 345 (+126.97%)
Mutual labels:  convolutional-neural-networks, lstm
Pytorch Pos Tagging
A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (-36.84%)
Mutual labels:  lstm, part-of-speech-tagger
Bcdu Net
BCDU-Net : Medical Image Segmentation
Stars: ✭ 314 (+106.58%)
Mutual labels:  convolutional-neural-networks, lstm
Keras Anomaly Detection
Anomaly detection implemented in Keras
Stars: ✭ 335 (+120.39%)
Mutual labels:  convolutional-neural-networks, lstm
Keras Video Classifier
Keras implementation of video classifier
Stars: ✭ 100 (-34.21%)
Mutual labels:  convolutional-neural-networks, lstm
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (-17.11%)
Mutual labels:  convolutional-neural-networks, lstm

Mimick

Code for Mimicking Word Embeddings using Subword RNNs (EMNLP 2017) and subsequent experiments.

tl;dr

Given a word embedding dictionary (with vectors from, e.g. FastText or Polyglot or GloVe), Mimick trains a character-level neural net that learns to approximate the embeddings. It can then be applied to infer embeddings in the same space for words that were not available in the original set (i.e. OOVs - Out Of Vocabulary).

Citation

Please cite our paper if you use this code.

@inproceedings{pinter2017mimicking,
  title={Mimicking Word Embeddings using Subword RNNs},
  author={Pinter, Yuval and Guthrie, Robert and Eisenstein, Jacob},
  booktitle={Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing},
  pages={102--112},
  year={2017}
}

Dependencies

The main dependency for this project is DyNet. Get it here.

  • As of November 22, 2017, the code complies with Dynet 2.0. You may access the 1.0 version code via the commit log.

Create Mimick models

The tagging model. If you train new models, please add them here via pull request!

  • December 12, 2017 note: pre-trained model are now in DyNet 2.0 format (and employ early-stopping). The 1.0-compatible models are still available in a subdirectory.

CNN Version (November 2017)

As of the November 22 PR, there is a CNN version of Mimick available for training. It is currently a single-layer convolutional net (conv -> ReLU -> max-k-pool -> fully-connected -> tanh -> fully-connected) that performs the same function as the LSTM version.

Tag parts-of-speech and morphosyntactic attributes using trained models

The root directory of this repository contains the code required to perform extrinsic analysis on Universal Dependencies data. Vocabulary files are supplied in the vocabs directory.

The entry point is model.py, which can use tagging datasets created using the make_dataset.py script. Note that model.py accepts pre-trained Word Embedding models via text files with no header. For Mimick models, this exact format is output into the path in mimick/model.py script's --output argument. For Word2Vec, FastText, or Polyglot models, one can create such a file using the scripts/output_word_vectors.py script that accepts a model (.pkl or .bin) and the desired output vocabulary (.txt).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].