All Projects → tatsuokun → context2vec

tatsuokun / context2vec

Licence: BSD-3-Clause license
PyTorch implementation of context2vec from Melamud et al., CoNLL 2016

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to context2vec

word2vec-on-wikipedia
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Stars: ✭ 68 (+277.78%)
Mutual labels:  word-embeddings
Active-Explainable-Classification
A set of tools for leveraging pre-trained embeddings, active learning and model explainability for effecient document classification
Stars: ✭ 28 (+55.56%)
Mutual labels:  word-embeddings
word2vec-tsne
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Stars: ✭ 59 (+227.78%)
Mutual labels:  word-embeddings
contextualLSTM
Contextual LSTM for NLP tasks like word prediction and word embedding creation for Deep Learning
Stars: ✭ 28 (+55.56%)
Mutual labels:  word-embeddings
MorphologicalPriorsForWordEmbeddings
Code for EMNLP 2016 paper: Morphological Priors for Probabilistic Word Embeddings
Stars: ✭ 53 (+194.44%)
Mutual labels:  word-embeddings
robot-mind-meld
A little game powered by word vectors
Stars: ✭ 31 (+72.22%)
Mutual labels:  word-embeddings
PersianNER
Named-Entity Recognition in Persian Language
Stars: ✭ 48 (+166.67%)
Mutual labels:  word-embeddings
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (+50%)
Mutual labels:  word-embeddings
compress-fasttext
Tools for shrinking fastText models (in gensim format)
Stars: ✭ 124 (+588.89%)
Mutual labels:  word-embeddings
SiameseCBOW
Implementation of Siamese CBOW using keras whose backend is tensorflow.
Stars: ✭ 14 (-22.22%)
Mutual labels:  word-embeddings
word-benchmarks
Benchmarks for intrinsic word embeddings evaluation.
Stars: ✭ 45 (+150%)
Mutual labels:  word-embeddings
pair2vec
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Stars: ✭ 62 (+244.44%)
Mutual labels:  word-embeddings
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (+11.11%)
Mutual labels:  word-embeddings
dasem
Danish Semantic analysis
Stars: ✭ 17 (-5.56%)
Mutual labels:  word-embeddings
QuestionClustering
Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY
Stars: ✭ 15 (-16.67%)
Mutual labels:  word-embeddings
S-WMD
Code for Supervised Word Mover's Distance (SWMD)
Stars: ✭ 90 (+400%)
Mutual labels:  word-embeddings
JoSH
[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (+205.56%)
Mutual labels:  word-embeddings
wikidata-corpus
Train Wikidata with word2vec for word embedding tasks
Stars: ✭ 109 (+505.56%)
Mutual labels:  word-embeddings
materials-synthesis-generative-models
Public release of data and code for materials synthesis generation
Stars: ✭ 47 (+161.11%)
Mutual labels:  word-embeddings
SIFRank
The code of our paper "SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model"
Stars: ✭ 96 (+433.33%)
Mutual labels:  word-embeddings

context2vec: Learning Generic Context Embedding with Bidirectional LSTM, Melamud et al., CoNLL 2016

This is a PyTorch implementation of Context2Vec that learns context vectors utilizing bi-directional LSTM.

Requirements

Framework

  • python (<= 3.6)
  • pytorch (<= 0.4.1)

Packages

  • torchtext
  • nltk

Quick Run

Train

python -m src --train

which means running on cpu and learning context vectors from a small piece of penn tree bank (that is in the repository). The learned model and the embedding file are stored at models/model.param and models/embedding.vec respectively. (Note that you have to put flag --train if you want to train the model. Otherwise you might be on an inference mode.)

Inference

python -m src
>> I am a [] .

(Note that you might not get a good result if you use the model that learns from a part of penn tree bank (i.e. dataset/sample.txt) because it does not contain enough data for learning context vectors. The reason why I put this sample in the repository is that you can easily check whether this program could actually work.)

Running with GPU and other settings

Train

Running on GPU_ID 0 using INPUT_FILE and output embedding file on OUTPUT_EMBEDDING_FILE and model parameters on MODEL_FILE. (The other detailed settings are set by config.toml)

python -m src -g 0 -i INPUT_FILE -w OUTPUT_EMBEDDING_FILE -m MODEL_FILE --train

Inference

python -m src -w WORD_EMBEDDING_FILE -m MODEL_FILE

Performance

Training Speed

There is approximatitely 3x speed up compared to the original implementation.

MSR Sentence Completion

After setting your question/answer file in config.toml, run

python -m src --task mscc -w WORD_EMBEDDING_FILE -m MODEL_FILE
- Reported score This implementation
TEST 64.0 65.9
ALL 65.1 65.8

Reference

  • The original implementation (written in Chainer) by the author can be found here.
@InProceedings{K16-1006,
  author = 	"Melamud, Oren
		and Goldberger, Jacob
		and Dagan, Ido",
  title = 	"context2vec: Learning Generic Context Embedding with Bidirectional LSTM",
  booktitle = 	"Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning",
  year = 	"2016",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"51--61",
  location = 	"Berlin, Germany",
  doi = 	"10.18653/v1/K16-1006",
  url = 	"http://www.aclweb.org/anthology/K16-1006"
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].