Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cod3licious → conec

cod3licious / conec

Licence: MIT license

Context Encoders (ConEc) as a simple but powerful extension of the word2vec model for learning word embeddings

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning natural-language-processing word-embeddings

Projects that are alternatives of or similar to conec

compress-fasttext

Tools for shrinking fastText models (in gensim format)

Stars: ✭ 124 (+520%)

Mutual labels: word-embeddings

QuestionClustering

Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY

Stars: ✭ 15 (-25%)

Mutual labels: word-embeddings

💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA

Stars: ✭ 19 (-5%)

Mutual labels: word-embeddings

[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Stars: ✭ 55 (+175%)

Mutual labels: word-embeddings

Implementation of Siamese CBOW using keras whose backend is tensorflow.

Stars: ✭ 14 (-30%)

Mutual labels: word-embeddings

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

Stars: ✭ 27 (+35%)

Mutual labels: word-embeddings

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference

Stars: ✭ 62 (+210%)

Mutual labels: word-embeddings

Naive-Resume-Matching

Text Similarity Applied to resume, to compare Resumes with Job Descriptions and create a score to rank them. Similar to an ATS.

Stars: ✭ 27 (+35%)

Mutual labels: word-embeddings

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Stars: ✭ 59 (+195%)

Mutual labels: word-embeddings

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (+155%)

Mutual labels: word-embeddings

robot-mind-meld

A little game powered by word vectors

Stars: ✭ 31 (+55%)

Mutual labels: word-embeddings

The code of our paper "SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model"

Stars: ✭ 96 (+380%)

Mutual labels: word-embeddings

wikidata-corpus

Train Wikidata with word2vec for word embedding tasks

Stars: ✭ 109 (+445%)

Mutual labels: word-embeddings

Active-Explainable-Classification

A set of tools for leveraging pre-trained embeddings, active learning and model explainability for effecient document classification

Stars: ✭ 28 (+40%)

Mutual labels: word-embeddings

Word-recognition-EmbedNet-CAB

Code implementation for our ICPR, 2020 paper titled "Improving Word Recognition using Multiple Hypotheses and Deep Embeddings"

Stars: ✭ 19 (-5%)

Mutual labels: word-embeddings

MorphologicalPriorsForWordEmbeddings

Code for EMNLP 2016 paper: Morphological Priors for Probabilistic Word Embeddings

Stars: ✭ 53 (+165%)

Mutual labels: word-embeddings

materials-synthesis-generative-models

Public release of data and code for materials synthesis generation

Stars: ✭ 47 (+135%)

Mutual labels: word-embeddings

Codenames AI using Word Vectors

Stars: ✭ 41 (+105%)

Mutual labels: word-embeddings

SentimentAnalysis

Sentiment Analysis: Deep Bi-LSTM+attention model

Stars: ✭ 32 (+60%)

Mutual labels: word-embeddings

PyTorch implementation of context2vec from Melamud et al., CoNLL 2016

Stars: ✭ 18 (-10%)

Mutual labels: word-embeddings

View All Similar Projects ➔

Context Encoders (ConEc)

With this code you can train and evaluate Context Encoders (ConEc), an extension of word2vec, which can learn word embeddings from large corpora and create out-of-vocabulary embeddings on the spot as well as distinguish between multiple meanings of words based on their local contexts. For further details on the model and experiments please refer to the paper - and of course if any of this code was helpful for your research, please consider citing it:

    @inproceedings{horn2017conecRepL4NLP,
      author       = {Horn, Franziska},
      title        = {Context encoders as a simple but powerful extension of word2vec},
      booktitle    = {Proceedings of the 2nd Workshop on Representation Learning for NLP},
      year         = {2017},
      organization = {Association for Computational Linguistics},
      pages        = {10--14}
    }

The code is intended for research purposes. It should run with Python 2.7 and 3 versions - no guarantees on this though (open an issue if you find a bug, please)!

installation

You either download the code from here and include the conec folder in your $PYTHONPATH or install (the library components only) via pip:

$ pip install conec

conec library components

dependencies: numpy, scipy

word2vec.py: code to train a standard word2vec model, adapted from the corresponding gensim implementation.
context2vec.py: code to build a sparse context matrix from a large collection of texts; this context matrix can then be multiplied with the corresponding word2vec embeddings to give the context encoder embeddings:

# get the text for training
sentences = Text8Corpus('data/text8')
# train the word2vec model
w2v_model = word2vec.Word2Vec(sentences, mtype='cbow', hs=0, neg=13, vector_size=200, seed=3)
# get the global context matrix for the text
context_model = context2vec.ContextModel(sentences, min_count=w2v_model.min_count, window=w2v_model.window, wordlist=w2v_model.wv.index2word)
context_mat = context_model.get_context_matrix(fill_diag=False, norm='max')
# multiply the context matrix with the (length normalized) word2vec embeddings
# to get the context encoder (ConEc) embeddings
conec_emb = context_mat.dot(w2v_model.wv.vectors_norm)
# renormalize so the word embeddings have unit length again
conec_emb = conec_emb / np.array([np.linalg.norm(conec_emb, axis=1)]).T

examples

additional dependencies: sklearn

test_analogy.py and test_ner.py contain the code to replicate the analogy and named entity recognition (NER) experiments discussed in the aforementioned paper.

To run the analogy experiment, it is assumed that the text8 corpus or 1-billion corpus as well as the analogy questions are in a data directory.

To run the named entity recognition experiment, it is assumed that the corresponding training and test files are located in the data/conll2003 directory.

If you have any questions please don't hesitate to send me an email and of course if you should find any bugs or want to contribute other improvements, pull requests are very welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 20

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗