All Projects → ashokc → Word-Embeddings-and-Document-Vectors

ashokc / Word-Embeddings-and-Document-Vectors

Licence: other
An evaluation of word-embeddings for classification

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Word-Embeddings-and-Document-Vectors

Persian-Sentiment-Analyzer
Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )
Stars: ✭ 30 (-6.25%)
Mutual labels:  word2vec, fasttext-embeddings
Bayes
Naive Bayes Classifier in Swift for Mac and iOS
Stars: ✭ 30 (-6.25%)
Mutual labels:  naive-bayes-classifier
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-31.25%)
Mutual labels:  word2vec
Trajectory-Analysis-and-Classification-in-Python-Pandas-and-Scikit-Learn
Formed trajectories of sets of points.Experimented on finding similarities between trajectories based on DTW (Dynamic Time Warping) and LCSS (Longest Common SubSequence) algorithms.Modeled trajectories as strings based on a Grid representation.Benchmarked KNN, Random Forest, Logistic Regression classification algorithms to classify efficiently t…
Stars: ✭ 41 (+28.13%)
Mutual labels:  scikitlearn-machine-learning
christmAIs
Text to abstract art generation for the holidays!
Stars: ✭ 90 (+181.25%)
Mutual labels:  fasttext-embeddings
name2gender
Extrapolate gender from first names using Naïve-Bayes and PyTorch Char-RNN
Stars: ✭ 24 (-25%)
Mutual labels:  naive-bayes-classifier
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (+209.38%)
Mutual labels:  word2vec
lapis-bayes
Naive Bayes classifier for use in Lua
Stars: ✭ 26 (-18.75%)
Mutual labels:  naive-bayes-classifier
hyperstar
Hyperstar: Negative Sampling Improves Hypernymy Extraction Based on Projection Learning.
Stars: ✭ 24 (-25%)
Mutual labels:  word2vec
fake-fews
Candidate solution for Facebook's fake news problem using machine learning and crowd-sourcing. Takes form of a Chrome extension. Developed in under 24 hours at 2017 Crimson Code hackathon at Washington State University.
Stars: ✭ 13 (-59.37%)
Mutual labels:  naive-bayes-classifier
Recommendation-based-on-sequence-
Recommendation based on sequence
Stars: ✭ 23 (-28.12%)
Mutual labels:  word2vec
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (+40.63%)
Mutual labels:  word2vec
skip-gram-Chinese
skip-gram for Chinese word2vec base on tensorflow
Stars: ✭ 20 (-37.5%)
Mutual labels:  word2vec
Vaaku2Vec
Language Modeling and Text Classification in Malayalam Language using ULMFiT
Stars: ✭ 68 (+112.5%)
Mutual labels:  word2vec
word2vec-movies
Bag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (+68.75%)
Mutual labels:  word2vec
grad-cam-text
Implementation of Grad-CAM for text.
Stars: ✭ 37 (+15.63%)
Mutual labels:  word2vec
Word2Vec-iOS
Word2Vec iOS port
Stars: ✭ 23 (-28.12%)
Mutual labels:  word2vec
two-stream-cnn
A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data
Stars: ✭ 24 (-25%)
Mutual labels:  word2vec
doc2vec-api
document embedding and machine learning script for beginners
Stars: ✭ 92 (+187.5%)
Mutual labels:  word2vec
asm2vec
An unofficial implementation of asm2vec as a standalone python package
Stars: ✭ 127 (+296.88%)
Mutual labels:  word2vec

Word Embeddings and Document Vectors

This is the source code to go along with the series of blog articles

The code employs,

  • Elasticsearch (localhost:9200) as the repository

    1. to save tokens to, and get them as needed.
    2. to save word-vectors (pre-trained or custom) to, and get them as needed.
  • See the Pipfle for Python dependencies

Usage

  1. Generate tokens for the 20-news corpus & the movie review data set and save them to Elasticsearch.

    • The dataset for 20-news is downloaded as part of the script. But you need to download the movie review dataset separately.
    • The shell script & python code in the folders text-data/twenty-news & text-data/acl-imdb
  2. Generate custom word vectors for the two text corpus in 1 above and save them to Elasticsearch. text-data/twenty-news/vectors & text-data/acl-imdb/vectors directories have the scripts

  3. Process pre-trained vectors and save them to Elasticsearch. Look into pre-trained-vectors/ for the code. You need to download the actual published vectors from their sources. We have used Word2Vec, Glove and FastText in these articles.

  4. The script run.sh can be configured to run whichever combination of the pipeline steps.

  5. The logs contain the F-scores and timing results. Create a "logs" directory before running the run.sh script

    mkdir logs

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].