All Projects → dselivanov → Text2vec

dselivanov / Text2vec

Licence: other
Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Text2vec

Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+140.84%)
Mutual labels:  natural-language-processing, word2vec, text-mining, word-embeddings, topic-modeling
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+1685.03%)
Mutual labels:  natural-language-processing, word2vec, word-embeddings, topic-modeling
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+94.97%)
Mutual labels:  natural-language-processing, word2vec, word-embeddings, glove
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-96.22%)
Mutual labels:  text-mining, word2vec, word-embeddings, topic-modeling
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (-87.27%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
Repo 2017
Python codes in Machine Learning, NLP, Deep Learning and Reinforcement Learning with Keras and Theano
Stars: ✭ 1,123 (+57.06%)
Mutual labels:  natural-language-processing, word2vec, glove
Wego
Word Embeddings (e.g. Word2Vec) in Go!
Stars: ✭ 336 (-53.01%)
Mutual labels:  word2vec, word-embeddings, glove
Glove As A Tensorflow Embedding Layer
Taking a pretrained GloVe model, and using it as a TensorFlow embedding weight layer **inside the GPU**. Therefore, you only need to send the index of the words through the GPU data transfer bus, reducing data transfer overhead.
Stars: ✭ 85 (-88.11%)
Mutual labels:  word2vec, word-embeddings, glove
Natural Language Processing
Programming Assignments and Lectures for Stanford's CS 224: Natural Language Processing with Deep Learning
Stars: ✭ 377 (-47.27%)
Mutual labels:  natural-language-processing, word2vec, glove
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (-49.93%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
Pyshorttextcategorization
Various Algorithms for Short Text Mining
Stars: ✭ 429 (-40%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
How To Mine Newsfeed Data And Extract Interactive Insights In Python
A practical guide to topic mining and interactive visualizations
Stars: ✭ 61 (-91.47%)
Mutual labels:  natural-language-processing, text-mining, topic-modeling
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+10.49%)
Mutual labels:  natural-language-processing, word2vec, text-mining
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (-72.59%)
Mutual labels:  word2vec, text-mining, word-embeddings
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (-86.15%)
Mutual labels:  word2vec, word-embeddings, glove
JoSH
[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (-92.31%)
Mutual labels:  text-mining, word-embeddings, topic-modeling
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (-73.57%)
Mutual labels:  natural-language-processing, word2vec, word-embeddings
Nlp Notebooks
A collection of notebooks for Natural Language Processing from NLP Town
Stars: ✭ 513 (-28.25%)
Mutual labels:  natural-language-processing, text-mining, word-embeddings
Text-Analysis
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-93.29%)
Mutual labels:  text-mining, word2vec, word-embeddings
Cs224n
CS224n: Natural Language Processing with Deep Learning Assignments Winter, 2017
Stars: ✭ 656 (-8.25%)
Mutual labels:  natural-language-processing, word2vec

text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP).

Goals which we aimed to achieve as a result of development of text2vec:

  • Concise - expose as few functions as possible
  • Consistent - expose unified interfaces, no need to explore new interface for each task
  • Flexible - allow to easily solve complex tasks
  • Fast - maximize efficiency per single thread, transparently scale to multiple threads on multicore machines
  • Memory efficient - use streams and iterators, not keep data in RAM if possible

See API section for details.

Performance

htop

This package is efficient because it is carefully written in C++, which also means that text2vec is memory friendly. Some parts are fully parallelized using OpenMP.

Other emrassingly parallel tasks (such as vectorization) can use any fork-based parallel backend on UNIX-like machines. They can achieve near-linear scalability with the number of available cores.

Finally, a streaming API means that users do not have to load all the data into RAM.

Contributing

The package has issue tracker on GitHub where I'm filing feature requests and notes for future work. Any ideas are appreciated.

Contributors are welcome. You can help by:

License

GPL (>= 2)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].