All Projects → ankane → tomoto-ruby

ankane / tomoto-ruby

Licence: MIT license
High performance topic modeling for Ruby

Programming Languages

C++
36643 projects - #6 most used programming language
ruby
36898 projects - #4 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to tomoto-ruby

kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-32.65%)
Mutual labels:  topic-modeling, lda, latent-dirichlet-allocation
PyLDA
A Latent Dirichlet Allocation implementation in Python.
Stars: ✭ 51 (+4.08%)
Mutual labels:  topic-modeling, lda, latent-dirichlet-allocation
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-44.9%)
Mutual labels:  topic-modeling, lda
KGE-LDA
Knowledge Graph Embedding LDA. AAAI 2017
Stars: ✭ 35 (-28.57%)
Mutual labels:  topic-modeling, lda
TopicsExplorer
Explore your own text collection with a topic model – without prior knowledge.
Stars: ✭ 53 (+8.16%)
Mutual labels:  topic-modeling, lda
topic modelling financial news
Topic modelling on financial news with Natural Language Processing
Stars: ✭ 51 (+4.08%)
Mutual labels:  topic-modeling, latent-dirichlet-allocation
amazon-reviews
Sentiment Analysis & Topic Modeling with Amazon Reviews
Stars: ✭ 26 (-46.94%)
Mutual labels:  topic-modeling, lda
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (-20.41%)
Mutual labels:  topic-modeling, lda
Lda
LDA topic modeling for node.js
Stars: ✭ 262 (+434.69%)
Mutual labels:  topic-modeling, lda
Lightlda
fast sampling algorithm based on CGS
Stars: ✭ 49 (+0%)
Mutual labels:  topic-modeling, lda
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (+85.71%)
Mutual labels:  topic-modeling, lda
hlda
Gibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model
Stars: ✭ 138 (+181.63%)
Mutual labels:  topic-modeling, lda
Familia
A Toolkit for Industrial Topic Modeling
Stars: ✭ 2,499 (+5000%)
Mutual labels:  topic-modeling, lda
Topic-Modeling-Workshop-with-R
A workshop on analyzing topic modeling (LDA, CTM, STM) using R
Stars: ✭ 51 (+4.08%)
Mutual labels:  topic-modeling, lda
Text-Analysis
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (-2.04%)
Mutual labels:  lda, latent-dirichlet-allocation
pydataberlin-2017
Repo for my talk at the PyData Berlin 2017 conference
Stars: ✭ 63 (+28.57%)
Mutual labels:  topic-modeling, lda
Sttm
Short Text Topic Modeling, JAVA
Stars: ✭ 100 (+104.08%)
Mutual labels:  topic-modeling, lda
Ldagibbssampling
Open Source Package for Gibbs Sampling of LDA
Stars: ✭ 218 (+344.9%)
Mutual labels:  topic-modeling, lda
LinkedIn Scraper
🙋 A Selenium based automated program that scrapes profiles data,stores in CSV,follows them and saves their profile in PDF.
Stars: ✭ 25 (-48.98%)
Mutual labels:  topic-modeling
keras-aquarium
a small collection of models implemented in keras, including matrix factorization(recommendation system), topic modeling, text classification, etc. Runs on tensorflow.
Stars: ✭ 14 (-71.43%)
Mutual labels:  topic-modeling

tomoto.rb

🍅 tomoto - high performance topic modeling - for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem "tomoto"

ARM is not currently supported

Getting Started

Train a model

model = Tomoto::LDA.new(k: 2)
model.add_doc("text from document one")
model.add_doc("text from document two")
model.add_doc("text from document three")
model.train(100) # iterations

Get the summary

model.summary

Get topic words

model.topic_words

Save the model to a file

model.save("model.bin")

Load the model from a file

model = Tomoto::LDA.load("model.bin")

Get topic probabilities for a document

doc = model.docs[0]
doc.topics

Get the number of words for each topic

model.count_by_topics

Get the vocab

model.vocabs

Get the log likelihood per word

model.ll_per_word

Perform inference for unseen documents

doc = model.make_doc("unseen doc")
topic_dist, ll = model.infer(doc)

Models

Supports:

  • Latent Dirichlet Allocation (LDA)
  • Labeled LDA (LLDA)
  • Partially Labeled LDA (PLDA)
  • Supervised LDA (SLDA)
  • Dirichlet Multinomial Regression (DMR)
  • Generalized Dirichlet Multinomial Regression (GDMR)
  • Hierarchical Dirichlet Process (HDP)
  • Hierarchical LDA (HLDA)
  • Multi Grain LDA (MGLDA)
  • Pachinko Allocation (PA)
  • Hierarchical PA (HPA)
  • Correlated Topic Model (CT)
  • Dynamic Topic Model (DT)

API

This library follows the tomotopy API. There are a few changes to make it more Ruby-like:

  • The get_ prefix has been removed from methods (topic_words instead of get_topic_words)
  • Methods that return booleans use ? instead of is_ (live_topic? instead of is_live_topic)

If a method or option you need isn’t supported, feel free to open an issue.

Examples

Tokenization

Documents are tokenized by whitespace by default, or you can perform your own tokenization.

model.add_doc(["tokens", "from", "document", "one"])

Performance

tomoto uses AVX2, AVX, or SSE2 instructions to increase performance on machines that support it. Check which instruction set architecture it’s using with:

Tomoto.isa

Parallelism

Choose a parallelism algorithm with:

model.train(parallel: :partition)

Supported values are :default, :none, :copy_merge, and :partition.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone --recursive https://github.com/ankane/tomoto-ruby.git
cd tomoto-ruby
bundle install
bundle exec rake compile
bundle exec rake test
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].