All Projects → blei-lab → Lda C

blei-lab / Lda C

Licence: lgpl-2.1
This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data.

Programming Languages

c
50402 projects - #5 most used programming language

Latent Dirichlet allocation

This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003).

This code contains:

  • an implementation of variational inference for the per-document topic proportions and per-word topic assignments
  • a variational EM procedure for estimating the topics and exchangeable Dirichlet hyperparameter

Readme

View the readme.txt and fork or clone the repository.

Sample data

2246 documents from the Associated Press download.

Top 20 words from 100 topics estimated from the AP corpus pdf.

Bug fixes and updates

To learn about bug-fixes, updates, and discuss LDA and related techniques, please join the topic-models mailing list, topic-models [at] lists.cs.princeton.edu.

To join, click here.

Other implementations on the web

There are several other implementations of LDA on the web:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].