primaryobjects / Lda

Licence: apache-2.0
LDA topic modeling for node.js

Programming Languages

javascript
184084 projects - #8 most used programming language
language
365 projects

Projects that are alternatives of or similar to Lda

Olivia
💁‍♀️Your new best friend powered by an artificial neural network
Stars: ✭ 3,114 (+1088.55%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Riceteacatpanda
repo with challenge material for riceteacatpanda (2020)
Stars: ✭ 18 (-93.13%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+8288.55%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Mycroft Core
Mycroft Core, the Mycroft Artificial Intelligence platform.
Stars: ✭ 5,489 (+1995.04%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Dynamics
A Compositional Object-Based Approach to Learning Physical Dynamics
Stars: ✭ 159 (-39.31%)
Mutual labels:  artificial-intelligence, ai, node-js
Strips
AI Automated Planning with STRIPS and PDDL in Node.js
Stars: ✭ 272 (+3.82%)
Mutual labels:  artificial-intelligence, ai, node-js
Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (+217.18%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (-65.27%)
Mutual labels:  natural-language-processing, topic-modeling, lda
Nlpaug
Data augmentation for NLP
Stars: ✭ 2,761 (+953.82%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Cocoaai
🤖 The Cocoa Artificial Intelligence Lab
Stars: ✭ 134 (-48.85%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+384.73%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Thinc
🔮 A refreshing functional take on deep learning, compatible with your favorite libraries
Stars: ✭ 2,422 (+824.43%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (-37.02%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
Catalyst
🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
Stars: ✭ 224 (-14.5%)
Mutual labels:  artificial-intelligence, ai, natural-language-processing
tomoto-ruby
High performance topic modeling for Ruby
Stars: ✭ 49 (-81.3%)
Mutual labels:  topic-modeling, lda
TopicsExplorer
Explore your own text collection with a topic model – without prior knowledge.
Stars: ✭ 53 (-79.77%)
Mutual labels:  topic-modeling, lda
Deeplearningnotes
《深度学习》花书手推笔记
Stars: ✭ 257 (-1.91%)
Mutual labels:  artificial-intelligence, ai
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-2.67%)
Mutual labels:  artificial-intelligence, natural-language-processing
Topic-Modeling-Workshop-with-R
A workshop on analyzing topic modeling (LDA, CTM, STM) using R
Stars: ✭ 51 (-80.53%)
Mutual labels:  topic-modeling, lda
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: ✭ 259 (-1.15%)
Mutual labels:  artificial-intelligence, ai

LDA

Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js. LDA is a machine learning algorithm that extracts topics and their related keywords from a collection of documents.

In LDA, a document may contain several different topics, each with their own related terms. The algorithm uses a probabilistic model for detecting the number of topics specified and extracting their related keywords. For example, a document may contain topics that could be classified as beach-related and weather-related. The beach topic may contain related words, such as sand, ocean, and water. Similarly, the weather topic may contain related words, such as sun, temperature, and clouds.

See http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

$ npm install lda

Usage

var lda = require('lda');

// Example document.
var text = 'Cats are small. Dogs are big. Cats like to chase mice. Dogs like to eat bones.';

// Extract sentences.
var documents = text.match( /[^\.!\?]+[\.!\?]+/g );

// Run LDA to get terms for 2 topics (5 terms each).
var result = lda(documents, 2, 5);

The above example produces the following result with two topics (topic 1 is "cat-related", topic 2 is "dog-related"):

Topic 1
cats (0.21%)
dogs (0.19%)
small (0.1%)
mice (0.1%)
chase (0.1%)

Topic 2
dogs (0.21%)
cats (0.19%)
big (0.11%)
eat (0.1%)
bones (0.1%)

Output

LDA returns an array of topics, each containing an array of terms. The result contains the following format:

[ [ { term: 'dogs', probability: 0.2 },
    { term: 'cats', probability: 0.2 },
    { term: 'small', probability: 0.1 },
    { term: 'mice', probability: 0.1 },
    { term: 'chase', probability: 0.1 } ],
  [ { term: 'dogs', probability: 0.2 },
    { term: 'cats', probability: 0.2 },
    { term: 'bones', probability: 0.11 },
    { term: 'eat', probability: 0.1 },
    { term: 'big', probability: 0.099 } ] ]

The result can be traversed as follows:

var result = lda(documents, 2, 5);

// For each topic.
for (var i in result) {
	var row = result[i];
	console.log('Topic ' + (parseInt(i) + 1));
	
	// For each term.
	for (var j in row) {
		var term = row[j];
		console.log(term.term + ' (' + term.probability + '%)');
	}
	
	console.log('');
}

Additional Languages

LDA uses stop-words to ignore common terms in the text (for example: this, that, it, we). By default, the stop-words list uses English. To use additional languages, you can specify an array of language ids, as follows:

// Use English (this is the default).
result = lda(documents, 2, 5, ['en']);

// Use German.
result = lda(documents, 2, 5, ['de']);

// Use English + German.
result = lda(documents, 2, 5, ['en', 'de']);

To add a new language-specific stop-words list, create a file /lda/lib/stopwords_XX.js where XX is the id for the language. For example, a French stop-words list could be named "stopwords_fr.js". The contents of the file should follow the format of an existing stop-words list. The format is, as follows:

exports.stop_words = [
    'cette',
    'que',
    'une',
    'il'
];

Setting a Random Seed

A specific random seed can be used to compute the same terms and probabilities during subsequent runs. You can specify the random seed, as follows:

// Use the random seed 123.
result = lda(documents, 2, 5, null, null, null, 123);

Author

Kory Becker http://www.primaryobjects.com

Based on original javascript implementation https://github.com/awaisathar/lda.js

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].