Alternatives and detailed information of latent-semantic-analysis

Cloud Pipelines Editor is a web app that allows the users to build and run Machine Learning pipelines without having to set up development environment.

Stars: ✭ 22 (+10%)

Mutual labels: pipeline

jenkins-pipeline-gitflow-maven

Sample Maven project with a Jenkinsfile doing git-flow based release management

Stars: ✭ 47 (+135%)

Mutual labels: pipeline

topic models

implemented : lsa, plsa, lda

Stars: ✭ 80 (+300%)

Mutual labels: topic-modeling

DNAscan

DNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.

Stars: ✭ 36 (+80%)

Mutual labels: pipeline

JT1078Gateway

基于Pipeline实现的JT1078Gateway支持TCP/UDP,目前只支持http-flv、ws-flv、hls三种拉流方式

Stars: ✭ 50 (+150%)

Mutual labels: pipeline

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+25%)

Mutual labels: pipeline

pydataberlin-2017

Repo for my talk at the PyData Berlin 2017 conference

Stars: ✭ 63 (+215%)

Mutual labels: topic-modeling

create-mithril-app

Sets up a mithril.js project with webpack

Stars: ✭ 20 (+0%)

Mutual labels: pipeline

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (+65%)

Mutual labels: topic-modeling

pipecolor

A terminal filter to colorize output

Stars: ✭ 17 (-15%)

Mutual labels: pipeline

topicApp

A simple Shiny App for Topic Modeling in R

Stars: ✭ 40 (+100%)

Mutual labels: topic-modeling

gitlab-merger-bot

GitLab Merger Bot

Stars: ✭ 23 (+15%)

Mutual labels: pipeline

re-mote

Re-mote operations using SSH and Re-gent

Stars: ✭ 61 (+205%)

Mutual labels: pipeline

dropEst

Pipeline for initial analysis of droplet-based single-cell RNA-seq data

Stars: ✭ 71 (+255%)

Mutual labels: pipeline

RNASeq

RNASeq pipeline

Stars: ✭ 30 (+50%)

Mutual labels: pipeline

HAR

Recognize one of six human activities such as standing, sitting, and walking using a Softmax Classifier trained on mobile phone sensor data.

Stars: ✭ 18 (-10%)

Mutual labels: pipeline

godot-exporter

Godot Engine Automation Pipeline Android – iOS – Linux – MacOS – Windows – HTML5 – Itch.io.

Stars: ✭ 54 (+170%)

Mutual labels: pipeline

View All Similar Projects ➔

Latent Semantic Analysis

Pipeline for training LSA models using Scikit-Learn.

Usage

Instead of writing custom code for latent semantic analysis, you just need:

install pipeline:

pip install latent-semantic-analysis

run pipeline:

either in terminal:

lsa-train --path_to_config config.yaml

or in python:

import latent_semantic_analysis

latent_semantic_analysis.train(path_to_config="config.yaml")

NOTE: more about config file here.

No data preparation is needed, only a csv file with raw text column (with arbitrary name).

Config

The user interface consists of only one files:

config.yaml - general configuration with sklearn TF-IDF and SVD parameters

Change config.yaml to create the desired configuration and train LSA model with the following command:

terminal:

lsa-train --path_to_config config.yaml

python:

import latent_semantic_analysis

latent_semantic_analysis.train(path_to_config="config.yaml")

Default config.yaml:

seed: 42
path_to_save_folder: models

# data
data:
  data_path: data/data.csv
  sep: ','
  text_column: text

# tf-idf
tf-idf:
  lowercase: true
  ngram_range: (1, 1)
  max_df: 1.0
  min_df: 1

# svd
svd:
  n_components: 10
  algorithm: arpack

NOTE: tf-idf and svd are sklearn TfidfVectorizer and TruncatedSVD parameters correspondingly, so you can parameterize instances of these classes however you want.

Output

After training the model, the pipeline will return the following files:

model.joblib - sklearn pipeline with LSA (TF-IDF and SVD steps)
config.yaml - config that was used to train the model
logging.txt - logging file
doc2topic.json - document embeddings
term2topic.json - term embeddings

Requirements

Python >= 3.6

Citation

If you use latent-semantic-analysis in a scientific publication, we would appreciate references to the following BibTex entry:

@misc{dayyass2021lsa,
    author       = {El-Ayyass, Dani},
    title        = {Pipeline for training LSA models},
    howpublished = {\url{https://github.com/dayyass/latent-semantic-analysis}},
    year         = {2021}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dayyass / latent-semantic-analysis

Programming Languages

Labels

Projects that are alternatives of or similar to latent-semantic-analysis

Latent Semantic Analysis

Usage

Config

Output

Requirements

Citation