All Projects → martingerlach → Hsbm_topicmodel

martingerlach / Hsbm_topicmodel

Licence: gpl-3.0
Using stochastic block models for topic modeling

Projects that are alternatives of or similar to Hsbm topicmodel

Climatemodeling courseware
A collection of interactive lecture notes and assignments in Jupyter notebook format.
Stars: ✭ 119 (-1.65%)
Mutual labels:  jupyter-notebook
Western constellations atlas of space
Code, data, and instructions to map every star you can see from Earth
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Multilstm
keras attentional bi-LSTM-CRF for Joint NLU (slot-filling and intent detection) with ATIS
Stars: ✭ 122 (+0.83%)
Mutual labels:  jupyter-notebook
Limperg python
Repository with material for the Limperg Python course by Ties de Kok.
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Pandas Videos
Jupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (+1318.18%)
Mutual labels:  jupyter-notebook
Machine learning model
机器学习基本模型算法介绍(附加案例)
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Mdm
A TensorFlow implementation of the Mnemonic Descent Method.
Stars: ✭ 120 (-0.83%)
Mutual labels:  jupyter-notebook
Models
Model zoo for genomics
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Research public
Quantitative research and educational materials
Stars: ✭ 1,776 (+1367.77%)
Mutual labels:  jupyter-notebook
Time Series Classification And Clustering With Reservoir Computing
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.
Stars: ✭ 120 (-0.83%)
Mutual labels:  jupyter-notebook
Yolov3 Point
从零开始学习YOLOv3教程解读代码+注意力模块(SE,SPP,RFB etc)
Stars: ✭ 119 (-1.65%)
Mutual labels:  jupyter-notebook
Chatbot Retrieval
Dual LSTM Encoder for Dialog Response Generation
Stars: ✭ 1,547 (+1178.51%)
Mutual labels:  jupyter-notebook
Keywords2vec
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Bostonml
Stars: ✭ 120 (-0.83%)
Mutual labels:  jupyter-notebook
Drl Portfolio Management
CSCI 599 deep learning and its applications final project
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Depy
DePy 2015 Talk
Stars: ✭ 120 (-0.83%)
Mutual labels:  jupyter-notebook
Pytorch Rl
Tutorials for reinforcement learning in PyTorch and Gym by implementing a few of the popular algorithms. [IN PROGRESS]
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Eeg Classification
This project was a joint effort with the neurology labs at UNL and UCD Anschutz to use deep learning to classify EEG data.
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Batchai
Repo for publishing code Samples and CLI samples for BatchAI service
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook
Deep learning explorations
Codes and experiments while learning and exploring deep learning for personal curiosity by doing online courses, personal projects and work.
Stars: ✭ 121 (+0%)
Mutual labels:  jupyter-notebook

hSBM_Topicmodel

A tutorial for topic-modeling with hierarchical stochastic blockmodels using graph-tool.

Based on the works in:

Setup

Get the code via: git clone https://github.com/martingerlach/hSBM_Topicmodel.git

Installing graph-tool

We use the graph-tool package for finding topical structure in the word-document networks.

  • see the installation-instructions, where you will find packages for linux, etc.
  • for linux, one relatively straightforward way is to install via conda
conda create --name graph-tool python=3.7
conda activate graph-tool
conda install -c conda-forge gtk3 pygobject matplotlib graph-tool

The packages gtk3, pygobject, matplotlib are needed to enable plotting-functionality

Additional packages

We need some additional packages to run the code (for example, jupyter to run the tutorial-notebooks).

The list of packages is listed in requirements.txt

SBM for topic modeling of text

This method uses Stochastic block models for topic modeling of text.

Code

Code-base: sbmtm.py

Tutorial-notebook: TopSBM-tutorial.ipynb guides you through the different steps to do topic modeling with stochastic block models

  • How to construct the word-document network from a corpus of text
  • How to fit the stochastic block model to the word-document network
  • How to extract the topics from the fitted model, e.g.
    • the most important words for each topic
    • the clustering of documents
    • the topic mixtures for each document
  • How to visualize the topical structure, in particular the hierarchy of topics

Data

The example-corpus is saved in corpus.txt

  • each line is a separate document with words separated by whitespace
  • optionally, we can provide a file with titles for the documents in titles.txt

Multilayer SBM for topic modeling beyond text

This method provides a multilayer extension to the Stochastic Block Model approach for topic modeling.

Code

The code implementing a multilayer extension to the stochastic block model has been implemented for the 2-layer SBM containing a hyperlink and text layer. The addition of a metadata layer can be done by following the process for the addition of the hyperlink layer.

Code-base: sbmmultilayer.py

Tutorial-notebook: Multilayer_SBM_Tutorial.ipynb

The tutorial notebook details how to

  • Construct a multilayer SBM with the hyperlink and text layer
  • Fit a multilayer SBM using simulated annealing for improved inference
  • Extract the consensus partitions from multiple runs of the fitting procedure
  • Extract the topics and topic proportions associated to blocks of documents

Data

The associated Wikipedia dataset is saved in a zip file in data/dataset-four.zip.

  • Contains three 3 types of data.
  • Hyperlink: Each Wikipedia article has a hyperlink to another Wikipedia article.
  • Text: Each Wikipedia contains text associated to it.
  • Metadata: Each Wikipedia has a category assigned to it by Wikipedia users.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].