All Projects → derekgreene → Topic Model Tutorial

derekgreene / Topic Model Tutorial

Tutorial on topic models in Python with scikit-learn

Projects that are alternatives of or similar to Topic Model Tutorial

Senato.py
A scraper for the data made available by the Italian Senate, and a cluster analysis to detect similar amendments.
Stars: ✭ 118 (-0.84%)
Mutual labels:  jupyter-notebook
Automunge
Artificial Learning, Intelligent Machines
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Voice activity detector
A statistical model-based Voice Activity Detection
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Ds salary proj
Repo for the data science salary prediction of the Data Science Project From Scratch video on my youtube
Stars: ✭ 116 (-2.52%)
Mutual labels:  jupyter-notebook
Texture Synthesis Nonparametric Sampling
Implementation of "Texture Synthesis with Non-Parametric Sampling" paper by Alexei A. Efros and Thomas K. Leung
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Kaggle challenge
This is the code for "Kaggle Challenge LIVE" By Siraj Raval on Youtube
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Ysda deeplearning17
Yandex SDA classes on deep learning. Version of year 2017
Stars: ✭ 118 (-0.84%)
Mutual labels:  jupyter-notebook
Mnet deepcdr
Code for TMI 2018 "Joint Optic Disc and Cup Segmentation Based on Multi-label Deep Network and Polar Transformation"
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Pydatadc 2018 Tidy
PyData 2018 tutorial for tidying data
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Linear Attention Recurrent Neural Network
A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Deeplearning With Tensorflow Notes
龙曲良《TensorFlow深度学习》学习笔记及代码,采用TensorFlow2.0.0版本
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Chromagan
Official Implementation of ChromaGAN: An Adversarial Approach for Picture Colorization
Stars: ✭ 117 (-1.68%)
Mutual labels:  jupyter-notebook
Bayes By Backprop
PyTorch implementation of "Weight Uncertainty in Neural Networks"
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Midi Dataset
Code for creating a dataset of MIDI ground truth
Stars: ✭ 118 (-0.84%)
Mutual labels:  jupyter-notebook
Trimap generator
Generating automatic trimap through pixel dilation and strongly-connected-component algorithms
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow shiny
A R/Shiny app for interactive RNN tensorflow models
Stars: ✭ 118 (-0.84%)
Mutual labels:  jupyter-notebook
Abstractive Text Summarization
PyTorch implementation/experiments on Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond paper.
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Defaultcreds Cheat Sheet
One place for all the default credentials to assist the Blue/Red teamers activities on finding devices with default password 🛡️
Stars: ✭ 1,949 (+1537.82%)
Mutual labels:  jupyter-notebook
2018 19 Classes
https://cc-mnnit.github.io/2018-19-Classes/ - 🎒 💻 Material for Computer Club Classes
Stars: ✭ 119 (+0%)
Mutual labels:  jupyter-notebook
Adversarial examples
对抗样本
Stars: ✭ 118 (-0.84%)
Mutual labels:  jupyter-notebook

topic-model-tutorial

This repository contains notebooks, slides, and data for the short tutorial "Topic modelling with Scikit-learn", presented at PyData Dublin in September 2017.

Contents

The summary tutorial is covered in these slides. There are three associated IPython notebooks:

  1. Text Preprocessing: Provides a basic introduction to preprocessing documents with scitkit-learn.
  2. NMF Topic Models: Covers the application and interpretation of topic models via the NMF implementation provided by scitkit-learn.
  3. Parameter Selection for NMF: More advanced material on selecting the number of topics for NMF, using topic coherence.

To demonstrate the topic modelling techniques, a sample dataset is provided here. This consists of 4,551 news articles from 2016, stored in a single text file (25MB), one article per line.

Dependencies

This code has been tested with Python 3.6. The core package requirements are:

  • scikit-learn (tested with v0.19.0)
  • numpy
  • matplotlib

The model selection code also relies on the gensim package to build a Word2Vec model. A pre-built Word2Vec model for the sample dataset is also provided here for download (71MB).

Links and References

  • Scikit-learn home
  • NMF documentation for scikit-learn
  • Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature. [PDF]
  • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4). [Link]
  • O’Callaghan, D., Greene, D., Carthy, J., & Cunningham, P. (2015). An analysis of the coherence of descriptors in topic modeling. Expert Systems with Applications. [PDF]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].