Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → brandomr → Document_cluster

brandomr / Document_cluster

A guide to document clustering in Python

Labels

jupyter-notebook

Projects that are alternatives of or similar to Document cluster

PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)

Stars: ✭ 460 (-1.92%)

Mutual labels: jupyter-notebook

Graph Neural Networks

Stars: ✭ 464 (-1.07%)

Mutual labels: jupyter-notebook

Stars: ✭ 467 (-0.43%)

Mutual labels: jupyter-notebook

Instcolorization

Stars: ✭ 461 (-1.71%)

Mutual labels: jupyter-notebook

Udacity Deep Learning

Udacity Deep Learning MOOC assignments

Stars: ✭ 463 (-1.28%)

Mutual labels: jupyter-notebook

Python3 In One Pic

Learn python3 in one picture.

Stars: ✭ 4,514 (+862.47%)

Mutual labels: jupyter-notebook

Efficient Learning of Augmentation Policy Schedules

Stars: ✭ 461 (-1.71%)

Mutual labels: jupyter-notebook

Lowresource Nlp Bootcamp 2020

The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020

Stars: ✭ 469 (+0%)

Mutual labels: jupyter-notebook

Source codes for the book "Reinforcement Learning: Theory and Python Implementation"

Stars: ✭ 464 (-1.07%)

Mutual labels: jupyter-notebook

Machine Learning A Probabilistic Perspective Solutions

My solutions to Kevin Murphy Machine Learning Book

Stars: ✭ 467 (-0.43%)

Mutual labels: jupyter-notebook

Timeseries seq2seq

This repo aims to be a useful collection of notebooks/code for understanding and implementing seq2seq neural networks for time series forecasting. Networks are constructed with keras/tensorflow.

Stars: ✭ 462 (-1.49%)

Mutual labels: jupyter-notebook

Scene Graph Benchmark.pytorch

A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”

Stars: ✭ 462 (-1.49%)

Mutual labels: jupyter-notebook

Interview Questions

机器学习/深度学习/Python/Go语言面试题笔试题(Machine learning Deep Learning Python and Golang Interview Questions)

Stars: ✭ 462 (-1.49%)

Mutual labels: jupyter-notebook

Generative Adversarial Network Tutorial

Tutorial on creating your own GAN in Tensorflow

Stars: ✭ 461 (-1.71%)

Mutual labels: jupyter-notebook

Clickbait Detector

Detects clickbait headlines using deep learning.

Stars: ✭ 468 (-0.21%)

Mutual labels: jupyter-notebook

Artificial Intelligence For Trading

Content for Udacity's AI in Trading NanoDegree.

Stars: ✭ 459 (-2.13%)

Mutual labels: jupyter-notebook

Additive Margin Softmax

This is the implementation of paper <Additive Margin Softmax for Face Verification>

Stars: ✭ 464 (-1.07%)

Mutual labels: jupyter-notebook

Masters Desarrollo Udemy

Stars: ✭ 468 (-0.21%)

Mutual labels: jupyter-notebook

Дорожная карта по изучению Python

Stars: ✭ 467 (-0.43%)

Mutual labels: jupyter-notebook

How to make a text summarizer

This is the code for "How to Make a Text Summarizer - Intro to Deep Learning #10" by Siraj Raval on Youtube

Stars: ✭ 467 (-0.43%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Document Clustering with Python

In this guide, I will explain how to cluster a set of documents using Python. My motivating example is to identify the latent structures within the synopses of the top 100 films of all time (per an IMDB list). See the original postfor a more detailed discussion on the example. This guide covers:

tokenizing and stemming each synopsis
transforming the corpus into vector space using tf-idf
calculating cosine distance between each document as a measure of similarity
clustering the documents using the k-means algorithm
using multidimensional scaling to reduce dimensionality within the corpus
plotting the clustering output using matplotlib and mpld3
conducting a hierarchical clustering on the corpus using Ward clustering
plotting a Ward dendrogram
topic modeling using Latent Dirichlet Allocation (LDA)

The 'cluster_analysis' workbook is fully functional; the 'cluster_analysis_web' workbook has been trimmed down for the purpose of creating this walkthrough. Feel free to download the repo and use 'cluster_analysis' to step through the guide yourself.

How the repo is set up

Once you've pulled down the repo, all you need to do is run 'cluster_analysis.ipynb'; it will find the various lists of synopses and titles. The 'Film_Scrape.ipynb' contains the code I used to actually scrape the synopses, in case you are interested. The other items in the repo are mostly incidentals for setting up the webpage walk-through. There is also one pickled model.

At some point in the future I'll write up how I executed the web scraping in case it's of interest.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 469

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (15) 🔗