All Projects → Lilykos → clusterix

Lilykos / clusterix

Licence: other
Visual exploration of clustered data.

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects
python
139335 projects - #7 most used programming language
CSS
56736 projects

Projects that are alternatives of or similar to clusterix

watchman
Watchman: An open-source social-media event-detection system
Stars: ✭ 18 (-59.09%)
Mutual labels:  clustering, tf-idf
2018 Machinelearning Lectures Esa
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Stars: ✭ 280 (+536.36%)
Mutual labels:  clustering, tf-idf
Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (+93.18%)
Mutual labels:  clustering, tf-idf
IntroduceToEclicpseVert.x
This repository contains the code of Vert.x examples contained in my articles published on platforms such as kodcu.com, medium, dzone. How to run each example is described in its readme file.
Stars: ✭ 27 (-38.64%)
Mutual labels:  clustering
kohonen-maps
Implementation of SOM and GSOM
Stars: ✭ 62 (+40.91%)
Mutual labels:  clustering
snATAC
<<------ Use SnapATAC!!
Stars: ✭ 23 (-47.73%)
Mutual labels:  clustering
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (+84.09%)
Mutual labels:  clustering
DBSCAN
c++ implementation of clustering by DBSCAN
Stars: ✭ 89 (+102.27%)
Mutual labels:  clustering
multigraph
multigraph: Plot and Manipulate Multigraphs in R
Stars: ✭ 18 (-59.09%)
Mutual labels:  plot
swanager
A high-level Docker Services management tool built on top of Swarm
Stars: ✭ 12 (-72.73%)
Mutual labels:  clustering
consul role
Ansible role to install Consul (cluster of) server/agent
Stars: ✭ 14 (-68.18%)
Mutual labels:  clustering
KeywordExtraction
Implementation of algorithm in keyword extraction,including TextRank,TF-IDF and the combination of both
Stars: ✭ 95 (+115.91%)
Mutual labels:  tf-idf
sarviewer
Generate graphs with gnuplot or matplotlib (Python) from sar data
Stars: ✭ 60 (+36.36%)
Mutual labels:  plot
pygrams
Extracts key terminology (n-grams) from any large collection of documents (>1000) and forecasts emergence
Stars: ✭ 52 (+18.18%)
Mutual labels:  tf-idf
ssdc
ssdeep cluster analysis for malware files
Stars: ✭ 24 (-45.45%)
Mutual labels:  clustering
text clustering
文本聚类(Kmeans、DBSCAN、LDA、Single-pass)
Stars: ✭ 230 (+422.73%)
Mutual labels:  clustering
ResumeRise
An NLP tool which classifies and summarizes resumes
Stars: ✭ 29 (-34.09%)
Mutual labels:  tf-idf
scSeqR
This package has migrated to https://github.com/rezakj/iCellR please use iCellR instead of scSeqR for more functionalities and updates.
Stars: ✭ 16 (-63.64%)
Mutual labels:  clustering
EgoSplitting
A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).
Stars: ✭ 78 (+77.27%)
Mutual labels:  clustering
bns-short-text-similarity
📖 Use Bi-normal Separation to find document vectors which is used to compute similarity for shorter sentences.
Stars: ✭ 24 (-45.45%)
Mutual labels:  tf-idf

alt tag

Clusterix: A visual analytics approach to data clustering

Clusterix is a web-based visual analytics tool that aspires to support clustering tasks by users, while having analysts at the center of the workflow. Clusterix provides the facilities to:

  • Load and preview CSV data files;
  • create a 2D projection of the dataset
  • select any combination of fields to be used for projection/clustering;
  • select and run one or more clustering algorithms (K-Means, Agglomerative Clustering, Mean Shift) with varying parameters;
  • view and interact with the results in a browser environment;
  • save time and use an iterative approach;
  • modify the parameters or input data to correct the clustering output.

Such an iterative, visual analytics approach allows users to quickly determine the best clustering algorithm and parameters for their data, and to correct any errors in the clustering output. Clusterix has been applied to the clustering of heterogeneous data sets

Usage

First you need to install the requirements:

pip install -r requirements.txt

To run the project:

python manage.py runserver

This command will run Clusterix on http://127.0.0.1:5000 where you will be able to use the interface to upload data files, and select the algorithms/options that you want.

Features

File input (CSV only currently)

  • Data Preview
  • Field selection
  • Text Features (Vectorizers, stemming, stopwords, etc)

Vectorizers

  • Count Vactorizer
  • Tf-Idf Vectorizer
  • Hashing Vectorizer

Decomposition

  • PCA
  • SVD
  • MDS
  • t-SNE

Algorithms

  • K-Means
  • Agglomerative Clustering
  • Mean Shift
  • DBSCAN

Plot Features

  • Scatterplot vizualizations
  • Full text/column search for the nodes
  • Brushing and zoom for targeted inspection
  • Various clustering metrics (TF-IDF, etc)

Instructions

Clusterix works iteratively, so there are certain steps that need to be followed:

  • Upload a data file. the necessary information/preprocessing will happen and the options will be shown
  • First you need to get a projection of the data, so use all the text and field options to tune your decomposition.
  • The decomposition model and the coordinates are saved, so that you can iterate through clustering models really fast.
  • In case you need to try a new decomposition, create a new projection.
  • Use brushing to get TF-IDF (if applicable) and a zoomed area for browsing.
  • The Search function works using the SQLite syntax, so everytime you want to write something imagine that it starts like this: SELECT * FROM dataframe WHERE...

Screenshots

Wine Data

alt tag

alt tag

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].