All Projects → sandipanpaul21 → Clustering-in-Python

sandipanpaul21 / Clustering-in-Python

Licence: other
Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Clustering-in-Python

dbscan-python
[New Version] Theoretically Efficient and Practical Parallel DBSCAN
Stars: ✭ 18 (-33.33%)
Mutual labels:  clustering, dbscan, dbscan-clustering
genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (+25.93%)
Mutual labels:  clustering, clustering-algorithm, hierarchical-clustering
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+366.67%)
Mutual labels:  clustering, clustering-algorithm, clustering-evaluation
ClusterAnalysis.jl
Cluster Algorithms from Scratch with Julia Lang. (K-Means and DBSCAN)
Stars: ✭ 22 (-18.52%)
Mutual labels:  clustering, dbscan, dbscan-clustering
Hdbscan
A high performance implementation of HDBSCAN clustering.
Stars: ✭ 2,032 (+7425.93%)
Mutual labels:  clustering, clustering-algorithm, clustering-evaluation
point-cloud-clusters
A catkin workspace in ROS which uses DBSCAN to identify which points in a point cloud belong to the same object.
Stars: ✭ 43 (+59.26%)
Mutual labels:  clustering, dbscan, dbscan-clustering
tsp-essay
A fun study of some heuristics for the Travelling Salesman Problem.
Stars: ✭ 15 (-44.44%)
Mutual labels:  clustering, kmeans-clustering, kmeans-algorithm
cyoptics-clustering
Fast OPTICS clustering in Cython + gradient cluster extraction
Stars: ✭ 23 (-14.81%)
Mutual labels:  clustering-algorithm, dbscan-clustering, clustering-methods
Clustering-Python
Python Clustering Algorithms
Stars: ✭ 23 (-14.81%)
Mutual labels:  clustering-algorithm, dbscan-clustering
k-means-quantization-js
🎨 Apply color quantization to images using k-means clustering.
Stars: ✭ 27 (+0%)
Mutual labels:  clustering, kmeans-clustering
kmeans-clustering-cpp
A C++ implementation of simple k-means clustering algorithm.
Stars: ✭ 39 (+44.44%)
Mutual labels:  kmeans-clustering, kmeans-algorithm
kmeans-dbscan-tutorial
A clustering tutorial with scikit-learn for beginners.
Stars: ✭ 20 (-25.93%)
Mutual labels:  clustering-algorithm, dbscan
clusters
Cluster analysis library for Golang
Stars: ✭ 68 (+151.85%)
Mutual labels:  clustering, clustering-algorithm
clope
Elixir implementation of CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data
Stars: ✭ 18 (-33.33%)
Mutual labels:  clustering, clustering-algorithm
dbscan
DBSCAN Clustering Algorithm C# Implementation
Stars: ✭ 38 (+40.74%)
Mutual labels:  clustering, dbscan
kmpp
k-means clustering algorithm with k-means++ initialization.
Stars: ✭ 28 (+3.7%)
Mutual labels:  clustering-algorithm, kmeans-algorithm
text-cluster
🍡 文本聚类 k-means算法及实战
Stars: ✭ 40 (+48.15%)
Mutual labels:  kmeans-clustering, kmeans-algorithm
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-37.04%)
Mutual labels:  clustering, kmeans-clustering
hpdbscan
Highly parallel DBSCAN (HPDBSCAN)
Stars: ✭ 19 (-29.63%)
Mutual labels:  clustering, dbscan
Study-of-David-Mackay-s-book-
David Mackay's book review and problem solvings and own python codes, mathematica files
Stars: ✭ 46 (+70.37%)
Mutual labels:  clustering-algorithm, kmeans-clustering

Welcome to Clustering (Theory & Code)

01 Unsupervised Learning (Theory)

  • What is Unsupervised Learning & Goals of Unsupervised Learning
  • Type of Unsupervised Learning: 1.Clustering, 2.Association Rule & 3.Dimensionality Reduction

02 Clustering (Theory)

  • Definition and Application of Clustering
  • 4 methods: 1.K Means 2.Hierarchical 3.DBScan & 4.Gaussian Mixture

03 Euclidean & Manhattan Distance (Theory)

  • Two points are near to each other, chances they are similar
  • Distance Measure between two points
    1. Euclidean Distance: Under-root of Square distance between two points
    2. Manhattan Distance: Absolute Distance between points

04 K-Means Clustering (Theory)

  • How Algorithim works (Step Wise Calculation)
  • Pre-processing required for K Means
  • Determining optimal number of K: 1.Profiling Approach & 2.Elbow Method

05 Elbow Method (Theory)

  • Working of Elbow Method with Example
  • 3 concepts: 1.Total Error, 2.Variance/Total Squared Error & 3.Within Cluster Sum of Square (WCSS)

06 K Means Clustering (Python Code)

  • Define number of clusters, take centroids and measure distance
  • Euclidean Distance : Measure distance between points
  • Number of Clusters defined by Elbow Method
  • Elbow Method : WCSS vs Number of Cluster
  • Silhouette Score : Goodness of Clustering

07 Hierarchical Clustering (Theory)

  • Two Approaches: 1.Agglomerative(Botton-Up) & 2.Divisive(Top-Down)
  • Types of Linkages:
    1. Single Linkage - Nearest Neighbour (Minimal intercluster dissimilarity)
    2. Complete Linkage - Farthest Neighbour (Maximal intercluster dissimilarity)
    3. Average Linkage - Average Distance (Mean intercluster dissimilarity)
  • Steps in Agglomerative Hierarchical Clustering with Single Linkage
  • Determining optimal number of Cluster: Dendogram

08 Dendogram (Theory)

  • Hierarchical relationship between objects
  • Optimal number of Clusters for Hierarchical Clustering

09 Hierarchical Clustering (Python Code)

  • Type of HC
    1. Agglomerative : Bottom Up approach
    2. Divisive : Top Down approach
  • Number of Clusters defined by Dendogram
  • Dendogram : Joining datapoints based on distance & creating clusters
  • Linkage : To calculate distance between two points of two clusters
    1. Single linkage : Minimum Distance between two clusters
    2. Complete linkage : Maximum Distance between two clusters
    3. Average linkage : Average Distance between two clusters

10 DBScan Clustering (Theory)

  • Density Based Clustering
  • Kmeans & Hierarchical good for compact & well seperated Data
  • Both are sensitive to Outliers & Noise
  • DBScan overcome all the issue & works well with Outliers
  • 2 important parameters -
    1. eps: Distance between 2 points is lower/equal to eps they are neighbours
    2. MinPts: Minimum number of neighbours/data points with eps radius

11 DBScan Clustering (Python Code)

  • No need to give pre-define clusters
  • Distance metric is Euclidean Distance
  • Need to give 2 parameters
    1. eps : Radius of the circle
    2. min_samples : minimum data points to consider it as clusters

12 GMM Clustering (Theory)

  • Weakness of K Means
  • Expectation Maximization(EM) method

13 Gausian Mixture Model Clustering (Python Code)

  • Probablistic Model
  • Uses Expectation-Minimization (EM) steps:
    1. E Step : Probability of datapoint of each cluster
    2. M Step : For each cluster,revise parameter based on proabability

14 Cluster Adjustment (Theory)

  • 2 Steps we normally do for Cluster Adjustement
    1. Quality of Clustering (Cardinality & Magnitude)
    2. Performance of Similiarity Measure (Euclidean Distance)

15 Silhouette Coefficient - Cluster Validation (Theory)

  • Clusters are well apart from each other as the silhouette score is closer to 1
  • It is a metric used to calculate the goodness of a clustering technique
  • Its value ranges from -1 to 1.
    1. 1: Means clusters are well apart from each other and clearly distinguished
    2. 0: Means clusters are indifferent, or distance between clusters is not significant
    3. -1: Means clusters are assigned in the wrong way

16 Disadvantage & Choosing Right Clustering Method (Theory)

  • Disadvantage of each clustering techniques respectively
  • Based on the data, which is the right clustering method

17 Clustering Revision (Theory)

  • Short Description of Each Clustering Alogrithim
  • Advantage, Disadvantage
  • When to use what

18 Interview Questions on Clustering (Theory)

  • Commonly asked question on Clustering

19 K Modes (Theory)

  • For Categorical variable clustering, use K Modes
  • It uses the dissimilarities(total mismatch) between data points
  • Lesser the dissimilarities, the more our data points are closer
  • It uses Mode for most value in the column

20 K Modes (Python Code)

  • K Mode code in Python
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].