All Projects → mhahsler → Dbscan

mhahsler / Dbscan

Licence: gpl-3.0
Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package

Programming Languages

r
7636 projects
optics
23 projects

Projects that are alternatives of or similar to Dbscan

Stream
A framework for data stream modeling and associated data mining tasks such as clustering and classification. - R Package
Stars: ✭ 23 (-85.71%)
Mutual labels:  clustering, cran
Refinr
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Stars: ✭ 91 (-43.48%)
Mutual labels:  clustering, cran
Mlr
Machine Learning in R
Stars: ✭ 1,542 (+857.76%)
Mutual labels:  clustering, cran
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+1060.25%)
Mutual labels:  clustering
Docker Rabbitmq Cluster
Cluster RabbitMQ (official docker image)
Stars: ✭ 144 (-10.56%)
Mutual labels:  clustering
Ml Course
Starter code of Prof. Andrew Ng's machine learning MOOC in R statistical language
Stars: ✭ 154 (-4.35%)
Mutual labels:  clustering
Danmf
A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (+0%)
Mutual labels:  clustering
Meanrecipe
Get a consensus recipe for your next meal. 🍪 🍰
Stars: ✭ 140 (-13.04%)
Mutual labels:  clustering
Instance Segmentation With Discriminative Loss Tensorflow
Tensorflow implementation of "Semantic Instance Segmentation with a Discriminative Loss Function"
Stars: ✭ 158 (-1.86%)
Mutual labels:  clustering
Python Clustering Exercises
Jupyter Notebook exercises for k-means clustering with Python 3 and scikit-learn
Stars: ✭ 153 (-4.97%)
Mutual labels:  clustering
Tableone
R package to create "Table 1", description of baseline characteristics with or without propensity score weighting
Stars: ✭ 151 (-6.21%)
Mutual labels:  cran
Latex2exp
Use LaTeX in R. More LaTeX, less plotmath!
Stars: ✭ 148 (-8.07%)
Mutual labels:  cran
Isee
R/shiny interface for interactive visualization of data in SummarizedExperiment objects
Stars: ✭ 155 (-3.73%)
Mutual labels:  clustering
Matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Stars: ✭ 141 (-12.42%)
Mutual labels:  clustering
Osrm
Shortest Paths and Travel Time from OpenStreetMap with R
Stars: ✭ 160 (-0.62%)
Mutual labels:  cran
Hazelcast Go Client
Hazelcast IMDG Go Client
Stars: ✭ 140 (-13.04%)
Mutual labels:  clustering
Explor
Interfaces for Multivariate Analysis in R
Stars: ✭ 157 (-2.48%)
Mutual labels:  cran
Matrixstats
R package: Methods that Apply to Rows and Columns of Matrices (and to Vectors)
Stars: ✭ 151 (-6.21%)
Mutual labels:  cran
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+1264.6%)
Mutual labels:  clustering
Hdbscan
A high performance implementation of HDBSCAN clustering.
Stars: ✭ 2,032 (+1162.11%)
Mutual labels:  clustering

dbscan - Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package

CRAN version Rdoc CRAN RStudio mirror downloads R build status AppVeyor Build Status

This R package provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data. The package includes:

Clustering

  • DBSCAN: Density-based spatial clustering of applications with noise.
  • HDBSCAN: Hierarchical DBSCAN with simplified hierarchy extraction.
  • OPTICS/OPTICSXi: Ordering points to identify the clustering structure clustering algorithms.
  • FOSC: Framework for Optimal Selection of Clusters for unsupervised and semisupervised clustering of hierarchical cluster tree.
  • Jarvis-Patrick clustering
  • SNN Clustering: Shared Nearest Neighbor Clustering.

Outlier Detection

  • LOF: Local outlier factor algorithm.
  • GLOSH: Global-Local Outlier Score from Hierarchies algorithm.

Fast Nearest-Neighbor Search (using kd-trees)

  • kNN search
  • Fixed-radius NN search

The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are typically faster than the native R implementations (e.g., dbscan in package fpc), or the implementations in WEKA, ELKI and Python's scikit-learn.

Installation

Stable CRAN version: install from within R with

install.packages("dbscan")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

library("devtools")
install_github("mhahsler/dbscan")

Usage

Load the package and use the numeric variables in the iris dataset

library("dbscan")

data("iris")
x <- as.matrix(iris[, 1:4])

Run DBSCAN

db <- dbscan(x, eps = .4, minPts = 4)
db
DBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.

 0  1  2  3  4 
25 47 38 36  4 

Available fields: cluster, eps, minPts

Visualize results (noise is shown in black)

pairs(x, col = db$cluster + 1L)

Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)

lof <- lof(x, k = 4)
pairs(x, cex = lof)

Run OPTICS

opt <- optics(x, eps = 1, minPts = 4)
opt
OPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xi

Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)

opt <- extractDBSCAN(opt, eps_cl = .4)
plot(opt)

Extract a hierarchical clustering using the Xi method (captures clusters of varying density)

opt <- extractXi(opt, xi = .05)
opt
plot(opt)

Run HDBSCAN (captures stable clusters)

hdb <- hdbscan(x, minPts = 4)
hdb
HDBSCAN clustering for 150 objects.
Parameters: minPts = 4
The clustering contains 2 cluster(s) and 0 noise points.

  1   2 
100  50 

Available fields: cluster, minPts, cluster_scores, membership_prob, outlier_scores, hc

Visualize the results as a simplified tree

plot(hdb, show_flat = T)

See how well each point corresponds to the clusters found by the model used

  colors <- mapply(function(col, i) adjustcolor(col, alpha.f = hdb$membership_prob[i]), 
                   palette()[hdb$cluster+1], seq_along(hdb$cluster))
  plot(x, col=colors, pch=20)

License

The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.

Further Information

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].