All Projects → gagolews → genie

gagolews / genie

Licence: other
Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)

Programming Languages

C++
36643 projects - #6 most used programming language
r
7636 projects
Makefile
30231 projects

Projects that are alternatives of or similar to genie

genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (+61.9%)
Mutual labels:  data-mining, clustering, machine-learning-algorithms, data-analysis, genie, cluster-analysis, hierarchical-clustering-algorithm
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+24104.76%)
Mutual labels:  data-mining, outliers, data-analysis
Hdbscan
A high performance implementation of HDBSCAN clustering.
Stars: ✭ 2,032 (+9576.19%)
Mutual labels:  clustering, machine-learning-algorithms, cluster-analysis
Model Describer
model-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (+4.76%)
Mutual labels:  data-mining, machine-learning-algorithms, data-analysis
Elki
ELKI Data Mining Toolkit
Stars: ✭ 613 (+2819.05%)
Mutual labels:  data-mining, clustering, data-analysis
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (+104.76%)
Mutual labels:  data-mining, clustering, machine-learning-algorithms
Spring2017 proffosterprovost
Introduction to Data Science
Stars: ✭ 18 (-14.29%)
Mutual labels:  data-mining, machine-learning-algorithms, data-analysis
Data mining
The Ruby DataMining Gem, is a little collection of several Data-Mining-Algorithms
Stars: ✭ 10 (-52.38%)
Mutual labels:  data-mining, clustering, machine-learning-algorithms
Datascience
Curated list of Python resources for data science.
Stars: ✭ 3,051 (+14428.57%)
Mutual labels:  data-mining, datascience, data-analysis
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (+714.29%)
Mutual labels:  data-mining, datascience, data-analysis
xgboost-smote-detect-fraud
Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!
Stars: ✭ 59 (+180.95%)
Mutual labels:  data-mining, machine-learning-algorithms, datascience
Clustering Algorithms from Scratch
Implementing Clustering Algorithms from scratch in MATLAB and Python
Stars: ✭ 170 (+709.52%)
Mutual labels:  cluster, machine-learning-algorithms, cluster-analysis
taller SparkR
Taller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-42.86%)
Mutual labels:  data-mining, machine-learning-algorithms, data-analysis
Data-Scientist-In-Python
This repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (+9.52%)
Mutual labels:  machine-learning-algorithms, datascience
heidi
heidi : tidy data in Haskell
Stars: ✭ 24 (+14.29%)
Mutual labels:  data-mining, data-analysis
Data-Science-Resources
A guide to getting started with Data Science and ML.
Stars: ✭ 17 (-19.05%)
Mutual labels:  datascience, data-analysis
pyclustertend
A python package to assess cluster tendency
Stars: ✭ 38 (+80.95%)
Mutual labels:  clustering, cluster-analysis
AgePredictor
Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum
Stars: ✭ 13 (-38.1%)
Mutual labels:  machine-learning-algorithms, datascience
python-notebooks
A collection of Jupyter Notebooks used in conferences or just to have some snippets.
Stars: ✭ 14 (-33.33%)
Mutual labels:  data-mining, data-analysis
react-map-gl-cluster
Urbica React Cluster Component for Mapbox GL JS
Stars: ✭ 27 (+28.57%)
Mutual labels:  clustering, cluster

Genie (R Package)

This project has been superseded by genieclust, which features a faster and more feature-rich implementation of Genie (now also available for both R and Python).

A Fast and Robust Hierarchical Clustering Algorithm

Build Status

The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts the use of all the classical linkage criteria at a disadvantage, with the exception of the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms and therefore usually does not reflect the true underlying structure of analysed data - unless the clusters are well-separated. To overcome its limitations, we proposed a hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that the Gini measure of inequity of the cluster sizes does not exceed a given threshold. This method most often outperforms the Ward or average linkage in terms of the clustering quality on benchmark data. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. The algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering.

A detailed description of the algorithm can be found in:

Gagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm, Information Sciences, 2016, doi:10.1016/j.ins.2016.05.003.

Authors: Marek Gagolewski, Maciej Bartoszuk, and Anna Cena

CRAN entry: http://cran.r-project.org/web/packages/genie/

See also: http://genieclust.gagolewski.com/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].