Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!

Stars: ✭ 59 (+180.95%)

Mutual labels: data-mining, machine-learning-algorithms, datascience

Clustering Algorithms from Scratch

Implementing Clustering Algorithms from scratch in MATLAB and Python

Stars: ✭ 170 (+709.52%)

Mutual labels: cluster, machine-learning-algorithms, cluster-analysis

taller SparkR

Taller SparkR para las Jornadas de Usuarios de R

Stars: ✭ 12 (-42.86%)

Mutual labels: data-mining, machine-learning-algorithms, data-analysis

Data-Scientist-In-Python

This repository contains notes and projects of Data scientist track from dataquest course work.

Stars: ✭ 23 (+9.52%)

Mutual labels: machine-learning-algorithms, datascience

heidi

heidi : tidy data in Haskell

Stars: ✭ 24 (+14.29%)

Mutual labels: data-mining, data-analysis

Data-Science-Resources

A guide to getting started with Data Science and ML.

Stars: ✭ 17 (-19.05%)

Mutual labels: datascience, data-analysis

pyclustertend

A python package to assess cluster tendency

Stars: ✭ 38 (+80.95%)

Mutual labels: clustering, cluster-analysis

AgePredictor

Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum

Stars: ✭ 13 (-38.1%)

Mutual labels: machine-learning-algorithms, datascience

python-notebooks

A collection of Jupyter Notebooks used in conferences or just to have some snippets.

Stars: ✭ 14 (-33.33%)

Mutual labels: data-mining, data-analysis

react-map-gl-cluster

Urbica React Cluster Component for Mapbox GL JS

Stars: ✭ 27 (+28.57%)

Mutual labels: clustering, cluster

View All Similar Projects ➔

Genie (R Package)

This project has been superseded by genieclust, which features a faster and more feature-rich implementation of Genie (now also available for both R and Python).

A Fast and Robust Hierarchical Clustering Algorithm

The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts the use of all the classical linkage criteria at a disadvantage, with the exception of the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms and therefore usually does not reflect the true underlying structure of analysed data - unless the clusters are well-separated. To overcome its limitations, we proposed a hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that the Gini measure of inequity of the cluster sizes does not exceed a given threshold. This method most often outperforms the Ward or average linkage in terms of the clustering quality on benchmark data. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. The algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering.

A detailed description of the algorithm can be found in:

Gagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm, Information Sciences, 2016, doi:10.1016/j.ins.2016.05.003.

Authors: Marek Gagolewski, Maciej Bartoszuk, and Anna Cena

CRAN entry: http://cran.r-project.org/web/packages/genie/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

gagolews / genie

Programming Languages

Labels

Projects that are alternatives of or similar to genie

Genie (R Package)

A Fast and Robust Hierarchical Clustering Algorithm