shubhamjha97 / hierarchical-clustering

Licence: other

A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to hierarchical-clustering

genieclust

Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R

Stars: ✭ 34 (-45.16%)

Mutual labels: data-mining, clustering, hierarchical-clustering

genie

Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)

Stars: ✭ 21 (-66.13%)

Mutual labels: data-mining, clustering

SparseLSH

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

Stars: ✭ 127 (+104.84%)

Mutual labels: data-mining, clustering

FPGrowth-and-Apriori-algorithm-Association-Rule-Data-Mining

Implementation of FPTree-Growth and Apriori-Algorithm for finding frequent patterns in Transactional Database.

Stars: ✭ 19 (-69.35%)

Mutual labels: data-mining, data-mining-algorithms

Data-Mining-and-Warehousing

Data Mining algorithms for IDMW632C course at IIIT Allahabad, 6th semester

Stars: ✭ 19 (-69.35%)

Mutual labels: data-mining, data-mining-algorithms

kmeans

A simple implementation of K-means (and Bisecting K-means) clustering algorithm in Python

Stars: ✭ 18 (-70.97%)

Mutual labels: data-mining, clustering

NIDS-Intrusion-Detection

Simple Implementation of Network Intrusion Detection System. KddCup'99 Data set is used for this project. kdd_cup_10_percent is used for training test. correct set is used for test. PCA is used for dimension reduction. SVM and KNN supervised algorithms are the classification algorithms of project. Accuracy : %83.5 For SVM , %80 For KNN

Stars: ✭ 45 (-27.42%)

Mutual labels: data-mining, data-mining-algorithms

graphgrove

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Stars: ✭ 29 (-53.23%)

Mutual labels: clustering, hierarchical-clustering

All Algorithms implemented in R

Stars: ✭ 294 (+374.19%)

Mutual labels: data-mining, clustering

Elki

ELKI Data Mining Toolkit

Stars: ✭ 613 (+888.71%)

Mutual labels: data-mining, clustering

Pyclustering

pyclustring is a Python, C++ data mining library.

Stars: ✭ 806 (+1200%)

Mutual labels: data-mining, clustering

Apriori-and-Eclat-Frequent-Itemset-Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Stars: ✭ 36 (-41.94%)

Mutual labels: data-mining, data-mining-algorithms

teanaps

자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.

Stars: ✭ 91 (+46.77%)

Mutual labels: data-mining, clustering

Heart disease prediction

Heart Disease prediction using 5 algorithms

Stars: ✭ 43 (-30.65%)

Mutual labels: data-mining, clustering

Clustering-in-Python

Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.

Stars: ✭ 27 (-56.45%)

Mutual labels: clustering, hierarchical-clustering

Matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

Stars: ✭ 141 (+127.42%)

Mutual labels: data-mining, clustering

Alink

Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.

Stars: ✭ 2,936 (+4635.48%)

Mutual labels: data-mining, clustering

Data mining

The Ruby DataMining Gem, is a little collection of several Data-Mining-Algorithms

Stars: ✭ 10 (-83.87%)

Mutual labels: data-mining, clustering

Orange3

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+4983.87%)

Mutual labels: data-mining, clustering

A-quantum-inspired-genetic-algorithm-for-k-means-clustering

Implementation of a Quantum inspired genetic algorithm proposed by A quantum-inspired genetic algorithm for k-means clustering paper.

Stars: ✭ 28 (-54.84%)

Mutual labels: clustering

View All Similar Projects ➔

Agglomerative and Divisive Hierarchical Clustering

Course Assignment for CS F415- Data Mining @ BITS Pilani, Hyderabad Campus.

Done under the guidance of Dr. Aruna Malapati, Assistant Professor, BITS Pilani, Hyderabad Campus.

Introduction
Data
Instructions to run the scripts - Divisive clustering - Agglomerative clustering
Equations used
Pre-processing done
Machine specs
Results
- Agglomerative
- Divisive
Group Members

Table of contents generated with markdown-toc

Introduction

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:

Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.

The main purpose of this project is to get an in depth understanding of how the Divisive and Agglomerative hierarchical clustering algorithms work.

More on Hierarchical clustering

Data

We used the Human Gene DNA Sequence dataset, which can be found here. The dataset contains 311 gene sequences. The data can be found in the folder 'data'.

Instructions to run the scripts

Run the following command:

Divisive clustering

python divisive.py

Agglomerative clustering

python agglomerative.py

Equations used

Maximum or complete-linkage clustering -> Max(d(a,b))
Minimum or single-linkage clustering -> Min(d(a,b))
Mean or average linkage clustering -> sum of all d(a,b)/(|A|+|B|)
Diameter of a cluster -> Max(d(x,y))

where x, y are points in the same cluster and, a belongs to A, b belongs to B.

Pre-processing done

The file was read sequence by sequence and was saved in the form of a dictionary, where the key is the gene sequence's name and the value contains the entire gene string.

A mapping was created from the unique gene sequences in the dataset to integers so that each sequence corresponded to a unique integer.

The entire data was mapped to integers to reduce the storage and computational requirement.

Machine specs

Processor: i7-7500U

Ram: 16 GB DDR4

OS: Ubuntu 16.04 LTS

Results

CLustering was performed using the agglomerative and divisive methods and the following dendrograms were obtained-

Agglomerative

Divisive

Group Members

Shubham Jha

Praneet Mehta

Abhinav Jain

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

shubhamjha97 / hierarchical-clustering

Programming Languages

Labels

Projects that are alternatives of or similar to hierarchical-clustering

Agglomerative and Divisive Hierarchical Clustering

Table of contents

Introduction

Data

Instructions to run the scripts

Divisive clustering

Agglomerative clustering

Equations used

Pre-processing done

Machine specs

Results

Agglomerative

Divisive

Group Members