liuguiyangnwpu / Densitycluster
Machine learning. Clustering by fast search and find of density peaks.
Stars: ✭ 27
Labels
Projects that are alternatives of or similar to Densitycluster
Biolitmap
Code for the paper "BIOLITMAP: a web-based geolocated and temporal visualization of the evolution of bioinformatics publications" in Oxford Bioinformatics.
Stars: ✭ 18 (-33.33%)
Mutual labels: science
Repotools
Short names, big time savings – a collection of commands for the git operations you perform most often
Stars: ✭ 24 (-11.11%)
Mutual labels: science
Gitscience
A curated list of science- and engineering related repositories on GitHub and in neighboring counties
Stars: ✭ 8 (-70.37%)
Mutual labels: science
Pydhamed
Dynamic Histogram Analysis To Determine Free Energies and Rates from Biased Simulations
Stars: ✭ 17 (-37.04%)
Mutual labels: science
Kitti Track Collection
Data and devtools for the "Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video" paper.
Stars: ✭ 20 (-25.93%)
Mutual labels: clustering
Gab2019sciencelab
This project contains instructions to deploy the Global Azure Bootcamp 2019 Science Lab
Stars: ✭ 25 (-7.41%)
Mutual labels: science
Thrive
The main repository for the development of the evolution game Thrive.
Stars: ✭ 874 (+3137.04%)
Mutual labels: science
Arxiv Equations
🚀 Provides equations in latex format from arxiv paper.
Stars: ✭ 23 (-14.81%)
Mutual labels: science
Data mining
The Ruby DataMining Gem, is a little collection of several Data-Mining-Algorithms
Stars: ✭ 10 (-62.96%)
Mutual labels: clustering
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-33.33%)
Mutual labels: clustering
Glumpy
Python+Numpy+OpenGL: fast, scalable and beautiful scientific visualization
Stars: ✭ 882 (+3166.67%)
Mutual labels: science
Events
Repository for *SEM Paper on Event Coreference Resolution in ECB+
Stars: ✭ 20 (-25.93%)
Mutual labels: clustering
Density Cluster
使用基于密度的聚类算法,进行高维特征的聚类分析,从高维数据中提取出类似的有用信息,从而简化了特征数量,并且去除了部分冗余信息。 在聚类算法中,有这样几种算法:
- 划分的算法, K-Means
- 层次的方法, CURE
- 基于密度的算法, DBSCAN,DPCA(Desity Peaks Clusering Algorithm)
- 基于网格的算法, CLIQUE
- 基于模型的算法, 主要是一些概率的算法
由Alex Rodriguez和Alessandro Laio发表的《Clustering by fast search and find of density peaks》的主要思想是寻找被低密度区域分离的高密度区域。 基于这样的一种假设: 对于一个数据集,聚类中心被一些低局部密度的数据点包围,而且这些低局部密度的点距离其他有高局部密度的点的距离都比较大。
如何定义局部密度?
找到与某个数据点之间的距离小于截断距离的数据点的数量
如何寻找与高密度之间的距离?
- 找到所有比第i个数据点局部密度都打的数据点中,与第i个数据点之间的距离最小的值;
- 而对于有最大密度的数据点,通常取 $\sigma_i = max_{j}(d_{ij})$;
如何确定聚类中心、外点?
- DPCA中将那些具有较大距离$\sigma_i$,且同时具有较大局部密度的$\ro_i$的点定义为聚类中心。
- 同时具有较高的距离,但是密度却较小的数据点称为异常点。
- 根据论文中的决策图和乘积曲线去寻找潜在的聚类中心
- 一条线中,去掉为零的部分,然后取出指定的前百分之几的数据即可
- 将数据按照层次聚类,将曲线分层,找到可能的聚类中心
Requirements
- g++-4.7以上版本
- 内存最好够大,因为至少要存储任意两个向量之间的距离
- 使用libopm进行算法的并行化,提高运行效率
程序运行的框架
程序运行展示
测试数据的分布
- 样本数据的展示
References
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].