Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.

Stars: ✭ 27 (-78.57%)

Mutual labels: clustering, clustering-algorithm, clustering-evaluation

Hdbscan

A high performance implementation of HDBSCAN clustering.

Stars: ✭ 2,032 (+1512.7%)

Mutual labels: clustering, clustering-algorithm, clustering-evaluation

Hazelcast Python Client

Hazelcast IMDG Python Client

Stars: ✭ 92 (-26.98%)

Mutual labels: big-data, clustering, scalability

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-89.68%)

Mutual labels: big-data, bigdata

clustering-python

Different clustering approaches applied on different problemsets

Stars: ✭ 36 (-71.43%)

Mutual labels: clustering, clustering-algorithm

clope

Elixir implementation of CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data

Stars: ✭ 18 (-85.71%)

Mutual labels: clustering, clustering-algorithm

NiFi-Rule-engine-processor

Drools processor for Apache NiFi

Stars: ✭ 34 (-73.02%)

Mutual labels: big-data, bigdata

Uproot4

ROOT I/O in pure Python and NumPy.

Stars: ✭ 80 (-36.51%)

Mutual labels: big-data, bigdata

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+8623.02%)

Mutual labels: big-data, bigdata

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

Stars: ✭ 107 (-15.08%)

Mutual labels: big-data, bigdata

Clustering

Implements "Clustering a Million Faces by Identity"

Stars: ✭ 128 (+1.59%)

Mutual labels: clustering, clustering-algorithm

clueminer

interactive clustering platform

Stars: ✭ 13 (-89.68%)

Mutual labels: clustering, clustering-algorithm

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

Stars: ✭ 54 (-57.14%)

Mutual labels: big-data, bigdata

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-52.38%)

Mutual labels: big-data, bigdata

v6.dooring.public

可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.

Stars: ✭ 323 (+156.35%)

Mutual labels: big-data, bigdata

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-84.92%)

Mutual labels: big-data, bigdata

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (-54.76%)

Mutual labels: big-data, bigdata

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-73.02%)

Mutual labels: big-data, bigdata

Circosjs

d3 library to build circular graphs

Stars: ✭ 436 (+246.03%)

Mutual labels: big-data, bigdata

Hazelcast

Open-source distributed computation and storage platform

Stars: ✭ 4,662 (+3600%)

Mutual labels: big-data, scalability

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-96.03%)

Mutual labels: big-data, bigdata

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+491.27%)

Mutual labels: big-data, bigdata

Countly Sdk Cordova

Countly Product Analytics SDK for Cordova, Icenium and Phonegap

Stars: ✭ 69 (-45.24%)

Mutual labels: big-data, bigdata

Uproot3

ROOT I/O in pure Python and NumPy.

Stars: ✭ 312 (+147.62%)

Mutual labels: big-data, bigdata

Awesome Scalability

The Patterns of Scalable, Reliable, and Performant Large-Scale Systems

Stars: ✭ 36,688 (+29017.46%)

Mutual labels: big-data, scalability

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-43.65%)

Mutual labels: big-data, bigdata

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

Stars: ✭ 43 (-65.87%)

Mutual labels: big-data, bigdata

clusters

Cluster analysis library for Golang

Stars: ✭ 68 (-46.03%)

Mutual labels: clustering, clustering-algorithm

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (+238.1%)

Mutual labels: big-data, bigdata

Coherence

Oracle Coherence Community Edition

Stars: ✭ 328 (+160.32%)

Mutual labels: clustering, scalability

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-13.49%)

Mutual labels: big-data, bigdata

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+961.9%)

Mutual labels: big-data, bigdata

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (+1125.4%)

Mutual labels: big-data, bigdata

scarf

Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (-57.14%)

Mutual labels: big-data, clustering

nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

Stars: ✭ 8,196 (+6404.76%)

Mutual labels: big-data, scalability

genieclust

Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R

Stars: ✭ 34 (-73.02%)

Mutual labels: clustering, clustering-algorithm

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-69.84%)

Mutual labels: big-data, clustering

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+70.63%)

Mutual labels: big-data, bigdata

Big Data Study

🐳 big data study

Stars: ✭ 141 (+11.9%)

Mutual labels: big-data, bigdata

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (+94.44%)

Mutual labels: big-data, bigdata

TrajectoryTracking

Trajectory Tracking Project

Stars: ✭ 16 (-87.3%)

Mutual labels: clustering

bullet-core

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Stars: ✭ 36 (-71.43%)

Mutual labels: big-data

NNet

algorithm for study: multi-layer-perceptron, cluster-graph, cnn, rnn, restricted boltzmann machine, bayesian network

Stars: ✭ 24 (-80.95%)

Mutual labels: clustering

microservice workshop

Microservices Architecture Workshop focuses on helping the developers / architects to understand the key Architecture paradigms with hands on section. The course helps the developers from Monolithic App mindset to a Microservices based App development. It also helps the developers with hands on development experience with key Microservices infra…

Stars: ✭ 69 (-45.24%)

Mutual labels: scalability

clusterix

Visual exploration of clustered data.

Stars: ✭ 44 (-65.08%)

Mutual labels: clustering

hayabusa

Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data

Stars: ✭ 43 (-65.87%)

Mutual labels: bigdata

Spark-MLlib-Tutorial

大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件