MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+713.49%)
Clustering-in-PythonClustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.
Stars: ✭ 27 (-78.57%)
HdbscanA high performance implementation of HDBSCAN clustering.
Stars: ✭ 2,032 (+1512.7%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-89.68%)
clustering-pythonDifferent clustering approaches applied on different problemsets
Stars: ✭ 36 (-71.43%)
clopeElixir implementation of CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data
Stars: ✭ 18 (-85.71%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-36.51%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-15.08%)
ClusteringImplements "Clustering a Million Faces by Identity"
Stars: ✭ 128 (+1.59%)
clueminerinteractive clustering platform
Stars: ✭ 13 (-89.68%)
awesome-coder-resources编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (-57.14%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-52.38%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+156.35%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-84.92%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-73.02%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (+246.03%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+3600%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-96.03%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+491.27%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-45.24%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+147.62%)
Awesome ScalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Stars: ✭ 36,688 (+29017.46%)
clustersCluster analysis library for Golang
Stars: ✭ 68 (-46.03%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+238.1%)
CoherenceOracle Coherence Community Edition
Stars: ✭ 328 (+160.32%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-13.49%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+961.9%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1125.4%)
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-57.14%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+6404.76%)
genieclustGenie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (-73.02%)
pytorch kmeansImplementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-69.84%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+70.63%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+94.44%)
bullet-coreBullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (-71.43%)
NNetalgorithm for study: multi-layer-perceptron, cluster-graph, cnn, rnn, restricted boltzmann machine, bayesian network
Stars: ✭ 24 (-80.95%)
microservice workshopMicroservices Architecture Workshop focuses on helping the developers / architects to understand the key Architecture paradigms with hands on section. The course helps the developers from Monolithic App mindset to a Microservices based App development. It also helps the developers with hands on development experience with key Microservices infra…
Stars: ✭ 69 (-45.24%)
clusterixVisual exploration of clustered data.
Stars: ✭ 44 (-65.08%)
hayabusaHayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data
Stars: ✭ 43 (-65.87%)
tsp-essayA fun study of some heuristics for the Travelling Salesman Problem.
Stars: ✭ 15 (-88.1%)
swanagerA high-level Docker Services management tool built on top of Swarm
Stars: ✭ 12 (-90.48%)
bagriXML/Document DB on top of distributed cache
Stars: ✭ 40 (-68.25%)
scikit-learn-intelexIntel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Stars: ✭ 887 (+603.97%)
Revisiting-Contrastive-SSLRevisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (-35.71%)
consul roleAnsible role to install Consul (cluster of) server/agent
Stars: ✭ 14 (-88.89%)