All Projects → Tdigest → Similar Projects or Alternatives

308 Open source projects that are alternatives of or similar to Tdigest

pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-73.72%)
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-87.59%)
Mutual labels:  pyspark, mapreduce
dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-90.88%)
Mutual labels:  distributed-computing, pyspark
Data Algorithms Book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+246.35%)
Mutual labels:  mapreduce, distributed-computing
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-87.59%)
Mutual labels:  pyspark, mapreduce
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-45.26%)
Mutual labels:  pyspark, distributed-computing
ParallelUtilities.jl
Fast and easy parallel mapreduce on HPC clusters
Stars: ✭ 28 (-89.78%)
Mutual labels:  distributed-computing, mapreduce
future.batchtools
🚀 R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools
Stars: ✭ 77 (-71.9%)
Mutual labels:  distributed-computing
dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Stars: ✭ 112 (-59.12%)
Mutual labels:  mapreduce
dicodile
Experiments for "Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and Signals"
Stars: ✭ 15 (-94.53%)
Mutual labels:  distributed-computing
kar
KAR: A Runtime for the Hybrid Cloud
Stars: ✭ 18 (-93.43%)
Mutual labels:  distributed-computing
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (-81.75%)
Mutual labels:  pyspark
server
Hashtopolis - A Hashcat wrapper for distributed hashcracking
Stars: ✭ 954 (+248.18%)
Mutual labels:  distributed-computing
ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-90.51%)
Mutual labels:  pyspark
Awesome-Federated-Machine-Learning
Everything about federated learning, including research papers, books, codes, tutorials, videos and beyond
Stars: ✭ 190 (-30.66%)
Mutual labels:  distributed-computing
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-58.03%)
Mutual labels:  pyspark
frovedis
Framework of vectorized and distributed data analytics
Stars: ✭ 59 (-78.47%)
Mutual labels:  distributed-computing
SciFlow
Scientific workflow management
Stars: ✭ 49 (-82.12%)
Mutual labels:  distributed-computing
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-93.8%)
Mutual labels:  pyspark
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-90.88%)
Mutual labels:  pyspark
bloomfilter
Bloomfilter written in Golang, includes rotation and RPC
Stars: ✭ 61 (-77.74%)
Mutual labels:  distributed-computing
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-82.85%)
Mutual labels:  pyspark
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+72.99%)
Mutual labels:  pyspark
easyFL
An experimental platform to quickly realize and compare with popular centralized federated learning algorithms. A realization of federated learning algorithm on fairness (FedFV, Federated Learning with Fair Averaging, https://fanxlxmu.github.io/publication/ijcai2021/) was accepted by IJCAI-21 (https://www.ijcai.org/proceedings/2021/223).
Stars: ✭ 104 (-62.04%)
Mutual labels:  distributed-computing
Distributed-Data-Structures
[GSoC] Distributed Data Structures - Collections Framework for Chapel language
Stars: ✭ 13 (-95.26%)
Mutual labels:  distributed-computing
infantry
Run MapReduce in user's browser.
Stars: ✭ 14 (-94.89%)
Mutual labels:  mapreduce
DevOps
DevOps code to deploy eScience services
Stars: ✭ 19 (-93.07%)
Mutual labels:  distributed-computing
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-59.49%)
Mutual labels:  pyspark
ai-deployment
关注AI模型上线、模型部署
Stars: ✭ 149 (-45.62%)
Mutual labels:  pyspark
mobius
Mobius is an AI infra platform including realtime computing and training.
Stars: ✭ 22 (-91.97%)
Mutual labels:  distributed-computing
Azure-Databricks-NYC-Taxi-Workshop
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Stars: ✭ 71 (-74.09%)
Mutual labels:  pyspark
st-hadoop
ST-Hadoop is an open-source MapReduce extension of Hadoop designed specially to analyze your spatio-temporal data efficiently
Stars: ✭ 17 (-93.8%)
Mutual labels:  mapreduce
mapreduce-examples
A collection of mapreduce problems and solutions
Stars: ✭ 23 (-91.61%)
Mutual labels:  mapreduce
interbit
To the end of servers
Stars: ✭ 23 (-91.61%)
Mutual labels:  distributed-computing
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+797.45%)
Mutual labels:  pyspark
CipherCompute
The free EAP version of the Cosmian Collaborative Confidential Computing platform. Try it!
Stars: ✭ 20 (-92.7%)
Mutual labels:  distributed-computing
Spark-and-Kafka IoT-Data-Processing-and-Analytics
Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time
Stars: ✭ 42 (-84.67%)
Mutual labels:  pyspark
Springboard-Data-Science-Immersive
No description or website provided.
Stars: ✭ 52 (-81.02%)
Mutual labels:  pyspark
swarm-learning
A simplified library for decentralized, privacy preserving machine learning
Stars: ✭ 142 (-48.18%)
Mutual labels:  distributed-computing
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-88.32%)
Mutual labels:  pyspark
lineage
Generate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-94.16%)
Mutual labels:  pyspark
Charm4py
Parallel Programming with Python and Charm++
Stars: ✭ 259 (-5.47%)
Mutual labels:  distributed-computing
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-93.43%)
Mutual labels:  mapreduce
optimism-v2
ARCHIVE of monorepo implementing Boba, an L2 Compute solution built on Optimistic Ethereum - active repo is at https://github.com/bobanetwork/boba
Stars: ✭ 34 (-87.59%)
Mutual labels:  distributed-computing
mapreduce
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Stars: ✭ 93 (-66.06%)
Mutual labels:  mapreduce
SadlyDistributed
Distributing your code(soul), in almost any language(state), among a cluster of idle browsers(voids)
Stars: ✭ 20 (-92.7%)
Mutual labels:  distributed-computing
QCFractal
A distributed compute and database platform for quantum chemistry.
Stars: ✭ 107 (-60.95%)
Mutual labels:  distributed-computing
PFL-Non-IID
The origin of the Non-IID phenomenon is the personalization of users, who generate the Non-IID data. With Non-IID (Not Independent and Identically Distributed) issues existing in the federated learning setting, a myriad of approaches has been proposed to crack this hard nut. In contrast, the personalized federated learning may take the advantage…
Stars: ✭ 58 (-78.83%)
Mutual labels:  distributed-computing
distex
Distributed process pool for Python
Stars: ✭ 101 (-63.14%)
Mutual labels:  distributed-computing
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-90.88%)
Mutual labels:  pyspark
realtimemap-dotnet
A showcase for Proto.Actor - an ultra-fast distributed actors solution for Go, C#, and Java/Kotlin.
Stars: ✭ 47 (-82.85%)
Mutual labels:  distributed-computing
IoTPy
Python for streams
Stars: ✭ 24 (-91.24%)
Mutual labels:  distributed-computing
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-93.8%)
Mutual labels:  pyspark
Prime95
Prime95 source code from GIMPS to find Mersenne Prime.
Stars: ✭ 25 (-90.88%)
Mutual labels:  distributed-computing
pyspark-asyncactions
Asynchronous actions for PySpark
Stars: ✭ 30 (-89.05%)
Mutual labels:  pyspark
connected-component
Map Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (-75.18%)
Mutual labels:  mapreduce
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-90.88%)
Mutual labels:  pyspark
protoactor-go
Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin
Stars: ✭ 4,138 (+1410.22%)
Mutual labels:  distributed-computing
blockchain-reading-list
A reading list on blockchain and related technologies, targeted at technical people who want a deep understanding of those topics.
Stars: ✭ 93 (-66.06%)
Mutual labels:  distributed-computing
Gleam
Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.
Stars: ✭ 2,949 (+976.28%)
Mutual labels:  distributed-computing
1-60 of 308 similar projects