CookFair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (-91.62%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (-74.55%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-92.05%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-90.8%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (-83.7%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-99.47%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (-92.02%)
Kontraktordistributed Actors for Java 8 / JavaScript
Stars: ✭ 333 (-91.12%)
ReshifterKubernetes cluster state management
Stars: ✭ 292 (-92.21%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (-92.48%)
NodejsstarterkitStarter Kit for Node.js v14.x, minimum dependencies 🚀
Stars: ✭ 348 (-90.72%)
Hbase RddSpark RDD to read, write and delete from HBase
Stars: ✭ 277 (-92.61%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (-91.92%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-91.01%)
Postgresql clusterPostgreSQL High-Availability Cluster (based on "Patroni" and "DCS(etcd)"). Automating deployment with Ansible.
Stars: ✭ 294 (-92.16%)
DiplomatA HTTP Ruby API for Consul
Stars: ✭ 358 (-90.45%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (-13.18%)
Icewater16,432 Free Yara rules created by
Stars: ✭ 324 (-91.36%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-92.58%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (-90.23%)
Xcat CoreCode repo for xCAT core packages
Stars: ✭ 273 (-92.72%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-92.74%)
K8s TewKubernetes - The Easier Way
Stars: ✭ 269 (-92.82%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (-17.37%)
K8s Multicluster Ingresskubemci: Command line tool to configure L7 load balancers using multiple kubernetes clusters
Stars: ✭ 345 (-90.8%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (-11.47%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (-93.04%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (-91.84%)
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (-90.9%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-90.34%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (-92.02%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (-17.8%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (-90.18%)
JaasRun jobs (tasks/one-shot containers) with Docker
Stars: ✭ 291 (-92.24%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-91.14%)
Kube No TroubleEasily check your cluster for use of deprecated APIs
Stars: ✭ 280 (-92.53%)
SparkstreamingSpark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Stars: ✭ 349 (-90.69%)
BroccoliBroccoli - distributed task queues for ESP32 cluster
Stars: ✭ 280 (-92.53%)
Koa RedisRedis storage for Koa session middleware/cache with Sentinel and Cluster support
Stars: ✭ 324 (-91.36%)
SupervizerNodeJS Application Manager
Stars: ✭ 278 (-92.58%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-90.07%)
ArvadosAn open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (-92.69%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (-91.57%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-90.85%)
PygogoA Python logging library with superpowers
Stars: ✭ 265 (-92.93%)
FabrikateMaking GitOps with Kubernetes easier one component at a time
Stars: ✭ 263 (-92.98%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (-90.31%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (-91.81%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-93.06%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-93.09%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-93.14%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-93.22%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (-91.73%)
MezaA Python toolkit for processing tabular data
Stars: ✭ 374 (-90.02%)