GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-48.69%)
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+1433.97%)
Pyspark ExamplesCode examples on Apache Spark using python
Stars: ✭ 58 (-86.22%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-80.29%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+308.79%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-71.5%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+21.85%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+154.16%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-66.75%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (+20.9%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+688.12%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+120.67%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+340.86%)
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (-51.31%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-41.33%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (-12.59%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (-26.37%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (-27.32%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (-13.78%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (-28.98%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (-27.08%)
TutorialJava全栈知识架构体系总结
Stars: ✭ 407 (-3.33%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+827.08%)
SidekickHigh Performance HTTP Sidecar Load Balancer
Stars: ✭ 366 (-13.06%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (-28.03%)
Enterprise gatewayA lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Stars: ✭ 412 (-2.14%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-29.22%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-14.25%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (-28.98%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+631.83%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-6.65%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-14.01%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (-33.02%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-33.97%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-14.01%)
Hbase RddSpark RDD to read, write and delete from HBase
Stars: ✭ 277 (-34.2%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-35.39%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (-0.48%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (-1.66%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+4685.51%)
SparkstreamingSpark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Stars: ✭ 349 (-17.1%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+635.63%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (-38%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (-18.53%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-38.24%)
Docker practiceLearn and understand Docker technologies, with real DevOps practice!
Stars: ✭ 19,768 (+4595.49%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (-18.05%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-38.48%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-38.95%)
ScalnetA Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
Stars: ✭ 342 (-18.76%)