All Projects → microsoft → Data Accelerator

microsoft / Data Accelerator

Licence: mit
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Projects that are alternatives of or similar to Data Accelerator

Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-43.32%)
Mutual labels:  azure, kafka, spark, apache-spark, spark-streaming, streaming
Real Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+107.69%)
Mutual labels:  kafka, spark, spark-streaming, streaming-data, streaming
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+596.76%)
Mutual labels:  azure, spark, apache-spark, spark-streaming, streaming
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+1073.68%)
Mutual labels:  azure, spark, big-data, apache-spark
Seldon Server
Machine Learning Platform and Recommendation Engine built on Kubernetes
Stars: ✭ 1,435 (+480.97%)
Mutual labels:  azure, kafka, spark, kafka-streams
Awesome Kafka
A list about Apache Kafka
Stars: ✭ 397 (+60.73%)
Mutual labels:  kafka, apache-spark, kafka-streams, streaming-data
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-60.73%)
Mutual labels:  kafka, spark, big-data, kafka-streams
Real Time Stream Processing Engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stars: ✭ 37 (-85.02%)
Mutual labels:  kafka, spark, apache-spark, spark-streaming
StreamLine - Streaming Analytics
Stars: ✭ 151 (-38.87%)
Mutual labels:  kafka, kafka-streams, spark-streaming, streaming
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+276.11%)
Mutual labels:  spark, apache-spark, spark-streaming, streaming
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-12.55%)
Mutual labels:  kafka, spark, big-data, spark-streaming
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-28.34%)
Mutual labels:  kafka, big-data, apache-spark, spark-streaming
Bigdata Notebook
Stars: ✭ 100 (-59.51%)
Mutual labels:  kafka, spark, streaming
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+4349.8%)
Mutual labels:  kafka, spark, big-data
Example Spark Kafka
Apache Spark and Apache Kafka integration example
Stars: ✭ 120 (-51.42%)
Mutual labels:  kafka, spark, spark-streaming
Flink Learning
flink learning blog. 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+4506.48%)
Mutual labels:  kafka, spark, streaming
Kafka Ui
Open-Source Web GUI for Apache Kafka Management
Stars: ✭ 230 (-6.88%)
Mutual labels:  kafka, big-data, kafka-streams
kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-61.13%)
Mutual labels:  kafka, big-data, streaming
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (-44.53%)
Mutual labels:  spark, big-data, apache-spark
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-12.96%)
Mutual labels:  spark, big-data, apache-spark

Data Accelerator for Apache Spark

Flow Build status Gateway Build status DataProcessing Build status
Metrics Build status SimulatedData Build status Website Build status

Data Accelerator for Apache Spark democratizes streaming big data using Spark by offering several key features such as a no-code experience to set up a data pipeline as well as fast dev-test loop for creating complex logic. Our team has been using the project for two years within Microsoft for processing streamed data across many internal deployments handling data volumes at Microsoft scale. It offers an easy to use platform to learn and evaluate streaming needs and requirements. We are thrilled to share this project with the wider community as open source!

Azure Friday: We are now featured on Azure Fridays! See the video here.

Data Accelerator offers three level of experiences:

  • The first requires no code at all, using rules to create alerts on data content.
  • The second allows to quickly write a Spark SQL query with additions like LiveQuery, time windowing, in-memory accumulator and more.
  • The third enables integrating custom code written in Scala or via Azure functions.

You can get started locally for Windows, macOs and Linux following these instructions
To deploy to Azure, you can use the ARM template; see instructions deploy to Azure.

The data-accelerator repository contains everything needed to set up an end-to-end data pipeline. There are many ways you can participate in the project:

Getting Started

To unleash the full power Data Accelerator, deploy to Azure and check cloud mode tutorials.

We have also enabled a "hello world" experience that you try out locally by running docker container. When running locally there are no dependencies on Azure, however the functionality is very limited and only there to give you a very cursory overview of Data Accelerator. To run Data Accelerator locally, deploy locally and then check out the local mode tutorials.

Data Accelerator for Spark runs on the following:

  • Azure HDInsight with Spark 2.4 (2.3 also supported)
  • Azure Databricks with Spark 2.4
  • Service Fabric (v6.4.637.9590) with
    • .NET Core 2.2
    • ASP.NET
  • App Service with Node 10.6

See the wiki pages for further information on how to build, diagnose and maintain your data pipelines built using Data Accelerator for Spark.


If you are interested in fixing issues and contributing to the code base, we would love to partner with you. Try things out, join in the design conversations and make pull requests.


Please also see our Code of Conduct.

Security issues

Security issues and bugs should be reported privately, via email, to the Microsoft Security Response Center (MSRC) [email protected]. You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Further information, including the MSRC PGP key, can be found in the Security TechCenter.


This repository is licensed with the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].