All Projects → Stratio → Sparta

Stratio / Sparta

Licence: apache-2.0
Real Time Analytics and Data Pipelines based on Spark Streaming

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Sparta

Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-51.85%)
Mutual labels:  kafka, spark, spark-streaming, streaming-data, streaming
Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-72.71%)
Mutual labels:  kafka, spark, spark-streaming, streaming, real-time
Streamline
StreamLine - Streaming Analytics
Stars: ✭ 151 (-70.57%)
Mutual labels:  kafka, spark-streaming, streaming, real-time
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-82.07%)
Mutual labels:  kafka, spark, olap, hdfs
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+235.48%)
Mutual labels:  spark, analytics, spark-streaming, streaming
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+2042.5%)
Mutual labels:  kafka, spark, hdfs
Bigdata Notebook
Stars: ✭ 100 (-80.51%)
Mutual labels:  kafka, spark, streaming
Example Spark Kafka
Apache Spark and Apache Kafka integration example
Stars: ✭ 120 (-76.61%)
Mutual labels:  kafka, spark, spark-streaming
Real Time Stream Processing Engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stars: ✭ 37 (-92.79%)
Mutual labels:  kafka, spark, spark-streaming
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-19.49%)
Mutual labels:  kafka, spark, analytics
Kafka Connect Hdfs
Kafka Connect HDFS connector
Stars: ✭ 400 (-22.03%)
Mutual labels:  kafka, hdfs, streaming
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-81.09%)
Mutual labels:  kafka, spark, analytics
Spark Streaming With Kafka
Self-contained examples of Apache Spark streaming integrated with Apache Kafka.
Stars: ✭ 180 (-64.91%)
Mutual labels:  kafka, spark, spark-streaming
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-57.89%)
Mutual labels:  kafka, spark, spark-streaming
Flink Learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+2117.93%)
Mutual labels:  kafka, spark, streaming
Hydra
A real-time data replication platform that "unbundles" the receiving, transforming, and transport of data streams.
Stars: ✭ 68 (-86.74%)
Mutual labels:  kafka, streaming, real-time
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+1071.15%)
Mutual labels:  kafka, spark, hdfs
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-70.76%)
Mutual labels:  spark, analytics, hdfs
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+67.06%)
Mutual labels:  kafka, spark, hdfs
Kafka Streams In Action
Source code for the Kafka Streams in Action Book
Stars: ✭ 167 (-67.45%)
Mutual labels:  kafka, streaming-data, streaming

Discontinued

After around two years of development, we have decided to discontinue this project due to a major refactor in its structure and in a near future we will launch Sparta 2.0.

We would like to thank all the open source community for their contribution. Needless to say that you can continue using this repository as a basis for your developments as it contains the latest stable version as of today and minor issues will be attended.

If you are interested in the new Sparta 2.0 with pipelines and workflows, please contact with us in the email [email protected]

About Stratio Sparta

At Stratio, we have implemented several real-time analytics projects based on Apache Spark, Kafka, Flume, Cassandra, ElasticSearch or MongoDB. These technologies were always a perfect fit, but soon we found ourselves writing the same pieces of integration code over and over again. Stratio Sparta is the easiest way to make use of the Apache Spark Streaming technology and all its ecosystem. Choose your input, operations and outputs, and start extracting insights out of your data in real-time.

Strata Twitter Analytics with Kibana

Main Features

  • Pure Spark
  • No need of coding, only declarative analytical workflows
  • Data continuously streamed in & processed in near real-time
  • Ready to use out-of-the-box
  • Plug & play: flexible workflows (inputs, outputs, transformations, etc…)
  • High performance and Fault Tolerance
  • Scalable and High Availability
  • Big Data OLAP on real-time to small data
  • ETLs
  • Triggers over streaming data
  • Spark SQL language with streaming and batch data
  • Kerberos and CAS compatible
Main Features

Architecture

Send one workflow as a JSON to Sparta API and execute in one Spark Cluster your own real-time plugins Architecture

Sparta as a Job Manager

Send more than one Streaming Job in the Spark Cluster and manage them with a simple UI

Job Manager

Run workflows over Mesos, Yarn or SparkStandAlone

Job Manager Architecture

Sparta as a SDK

Modular components extensible with simple SDK

  • You can extend several points of the platform to fulfill your needs, such as adding new inputs, outputs, operators, transformations.
  • Add new functions to Kite SDK in order to extend the data cleaning, enrichment and normalization capabilities. Architecture Detail

Components

On each workflow multiple components can be defined, but now all have the following architecture workflow Components

Core components

Several plugins are been implemented by Stratio Sparta team Main plugins

Trigger component

With Sparta is possible to execute queries over the streaming data, execute ETL, aggregations and Simple Event Processing mixing streaming data with batch data on the trigger process. triggers

Aggregation component

The aggregation process in Sparta is very powerful because is possible to generate efficient OLAP processes with streaming data OLAP

Advanced feature are been implemented in order to optimize the stateful operations over Spark Streaming Aggregations

Inputs

  • Twitter
  • Kafka
  • Flume
  • RabbitMQ
  • Socket
  • WebSocket
  • HDFS/S3

Outputs

  • MongoDB
  • Cassandra
  • ElasticSearch
  • Redis
  • JDBC
  • CSV
  • Parquet
  • Http
  • Kafka
  • HDFS/S3
  • Http Rest
  • Avro
  • Logger

Outputs

Key technologies

Advantages

Sparta provide several advantages to final Users Advantages

Build

You can generate rpm and deb packages by running:

mvn clean package -Ppackage

Note: you need to have installed the following programs in order to build these packages:

In a debian distribution:

  • fakeroot
  • dpkg-dev
  • rpm
  • jq

In a centOS distribution:

  • fakeroot
  • dpkg-dev
  • rpmdevtools
  • jq

In all distributions:

  • Java 8
  • Maven 3

License

Licensed to STRATIO (C) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The STRATIO (C) licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].