All Projects → lightbend → Cloudflow

lightbend / Cloudflow

Licence: apache-2.0
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Cloudflow

Flink Learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+3992.81%)
Mutual labels:  spark, flink
Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+555.04%)
Mutual labels:  spark, flink
Java learning practice
java 进阶之路:面试高频算法、akka、多线程、NIO、Netty、SpringBoot、Spark&&Flink 等
Stars: ✭ 110 (-60.43%)
Mutual labels:  spark, flink
Hops Examples
Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-69.78%)
Mutual labels:  spark, flink
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-11.15%)
Mutual labels:  spark, streaming-data
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-66.91%)
Mutual labels:  spark, flink
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-54.68%)
Mutual labels:  spark, flink
Model Serving Tutorial
Code and presentation for Strata Model Serving tutorial
Stars: ✭ 57 (-79.5%)
Mutual labels:  spark, flink
Sparkstreaming
💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-35.61%)
Mutual labels:  spark, flink
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-41.37%)
Mutual labels:  spark, flink
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+329.86%)
Mutual labels:  spark, flink
cassandra.realtime
Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
Stars: ✭ 25 (-91.01%)
Mutual labels:  akka, flink
Kamu Cli
Next generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-75.18%)
Mutual labels:  spark, flink
Bigdata Notebook
Stars: ✭ 100 (-64.03%)
Mutual labels:  spark, flink
Thingsboard
Open-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+3686.33%)
Mutual labels:  spark, akka
Waterdrop
Production Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+567.63%)
Mutual labels:  spark, flink
Utils4s
scala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+284.89%)
Mutual labels:  spark, akka
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-80.22%)
Mutual labels:  spark, flink
Ecommercerecommendsystem
商品大数据实时推荐系统。前端:Vue + TypeScript + ElementUI,后端 Spring + Spark
Stars: ✭ 139 (-50%)
Mutual labels:  spark, flink
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-10.43%)
Mutual labels:  spark, akka

Join the chat at https://cloudflow.zulipchat.com/   Build status

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes. Cloudflow allows you to easily break down your streaming application to smaller composable components and wire them together with schema-based contracts. Cloudflow integrates with popular streaming engines like Akka, Spark and Flink. It also comes with a powerful CLI tool to easily manage, scale and configure your streaming applications at runtime. With its powerful abstractions, Cloudflow allows to define, build and deploy the most complex streaming applications.
  • Develop: Focus only on business logic, leave the boilerplate to us.
  • Build: We provide all the tooling for going from business logic to a deployable Docker image.
  • Deploy: We provide Kubernetes tooling to deploy your distributed system with a single command, and manage durable connections between processing stages.

As data pipelines become first-class citizens in microservices architectures, Cloudflow gives developers data-optimized programming abstractions and run-time tooling for Kubernetes. In a nutshell, Cloudflow is an application development toolkit comprising:

  • An API definition for Streamlet, the core abstraction in Cloudflow.
  • An extensible set of runtime implementations for Streamlet(s). Cloudflow provides support for popular streaming runtimes, like Spark's Structured Streaming, Flink, and Akka.
  • A Streamlet composition model driven by a blueprint definition.
  • A sandbox execution mode that accelerates the development and testing of your applications.
  • A set of sbt plugins that are able to package your application into a deployable container.
  • The Cloudflow operator, a Kubernetes operator that manages the application lifecycle on Kubernetes.
  • A CLI, in the form of a kubectl plugin, that facilitates manual and scripted management of the application.

The different parts of Cloudflow work in unison to dramatically accelerate your application development efforts, reducing the time required to create, package, and deploy an application from weeks to hours.

Basic Concepts

Basic components of a Cloudflow Application

As we discussed above, Cloudflow allows developers to quickly build and deploy distributed stream processing applications by breaking such applications into smaller stream processing units called Streamlets. Each Streamlet represents an independent stream processing component that implements a self-contained stage of the application logic. Streamlets let you break down your application into logical pieces that communicate with each other in a streaming fashion to accomplish an end to end goal.

Streamlets can be composed into larger systems using blueprints, which specify how Streamlets can be connected together to form a topology.

Streamlets expose one or more inlets and outlets that represent the data consumed and produced by the Streamlet, respectively. Inlets and outlets are schema-driven, ensuring that data flows are always consistent and that connections between streamlets are compatible. In the diagram above Streamlet 1 has one outlet which feeds data to Streamlet 2 inlet. Streamlet 1 is a component that generates data or could get its data from an external system eg. via an http request. Then Streamlet 2 outlet feeds its data output to Streamlet 3 inlet. Streamlet 2 in this application does the actual data processing. Streamlet 3 then may store its data to some external system. The example described here is a minimal Cloudflow application. The data sent between Streamlets is safely persisted in the underlying pub-sub system, allowing for independent lifecycle management of the different components.

Streamlets can be scaled up and down to meet the load requirements of the application. The underlying data streams are partitioned to allow for parallelism in a distributed application execution.

The Streamlet logic can be written using an extensible choice of streaming runtimes, such as Akka Streams and Spark. The lightweight API exposes the raw power of the underlying runtime and its libraries while providing a higher-level abstraction for composing streamlets and expressing data schemas. Your code is written in your familiar Structured Streaming, Flink, or Akka Streams native API.

Applications are deployed as a whole. Cloudflow takes care of deploying the individual streamlets and making sure connections get translated into data flowing between them at runtime.

Learn more about the Cloudflow building blocks in our Cloudflow Core Concepts.

The Drivers Behind Cloudflow

Technologies like mobile, the Internet of Things (IoT), Big Data analytics, machine learning, and others are driving enterprises to modernize how they process large volumes of data. A rapidly growing percentage of that data is now arriving in the form of data streams. To extract value from that data as soon as it arrives, those streams require near-realtime processing. We use the term "Fast Data" to describe applications and systems that deal with such requirements.

The Fast Data landscape has been rapidly evolving, with tools like Spark, Flink, and Kafka Streams emerging from the world of large-scale data processing while projects like Reactive Streams and Akka Streams have emerged from the world of application development and high-performance networking.

The demand for availability, scalability, and resilience is forcing fast data architectures to become more like microservice architectures. Conversely, successful organizations building microservices find their data needs grow with their organization while their data sources are becoming more stream-like and more real-time. Hence, there is a unification happening between streaming data and microservice architectures.

It can be quite hard to develop, deploy, and operate large-scale microservices-based systems that can take advantage of streaming data and seamlessly integrate with systems for analytics processing and machine learning. The individual technologies are well-documented, but combining them into fully integrated, unified systems is no easy task.

Cloudflow aims to make this easier by integrating the most popular streaming frameworks into a single platform for creating and running distributed Fast Data applications on Kubernetes.


Where to Go Next?

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].