All Projects → Azure-Samples → Streaming At Scale

Azure-Samples / Streaming At Scale

Licence: mit
How to implement a streaming at scale solution in Azure

Projects that are alternatives of or similar to Streaming At Scale

Flogo
Project Flogo is an open source ecosystem of opinionated event-driven capabilities to simplify building efficient & modern serverless functions, microservices & edge apps.
Stars: ✭ 1,891 (+1144.08%)
Mutual labels:  serverless, streaming
Motorway
Cloud ready pure-python streaming data pipeline library
Stars: ✭ 150 (-1.32%)
Mutual labels:  streaming
Python Tidal
Python API for TIDAL music streaming service
Stars: ✭ 145 (-4.61%)
Mutual labels:  streaming
Graphql Genie
Simply pass in your GraphQL type defintions and get a fully featured GraphQL API with referential integrity, inverse updates, subscriptions and role based access control that can be used client side or server side.
Stars: ✭ 147 (-3.29%)
Mutual labels:  serverless
Azure Function Express
⚡️Allows Express.js usage with Azure Functions
Stars: ✭ 146 (-3.95%)
Mutual labels:  serverless
Terrahub
Terraform Automation and Orchestration Tool (Open Source)
Stars: ✭ 148 (-2.63%)
Mutual labels:  serverless
Lapidus
Stream your PostgreSQL, MySQL or MongoDB databases anywhere, fast.
Stars: ✭ 145 (-4.61%)
Mutual labels:  streaming
Streamline
StreamLine - Streaming Analytics
Stars: ✭ 151 (-0.66%)
Mutual labels:  streaming
Fluvio
Cloud-native distributed data streaming platform, written in Rust
Stars: ✭ 149 (-1.97%)
Mutual labels:  streaming
Streaming
r/freemediaheckyeah
Stars: ✭ 147 (-3.29%)
Mutual labels:  streaming
Elliot Serverless Ecommerce
Elliot - Serverless eCommerce Storefront. 100% headless & serverless. Built with Next.js & one-click deployable on Vercel. 100% cross border ready; deploy, sell & ship to 130+ countries instantly, no credit card required. Join us as contributor ([email protected]).
Stars: ✭ 147 (-3.29%)
Mutual labels:  serverless
Serverless Sentry Plugin
This plugin adds automatic forwarding of errors and exceptions to Sentry (https://sentry.io) and Serverless (https://serverless.com)
Stars: ✭ 146 (-3.95%)
Mutual labels:  serverless
Ultrasonic
Free and open-source music streaming Android client for Subsonic API compatible servers
Stars: ✭ 149 (-1.97%)
Mutual labels:  streaming
Selfie2anime
Anime2Selfie Backend Services - Lambda, Queue, API Gateway and traffic processing
Stars: ✭ 146 (-3.95%)
Mutual labels:  serverless
Cloudrig
Stream your applications with Parsec and AWS on the cheap.
Stars: ✭ 151 (-0.66%)
Mutual labels:  streaming
Srs
SRS is a simple, high efficiency and realtime video server, supports RTMP, WebRTC, HLS, HTTP-FLV, SRT and GB28181.
Stars: ✭ 16,734 (+10909.21%)
Mutual labels:  streaming
Jets
Ruby on Jets
Stars: ✭ 1,915 (+1159.87%)
Mutual labels:  serverless
Ott Packager
OTT/ABR streaming encoder (H264/HEVC) and packager for DASH and HLS
Stars: ✭ 148 (-2.63%)
Mutual labels:  streaming
Racecontrol
Race Control is a standalone, open source F1TV client for Windows, written in C# on the .NET 5 platform.
Stars: ✭ 143 (-5.92%)
Mutual labels:  streaming
Archive aws Lambda Go Net
Network I/O interface for AWS Lambda Go runtime.
Stars: ✭ 151 (-0.66%)
Mutual labels:  serverless

page_type: sample languages:

  • azurecli
  • csharp
  • json
  • sql
  • scala products:
  • azure
  • azure-container-instances
  • azure-cosmos-db
  • azure-databricks
  • azure-data-explorer
  • azure-event-hubs
  • azure-functions
  • azure-kubernetes-service
  • azure-sql-database
  • azure-stream-analytics
  • azure-storage
  • azure-time-series-insights statusNotificationTargets:
  • [email protected] description: "How to setup an end-to-end solution to implement a streaming at scale scenario using a choice of different Azure technologies."

Streaming at Scale

License

The samples shows how to setup an end-to-end solution to implement a streaming at scale scenario using a choice of different Azure technologies. There are many possible way to implement such solution in Azure, following Kappa or Lambda architectures, a variation of them, or even custom ones. Each architectural solution can also be implemented with different technologies, each one with its own pros and cons.

More info on Streaming architectures can also be found here:

Here's also a list of scenarios where a Streaming solution fits nicely

A good document the describes the Stream Technologies available on Azure is:

Choosing a stream processing technology in Azure

The goal of this repository is to showcase a variety of common architectural solutions and implementations, describe the pros and the cons and provide you with a sample script to deploy the whole solution with 100% automation.

Ingestion Processing Serving

Running the samples

Please note that the scripts have been tested on Ubuntu 18 LTS, so make sure to use that environment to run the scripts. You can run it using Docker, WSL or a VM:

Just do a git clone of the repo and you'll be good to go.

Each sample may have additional requirements: they will be listed in the sample's README.

Streamed Data

Streamed data simulates an IoT device sending the following JSON data:

{
    "eventId": "b81d241f-5187-40b0-ab2a-940faf9757c0",
    "complexData": {
        "moreData0": 57.739726013343247,
        "moreData1": 52.230732688620829,
        "moreData2": 57.497518587807189,
        "moreData3": 81.32211656749469,
        "moreData4": 54.412361539409427,
        "moreData5": 75.36416309399911,
        "moreData6": 71.53407865773488,
        "moreData7": 45.34076957651598,
        "moreData8": 51.3068118685458,
        "moreData9": 44.44672606436184,
        [...]
    },
    "value": 49.02278128887753,
    "deviceId": "contoso-device-id-000154",
    "deviceSequenceNumber": 0,
    "type": "CO2",
    "createdAt": "2019-05-16T17:16:40.000003Z"
}

Duplicate event handling

Event delivery guarantees are a critical aspect of streaming solutions. Azure Event Hubs provides an at-least-once event delivery guarantees. In addition, the upstream components that compose a real-world deployment will typically send events to Event Hubs with at-least-once guarantees (i.e. for reliability purposes, they should be configured to retry if they do not get an acknowledgement of message reception by the Event Hub endpoint, though the message might actually have been ingested). And finally, the stream processing system typically only has at-least-once guarantees when delivering data into the serving layer. Duplicate messages are therefore unavoidable and are better dealt with explicitly.

Depending on the type of application, it might be acceptable to store and serve duplicate messages, or it might desirable to deduplicate messages. The serving layer might even have strong uniqueness guarantees (e.g. unique key in Azure SQL Database). To demonstrate effective message duplicate handling strategies, the various solution templates demonstrate, where possible, effective message duplicate handling strategies for the given combination of stream processing and serving technologies. In most solutions, the event simulator is configured to randomly duplicate a small fraction of the messages (0.1% on average).

Integration tests

End-to-end integration tests are configured to run. You can check the latest closed pulled requests ("View Details") to navigate to the integration test run in Azure DevOps. The integration test suite deploys each solution and runs verification jobs in Azure Databricks that pull the data from the serving layer of the given solution and verifies the solution event processing rate and duplicate handling guarantees.

Available solutions

At present time the available solutions are

Kafka on AKS + Azure Databricks + Cosmos DB

Implement a stream processing architecture using:

  • Kafka on Azure Kubernetes Service (AKS) (Ingest / Immutable Log)
  • Azure Databricks (Stream Process)
  • Cosmos DB (Serve)

Event Hubs Capture Sample

Implement stream processing architecture using:

  • Event Hubs (Ingest)
  • Event Hubs Capture (Store)
  • Azure Blob Store (Data Lake)
  • Apache Drill (Query/Serve)

Event Hubs + Azure Databricks + Azure SQL

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Azure Databricks (Stream Process)
  • Azure SQL (Serve)

Event Hubs + Azure Databricks + Cosmos DB

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Azure Databricks (Stream Process)
  • Cosmos DB (Serve)

Event Hubs Kafka + Azure Databricks + Cosmos DB

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log) with Kafka endpoint
  • Azure Databricks (Stream Process)
  • Cosmos DB (Serve)

Event Hubs + Azure Databricks + Delta

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Azure Databricks (Stream Process)
  • Delta Lake (Serve)

Event Hubs + Azure Functions + Azure SQL

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Azure Functions (Stream Process)
  • Azure SQL (Serve)

Event Hubs + Azure Functions + Cosmos DB

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Azure Functions (Stream Process)
  • Cosmos DB (Serve)

Event Hubs + Stream Analytics + Cosmos DB

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Stream Analytics (Stream Process)
  • Cosmos DB (Serve)

Event Hubs + Stream Analytics + Azure SQL

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Stream Analytics (Stream Process)
  • Azure SQL (Serve)

Event Hubs + Stream Analytics + Event Hubs

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Stream Analytics (Stream Process)
  • Event Hubs (Serve)

HDInsight Kafka + Flink + HDInsight Kafka

Implement a stream processing architecture using:

  • HDInsight Kafka (Ingest / Immutable Log)
  • Flink on HDInsight or Azure Kubernetes Service (Stream Process)
  • HDInsight Kafka (Serve)

Event Hubs Kafka + Flink + Event Hubs Kafka

Implement a stream processing architecture using:

  • Event Hubs Kafka (Ingest / Immutable Log)
  • Flink on HDInsight or Azure Kubernetes Service (Stream Process)
  • Event Hubs Kafka (Serve)

Event Hubs Kafka + Azure Functions + Cosmos DB

Implement a stream processing architecture using:

  • Event Hubs Kafka (Ingest / Immutable Log)
  • Azure Functions (Stream Process)
  • Cosmos DB (Serve)

HDInsight Kafka + Azure Databricks + Azure SQL

Implement a stream processing architecture using:

  • HDInsight Kafka (Ingest / Immutable Log)
  • Azure Databricks (Stream Process)
  • Azure SQL Data Warehouse (Serve)

Event Hubs + Azure Data Explorer

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Azure Data Explorer (Stream Process / Serve)

Event Hubs + Data Accelerator + Cosmos DB

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Microsoft Data Accelerator on HDInsight and Service Fabric (Stream Process)
  • Cosmos DB (Serve)

Event Hubs + Time Series Insights

Implement a stream processing architecture using:

  • Event Hubs (Ingest / Immutable Log)
  • Time Series Insights (Stream Process / Serve / Store to Parquet)
  • Azure Storage (Serve for data analytics)

IoT Hub + Azure Functions + Azure SQL

Implement a stream processing architecture using:

  • IoT Hub (Ingest)
  • Azure Functions (Stream Process)
  • Azure SQL (Serve)

Storage Blobs + Databricks + Delta

Implement a stream processing architecture using:

  • Azure Storage (Azure Data Lake Storage Gen2) (Ingest / Immutable Log)
  • Azure Databricks (Stream Process)
  • Delta Lake (Serve)

IoT Hub + Azure Digital Twins + Time Series Insights

Implement a stream processing architecture using:

  • IoT Hub (Ingest)
  • Azure Digital Twins (Model Management / Stream Process / Routing)
  • Time Series Insights (Serve / Store to Parquet)
  • Azure Storage (Serve for data analytics)

Note

Performance and Services change quickly in the cloud, so please keep in mind that all values used in the samples were tested at them moment of writing. If you find any discrepancies with what you observe when running the scripts, please create an issue and report it and/or create a PR to update the documentation and the sample. Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].