All Projects → dashbitco → Broadway

dashbitco / Broadway

Concurrent and multi-stage data ingestion and data processing with Elixir

Programming Languages

elixir
2628 projects

Projects that are alternatives of or similar to Broadway

Javaok
必看!java后端,亮剑诛仙。java发展路线技术要点。
Stars: ✭ 867 (-33.82%)
Mutual labels:  concurrent
Cbrain
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
Stars: ✭ 51 (-96.11%)
Mutual labels:  data-processing
Conget
A CLI app for downloading file concurrently.
Stars: ✭ 72 (-94.5%)
Mutual labels:  concurrent
Tdm
R package for normalizing RNA-seq data to make them comparable to microarray data.
Stars: ✭ 33 (-97.48%)
Mutual labels:  data-processing
Mdsplus
The MDSplus data management system
Stars: ✭ 47 (-96.41%)
Mutual labels:  data-processing
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-95.8%)
Mutual labels:  data-processing
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-34.81%)
Mutual labels:  data-processing
Weibo Album Crawler
新浪微博相册大图多线程爬虫。
Stars: ✭ 83 (-93.66%)
Mutual labels:  concurrent
Recloser
A concurrent circuit breaker implemented with ring buffers
Stars: ✭ 51 (-96.11%)
Mutual labels:  concurrent
Dashmap
Blazing fast concurrent HashMap for Rust.
Stars: ✭ 1,128 (-13.89%)
Mutual labels:  concurrent
Routine
go routine control, abstraction of the Main and some useful Executors.如果你不会管理Goroutine的话,用它
Stars: ✭ 40 (-96.95%)
Mutual labels:  concurrent
Fgbase
Ready-send coordination layer on top of goroutines.
Stars: ✭ 45 (-96.56%)
Mutual labels:  concurrent
Freelancer
👔 An implementation of on-the-fly defined WebWorkers that are created inline using data URIs, rather than separate physical files — for the benefit of all humanity.
Stars: ✭ 57 (-95.65%)
Mutual labels:  concurrent
Javacore
☕️ JavaCore 是对 Java 核心技术的经验总结。
Stars: ✭ 909 (-30.61%)
Mutual labels:  concurrent
Dialogpt
Large-scale pretraining for dialogue
Stars: ✭ 1,177 (-10.15%)
Mutual labels:  data-processing
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (-34.05%)
Mutual labels:  data-processing
2019 Electronic Design Competition
【电赛】2019 全国大学生电子设计竞赛 (F题)纸张数量检测装置 (基于STM32F407 & FDC2214 & USART HMI)
Stars: ✭ 53 (-95.95%)
Mutual labels:  data-processing
Forte
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 89 (-93.21%)
Mutual labels:  data-processing
Oh
A new Unix shell.
Stars: ✭ 1,206 (-7.94%)
Mutual labels:  concurrent
Suman
🌇 🌆 🌉 Advanced, user-friendly, language-agnostic, super-high-performance test runner. http://sumanjs.org
Stars: ✭ 57 (-95.65%)
Mutual labels:  concurrent

Broadway

Build concurrent and multi-stage data ingestion and data processing pipelines with Elixir. It allows developers to consume data efficiently from different sources, known as producers, such as Amazon SQS, Apache Kafka, Google Cloud PubSub, RabbitMQ, and others.

The name Broadway was taken from the famous Broadway street in New York City, renowned for its stages, actors, and producers. :)

Documentation, examples, and how tos can be found at https://hexdocs.pm/broadway.

Built-in features

Broadway takes the burden of defining concurrent GenStage topologies and provide a simple configuration API that automatically defines concurrent producers, concurrent processing, batch handling, and more, leading to both time and cost efficient ingestion and processing of data. It features:

  • Back-pressure
  • Automatic acknowledgements at the end of the pipeline
  • Batching
  • Fault tolerance with minimal data loss
  • Graceful shutdown
  • Built-in testing
  • Custom failure handling
  • Ordering and partitioning
  • Rate-limiting
  • Metrics
  • Back-off (TODO)

Installation

Add :broadway to the list of dependencies in mix.exs:

def deps do
  [
    {:broadway, "~> 0.6.0"}
  ]
end

Official Broadway Producers

Currently we officially support four Broadway producers:

More producers are on the way.

A quick example: SQS integration

Assuming you have added broadway_sqs as a dependency and configured your SQS credentials accordingly, you can consume Amazon SQS events in only 20 LOCs:

defmodule MyBroadway do
  use Broadway

  alias Broadway.Message

  def start_link(_opts) do
    Broadway.start_link(__MODULE__,
      name: __MODULE__,
      producer: [
        module: {BroadwaySQS.Producer, queue_url: "https://us-east-2.queue.amazonaws.com/100000000001/my_queue"}
      ],
      processors: [
        default: [concurrency: 50]
      ],
      batchers: [
        s3: [concurrency: 5, batch_size: 10, batch_timeout: 1000]
      ]
    )
  end

  def handle_message(_processor_name, message, _context) do
    message
    |> Message.update_data(&process_data/1)
    |> Message.put_batcher(:s3)
  end

  def handle_batch(:s3, messages, _batch_info, _context) do
    # Send batch of messages to S3
  end

  defp process_data(data) do
    # Do some calculations, generate a JSON representation, process images.
  end
end

Once your Broadway module is defined, you just need to add it as a child of your application supervision tree as {MyBroadway, []}.

Non-official (Off-Broadway) Producers

For those interested in rolling their own Broadway Producers (which we actively encourage!), we recommend using the OffBroadway namespace, mirroring the Off-Broadway theaters. For example, if you want to publish your own integration with Amazon SQS, you can package it as off_broadway_sqs, which uses the OffBroadway.SQS namespace.

The following Off-Broadway libraries are available (feel free to send a PR adding your own in alphabetical order):

Comparison to Flow

You may also be interested in Flow by Dashbit. Both Broadway and Flow are built on top of GenStage. Flow is a more general abstraction than Broadway that focuses on data as a whole, providing features like aggregation, joins, windows, etc. Broadway focuses on events and on operational features, such as metrics, automatic acknowledgements, failure handling, and so on.

License

Copyright 2019 Plataformatec
Copyright 2020 Dashbit

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].