Materialize lets you ask questions of your live data, which it answers and then maintains for you as your data continue to change. The moment you need a refreshed answer, you can get it in milliseconds. Materialize is designed to help you interactively explore your streaming data, perform data warehousing analytics against live relational data, or just increase the freshness and reduce the load of your dashboard and monitoring tasks.

Stars: ✭ 3,341 (-35.11%)

Mutual labels: sql, streaming

Calcite

Apache Calcite

Stars: ✭ 2,816 (-45.31%)

Mutual labels: sql, big-data

openmessaging.github.io

OpenMessaging homepage

Stars: ✭ 12 (-99.77%)

Mutual labels: streaming, batch

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-97.09%)

Mutual labels: sql, big-data

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+151.64%)

Mutual labels: sql, big-data

beam-site

Apache Beam Site

Stars: ✭ 28 (-99.46%)

Mutual labels: big-data, beam

Maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Stars: ✭ 101 (-98.04%)

Mutual labels: sql, big-data

Crate

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.

Stars: ✭ 3,254 (-36.8%)

Mutual labels: sql, big-data

Sylph

Stream computing platform for bigdata

Stars: ✭ 362 (-92.97%)

Mutual labels: sql, big-data

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (-92.99%)

Mutual labels: sql, big-data

Efcore.bulkextensions

Entity Framework Core Bulk Batch Extensions for Insert Update Delete Read (CRUD), Truncate and SaveChanges operations on SQL Server, PostgreSQL, SQLite

Stars: ✭ 2,295 (-55.43%)

Mutual labels: sql, batch

Join Monster Graphql Tools Adapter

Use Join Monster to fetch your data with Apollo Server.

Stars: ✭ 130 (-97.48%)

Mutual labels: sql, batch

Presto Go Client

A Presto client for the Go programming language.

Stars: ✭ 183 (-96.45%)

Mutual labels: sql, big-data

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (-97.48%)

Mutual labels: sql, big-data

Clickhouse

ClickHouse® is a free analytics DBMS for big data

Stars: ✭ 21,089 (+309.57%)

Mutual labels: sql, big-data

Ignite

Apache Ignite

Stars: ✭ 4,027 (-21.79%)

Mutual labels: sql, big-data

Spark Website

Apache Spark Website

Stars: ✭ 75 (-98.54%)

Mutual labels: sql, big-data

Fiflow

flink-sql 在 flink 上运行 sql 和构建数据流的平台基于 apache flink 1.10.0

Stars: ✭ 100 (-98.06%)

Mutual labels: sql, streaming

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (-11.03%)

Mutual labels: sql, big-data

View All Similar Projects ➔

Apache Beam

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow, and Hazelcast Jet.

Status

Post-commit tests status (on master branch)

Lang	Dataflow	Twister2
Go	---	---
Java
Python		---
XLang

Overview

Beam provides a general approach to expressing embarrassingly parallel data processing pipelines and supports three categories of users, each of which have relatively disparate backgrounds and needs.

End Users: Writing pipelines with an existing SDK, running it on an existing runner. These users want to focus on writing their application logic and have everything else just work.
SDK Writers: Developing a Beam SDK targeted at a specific user community (Java, Python, Scala, Go, R, graphical, etc). These users are language geeks and would prefer to be shielded from all the details of various runners and their implementations.
Runner Writers: Have an execution environment for distributed processing and would like to support programs written against the Beam Model. Would prefer to be shielded from details of multiple SDKs.

The Beam Model

The model behind Beam evolved from a number of internal Google data processing projects, including MapReduce, FlumeJava, and Millwheel. This model was originally known as the “Dataflow Model”.

To learn more about the Beam Model (though still under the original name of Dataflow), see the World Beyond Batch: Streaming 101 and Streaming 102 posts on O’Reilly’s Radar site, and the VLDB 2015 paper.

The key concepts in the Beam programming model are:

PCollection: represents a collection of data, which could be bounded or unbounded in size.
PTransform: represents a computation that transforms input PCollections into output PCollections.
Pipeline: manages a directed acyclic graph of PTransforms and PCollections that is ready for execution.
PipelineRunner: specifies where and how the pipeline should execute.

SDKs

Beam supports multiple language specific SDKs for writing pipelines against the Beam Model.

Currently, this repository contains SDKs for Java, Python and Go.

Have ideas for new SDKs or DSLs? See the JIRA.

Runners

Beam supports executing programs on multiple distributed processing backends through PipelineRunners. Currently, the following PipelineRunners are available:

The DirectRunner runs the pipeline on your local machine.
The DataflowRunner submits the pipeline to the Google Cloud Dataflow.
The FlinkRunner runs the pipeline on an Apache Flink cluster. The code has been donated from dataArtisans/flink-dataflow and is now part of Beam.
The SparkRunner runs the pipeline on an Apache Spark cluster. The code has been donated from cloudera/spark-dataflow and is now part of Beam.
The JetRunner runs the pipeline on a Hazelcast Jet cluster. The code has been donated from hazelcast/hazelcast-jet and is now part of Beam.
The Twister2Runner runs the pipeline on a Twister2 cluster. The code has been donated from DSC-SPIDAL/twister2 and is now part of Beam.

Have ideas for new Runners? See the JIRA.

Getting Started

To learn how to write Beam pipelines, read the Quickstart for [Java, Python, or Go] available on our website.

Contact Us

To get involved in Apache Beam:

Subscribe or mail the [email protected] list.
Subscribe or mail the [email protected] list.
Join ASF Slack on #beam channel
Report issues on JIRA.

Instructions for building and testing Beam itself are in the contribution guide.

More Information

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

apache / Beam

Programming Languages

Labels

Projects that are alternatives of or similar to Beam

Apache Beam

Status

Post-commit tests status (on master branch)

Overview

The Beam Model

SDKs

Runners

Getting Started

Contact Us

More Information