Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-73.13%)

Mutual labels: spark, big-data

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (-61.5%)

Mutual labels: spark, big-data

Js Spark

Realtime calculation distributed system. AKA distributed lodash

Stars: ✭ 187 (-48.2%)

Mutual labels: spark, distributed-computing

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+270.64%)

Mutual labels: spark, big-data

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (-31.58%)

Mutual labels: spark, big-data

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (-31.86%)

Mutual labels: spark, big-data

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (+0.55%)

Mutual labels: sql, spark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (-40.17%)

Mutual labels: spark, big-data

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+69.25%)

Mutual labels: sql, spark

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+1326.32%)

Mutual labels: sql, big-data

Sylph

Stream computing platform for bigdata

Stars: ✭ 362 (+0.28%)

Mutual labels: sql, big-data

Hazelcast

Open-source distributed computation and storage platform

Stars: ✭ 4,662 (+1191.41%)

Mutual labels: big-data, distributed-computing

Awesome Business Intelligence

Actively curated list of awesome BI tools. PRs welcome!

Stars: ✭ 1,157 (+220.5%)

Mutual labels: sql, etl

Ether sql

A python library to push ethereum blockchain data into an sql database.

Stars: ✭ 41 (-88.64%)

Mutual labels: sql, etl

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-93.07%)

Mutual labels: spark, etl

Ethereum Etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+164.82%)

Mutual labels: sql, etl

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (-63.99%)

Mutual labels: sql, big-data

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (-69.81%)

Mutual labels: sql, spark

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-96.12%)

Mutual labels: big-data, spark

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 20 (-94.46%)

Mutual labels: big-data, etl

Bitcoin Etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 174 (-51.8%)

Mutual labels: sql, etl

Linq2db

Linq to database provider.

Stars: ✭ 2,211 (+512.47%)

Mutual labels: sql, etl

Bulk Writer

Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.

Stars: ✭ 210 (-41.83%)

Mutual labels: sql, etl

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+543.49%)

Mutual labels: sql, spark

DIRECT

DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.

Stars: ✭ 20 (-94.46%)

Mutual labels: etl, etl-framework

vixtract

www.vixtract.ru

Stars: ✭ 40 (-88.92%)

Mutual labels: etl, etl-framework

pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Stars: ✭ 72 (-80.06%)

Mutual labels: big-data, distributed-computing

Phoenix

Mirror of Apache Phoenix

Stars: ✭ 867 (+140.17%)

Mutual labels: sql, big-data

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+69.53%)

Mutual labels: etl, etl-framework

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (-64.82%)

Mutual labels: big-data, distributed-computing

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (-89.47%)

Mutual labels: etl, etl-framework

OpenKettleWebUI

一款基于kettle的数据处理web调度控制平台，支持文档资源库和数据库资源库，通过web平台控制kettle数据转换，可作为中间件集成到现有系统中

Stars: ✭ 138 (-61.77%)

Mutual labels: etl, etl-framework

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (-94.18%)

Mutual labels: etl, etl-framework

DataBridge.NET

Configurable data bridge for permanent ETL jobs

Stars: ✭ 16 (-95.57%)

Mutual labels: etl, etl-framework

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-96.4%)

Mutual labels: big-data, spark

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

Stars: ✭ 67 (-81.44%)

Mutual labels: etl, etl-framework

Bender

Bender - Serverless ETL Framework

Stars: ✭ 171 (-52.63%)

Mutual labels: etl, etl-framework

Etlbox

A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.