All Projects → bandar-log → Similar Projects or Alternatives

660 Open source projects that are alternatives of or similar to bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 19 (-5%)

Mutual labels: big-data, presto, etl, spark-streaming

Maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

Stars: ✭ 101 (+405%)

Mutual labels: big-data, presto

Cube.js

📊 Cube — Open-Source Analytics API for Building Data Apps

Stars: ✭ 11,983 (+59815%)

Mutual labels: presto, athena

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+785%)

Mutual labels: big-data, spark-streaming

architect big data solutions with spark

code, labs and lectures for the course

Stars: ✭ 40 (+100%)

Mutual labels: etl, spark-streaming

Presto Go Client

A Presto client for the Go programming language.

Stars: ✭ 183 (+815%)

Mutual labels: big-data, presto

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (+40%)

Mutual labels: athena, etl

Sylph

Stream computing platform for bigdata

Stars: ✭ 362 (+1710%)

Mutual labels: big-data, spark-streaming

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (+295%)

Mutual labels: big-data, etl

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+22805%)

Mutual labels: big-data, presto

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+600%)

Mutual labels: big-data, etl

Eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Stars: ✭ 235 (+1075%)

Mutual labels: big-data, etl

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+980%)

Mutual labels: big-data, spark-streaming

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (+1125%)

Mutual labels: big-data, etl

Smooks

An extensible Java framework for building XML and non-XML streaming applications

Stars: ✭ 293 (+1365%)

Mutual labels: big-data, etl

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+64685%)

Mutual labels: big-data, presto

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-15%)

Mutual labels: big-data, presto

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+1705%)

Mutual labels: big-data, etl

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+1135%)

Mutual labels: big-data, spark-streaming

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+11825%)

Mutual labels: athena, etl

Sqlpad

Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC

Stars: ✭ 4,113 (+20465%)

Mutual labels: presto, vertica

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (+620%)

Mutual labels: big-data, etl

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+95%)

Mutual labels: big-data, etl

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-15%)

Mutual labels: big-data, etl

openrefine-docker

OpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.

Stars: ✭ 19 (-5%)

Mutual labels: etl

pipeline

OONI data processing pipeline

Stars: ✭ 36 (+80%)

Mutual labels: big-data

etl

M-Lab ingestion pipeline

Stars: ✭ 15 (-25%)

Mutual labels: etl

gamechanger-data

GAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements

Stars: ✭ 17 (-15%)

Mutual labels: etl

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-30%)

Mutual labels: big-data

cardano-py

Python3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)

Stars: ✭ 17 (-15%)

Mutual labels: etl

bigtable

TypeScript Bigtable Client with 🔋🔋 included.

Stars: ✭ 13 (-35%)

Mutual labels: big-data

mlbgameday

Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.

Stars: ✭ 37 (+85%)

Mutual labels: etl

arthur-redshift-etl

ELT Code for your Data Warehouse

Stars: ✭ 22 (+10%)

Mutual labels: etl

Spark-and-Kafka IoT-Data-Processing-and-Analytics

Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time

Stars: ✭ 42 (+110%)

Mutual labels: spark-streaming

AverageShiftedHistograms.jl

⚡ Lightning fast density estimation in Julia ⚡

Stars: ✭ 52 (+160%)

Mutual labels: big-data

mmtf-workshop-2018

Structural Bioinformatics Training Workshop & Hackathon 2018

Stars: ✭ 50 (+150%)

Mutual labels: big-data

sync-addons

Odoo Integration Addons

Stars: ✭ 69 (+245%)

Mutual labels: etl

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+455%)

Mutual labels: big-data

predictionio-template-attribute-based-classifier

PredictionIO Classification Engine Template (Scala-based parallelized engine)

Stars: ✭ 38 (+90%)

Mutual labels: big-data

spdr-etf-holdings

ETL for the SPDR ETF holdings XLS documents

Stars: ✭ 14 (-30%)

Mutual labels: etl

NiFi-Rule-engine-processor

Drools processor for Apache NiFi

Stars: ✭ 34 (+70%)

Mutual labels: big-data

horgh-replicator

Golang binlog replication from MySQL to MySQL, PostgreSQL, Vertica, Clickhouse

Stars: ✭ 46 (+130%)

Mutual labels: vertica

AirflowDataPipeline

Example of an ETL Pipeline using Airflow

Stars: ✭ 24 (+20%)

Mutual labels: etl

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+12195%)

Mutual labels: presto

vxquery

Mirror of Apache VXQuery

Stars: ✭ 19 (-5%)

Mutual labels: big-data

openrefine-client

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.

Stars: ✭ 67 (+235%)

Mutual labels: etl

predictionio-template-java-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Java-based parallelized engine)

Stars: ✭ 36 (+80%)

Mutual labels: big-data

alluxio-py

Alluxio Python client - Access Any Data Source with Python

Stars: ✭ 18 (-10%)

Mutual labels: big-data

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+375%)

Mutual labels: big-data

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+25%)

Mutual labels: etl

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-20%)

Mutual labels: presto

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (+50%)

Mutual labels: etl

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-35%)

Mutual labels: big-data

predictionio-template-similar-product

PredictionIO Similar Product Engine Template (Scala-based parallelized engine)

Stars: ✭ 50 (+150%)

Mutual labels: big-data

TEAM

The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.

Stars: ✭ 27 (+35%)

Mutual labels: etl

pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Stars: ✭ 64 (+220%)

Mutual labels: etl

hotmap

WebGL Heatmap Viewer for Big Data and Bioinformatics

Stars: ✭ 13 (-35%)

Mutual labels: big-data

carry

Python ETL(Extract-Transform-Load) tool / Data migration tool