Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC

Stars: ✭ 4,113 (+25606.25%)

Mutual labels: bigquery, snowflake

Redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Stars: ✭ 20,147 (+125818.75%)

Mutual labels: bigquery, redshift

Ddlparse

DDL parase and Convert to BigQuery JSON schema and DDL statements

Stars: ✭ 52 (+225%)

Mutual labels: bigquery, redshift

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+231.25%)

Mutual labels: bigquery, etl

Bitcoin Etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 174 (+987.5%)

Mutual labels: bigquery, etl

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (+6.25%)

Mutual labels: bigquery, etl

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (+137.5%)

Mutual labels: bigquery, etl

Ethereum Etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+5875%)

Mutual labels: bigquery, etl

View All Similar Projects ➔

About Starlake

Complete documentation available here

Introduction

The purpose of this project is to efficiently ingest various data sources in different formats and make them available for analytics. Usually, ingestion is done by writing hand made custom parsers that transform input files into datasets of records.

This project aims at automating this parsing task by making data ingestion purely declarative.

The workflow below is a typical use case :

Export your data as a set of DSV (Delimiter-separated values) or JSON files
Define each DSV/JSON file with a schema using YAML syntax
Configure the ingestion process
Start watching your data being available as Hive Tables in your datalake

The main advantages of the Starlake Data Pipeline project are :

Eliminates manual coding for data ingestion
Assign metadata to each dataset
Expose data ingestion metrics and history
Transform text files to strongly typed records
Support semantic types
Force privacy on specific fields (RGPD)
very, very simple piece of software to administer

How it works

Starlake Data Pipeline automates the loading and parsing of files and their ingestion into a Hadoop Datalake where datasets become available as Hive tables.

Landing Area : Files are first stored in the local file system
Staging Area : Files associated with a schema are imported into the datalake
Working Area : Staged Files are parsed against their schema and records are rejected or accepted and made available in parquet/orc/... files as Hive Tables.
Business Area : Tables in the working area may be joined to provide a hoslictic view of the data through the definition of AutoJob.
Data visualization : parquet/orc/... tables may be exposed in warehouses or elasticsearch indexes

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

starlake-ai / starlake

Programming Languages

Labels

Projects that are alternatives of or similar to starlake

About Starlake

Introduction

How it works