All Projects → pangeo-forge-recipes → Similar Projects or Alternatives

308 Open source projects that are alternatives of or similar to pangeo-forge-recipes

Example of an ETL Pipeline using Airflow

Stars: ✭ 24 (-62.5%)

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+3626.56%)

Mutual labels: etl, data-engineering

Dataform

Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift

Stars: ✭ 342 (+434.38%)

Mutual labels: etl, data-engineering

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (-26.56%)

Mutual labels: etl, data-engineering

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (+96.88%)

Mutual labels: etl, data-engineering

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (-42.19%)

Mutual labels: etl, data-engineering

Aws Serverless Data Lake Framework

Enterprise-grade, production-hardened, serverless data lake on AWS

Stars: ✭ 179 (+179.69%)

Mutual labels: etl, data-engineering

versatile-data-kit

Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.

Stars: ✭ 144 (+125%)

Mutual labels: etl, data-engineering

etl manager

A python package to create a database on the platform using our moj data warehousing framework

Stars: ✭ 14 (-78.12%)

Mutual labels: etl, data-engineering

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (+23.44%)

Mutual labels: etl, data-engineering

xarray-beam

Distributed Xarray with Apache Beam

Stars: ✭ 83 (+29.69%)

Mutual labels: xarray, zarr

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (-17.19%)

Mutual labels: etl, data-engineering

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+335.94%)

Mutual labels: etl, data-engineering

AirflowETL

Blog post on ETL pipelines with Airflow

Stars: ✭ 20 (-68.75%)

Mutual labels: etl, data-engineering

Benthos

Fancy stream processing made operationally mundane

Stars: ✭ 3,705 (+5689.06%)

Mutual labels: etl, data-engineering

gallia-core

A schema-aware Scala library for data transformation

Stars: ✭ 44 (-31.25%)

Mutual labels: etl, data-engineering

morph-kgc

Powerful RDF Knowledge Graph Generation with [R2]RML Mappings

Stars: ✭ 77 (+20.31%)

Mutual labels: etl, data-engineering

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+889.06%)

Mutual labels: etl, data-engineering

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Stars: ✭ 79 (+23.44%)

Mutual labels: etl, data-engineering

arthur-redshift-etl

ELT Code for your Data Warehouse

Stars: ✭ 22 (-65.62%)

Mutual labels: etl, data-engineering

beneath

Beneath is a serverless real-time data platform ⚡️

Stars: ✭ 65 (+1.56%)

Mutual labels: etl, data-engineering

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+7585.94%)

Mutual labels: etl, data-engineering

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (-10.94%)

Mutual labels: etl, data-engineering

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+856.25%)

Mutual labels: etl, data-engineering

climate system

Notes and practicals for my "Physics of the Climate System" lecture

Stars: ✭ 13 (-79.69%)

Mutual labels: xarray

hypothesis-gufunc

Extension to hypothesis for testing numpy general universal functions

Stars: ✭ 32 (-50%)

Mutual labels: xarray

maxwell-sink

consume maxwell generated message from kafka,export it to another mysql.

Stars: ✭ 16 (-75%)

Mutual labels: etl

Data-Engineering-Projects

Personal Data Engineering Projects

Stars: ✭ 167 (+160.94%)

Mutual labels: data-engineering

carry

Python ETL(Extract-Transform-Load) tool / Data migration tool

Stars: ✭ 115 (+79.69%)

Mutual labels: etl

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (-60.94%)

Mutual labels: data-engineering

mlbgameday

Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.

Stars: ✭ 37 (-42.19%)

Mutual labels: etl

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (-67.19%)

Mutual labels: etl

gcpy

Python toolkit for GEOS-Chem.

Stars: ✭ 34 (-46.87%)

Mutual labels: xarray

xpublish

Publish Xarray Datasets via a REST API.

Stars: ✭ 86 (+34.38%)

Mutual labels: xarray

ruby-for-pentaho-kettle

Ruby scripting for pentaho-kettle

Stars: ✭ 42 (-34.37%)

Mutual labels: etl

openrefine-docker

OpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.

Stars: ✭ 19 (-70.31%)

Mutual labels: etl

spdr-etf-holdings

ETL for the SPDR ETF holdings XLS documents

Stars: ✭ 14 (-78.12%)

Mutual labels: etl

restee

Python package to call processed EE objects via the REST API to local data

Stars: ✭ 26 (-59.37%)

Mutual labels: xarray

persistity

A persistence framework for game developers

Stars: ✭ 34 (-46.87%)

Mutual labels: etl

koza

Data transformation framework for LinkML data models

Stars: ✭ 21 (-67.19%)

Mutual labels: etl

dswarm

an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)

Stars: ✭ 57 (-10.94%)

Mutual labels: etl

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-78.12%)

Mutual labels: etl

openrefine-client

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.

Stars: ✭ 67 (+4.69%)

Mutual labels: etl

es2postgres

ElasticSearch to PostgreSQL loader

Stars: ✭ 18 (-71.87%)

Mutual labels: etl

oic-options-chains

ETL for OIC Options Chains

Stars: ✭ 22 (-65.62%)

Mutual labels: etl

clisops

Climate Simulation Operations

Stars: ✭ 17 (-73.44%)

Mutual labels: xarray

oesophagus

Enterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)

Stars: ✭ 12 (-81.25%)

Mutual labels: etl

dflib

In-memory Java DataFrame library

Stars: ✭ 50 (-21.87%)

Mutual labels: etl

Addax

Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Stars: ✭ 615 (+860.94%)

Mutual labels: etl

wxee

A Python interface between Earth Engine and xarray for processing time series data

Stars: ✭ 113 (+76.56%)

Mutual labels: xarray

yt-channels-DS-AI-ML-CS

A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.

Stars: ✭ 1,038 (+1521.88%)

Mutual labels: data-engineering

astro

Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Stars: ✭ 79 (+23.44%)

Mutual labels: etl

mydataharbor

🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步，主要定位是为实时交易系统服务，亦可用于大数据的数据同步（ETL领域）。

Stars: ✭ 28 (-56.25%)

Mutual labels: etl

aospy

Python package for automated analysis and management of gridded climate data