Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+10740.91%)

Mutual labels: etl, data-engineering

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+2777.27%)

Mutual labels: etl, data-engineering

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+2681.82%)

Mutual labels: etl, data-engineering

Aws Serverless Data Lake Framework

Enterprise-grade, production-hardened, serverless data lake on AWS

Stars: ✭ 179 (+713.64%)

Mutual labels: etl, data-engineering

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (+159.09%)

Mutual labels: etl, data-engineering

pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Stars: ✭ 64 (+190.91%)

Mutual labels: etl, data-engineering

ruby-for-pentaho-kettle

Ruby scripting for pentaho-kettle

Stars: ✭ 42 (+90.91%)

Mutual labels: etl

koza

Data transformation framework for LinkML data models

Stars: ✭ 21 (-4.55%)

Mutual labels: etl

persistity

A persistence framework for game developers

Stars: ✭ 34 (+54.55%)

Mutual labels: etl

mlbgameday

Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.

Stars: ✭ 37 (+68.18%)

Mutual labels: etl

es2postgres

ElasticSearch to PostgreSQL loader

Stars: ✭ 18 (-18.18%)

Mutual labels: etl

dswarm

an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)

Stars: ✭ 57 (+159.09%)

Mutual labels: etl

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-36.36%)

Mutual labels: etl

oesophagus

Enterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)

Stars: ✭ 12 (-45.45%)

Mutual labels: etl

oic-options-chains

ETL for OIC Options Chains

Stars: ✭ 22 (+0%)

Mutual labels: etl

gamechanger-data

GAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements

Stars: ✭ 17 (-22.73%)

Mutual labels: etl

spdr-etf-holdings

ETL for the SPDR ETF holdings XLS documents

Stars: ✭ 14 (-36.36%)

Mutual labels: etl

dflib

In-memory Java DataFrame library

Stars: ✭ 50 (+127.27%)

Mutual labels: etl

mydataharbor

🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步，主要定位是为实时交易系统服务，亦可用于大数据的数据同步（ETL领域）。

Stars: ✭ 28 (+27.27%)

Mutual labels: etl

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-27.27%)

Mutual labels: etl

PDAP-Scrapers

Code relating to scraping public police data.

Stars: ✭ 72 (+227.27%)

Mutual labels: etl

viewflow

Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Stars: ✭ 110 (+400%)

Mutual labels: data-engineering

openrefine-client

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.

Stars: ✭ 67 (+204.55%)

Mutual labels: etl

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (+113.64%)

Mutual labels: data-engineering

DataBridge.NET

Configurable data bridge for permanent ETL jobs

Stars: ✭ 16 (-27.27%)

Mutual labels: etl

go-bqloader

bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

Stars: ✭ 16 (-27.27%)

Mutual labels: etl

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…

Stars: ✭ 474 (+2054.55%)

Mutual labels: elt

cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Stars: ✭ 109 (+395.45%)

Mutual labels: etl

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (-4.55%)

Mutual labels: etl

open-semantic-desktop-search

Virtual Machine for Desktop Search with Open Semantic Search

Stars: ✭ 22 (+0%)

Mutual labels: etl

openrefine-docker

OpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files for automated builds.

Stars: ✭ 19 (-13.64%)

Mutual labels: etl

yt-channels-DS-AI-ML-CS

A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.

Stars: ✭ 1,038 (+4618.18%)

Mutual labels: data-engineering

DataXServer

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用（ThriftServer，HttpServer）分布式运行（DataX on YARN）功能

Stars: ✭ 130 (+490.91%)

Mutual labels: etl

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (+9.09%)

Mutual labels: etl

neon-workshop

A Pachyderm deep learning tutorial for conference workshops

Stars: ✭ 19 (-13.64%)

Mutual labels: data-engineering

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-22.73%)

Mutual labels: etl

wrangle

A data transformation package for deep learning with Autonomio, Keras and TensorFlow.

Stars: ✭ 15 (-31.82%)

Mutual labels: etl

mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).

Stars: ✭ 32 (+45.45%)

Mutual labels: etl

TEAM

The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.

Stars: ✭ 27 (+22.73%)

Mutual labels: etl

1-60 of 281 similar projects

›