A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+3725%)

Mutual labels: data-engineering

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (+62.5%)

Mutual labels: data-engineering

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (+193.75%)

Mutual labels: data-engineering

qsv

CSVs sliced, diced & analyzed.

Stars: ✭ 438 (+2637.5%)

Mutual labels: data-engineering

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (+193.75%)

Mutual labels: data-engineering

airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.

Stars: ✭ 111 (+593.75%)

Mutual labels: data-engineering

pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

Stars: ✭ 64 (+300%)

Mutual labels: data-engineering

Gspread Pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

Stars: ✭ 226 (+1312.5%)

Mutual labels: data-engineering

practical-data-engineering

Real estate dagster pipeline

Stars: ✭ 110 (+587.5%)

Mutual labels: data-engineering

Yuniql

Free and open source schema versioning and database migration made natively with .NET Core.

Stars: ✭ 156 (+875%)

Mutual labels: data-engineering

gallia-core

A schema-aware Scala library for data transformation

Stars: ✭ 44 (+175%)

Mutual labels: data-engineering

Data Engineering Howto

A list of useful resources to learn Data Engineering from scratch

Stars: ✭ 2,056 (+12750%)

Mutual labels: data-engineering

morph-kgc

Powerful RDF Knowledge Graph Generation with [R2]RML Mappings

Stars: ✭ 77 (+381.25%)

Mutual labels: data-engineering

deordie-meetups

DE or DIE meetup made by data engineers for data engineers. Currently in Russian only.

Stars: ✭ 48 (+200%)

Mutual labels: data-engineering

Pipelinex

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more

Stars: ✭ 127 (+693.75%)

Mutual labels: data-engineering

neon-workshop

A Pachyderm deep learning tutorial for conference workshops

Stars: ✭ 19 (+18.75%)

Mutual labels: data-engineering

contessa

Easy way to define, execute and store quality rules for your data.

Stars: ✭ 17 (+6.25%)

Mutual labels: data-engineering

mpc-DL-controller

Deep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system vs classic Model Predictive Control

Stars: ✭ 37 (+131.25%)

Mutual labels: data-engineering

papilo

DEPRECATED: Stream data processing micro-framework

Stars: ✭ 24 (+50%)

Mutual labels: data-engineering

h4sci-course

ETH PhD Program course

Stars: ✭ 19 (+18.75%)

Mutual labels: data-engineering

lrmr

Less-Resilient MapReduce framework for Go

Stars: ✭ 32 (+100%)

Mutual labels: data-engineering

arthur-redshift-etl

ELT Code for your Data Warehouse

Stars: ✭ 22 (+37.5%)

Mutual labels: data-engineering

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+1643.75%)

Mutual labels: data-engineering

dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models

Stars: ✭ 139 (+768.75%)

Mutual labels: data-engineering

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (+131.25%)

Mutual labels: data-engineering

ml-in-production

The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.

Stars: ✭ 29 (+81.25%)

Mutual labels: data-engineering

awesome-dbt

A curated list of awesome dbt resources

Stars: ✭ 520 (+3150%)

Mutual labels: data-engineering

funsies

funsies is a lightweight workflow engine 🔧

Stars: ✭ 37 (+131.25%)

Mutual labels: data-engineering

Every Single Day I Tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Stars: ✭ 249 (+1456.25%)

Mutual labels: data-engineering

beneath

Beneath is a serverless real-time data platform ⚡️

Stars: ✭ 65 (+306.25%)

Mutual labels: data-engineering

Ploomber

A convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.

Stars: ✭ 221 (+1281.25%)

Mutual labels: data-engineering

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (+256.25%)

Mutual labels: data-engineering

Aws Serverless Data Lake Framework

Enterprise-grade, production-hardened, serverless data lake on AWS

Stars: ✭ 179 (+1018.75%)

Mutual labels: data-engineering

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (+56.25%)

Mutual labels: data-engineering

Auptimizer

An automatic ML model optimization tool.

Stars: ✭ 166 (+937.5%)

Mutual labels: data-engineering

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+231.25%)

Mutual labels: data-engineering

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (+850%)

Mutual labels: data-engineering

yt-channels-DS-AI-ML-CS

A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.

Stars: ✭ 1,038 (+6387.5%)

Mutual labels: data-engineering

Gcp Data Engineer Exam

Study materials for the Google Cloud Professional Data Engineering Exam

Stars: ✭ 144 (+800%)

Mutual labels: data-engineering

Azure-Certification-DP-200

Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution

Stars: ✭ 54 (+237.5%)

Mutual labels: data-engineering

Accelerator

The Accelerator is a tool for fast and reproducible processing of large amounts of data.