A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+2010.34%)

Mutual labels: data-engineering

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (-10.34%)

Mutual labels: data-engineering

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (+96.55%)

Mutual labels: data-engineering

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (+100%)

Mutual labels: data-engineering

gallia-core

A schema-aware Scala library for data transformation

Stars: ✭ 44 (+51.72%)

Mutual labels: data-engineering

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+862.07%)

Mutual labels: data-engineering

CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

Stars: ✭ 22 (-24.14%)

Mutual labels: data-pipelines

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (+27.59%)

Mutual labels: data-engineering

pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 970 (+3244.83%)

Mutual labels: data-engineering

awesome-dbt

A curated list of awesome dbt resources

Stars: ✭ 520 (+1693.1%)

Mutual labels: data-engineering

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+82.76%)

Mutual labels: data-engineering

Data-Engineering-Projects

Personal Data Engineering Projects

Stars: ✭ 167 (+475.86%)

Mutual labels: data-engineering

Ploomber

A convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.

Stars: ✭ 221 (+662.07%)

Mutual labels: data-engineering

Azure-Certification-DP-200

Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution

Stars: ✭ 54 (+86.21%)

Mutual labels: data-engineering

Aws Serverless Data Lake Framework

Enterprise-grade, production-hardened, serverless data lake on AWS

Stars: ✭ 179 (+517.24%)

Mutual labels: data-engineering

datajoint-python

Relational data pipelines for the science lab

Stars: ✭ 140 (+382.76%)

Mutual labels: data-pipelines

Auptimizer

An automatic ML model optimization tool.

Stars: ✭ 166 (+472.41%)

Mutual labels: data-engineering

morph-kgc

Powerful RDF Knowledge Graph Generation with [R2]RML Mappings

Stars: ✭ 77 (+165.52%)

Mutual labels: data-engineering

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (+424.14%)

Mutual labels: data-engineering

Data Engineering Howto

A list of useful resources to learn Data Engineering from scratch

Stars: ✭ 2,056 (+6989.66%)

Mutual labels: data-engineering

Gcp Data Engineer Exam

Study materials for the Google Cloud Professional Data Engineering Exam

Stars: ✭ 144 (+396.55%)

Mutual labels: data-engineering

awesome-bigquery-views

Useful SQL queries for Blockchain ETL datasets in BigQuery.

Stars: ✭ 325 (+1020.69%)

Mutual labels: data-engineering

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (+62.07%)

Mutual labels: data-engineering

Accelerator

The Accelerator is a tool for fast and reproducible processing of large amounts of data.

Stars: ✭ 137 (+372.41%)

Mutual labels: data-engineering

deordie-meetups

DE or DIE meetup made by data engineers for data engineers. Currently in Russian only.

Stars: ✭ 48 (+65.52%)

Mutual labels: data-engineering

Airflow Autoscaling Ecs

Airflow Deployment on AWS ECS Fargate Using Cloudformation

Stars: ✭ 136 (+368.97%)

Mutual labels: data-engineering

rivery cli

Rivery CLI

Stars: ✭ 16 (-44.83%)

Mutual labels: data-pipelines

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (-13.79%)

Mutual labels: data-engineering

fairflow

Functional Airflow DAG definitions.

Stars: ✭ 38 (+31.03%)

Mutual labels: apache-airflow

Pipelinex

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more

Stars: ✭ 127 (+337.93%)

Mutual labels: data-engineering

1-60 of 115 similar projects

›