All Projects → adilkhash → Data Engineering Howto

adilkhash / Data Engineering Howto

A list of useful resources to learn Data Engineering from scratch

Projects that are alternatives of or similar to Data Engineering Howto

AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-99.03%)
Mutual labels:  data-engineering, data-pipeline
practical-data-engineering
Real estate dagster pipeline
Stars: ✭ 110 (-94.65%)
Mutual labels:  data-engineering, data-pipeline
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-98.78%)
Mutual labels:  data-engineering, data-pipeline
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (-93.87%)
Mutual labels:  data-engineering
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+25.29%)
Mutual labels:  data-engineering
Saltie
🚗 Rocket League Distributed Deep Reinforcement Learning Bot
Stars: ✭ 134 (-93.48%)
Mutual labels:  distributed-systems
Examples
DC/OS examples
Stars: ✭ 139 (-93.24%)
Mutual labels:  distributed-systems
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+16%)
Mutual labels:  data-engineering
Airflow Autoscaling Ecs
Airflow Deployment on AWS ECS Fargate Using Cloudformation
Stars: ✭ 136 (-93.39%)
Mutual labels:  data-engineering
Go Grpc
A simpler grpc framework
Stars: ✭ 133 (-93.53%)
Mutual labels:  distributed-systems
Go Archaius
a dynamic configuration framework used in distributed system
Stars: ✭ 133 (-93.53%)
Mutual labels:  distributed-systems
Panic Server
Testing for collaborative apps and tools
Stars: ✭ 128 (-93.77%)
Mutual labels:  distributed-systems
Temporal
Temporal service
Stars: ✭ 3,212 (+56.23%)
Mutual labels:  distributed-systems
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-93.82%)
Mutual labels:  data-engineering
Mit6.824 distributedsystem
MIT6.824分布式系统(2018秋)
Stars: ✭ 135 (-93.43%)
Mutual labels:  distributed-systems
Faust
Python Stream Processing. A Faust fork
Stars: ✭ 124 (-93.97%)
Mutual labels:  distributed-systems
Swim Js
JavaScript implementation of SWIM membership protocol
Stars: ✭ 135 (-93.43%)
Mutual labels:  distributed-systems
Rucio
Rucio - Scientific Data Management
Stars: ✭ 131 (-93.63%)
Mutual labels:  distributed-systems
Kronos
Distributed Time Synchronization Service
Stars: ✭ 131 (-93.63%)
Mutual labels:  distributed-systems
Vertx In Action
Examples for the Manning "Vert.x in Action" book
Stars: ✭ 134 (-93.48%)
Mutual labels:  distributed-systems

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease
  • lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].