Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

✭ 2,385

python Jupyter Notebook aws mysql data-science aws-lambda pandas lambda etl data-engineering redshift emr athena apache-parquet amazon-athena apache-arrow aws-glue glue-catalog amazon-sagemaker-notebook

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

✭ 122

scala data-science spark data-engineering

D6t Python

Accelerate data science

✭ 118

html data-science pandas data-engineering

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

✭ 1,511

javascript CSS Makefile json data-science dashboard data-visualization data chart csv big-data yaml d3 d3js data-engineering gist business-intelligence data-driven github-gist just-dashboard

Superset

Apache Superset is a Data Visualization and Data Exploration Platform

Applied Ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

✭ 17,824

deep-learning machine-learning computer-vision data-science natural-language-processing reinforcement-learning search data-engineering production data-discovery recsys data-quality applied-data-science applied-machine-learning

Dataengineeringproject

Example end to end data engineering project.

✭ 82

python hacktoberfest redis mongodb elasticsearch kafka big-data s3 django-rest-framework scraping airflow data-engineering kafka-connect minio

Setl

A simple Spark-powered ETL framework that just works 🍺

✭ 79

scala machine-learning framework data-science dataset spark data-analysis big-data pipeline etl data-engineering modularization

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

✭ 79

python data-science automation sql analytics etl data-engineering

Ansible Playbook

Ansible playbook to deploy distributed technologies

✭ 61

python aws devops ansible kafka zookeeper data-engineering ansible-playbooks

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

✭ 60

scala spark hadoop data-engineering

Quilt

Quilt is a self-organizing data hub for S3

✭ 1,007

python jupyter-notebook data serialization data-engineering parquet

Dbt Sqlserver

dbt adapter for SQL Server and Azure SQL

✭ 41

python microsoft sql-server data-engineering mssql

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

✭ 864

jupyter-notebook machine-learning data-science data-visualization data-analysis cloud-computing data-engineering data-processing

Lakefs

Git-like capabilities for your object storage

✭ 847

go aws-s3 data-engineering object-storage

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

✭ 793

python spark s3 scheduler apache-spark airflow data-engineering etl-framework redshift

Prefect

The easiest way to automate your data

✭ 7,956

python data-science automation workflow infrastructure workflow-engine data-engineering orchestration orion prefect data-ops ml-ops

Pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

✭ 647

python hacktoberfest data pandas dataframe data-engineering pydata

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

✭ 633

python data-science spark etl data-engineering pyspark

Pointblank

Data validation and organization of metadata for data frames and database tables

✭ 480

r mysql postgresql spark sqlite easy-to-use data-engineering mssql data-frame data-validation

Data Engineering Book

Accumulated knowledge and experience in the field of Data Engineering

✭ 471

data engineering data-engineering

Udacity Data Engineering Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

✭ 458

python aws data postgres cluster infrastructure cassandra aws-s3 cloudformation airflow data-engineering aws-sdk aws-ec2 postgresql-database

Active workflow

Turn complex requirements to workflows without leaving the comfort of your technology stack.

✭ 413

ruby workflow scheduler event-driven scheduling data-engineering ifttt

Awesome Opensource Data Engineering

An Awesome List of Open-Source Data Engineering Projects

✭ 381

awesome-list data-engineering

Learn Something Every Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

✭ 362

css aws data-science blog algorithm learning education unix math research computer-science mathematics engineering software-engineering data-engineering educational university course-materials

Dataform

Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift

✭ 342

typescript hacktoberfest analytics etl data-engineering business-intelligence

Egeria

Open Metadata and Governance

✭ 328

java hacktoberfest data-engineering

Benthos

Fancy stream processing made operationally mundane

✭ 3,705

go javascript CSS HTML shell Makefile kafka rabbitmq cqrs event-sourcing etl stream-processing amqp message-queue logs data-engineering nats message-bus streaming-data stream-processor data-ops

Around Dataengineering

A Data Engineering & Machine Learning Knowledge Hub

✭ 257

machine-learning devops spark infrastructure datascience airflow data-engineering

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Feast

Feature Store for Machine Learning

✭ 2,576

python java shell HCL go Makefile machine-learning big-data ml feature-engineering features data-science data-engineering mlops feature-store

Cookbook

The Data Engineering Cookbook

✭ 9,829

big-data best-practices cookbook data-engineering data-engineer

etl manager

A python package to create a database on the platform using our moj data warehousing framework

✭ 14

python etl data-engineering

ClassifyBot

Automate building ML classification pipelines in .NET

✭ 16

C#python machine-learning dotnet data-engineering

AirflowDataPipeline

Example of an ETL Pipeline using Airflow

✭ 24

python airflow etl postgresql data-engineering data-pipelines

beneath

Beneath is a serverless real-time data platform ⚡️

✭ 65

go typescript python java Jupyter Notebook javascript kubernetes data-science streaming sql etl analytics data-warehouse data-engineering dataops developer-tools data-pipelines mlops beneath

growthbook

Open Source Feature Flagging and A/B Testing Platform

arthur-redshift-etl

ELT Code for your Data Warehouse

✭ 22

python shell javascript Dockerfile CSS HTML open-source aws etl data-engineering elt arthur

pangeo-forge-recipes

Python library for building Pangeo Forge recipes.

✭ 64

python cloud etl xarray data-engineering zarr

yt-channels-DS-AI-ML-CS

A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.

✭ 1,038

data-science data machine-learning youtube awesome statistics web-development programming ai deep-learning math ml resources artificial-intelligence data-engineering coding data-analysis software-engineering awesome-list

Kaggle-project-list

Summary of my projects on kaggle

✭ 20

python data-science data-mining deep-learning text-classification data-engineering kaggle-competition

mpc-DL-controller

Deep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system vs classic Model Predictive Control

✭ 37

python deep-learning keras data-engineering robust-optimization optimal-control simulated-data keras-tensorflow model-predictive-control casadi

DataEngineering

This repo contains commands that data engineers use in day to day work.

✭ 47

python linux docker aws devops terraform pyspark data-engineering hacktoberfest eks hacktoberfest2020

1-60 of 96 data-engineering projects

›