Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+12452.63%)

Mutual labels: data-engineering

D6t Python

Accelerate data science

Stars: ✭ 118 (+521.05%)

Mutual labels: data-engineering

Superset

Apache Superset is a Data Visualization and Data Exploration Platform

Stars: ✭ 42,634 (+224289.47%)

Mutual labels: data-engineering

datart

Datart is a next generation Data Visualization Open Platform

Stars: ✭ 1,042 (+5384.21%)

Mutual labels: data-engineering

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+1368.42%)

Mutual labels: data-engineering

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (+331.58%)

Mutual labels: data-engineering

airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.

Stars: ✭ 111 (+484.21%)

Mutual labels: data-engineering

prefect-saturn

Python client for using Prefect Cloud with Saturn Cloud

Stars: ✭ 15 (-21.05%)

Mutual labels: data-engineering

Elastik Nearest Neighbors

Go to: https://github.com/alexklibisz/elastiknn

Stars: ✭ 249 (+1210.53%)

Mutual labels: data-engineering

preprocessy

Python package for Customizable Data Preprocessing Pipelines

Stars: ✭ 34 (+78.95%)

Mutual labels: data-engineering

Gspread Pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

Stars: ✭ 226 (+1089.47%)

Mutual labels: data-engineering

deordie-meetups

DE or DIE meetup made by data engineers for data engineers. Currently in Russian only.

Stars: ✭ 48 (+152.63%)

Mutual labels: data-engineering

Soda Sql

Metric collection, data testing and monitoring for SQL accessible data

Stars: ✭ 173 (+810.53%)

Mutual labels: data-engineering

datajoint-python

Relational data pipelines for the science lab

Stars: ✭ 140 (+636.84%)

Mutual labels: data-pipelines

Yuniql

Free and open source schema versioning and database migration made natively with .NET Core.

Stars: ✭ 156 (+721.05%)

Mutual labels: data-engineering

contessa

Easy way to define, execute and store quality rules for your data.

Stars: ✭ 17 (-10.53%)

Mutual labels: data-engineering

Data Engineering Nanodegree

Projects done in the Data Engineering Nanodegree by Udacity.com

Stars: ✭ 151 (+694.74%)

Mutual labels: data-engineering

CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

Stars: ✭ 22 (+15.79%)

Mutual labels: data-pipelines

Data Engineering Howto

A list of useful resources to learn Data Engineering from scratch

Stars: ✭ 2,056 (+10721.05%)

Mutual labels: data-engineering

Everything-Tech

A collection of online resources to help you on your Tech journey.

Stars: ✭ 396 (+1984.21%)

Mutual labels: data-engineering

Airflow Autoscaling Ecs

Airflow Deployment on AWS ECS Fargate Using Cloudformation

Stars: ✭ 136 (+615.79%)

Mutual labels: data-engineering

h4sci-course

ETH PhD Program course

Stars: ✭ 19 (+0%)

Mutual labels: data-engineering

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (+563.16%)

Mutual labels: data-engineering

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (+36.84%)

Mutual labels: data-engineering

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (+542.11%)

Mutual labels: data-engineering

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+178.95%)

Mutual labels: data-engineering

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+7852.63%)

Mutual labels: data-engineering

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (+205.26%)

Mutual labels: data-engineering

Applied Ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Stars: ✭ 17,824 (+93710.53%)

Mutual labels: data-engineering

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (+147.37%)

Mutual labels: data-engineering

qsv

CSVs sliced, diced & analyzed.

Stars: ✭ 438 (+2205.26%)

Mutual labels: data-engineering

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (+315.79%)

Mutual labels: data-engineering

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Stars: ✭ 79 (+315.79%)

Mutual labels: data-engineering

Ansible Playbook

Ansible playbook to deploy distributed technologies

Stars: ✭ 61 (+221.05%)

Mutual labels: data-engineering

Azure-Certification-DP-200

Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution

Stars: ✭ 54 (+184.21%)

Mutual labels: data-engineering

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (+94.74%)

Mutual labels: data-engineering

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (+215.79%)

Mutual labels: data-engineering

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+5200%)

Mutual labels: data-engineering

AirflowETL

Blog post on ETL pipelines with Airflow

Stars: ✭ 20 (+5.26%)

Mutual labels: data-engineering

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+3121.05%)

Mutual labels: data-engineering

1-60 of 103 similar projects

›