Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.

Stars: ✭ 113 (+2.73%)

Mutual labels: data-pipeline

Airflow Autoscaling Ecs

Airflow Deployment on AWS ECS Fargate Using Cloudformation

Stars: ✭ 136 (+23.64%)

Mutual labels: data-engineering

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (-66.36%)

Mutual labels: data-engineering

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (+10.91%)

Mutual labels: data-engineering

Applied Ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Stars: ✭ 17,824 (+16103.64%)

Mutual labels: data-engineering

Gspread Pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

Stars: ✭ 226 (+105.45%)

Mutual labels: data-engineering

big-data-engineering-indonesia

A curated list of big data engineering tools, resources and communities.

Stars: ✭ 26 (-76.36%)

Mutual labels: data-engineering

Yuniql

Free and open source schema versioning and database migration made natively with .NET Core.

Stars: ✭ 156 (+41.82%)

Mutual labels: data-engineering

deordie-meetups

DE or DIE meetup made by data engineers for data engineers. Currently in Russian only.

Stars: ✭ 48 (-56.36%)

Mutual labels: data-engineering

Gcp Data Engineer Exam

Study materials for the Google Cloud Professional Data Engineering Exam

Stars: ✭ 144 (+30.91%)

Mutual labels: data-engineering

qsv

CSVs sliced, diced & analyzed.

Stars: ✭ 438 (+298.18%)

Mutual labels: data-engineering

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (+14.55%)

Mutual labels: data-engineering

Azure-Certification-DP-200

Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution

Stars: ✭ 54 (-50.91%)

Mutual labels: data-engineering

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+1273.64%)

Mutual labels: data-engineering

Superset

Apache Superset is a Data Visualization and Data Exploration Platform

Stars: ✭ 42,634 (+38658.18%)

Mutual labels: data-engineering

contessa

Easy way to define, execute and store quality rules for your data.

Stars: ✭ 17 (-84.55%)

Mutual labels: data-engineering

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-28.18%)

Mutual labels: data-engineering

awesome-dbt

A curated list of awesome dbt resources

Stars: ✭ 520 (+372.73%)

Mutual labels: data-engineering

Ansible Playbook

Ansible playbook to deploy distributed technologies

Stars: ✭ 61 (-44.55%)

Mutual labels: data-engineering

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+815.45%)

Mutual labels: data-engineering

Ploomber

A convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.

Stars: ✭ 221 (+100.91%)

Mutual labels: data-engineering

saisoku

Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.

Stars: ✭ 40 (-63.64%)

Mutual labels: data-pipeline

Aws Serverless Data Lake Framework

Enterprise-grade, production-hardened, serverless data lake on AWS

Stars: ✭ 179 (+62.73%)

Mutual labels: data-engineering

awesome-bigquery-views

Useful SQL queries for Blockchain ETL datasets in BigQuery.

Stars: ✭ 325 (+195.45%)

Mutual labels: data-engineering

Auptimizer

An automatic ML model optimization tool.

Stars: ✭ 166 (+50.91%)

Mutual labels: data-engineering

lrmr

Less-Resilient MapReduce framework for Go

Stars: ✭ 32 (-70.91%)

Mutual labels: data-engineering

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (+38.18%)

Mutual labels: data-engineering

datart

Datart is a next generation Data Visualization Open Platform

Stars: ✭ 1,042 (+847.27%)

Mutual labels: data-engineering

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+153.64%)

Mutual labels: data-engineering

airflow-dbt-python

A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.

Stars: ✭ 111 (+0.91%)

Mutual labels: data-engineering

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+685.45%)

Mutual labels: data-engineering

Accelerator

The Accelerator is a tool for fast and reproducible processing of large amounts of data.

Stars: ✭ 137 (+24.55%)

Mutual labels: data-engineering

machine-learning-data-pipeline

Pipeline module for parallel real-time data processing for machine learning models development and production purposes.

Stars: ✭ 22 (-80%)

Mutual labels: data-pipeline

Pipelinex

PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more

Stars: ✭ 127 (+15.45%)

Mutual labels: data-engineering

dc-sdk-js

一个基于浏览器环境的数据采集SDK

Stars: ✭ 52 (-52.73%)

Mutual labels: data-pipeline

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+2068.18%)

Mutual labels: data-engineering

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-64.55%)

Mutual labels: data-pipeline

D6t Python

Accelerate data science

Stars: ✭ 118 (+7.27%)

Mutual labels: data-engineering

datajob

Build and deploy a serverless data pipeline on AWS with no effort.

Stars: ✭ 101 (-8.18%)

Mutual labels: data-pipeline

get smarties

Dummy variable generation with fit/transform capabilities

Stars: ✭ 23 (-79.09%)

Mutual labels: data-engineering

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (-25.45%)

Mutual labels: data-engineering

aws-pdf-textract-pipeline

🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

Stars: ✭ 141 (+28.18%)

Mutual labels: data-pipeline

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Stars: ✭ 79 (-28.18%)

Mutual labels: data-engineering

morph-kgc

Powerful RDF Knowledge Graph Generation with [R2]RML Mappings

Stars: ✭ 77 (-30%)

Mutual labels: data-engineering

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.