All Projects → practical-data-engineering → Similar Projects or Alternatives

116 Open source projects that are alternatives of or similar to practical-data-engineering

Data Engineering Howto
A list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+1769.09%)
Mutual labels:  data-engineering, data-pipeline
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-81.82%)
Mutual labels:  data-engineering, data-pipeline
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-77.27%)
Mutual labels:  data-engineering, data-pipeline
Data Engineering Nanodegree
Projects done in the Data Engineering Nanodegree by Udacity.com
Stars: ✭ 151 (+37.27%)
Mutual labels:  data-engineering
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+57.27%)
Mutual labels:  data-engineering
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-47.27%)
Mutual labels:  data-engineering
ob bulkstash
Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+2.73%)
Mutual labels:  data-pipeline
Airflow Autoscaling Ecs
Airflow Deployment on AWS ECS Fargate Using Cloudformation
Stars: ✭ 136 (+23.64%)
Mutual labels:  data-engineering
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (-66.36%)
Mutual labels:  data-engineering
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (+10.91%)
Mutual labels:  data-engineering
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+16103.64%)
Mutual labels:  data-engineering
Gspread Pandas
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (+105.45%)
Mutual labels:  data-engineering
big-data-engineering-indonesia
A curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (-76.36%)
Mutual labels:  data-engineering
Yuniql
Free and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (+41.82%)
Mutual labels:  data-engineering
deordie-meetups
DE or DIE meetup made by data engineers for data engineers. Currently in Russian only.
Stars: ✭ 48 (-56.36%)
Mutual labels:  data-engineering
Gcp Data Engineer Exam
Study materials for the Google Cloud Professional Data Engineering Exam
Stars: ✭ 144 (+30.91%)
Mutual labels:  data-engineering
qsv
CSVs sliced, diced & analyzed.
Stars: ✭ 438 (+298.18%)
Mutual labels:  data-engineering
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+14.55%)
Mutual labels:  data-engineering
Azure-Certification-DP-200
Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution
Stars: ✭ 54 (-50.91%)
Mutual labels:  data-engineering
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1273.64%)
Mutual labels:  data-engineering
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+38658.18%)
Mutual labels:  data-engineering
contessa
Easy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-84.55%)
Mutual labels:  data-engineering
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-28.18%)
Mutual labels:  data-engineering
awesome-dbt
A curated list of awesome dbt resources
Stars: ✭ 520 (+372.73%)
Mutual labels:  data-engineering
Ansible Playbook
Ansible playbook to deploy distributed technologies
Stars: ✭ 61 (-44.55%)
Mutual labels:  data-engineering
Quilt
Quilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+815.45%)
Mutual labels:  data-engineering
Ploomber
A convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (+100.91%)
Mutual labels:  data-engineering
saisoku
Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Stars: ✭ 40 (-63.64%)
Mutual labels:  data-pipeline
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (+62.73%)
Mutual labels:  data-engineering
awesome-bigquery-views
Useful SQL queries for Blockchain ETL datasets in BigQuery.
Stars: ✭ 325 (+195.45%)
Mutual labels:  data-engineering
Auptimizer
An automatic ML model optimization tool.
Stars: ✭ 166 (+50.91%)
Mutual labels:  data-engineering
lrmr
Less-Resilient MapReduce framework for Go
Stars: ✭ 32 (-70.91%)
Mutual labels:  data-engineering
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+38.18%)
Mutual labels:  data-engineering
datart
Datart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+847.27%)
Mutual labels:  data-engineering
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+153.64%)
Mutual labels:  data-engineering
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+0.91%)
Mutual labels:  data-engineering
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+685.45%)
Mutual labels:  data-engineering
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+24.55%)
Mutual labels:  data-engineering
machine-learning-data-pipeline
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (-80%)
Mutual labels:  data-pipeline
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (+15.45%)
Mutual labels:  data-engineering
dc-sdk-js
一个基于浏览器环境的数据采集SDK
Stars: ✭ 52 (-52.73%)
Mutual labels:  data-pipeline
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+2068.18%)
Mutual labels:  data-engineering
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-64.55%)
Mutual labels:  data-pipeline
D6t Python
Accelerate data science
Stars: ✭ 118 (+7.27%)
Mutual labels:  data-engineering
datajob
Build and deploy a serverless data pipeline on AWS with no effort.
Stars: ✭ 101 (-8.18%)
Mutual labels:  data-pipeline
get smarties
Dummy variable generation with fit/transform capabilities
Stars: ✭ 23 (-79.09%)
Mutual labels:  data-engineering
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-25.45%)
Mutual labels:  data-engineering
aws-pdf-textract-pipeline
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+28.18%)
Mutual labels:  data-pipeline
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-28.18%)
Mutual labels:  data-engineering
morph-kgc
Powerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-30%)
Mutual labels:  data-engineering
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-45.45%)
Mutual labels:  data-engineering
scicloj.ml
A Clojure machine learning library
Stars: ✭ 152 (+38.18%)
Mutual labels:  data-pipeline
Dbt Sqlserver
dbt adapter for SQL Server and Azure SQL
Stars: ✭ 41 (-62.73%)
Mutual labels:  data-engineering
Everything-Tech
A collection of online resources to help you on your Tech journey.
Stars: ✭ 396 (+260%)
Mutual labels:  data-engineering
Lakefs
Git-like capabilities for your object storage
Stars: ✭ 847 (+670%)
Mutual labels:  data-engineering
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+126.36%)
Mutual labels:  data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-51.82%)
Mutual labels:  data-engineering
Data-pipeline-project
Data pipeline project
Stars: ✭ 18 (-83.64%)
Mutual labels:  data-pipeline
prefect-saturn
Python client for using Prefect Cloud with Saturn Cloud
Stars: ✭ 15 (-86.36%)
Mutual labels:  data-engineering
papilo
DEPRECATED: Stream data processing micro-framework
Stars: ✭ 24 (-78.18%)
Mutual labels:  data-engineering
1-60 of 116 similar projects