GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-60.1%)
Gcp Data Engineer ExamStudy materials for the Google Cloud Professional Data Engineering Exam
Stars: ✭ 144 (-62.2%)
Data Engineering HowtoA list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+439.63%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-64.04%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-66.67%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-66.93%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+525.98%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-67.98%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-69.03%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+296.59%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+11090.03%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+4578.22%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-79.27%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-79.27%)
Ansible PlaybookAnsible playbook to deploy distributed technologies
Stars: ✭ 61 (-83.99%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-84.25%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+164.3%)
Dbt Sqlserverdbt adapter for SQL Server and Azure SQL
Stars: ✭ 41 (-89.24%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+126.77%)
LakefsGit-like capabilities for your object storage
Stars: ✭ 847 (+122.31%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+108.14%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+1988.19%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+69.82%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+66.14%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+25.98%)
Data Engineering BookAccumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (+23.62%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+20.21%)
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (+8.4%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+1191.08%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+576.12%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+2479.79%)