All Projects → alanchn31 → Data-Engineering-Projects

alanchn31 / Data-Engineering-Projects

Licence: other
Personal Data Engineering Projects

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Data-Engineering-Projects

jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-85.03%)
Mutual labels:  airflow, data-engineering, data-lake, data-modeling
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+174.25%)
Mutual labels:  postgres, airflow, cassandra, data-engineering
Data Engineering Nanodegree
Projects done in the Data Engineering Nanodegree by Udacity.com
Stars: ✭ 151 (-9.58%)
Mutual labels:  postgres, cassandra, data-engineering
Beyond Jupyter
🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Stars: ✭ 135 (-19.16%)
Mutual labels:  postgres, airflow
Migrate
Database migrations. CLI and Golang library.
Stars: ✭ 7,712 (+4517.96%)
Mutual labels:  postgres, cassandra
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-52.69%)
Mutual labels:  postgres, airflow
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+3.59%)
Mutual labels:  airflow, data-engineering
Migrate
Database migrations. CLI and Golang library.
Stars: ✭ 2,315 (+1286.23%)
Mutual labels:  postgres, cassandra
Quill
Compile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+1096.41%)
Mutual labels:  postgres, cassandra
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-33.53%)
Mutual labels:  airflow, data-engineering
Azure-Certification-DP-200
Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution
Stars: ✭ 54 (-67.66%)
Mutual labels:  data-engineering, data-lake
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-88.02%)
Mutual labels:  airflow, data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-68.26%)
Mutual labels:  airflow, data-engineering
Sqlpad
Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC
Stars: ✭ 4,113 (+2362.87%)
Mutual labels:  postgres, cassandra
viewflow
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (-34.13%)
Mutual labels:  airflow, data-engineering
Data Science Stack Cookiecutter
🐳📊🤓Cookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)
Stars: ✭ 153 (-8.38%)
Mutual labels:  postgres, airflow
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-46.71%)
Mutual labels:  airflow, cassandra
Airflow Autoscaling Ecs
Airflow Deployment on AWS ECS Fargate Using Cloudformation
Stars: ✭ 136 (-18.56%)
Mutual labels:  airflow, data-engineering
pipeline
PipelineAI Kubeflow Distribution
Stars: ✭ 4,154 (+2387.43%)
Mutual labels:  airflow, cassandra
Cassandra-Data-Modeling
Basic Rules of Cassandra Data Modeling
Stars: ✭ 29 (-82.63%)
Mutual labels:  cassandra, data-modeling

Description


  • This repo contains projects done which applies principles in data engineering.
  • Notes taken during the course can be found in folder 0. Back to Basics

Projects


  1. Postgres ETL ✔️
  • This project looks at data modelling for a fictitious music startup Sparkify, applying STAR schema to ingest data to simplify queries that answers business questions the product owner may have
  1. Cassandra ETL ✔️
  • Looking at the realm of big data, Cassandra helps to ingest large amounts of data in a NoSQL context. This project adopts a query centric approach in ingesting data into data tables in Cassandra, to answer business questions about a music app
  1. Web Scrapying using Scrapy, MongoDB ETL ✔️
  • In storing semi-structured data, one form to store it in, is in the form of documents. MongoDB makes this possible, with a specific collection containing related documents. Each document contains fields of data which can be queried.
  • In this project, data is scraped from a books listing website using Scrapy. The fields of each book, such as price of a book, ratings, whether it is available is stored in a document in the books collection in MongoDB.
  1. Data Warehousing with AWS Redshift ✔️
  • This project creates a data warehouse, in AWS Redshift. A data warehouse provides a reliable and consistent foundation for users to query and answer some business questions based on requirements.
  1. Data Lake with Spark & AWS S3 ✔️
  • This project creates a data lake, in AWS S3 using Spark.
  • Why create a data lake? A data lake provides a reliable store for large amounts of data, from unstructured to semi-structured and even structured data. In this project, we ingest json files, denormalize them into fact and dimension tables and upload them into a AWS S3 data lake, in the form of parquet files.
  1. Data Pipelining with Airflow ✔️
  • This project schedules data pipelines, to perform ETL from json files in S3 to Redshift using Airflow.
  • Why use Airflow? Airflow allows workflows to be defined as code, they become more maintainable, versionable, testable, and collaborative
  1. Capstone Project ✔️
  • This project is the finale to Udacity's data engineering nanodegree. Udacity provides a default dataset however I chose to embark on my own project.
  • My project is on building a movies data warehouse, which can be used to build a movies recommendation system, as well as predicting box-office earnings. View the project here: Movies Data Warehouse
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].