All Projects → san089 → Udacity Data Engineering Projects

san089 / Udacity Data Engineering Projects

Licence: other
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Udacity Data Engineering Projects

Data Engineering Nanodegree
Projects done in the Data Engineering Nanodegree by Udacity.com
Stars: ✭ 151 (-67.03%)
Mutual labels:  aws, data-engineering, postgres, cassandra
Data-Engineering-Projects
Personal Data Engineering Projects
Stars: ✭ 167 (-63.54%)
Mutual labels:  postgres, airflow, cassandra, data-engineering
Aws Labs
step by step guide for aws mini labs. Currently maintained on : https://github.com/Cloud-Yeti/aws-labs Youtube playlist for labs:
Stars: ✭ 153 (-66.59%)
Mutual labels:  aws, aws-s3, cloudformation, aws-ec2
Smart Security Camera
A Pi Zero and Motion based webcamera that forwards images to Amazon Web Services for Image Processing
Stars: ✭ 103 (-77.51%)
Mutual labels:  aws, aws-s3, aws-sdk
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-80.57%)
Mutual labels:  aws, airflow, cassandra
Aws Deployment Guide
☁️ Deploy to Amazon aws on a virtual private cloud with elastic beanstalk
Stars: ✭ 89 (-80.57%)
Mutual labels:  aws, aws-s3, aws-ec2
Migrate
Database migrations. CLI and Golang library.
Stars: ✭ 2,315 (+405.46%)
Mutual labels:  aws-s3, postgres, cassandra
Aws Sdk Perl
A community AWS SDK for Perl Programmers
Stars: ✭ 153 (-66.59%)
Mutual labels:  aws, cloudformation, aws-sdk
Autospotting
Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.
Stars: ✭ 2,014 (+339.74%)
Mutual labels:  aws, cloudformation, infrastructure
Cluster Lifecycle Manager
Cluster Lifecycle Manager (CLM) to provision and update multiple Kubernetes clusters
Stars: ✭ 200 (-56.33%)
Mutual labels:  aws, cloudformation, cluster
T-Watch
Real Time Twitter Sentiment Analysis Product
Stars: ✭ 20 (-95.63%)
Mutual labels:  airflow, aws-s3, aws-ec2
Sceptre
Build better AWS infrastructure
Stars: ✭ 1,160 (+153.28%)
Mutual labels:  aws, cloudformation, infrastructure
Awsconsolerecorder
Records actions made in the AWS Management Console and outputs the equivalent CLI/SDK commands and CloudFormation/Terraform templates.
Stars: ✭ 1,152 (+151.53%)
Mutual labels:  aws, cloudformation, aws-sdk
Awesome Aws
A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.
Stars: ✭ 9,895 (+2060.48%)
Mutual labels:  aws, cloudformation, aws-sdk
Curso Aws Com Terraform
🎦 🇧🇷 Arquivos do curso "DevOps: AWS com Terraform Automatizando sua infraestrutura" publicado na Udemy. Você pode me ajudar comprando o curso utilizando o link abaixo.
Stars: ✭ 62 (-86.46%)
Mutual labels:  aws, aws-s3, aws-ec2
Security monkey
Security Monkey monitors AWS, GCP, OpenStack, and GitHub orgs for assets and their changes over time.
Stars: ✭ 4,244 (+826.64%)
Mutual labels:  aws, aws-s3, aws-ec2
Migrate
Database migrations. CLI and Golang library.
Stars: ✭ 7,712 (+1583.84%)
Mutual labels:  aws-s3, postgres, cassandra
Aws Cf Templates
A cloudonaut.io project. Engineered by widdix.
Stars: ✭ 2,399 (+423.8%)
Mutual labels:  aws, cloudformation, infrastructure
Spark Jupyter Aws
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-43.45%)
Mutual labels:  aws, aws-s3, aws-ec2
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (-43.89%)
Mutual labels:  airflow, data-engineering, infrastructure

Data Engineering Projects

Project 1: Data Modeling with Postgres

In this project, we apply Data Modeling with Postgres and build an ETL pipeline using Python. A startup wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. Currently, they are collecting data in json format and the analytics team is particularly interested in understanding what songs users are listening to.

Link: Data_Modeling_with_Postgres

Project 2: Data Modeling with Cassandra

In this project, we apply Data Modeling with Cassandra and build an ETL pipeline using Python. We will build a Data Model around our queries that we want to get answers for. For our use case we want below answers:

  • Get details of a song that was herad on the music app history during a particular session.
  • Get songs played by a user during particular session on music app.
  • Get all users from the music app history who listened to a particular song.

Link : Data_Modeling_with_Apache_Cassandra

Project 3: Data Warehouse

In this project, we apply the Data Warehouse architectures we learnt and build a Data Warehouse on AWS cloud. We build an ETL pipeline to extract and transform data stored in json format in s3 buckets and move the data to Warehouse hosted on Amazon Redshift.

Use Redshift IaC script - Redshift_IaC_README

Link - Data_Warehouse

Project 4: Data Lake

In this project, we will build a Data Lake on AWS cloud using Spark and AWS EMR cluster. The data lake will serve as a Single Source of Truth for the Analytics Platform. We will write spark jobs to perform ELT operations that picks data from landing zone on S3 and transform and stores data on the S3 processed zone.

Link: Data_Lake

Project 5: Data Pipelines with Airflow

In this project, we will orchestrate our Data Pipeline workflow using an open-source Apache project called Apache Airflow. We will schedule our ETL jobs in Airflow, create project related custom plugins and operators and automate the pipeline execution.

Link: Airflow_Data_Pipelines

Project 6: Api Data to Postgres

In this project, we build an etl pipeline to fetch data from yelp API and insert it into the Postgres Database. This project is a very basic example of fetching real time data from an open source API.

Link: API to Postgres

CAPSTONE PROJECT

Udacity provides their own crafted Capstone project with dataset that include data on immigration to the United States, and supplementary datasets that include data on airport codes, U.S. city demographics, and temperature data.

I worked on my own open-ended project.
Here is the link - goodreads_etl_pipeline

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].