All Projects → nareshk1290 → Udacity Data Engineering

nareshk1290 / Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Projects that are alternatives of or similar to Udacity Data Engineering

Luigi Warehouse
A luigi powered analytics / warehouse stack
Stars: ✭ 72 (-19.1%)
Mutual labels:  aws, spark, etl, postgresql, redshift
Locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-17.98%)
Mutual labels:  aws, s3, etl, redshift
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+791.01%)
Mutual labels:  s3, airflow, spark, redshift
Data Engineering Nanodegree
Projects done in the Data Engineering Nanodegree by Udacity.com
Stars: ✭ 151 (+69.66%)
Mutual labels:  aws, jupyter-notebook, cassandra
Deploy Strapi On Aws
Deploying a Strapi API on AWS (EC2 & RDS & S3)
Stars: ✭ 121 (+35.96%)
Mutual labels:  aws, s3, postgresql
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+2579.78%)
Mutual labels:  aws, etl, redshift
Firecamp
Serverless Platform for the stateful services
Stars: ✭ 194 (+117.98%)
Mutual labels:  aws, postgresql, cassandra
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-71.91%)
Mutual labels:  airflow, s3, redshift
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1242.7%)
Mutual labels:  airflow, spark, etl
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+22537.08%)
Mutual labels:  spark, postgresql, redshift
Spark Jupyter Aws
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+191.01%)
Mutual labels:  aws, jupyter-notebook, spark
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+364.04%)
Mutual labels:  airflow, jupyter-notebook, spark
Aws Ecs Airflow
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+20.22%)
Mutual labels:  aws, airflow, etl
Awesome Aws
A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.
Stars: ✭ 9,895 (+11017.98%)
Mutual labels:  aws, s3, redshift
Storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+160.67%)
Mutual labels:  s3, etl, postgresql
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-11.24%)
Mutual labels:  airflow, etl, s3
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+414.61%)
Mutual labels:  aws, airflow, cassandra
Dev Setup
macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Stars: ✭ 5,590 (+6180.9%)
Mutual labels:  aws, spark, postgresql
Terraform Aws Airflow
Terraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker with CeleryExecutor
Stars: ✭ 69 (-22.47%)
Mutual labels:  aws, airflow
Sql Runner
Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Stars: ✭ 68 (-23.6%)
Mutual labels:  postgresql, redshift

Data Engineering Nanodegree

Projects and resources developed in the DEND Nanodegree from Udacity.

Project 1: Relational Databases - Data Modeling with PostgreSQL.

Developed a relational database using PostgreSQL to model user activity data for a music streaming app. Skills include:

  • Created a relational database using PostgreSQL
  • Developed a Star Schema database using optimized definitions of Fact and Dimension tables. Normalization of tables.
  • Built out an ETL pipeline to optimize queries in order to understand what songs users listen to.

Proficiencies include: Python, PostgreSql, Star Schema, ETL pipelines, Normalization

Project 2: NoSQL Databases - Data Modeling with Apache Cassandra.

Designed a NoSQL database using Apache Cassandra based on the original schema outlined in project one. Skills include:

  • Created a nosql database using Apache Cassandra (both locally and with docker containers)
  • Developed denormalized tables optimized for a specific set queries and business needs

Proficiencies used: Python, Apache Cassandra, Denormalization

Project 3: Data Warehouse - Amazon Redshift.

Created a database warehouse utilizing Amazon Redshift. Skills include:

  • Creating a Redshift Cluster, IAM Roles, Security groups.
  • Develop an ETL Pipeline that copies data from S3 buckets into staging tables to be processed into a star schema
  • Developed a star schema with optimization to specific queries required by the data analytics team.

Proficiencies used: Python, Amazon Redshift, aws cli, Amazon SDK, SQL, PostgreSQL

Project 4: Data Lake - Spark

Scaled up the current ETL pipeline by moving the data warehouse to a data lake. Skills include:

  • Create an EMR Hadoop Cluster
  • Further develop the ETL Pipeline copying datasets from S3 buckets, data processing using Spark and writing to S3 buckets using efficient partitioning and parquet formatting.
  • Fast-tracking the data lake buildout using (serverless) AWS Lambda and cataloging tables with AWS Glue Crawler.

Technologies used: Spark, S3, EMR, Athena, Amazon Glue, Parquet.

Project 5: Data Pipelines - Airflow

Automate the ETL pipeline and creation of data warehouse using Apache Airflow. Skills include:

  • Using Airflow to automate ETL pipelines using Airflow, Python, Amazon Redshift.
  • Writing custom operators to perform tasks such as staging data, filling the data warehouse, and validation through data quality checks.
  • Transforming data from various sources into a star schema optimized for the analytics team's use cases.

Technologies used: Apache Airflow, S3, Amazon Redshift, Python.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].