All Projects → Flor91 → Data Engineering Nanodegree

Flor91 / Data Engineering Nanodegree

Licence: mit
Projects done in the Data Engineering Nanodegree by Udacity.com

Projects that are alternatives of or similar to Data Engineering Nanodegree

Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+203.31%)
Mutual labels:  aws, data-engineering, postgres, cassandra
Data-Engineering-Projects
Personal Data Engineering Projects
Stars: ✭ 167 (+10.6%)
Mutual labels:  postgres, cassandra, data-engineering
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-41.06%)
Mutual labels:  aws, jupyter-notebook, cassandra
Amazon Sagemaker Examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Stars: ✭ 6,346 (+4102.65%)
Mutual labels:  aws, jupyter-notebook
Sqlpad
Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC
Stars: ✭ 4,113 (+2623.84%)
Mutual labels:  postgres, cassandra
Labnotebook
LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.
Stars: ✭ 526 (+248.34%)
Mutual labels:  jupyter-notebook, postgres
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+562.25%)
Mutual labels:  aws, cassandra
Migrate
Database migrations. CLI and Golang library.
Stars: ✭ 7,712 (+5007.28%)
Mutual labels:  postgres, cassandra
Quilt
Quilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+566.89%)
Mutual labels:  jupyter-notebook, data-engineering
Pragmaticai
[Book-2019] Pragmatic AI: An Introduction to Cloud-based Machine Learning
Stars: ✭ 79 (-47.68%)
Mutual labels:  aws, jupyter-notebook
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1479.47%)
Mutual labels:  aws, data-engineering
Learn Something Every Day
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+139.74%)
Mutual labels:  aws, data-engineering
Aws Security Workshops
A collection of the latest AWS Security workshops
Stars: ✭ 332 (+119.87%)
Mutual labels:  aws, jupyter-notebook
Spark Jupyter Aws
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+71.52%)
Mutual labels:  aws, jupyter-notebook
Serverless Pg
A package for managing PostgreSQL connections at SERVERLESS scale
Stars: ✭ 142 (-5.96%)
Mutual labels:  aws, postgres
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+472.19%)
Mutual labels:  jupyter-notebook, data-engineering
Firecamp
Serverless Platform for the stateful services
Stars: ✭ 194 (+28.48%)
Mutual labels:  aws, cassandra
Retail Demo Store
AWS Retail Demo Store is a sample retail web application and workshop platform demonstrating how AWS infrastructure and services can be used to build compelling customer experiences for eCommerce, retail, and digital marketing use-cases
Stars: ✭ 238 (+57.62%)
Mutual labels:  aws, jupyter-notebook
Ansible Playbook
Ansible playbook to deploy distributed technologies
Stars: ✭ 61 (-59.6%)
Mutual labels:  aws, data-engineering
Beyond Jupyter
🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Stars: ✭ 135 (-10.6%)
Mutual labels:  jupyter-notebook, postgres

Data-engineering-nanodegree

Projects done in the Data Engineering Nanodegree by Udacity.com

Icon

Course 1: Data Modeling

Introduction to Data Modeling

➔ Understand the purpose of data modeling

➔ Identify the strengths and weaknesses of different types of databases and data storage techniques

➔ Create a table in Postgres and Apache Cassandra

Relational Data Models

➔ Understand when to use a relational database

➔ Understand the difference between OLAP and OLTP databases

➔ Create normalized data tables

➔ Implement denormalized schemas (e.g. STAR, Snowflake)

NoSQL Data Models

➔ Understand when to use NoSQL databases and how they differ from relational databases

➔ Select the appropriate primary key and clustering columns for a given use case

➔ Create a NoSQL database in Apache Cassandra

Project: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the Data Warehouses

➔ Understand Data Warehousing architecture

➔ Run an ETL process to denormalize a database (3NF to Star)

➔ Create an OLAP cube from facts and dimensions

➔ Compare columnar vs. row oriented approaches

Introduction to the Cloud with AWS

➔ Understand cloud computing

➔ Create an AWS account and understand their services

➔ Set up Amazon S3, IAM, VPC, EC2, RDS PostgreSQL

Implementing Data Warehouses on AWS

➔ Identify components of the Redshift architecture

➔ Run ETL process to extract data from S3 into Redshift

➔ Set up AWS infrastructure using Infrastructure as Code (IaC)

➔ Design an optimized table by selecting the appropriate distribution style and sorting key

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark

➔ Understand the big data ecosystem

➔ Understand when to use Spark and when not to use it

Data Wrangling with Spark

➔ Manipulate data with SparkSQL and Spark Dataframes

➔ Use Spark for ETL purposes

Debugging and Optimization

➔ Troubleshoot common errors and optimize their code using the Spark WebUI

Introduction to Data Lakes

➔ Understand the purpose and evolution of data lakes

➔ Implement data lakes on Amazon S3, EMR, Athena, and Amazon Glue

➔ Use Spark to run ELT processes and analytics on data of diverse sources, structures, and vintages

➔ Understand the components and issues of data lakes

Project 3: Big Data with Spark

Course 4: Automate Data Pipelines

Data Pipelines

➔ Create data pipelines with Apache Airflow

➔ Set up task dependencies

➔ Create data connections using hooks

Data Quality

➔ Track data lineage

➔ Set up data pipeline schedules

➔ Partition data to optimize pipelines

➔ Write tests to ensure data quality

➔ Backfill data

Production Data Pipelines

➔ Build reusable and maintainable pipelines

➔ Build your own Apache Airflow plugins

➔ Implement subDAGs

➔ Set up task boundaries

➔ Monitor data pipelines

Project: Data Pipelines with Airflow

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].