Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Flor91 → Data Engineering Nanodegree

Flor91 / Data Engineering Nanodegree

Licence: mit

Projects done in the Data Engineering Nanodegree by Udacity.com

Labels

jupyter-notebook aws postgres cassandra data-engineering

Projects that are alternatives of or similar to Data Engineering Nanodegree

Udacity Data Engineering Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Stars: ✭ 458 (+203.31%)

Mutual labels: aws, data-engineering, postgres, cassandra

Data-Engineering-Projects

Personal Data Engineering Projects

Stars: ✭ 167 (+10.6%)

Mutual labels: postgres, cassandra, data-engineering

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (-41.06%)

Mutual labels: aws, jupyter-notebook, cassandra

Amazon Sagemaker Examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Stars: ✭ 6,346 (+4102.65%)

Mutual labels: aws, jupyter-notebook

Sqlpad

Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC

Stars: ✭ 4,113 (+2623.84%)

Mutual labels: postgres, cassandra

Labnotebook

LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.

Stars: ✭ 526 (+248.34%)

Mutual labels: jupyter-notebook, postgres

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+562.25%)

Mutual labels: aws, cassandra

Migrate

Database migrations. CLI and Golang library.

Stars: ✭ 7,712 (+5007.28%)

Mutual labels: postgres, cassandra

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+566.89%)

Mutual labels: jupyter-notebook, data-engineering

Pragmaticai

[Book-2019] Pragmatic AI: An Introduction to Cloud-based Machine Learning

Stars: ✭ 79 (-47.68%)

Mutual labels: aws, jupyter-notebook

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+1479.47%)

Mutual labels: aws, data-engineering

Learn Something Every Day

📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

Stars: ✭ 362 (+139.74%)

Mutual labels: aws, data-engineering

Aws Security Workshops

A collection of the latest AWS Security workshops

Stars: ✭ 332 (+119.87%)

Mutual labels: aws, jupyter-notebook

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+71.52%)

Mutual labels: aws, jupyter-notebook

Serverless Pg

A package for managing PostgreSQL connections at SERVERLESS scale

Stars: ✭ 142 (-5.96%)

Mutual labels: aws, postgres

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+472.19%)

Mutual labels: jupyter-notebook, data-engineering

Firecamp

Serverless Platform for the stateful services

Stars: ✭ 194 (+28.48%)

Mutual labels: aws, cassandra

Retail Demo Store

AWS Retail Demo Store is a sample retail web application and workshop platform demonstrating how AWS infrastructure and services can be used to build compelling customer experiences for eCommerce, retail, and digital marketing use-cases

Stars: ✭ 238 (+57.62%)

Mutual labels: aws, jupyter-notebook

Ansible Playbook

Ansible playbook to deploy distributed technologies

Stars: ✭ 61 (-59.6%)

Mutual labels: aws, data-engineering

Beyond Jupyter

🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)

Stars: ✭ 135 (-10.6%)

Mutual labels: jupyter-notebook, postgres

View All Similar Projects ➔

Data-engineering-nanodegree

Projects done in the Data Engineering Nanodegree by Udacity.com

Course 1: Data Modeling

Introduction to Data Modeling

➔ Understand the purpose of data modeling

➔ Identify the strengths and weaknesses of different types of databases and data storage techniques

➔ Create a table in Postgres and Apache Cassandra

Relational Data Models

➔ Understand when to use a relational database

➔ Understand the difference between OLAP and OLTP databases

➔ Create normalized data tables

➔ Implement denormalized schemas (e.g. STAR, Snowflake)

NoSQL Data Models

➔ Understand when to use NoSQL databases and how they differ from relational databases

➔ Select the appropriate primary key and clustering columns for a given use case

➔ Create a NoSQL database in Apache Cassandra

Project: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the Data Warehouses

➔ Understand Data Warehousing architecture

➔ Run an ETL process to denormalize a database (3NF to Star)

➔ Create an OLAP cube from facts and dimensions

➔ Compare columnar vs. row oriented approaches

Introduction to the Cloud with AWS

➔ Understand cloud computing

➔ Create an AWS account and understand their services

➔ Set up Amazon S3, IAM, VPC, EC2, RDS PostgreSQL

Implementing Data Warehouses on AWS

➔ Identify components of the Redshift architecture

➔ Run ETL process to extract data from S3 into Redshift

➔ Set up AWS infrastructure using Infrastructure as Code (IaC)

➔ Design an optimized table by selecting the appropriate distribution style and sorting key

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark

➔ Understand the big data ecosystem

➔ Understand when to use Spark and when not to use it

Data Wrangling with Spark

➔ Manipulate data with SparkSQL and Spark Dataframes

➔ Use Spark for ETL purposes

Debugging and Optimization

➔ Troubleshoot common errors and optimize their code using the Spark WebUI

Introduction to Data Lakes

➔ Understand the purpose and evolution of data lakes

➔ Implement data lakes on Amazon S3, EMR, Athena, and Amazon Glue

➔ Use Spark to run ELT processes and analytics on data of diverse sources, structures, and vintages

➔ Understand the components and issues of data lakes

Project 3: Big Data with Spark

Course 4: Automate Data Pipelines

Data Pipelines

➔ Create data pipelines with Apache Airflow

➔ Set up task dependencies

➔ Create data connections using hooks

Data Quality

➔ Track data lineage

➔ Set up data pipeline schedules

➔ Partition data to optimize pipelines

➔ Write tests to ensure data quality

➔ Backfill data

Production Data Pipelines

➔ Build reusable and maintainable pipelines

➔ Build your own Apache Airflow plugins

➔ Implement subDAGs

➔ Set up task boundaries

➔ Monitor data pipelines

Project: Data Pipelines with Airflow

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 151

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗