All Projects → Udacity Data Engineering → Similar Projects or Alternatives

8704 Open source projects that are alternatives of or similar to Udacity Data Engineering

Fluent Plugin S3
Amazon S3 input and output plugin for Fluentd
Stars: ✭ 276 (+210.11%)
Mutual labels:  aws, s3
Nodb
NoDB isn't a database.. but it sort of looks like one.
Stars: ✭ 353 (+296.63%)
Mutual labels:  aws, s3
Aws Airflow Stack
Turbine: the bare metals that gets you Airflow
Stars: ✭ 352 (+295.51%)
Mutual labels:  aws, airflow
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+205.62%)
Mutual labels:  spark, etl
Kiba Plus
Kiba enhancement for Ruby ETL.
Stars: ✭ 47 (-47.19%)
Mutual labels:  etl, postgresql
Ddlparse
DDL parase and Convert to BigQuery JSON schema and DDL statements
Stars: ✭ 52 (-41.57%)
Mutual labels:  postgresql, redshift
Enterprise gateway
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Stars: ✭ 412 (+362.92%)
Mutual labels:  jupyter-notebook, spark
Pglogical
Logical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Stars: ✭ 455 (+411.24%)
Mutual labels:  etl, postgresql
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+356.18%)
Mutual labels:  aws, spark
Pointblank
Data validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+439.33%)
Mutual labels:  spark, postgresql
Evolve
Database migration tool for .NET and .NET Core projects. Inspired by Flyway.
Stars: ✭ 477 (+435.96%)
Mutual labels:  postgresql, cassandra
Helk
The Hunting ELK
Stars: ✭ 3,097 (+3379.78%)
Mutual labels:  jupyter-notebook, spark
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-34.83%)
Mutual labels:  s3, spark
Justenoughscalaforspark
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Stars: ✭ 538 (+504.49%)
Mutual labels:  jupyter-notebook, spark
Labnotebook
LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.
Stars: ✭ 526 (+491.01%)
Mutual labels:  jupyter-notebook, postgresql
Django S3direct
Directly upload files to S3 compatible services with Django.
Stars: ✭ 570 (+540.45%)
Mutual labels:  aws, s3
S3 Benchmark
Measure Amazon S3's performance from any location.
Stars: ✭ 525 (+489.89%)
Mutual labels:  aws, s3
Pixiedust
Python Helper library for Jupyter Notebooks
Stars: ✭ 998 (+1021.35%)
Mutual labels:  jupyter-notebook, spark
Aws Utilities
Docker images and scripts to deploy to AWS
Stars: ✭ 52 (-41.57%)
Mutual labels:  aws, s3
Pyspark Examples
Code examples on Apache Spark using python
Stars: ✭ 58 (-34.83%)
Mutual labels:  jupyter-notebook, spark
Data Science Cookbook
🎓 Jupyter notebooks from UFC data science course
Stars: ✭ 60 (-32.58%)
Mutual labels:  jupyter-notebook, spark
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+6255.06%)
Mutual labels:  jupyter-notebook, spark
Aws Mobile React Sample
A React Starter App that displays how web developers can integrate their front end with AWS on the backend. The App interacts with AWS Cognito, API Gateway, Lambda and DynamoDB on the backend.
Stars: ✭ 650 (+630.34%)
Mutual labels:  aws, s3
Falcon
Free, open-source SQL client for Windows and Mac 🦅
Stars: ✭ 4,848 (+5347.19%)
Mutual labels:  postgresql, redshift
Pgbackrest
Reliable PostgreSQL Backup & Restore
Stars: ✭ 766 (+760.67%)
Mutual labels:  s3, postgresql
Phila Airflow
Stars: ✭ 16 (-82.02%)
Mutual labels:  airflow, etl
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+737.08%)
Mutual labels:  jupyter-notebook, spark
Automating Your Data Pipeline With Apache Airflow
Automating Your Data Pipeline with Apache Airflow
Stars: ✭ 19 (-78.65%)
Mutual labels:  airflow, jupyter-notebook
S3 Permission Checker
Check read, write permissions on S3 buckets in your account
Stars: ✭ 18 (-79.78%)
Mutual labels:  aws, s3
Github To S3 Lambda Deployer
⚓️ GitHub webhook extension for uploading static pages to AWS S3 directly after commiting to master via Lambda written in Node.js
Stars: ✭ 23 (-74.16%)
Mutual labels:  aws, s3
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (+188.76%)
Mutual labels:  airflow, spark
Aws Auto Terminate Idle Emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-76.4%)
Mutual labels:  aws, etl
Awslib scala
An idiomatic Scala wrapper around the AWS Java SDK
Stars: ✭ 20 (-77.53%)
Mutual labels:  aws, s3
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+971.91%)
Mutual labels:  jupyter-notebook, spark
Panther
Detect threats with log data and improve cloud security posture
Stars: ✭ 885 (+894.38%)
Mutual labels:  aws, etl
Aws S3 Scala
Scala client for Amazon S3
Stars: ✭ 35 (-60.67%)
Mutual labels:  aws, s3
Vagrant Projects
Vagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
Stars: ✭ 34 (-61.8%)
Mutual labels:  spark, cassandra
Airflow On Kubernetes
Bare minimal Airflow on Kubernetes (Local, EKS, AKS)
Stars: ✭ 38 (-57.3%)
Mutual labels:  aws, airflow
Tedsds
Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-84.27%)
Mutual labels:  jupyter-notebook, spark
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-53.93%)
Mutual labels:  etl, postgresql
Aws Data Replication Hub
Seamless User Interface for replicating data into AWS.
Stars: ✭ 40 (-55.06%)
Mutual labels:  aws, s3
Aws Testing Library
Chai (https://chaijs.com) and Jest (https://jestjs.io/) assertions for testing services built with aws
Stars: ✭ 52 (-41.57%)
Mutual labels:  aws, s3
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+1023.6%)
Mutual labels:  aws, cassandra
Scrapy S3pipeline
Scrapy pipeline to store chunked items into Amazon S3 or Google Cloud Storage bucket.
Stars: ✭ 57 (-35.96%)
Mutual labels:  aws, s3
Dbbench
🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts
Stars: ✭ 52 (-41.57%)
Mutual labels:  postgresql, cassandra
Discreetly
ETLy is an add-on dashboard service on top of Apache Airflow.
Stars: ✭ 60 (-32.58%)
Mutual labels:  airflow, etl
Dockerfiles
50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+851.69%)
Mutual labels:  spark, cassandra
React Deploy S3
Deploy create react app's in AWS S3
Stars: ✭ 66 (-25.84%)
Mutual labels:  aws, s3
Terraform Aws S3 Log Storage
This module creates an S3 bucket suitable for receiving logs from other AWS services such as S3, CloudFront, and CloudTrail
Stars: ✭ 65 (-26.97%)
Mutual labels:  aws, s3
S3 Blob Store
☁️ Amazon S3 blob-store
Stars: ✭ 66 (-25.84%)
Mutual labels:  aws, s3
Cloud Security Audit
A command line security audit tool for Amazon Web Services
Stars: ✭ 68 (-23.6%)
Mutual labels:  aws, s3
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-28.09%)
Mutual labels:  jupyter-notebook, spark
Aws
Swift wrapper around AWS API
Stars: ✭ 67 (-24.72%)
Mutual labels:  aws, s3
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-23.6%)
Mutual labels:  jupyter-notebook, etl
Aws Inventory
Python script for AWS resources inventory (cheaper than AWS Config)
Stars: ✭ 69 (-22.47%)
Mutual labels:  aws, s3
Terraform Aws Airflow
Terraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker with CeleryExecutor
Stars: ✭ 69 (-22.47%)
Mutual labels:  aws, airflow
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-20.22%)
Mutual labels:  jupyter-notebook, spark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-29.21%)
Mutual labels:  jupyter-notebook, spark
Sql Runner
Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Stars: ✭ 68 (-23.6%)
Mutual labels:  postgresql, redshift
Transporter
Sync data between persistence engines, like ETL only not stodgy
Stars: ✭ 1,175 (+1220.22%)
Mutual labels:  etl, postgresql
61-120 of 8704 similar projects