A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.

Stars: ✭ 9,895 (+11017.98%)

Mutual labels: aws, s3, redshift

Redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Stars: ✭ 20,147 (+22537.08%)

Mutual labels: spark, postgresql, redshift

Storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Stars: ✭ 232 (+160.67%)

Mutual labels: s3, etl, postgresql

Data Engineering Nanodegree

Projects done in the Data Engineering Nanodegree by Udacity.com

Stars: ✭ 151 (+69.66%)

Mutual labels: aws, jupyter-notebook, cassandra

astro

Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Stars: ✭ 79 (-11.24%)

Mutual labels: airflow, etl, s3

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+2579.78%)

Mutual labels: aws, etl, redshift

Firecamp

Serverless Platform for the stateful services

Stars: ✭ 194 (+117.98%)

Mutual labels: aws, postgresql, cassandra

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+1242.7%)

Mutual labels: airflow, spark, etl

Deploy Strapi On Aws

Deploying a Strapi API on AWS (EC2 & RDS & S3)

Stars: ✭ 121 (+35.96%)

Mutual labels: aws, s3, postgresql

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+364.04%)

Mutual labels: airflow, jupyter-notebook, spark

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (-71.91%)

Mutual labels: airflow, s3, redshift

Udacity Data Engineering Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Stars: ✭ 458 (+414.61%)

Mutual labels: aws, airflow, cassandra

Dev Setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+6180.9%)

Mutual labels: aws, spark, postgresql

Awslib scala

An idiomatic Scala wrapper around the AWS Java SDK

Stars: ✭ 20 (-77.53%)

Mutual labels: aws, s3

Aws Auto Terminate Idle Emr

AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.

Stars: ✭ 21 (-76.4%)

Mutual labels: aws, etl

Tbls

tbls is a CI-Friendly tool for document a database, written in Go.

Stars: ✭ 940 (+956.18%)

Mutual labels: postgresql, redshift

Dropdot

☁️ Direct Upload to Amazon S3 With CORS demo. Built with Node/Express

Stars: ✭ 87 (-2.25%)

Mutual labels: aws, s3

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+971.91%)

Mutual labels: jupyter-notebook, spark

Vagrant Projects

Vagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR

Stars: ✭ 34 (-61.8%)

Mutual labels: spark, cassandra

Terraform Aws Redshift

Terraform module which creates Redshift resources on AWS

Stars: ✭ 36 (-59.55%)

Mutual labels: aws, redshift

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+1023.6%)

Mutual labels: aws, cassandra

Panther

Detect threats with log data and improve cloud security posture

Stars: ✭ 885 (+894.38%)

Mutual labels: aws, etl

Tedsds

Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark

Stars: ✭ 14 (-84.27%)

Mutual labels: jupyter-notebook, spark

Workshop Donkeytracker

Workshop to build a serverless tracking application for your mobile device with an AWS backend

Stars: ✭ 27 (-69.66%)

Mutual labels: aws, s3

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+851.69%)

Mutual labels: spark, cassandra

Objinsync

Continuously synchronize directories from remote object store to local filesystem

Stars: ✭ 29 (-67.42%)

Mutual labels: s3, airflow

Ethereum Etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+974.16%)

Mutual labels: aws, etl

Aws S3 Scala

Scala client for Amazon S3

Stars: ✭ 35 (-60.67%)

Mutual labels: aws, s3

S3 Deploy Website

Deploy website to S3/CloudFront from Python

Stars: ✭ 26 (-70.79%)

Mutual labels: aws, s3

Spark python ml examples

Spark 2.0 Python Machine Learning examples

Stars: ✭ 87 (-2.25%)

Mutual labels: aws, spark

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+1021.35%)

Mutual labels: jupyter-notebook, spark

Airflow On Kubernetes

Bare minimal Airflow on Kubernetes (Local, EKS, AKS)

Stars: ✭ 38 (-57.3%)

Mutual labels: aws, airflow

Ether sql

A python library to push ethereum blockchain data into an sql database.

Stars: ✭ 41 (-53.93%)

Mutual labels: etl, postgresql

Aws Data Replication Hub

Seamless User Interface for replicating data into AWS.

Stars: ✭ 40 (-55.06%)

Mutual labels: aws, s3

Simple S3 Setup

Code examples used in the post "How to Setup Amazon S3 in a Django Project"

Stars: ✭ 46 (-48.31%)

Mutual labels: aws, s3

Ddlparse

DDL parase and Convert to BigQuery JSON schema and DDL statements

Stars: ✭ 52 (-41.57%)

Mutual labels: postgresql, redshift

Aws Testing Library

Chai (https://chaijs.com) and Jest (https://jestjs.io/) assertions for testing services built with aws

Stars: ✭ 52 (-41.57%)

Mutual labels: aws, s3

Aws Utilities

Docker images and scripts to deploy to AWS

Stars: ✭ 52 (-41.57%)

Mutual labels: aws, s3

Scrapy S3pipeline

Scrapy pipeline to store chunked items into Amazon S3 or Google Cloud Storage bucket.

Stars: ✭ 57 (-35.96%)

Mutual labels: aws, s3

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-74.16%)

Mutual labels: jupyter-notebook, spark

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+1007.87%)

Mutual labels: jupyter-notebook, spark

Kiba Plus

Kiba enhancement for Ruby ETL.

Stars: ✭ 47 (-47.19%)

Mutual labels: etl, postgresql

Dbbench

🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts

Stars: ✭ 52 (-41.57%)

Mutual labels: postgresql, cassandra

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-34.83%)

Mutual labels: s3, spark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-29.21%)

Mutual labels: jupyter-notebook, spark

S3reverse

The format of various s3 buckets is convert in one format. for bugbounty and security testing.

Stars: ✭ 61 (-31.46%)

Mutual labels: aws, s3

W2v

Word2Vec models with Twitter data using Spark. Blog: