Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-19.1%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+791.01%)
Locopylocopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-17.98%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+20.22%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+191.01%)
Awesome AwsA curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.
Stars: ✭ 9,895 (+11017.98%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+22537.08%)
StoragetapperStorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+160.67%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-11.24%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+2579.78%)
FirecampServerless Platform for the stateful services
Stars: ✭ 194 (+117.98%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1242.7%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+364.04%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-71.91%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+414.61%)
Dev SetupmacOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Stars: ✭ 5,590 (+6180.9%)
Awslib scalaAn idiomatic Scala wrapper around the AWS Java SDK
Stars: ✭ 20 (-77.53%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-76.4%)
Tblstbls is a CI-Friendly tool for document a database, written in Go.
Stars: ✭ 940 (+956.18%)
Dropdot☁️ Direct Upload to Amazon S3 With CORS demo. Built with Node/Express
Stars: ✭ 87 (-2.25%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+971.91%)
Vagrant ProjectsVagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
Stars: ✭ 34 (-61.8%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+1023.6%)
PantherDetect threats with log data and improve cloud security posture
Stars: ✭ 885 (+894.38%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-84.27%)
Workshop DonkeytrackerWorkshop to build a serverless tracking application for your mobile device with an AWS backend
Stars: ✭ 27 (-69.66%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+851.69%)
ObjinsyncContinuously synchronize directories from remote object store to local filesystem
Stars: ✭ 29 (-67.42%)
Ethereum EtlPython scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+974.16%)
Aws S3 ScalaScala client for Amazon S3
Stars: ✭ 35 (-60.67%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+1021.35%)
Ether sqlA python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-53.93%)
Simple S3 SetupCode examples used in the post "How to Setup Amazon S3 in a Django Project"
Stars: ✭ 46 (-48.31%)
DdlparseDDL parase and Convert to BigQuery JSON schema and DDL statements
Stars: ✭ 52 (-41.57%)
Aws Testing LibraryChai (https://chaijs.com) and Jest (https://jestjs.io/) assertions for testing services built with aws
Stars: ✭ 52 (-41.57%)
Aws UtilitiesDocker images and scripts to deploy to AWS
Stars: ✭ 52 (-41.57%)
Scrapy S3pipelineScrapy pipeline to store chunked items into Amazon S3 or Google Cloud Storage bucket.
Stars: ✭ 57 (-35.96%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1007.87%)
Kiba PlusKiba enhancement for Ruby ETL.
Stars: ✭ 47 (-47.19%)
Dbbench🏋️ dbbench is a simple database benchmarking tool which supports several databases and own scripts
Stars: ✭ 52 (-41.57%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-34.83%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-29.21%)
S3reverseThe format of various s3 buckets is convert in one format. for bugbounty and security testing.
Stars: ✭ 61 (-31.46%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-28.09%)
Terraform Aws S3 Log StorageThis module creates an S3 bucket suitable for receiving logs from other AWS services such as S3, CloudFront, and CloudTrail
Stars: ✭ 65 (-26.97%)
Etl with pythonETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-23.6%)
Sql RunnerRun templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Stars: ✭ 68 (-23.6%)
Aws InventoryPython script for AWS resources inventory (cheaper than AWS Config)
Stars: ✭ 69 (-22.47%)