Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+78.21%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+208.56%)
TocA Table of Contents of all Gruntwork Code
Stars: ✭ 111 (-56.81%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-76.65%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-69.26%)
Awesome DevopsA curated list of resources for Devops
Stars: ✭ 697 (+171.21%)
Docker practiceLearn and understand Docker technologies, with real DevOps practice!
Stars: ✭ 19,768 (+7591.83%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+57.98%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+60.7%)
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-56.81%)
Awesome LearningA curated list for DevOps learning resources. Join the slack channel to discuss more.
Stars: ✭ 327 (+27.24%)
ChefChef Infra, a powerful automation platform that transforms infrastructure into code automating how infrastructure is configured, deployed and managed across any environment, at any scale
Stars: ✭ 6,766 (+2532.68%)
Minicron🕰️ Monitor your cron jobs
Stars: ✭ 2,351 (+814.79%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+2995.72%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+229.57%)
TerrascanDetect compliance and security violations across Infrastructure as Code to mitigate risk before provisioning cloud native infrastructure.
Stars: ✭ 2,687 (+945.53%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-3.11%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+902.33%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-50.19%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+364.98%)
awesome-open-mlopsThe Fuzzy Labs guide to the universe of open source MLOps
Stars: ✭ 304 (+18.29%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (-57.2%)
k3aiA lightweight tool to get an AI Infrastructure Stack up in minutes not days. K3ai will take care of setup K8s for You, deploy the AI tool of your choice and even run your code on it.
Stars: ✭ 105 (-59.14%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-90.27%)
HowtheysreA curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Stars: ✭ 6,962 (+2608.95%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-79.38%)
SceptreBuild better AWS infrastructure
Stars: ✭ 1,160 (+351.36%)
Terraform MultienvA template for maintaining a multiple environments infrastructure with Terraform. This template includes a CI/CD process, that applies the infrastructure in an AWS account.
Stars: ✭ 107 (-58.37%)
funsiesfunsies is a lightweight workflow engine 🔧
Stars: ✭ 37 (-85.6%)
Opunit🕵️♂️ Sanity checking containers, vms, and servers
Stars: ✭ 176 (-31.52%)
TerrahubTerraform Automation and Orchestration Tool (Open Source)
Stars: ✭ 148 (-42.41%)
Ansible PlaybookAnsible playbook to deploy distributed technologies
Stars: ✭ 61 (-76.26%)
MitogenDistributed self-replicating programs in Python
Stars: ✭ 1,779 (+592.22%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+146.3%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+86.77%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+3577.82%)
OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-50.58%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-40.86%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-45.91%)
openverse-catalogIdentifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-89.49%)
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (-32.68%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-52.53%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-92.22%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-89.88%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-93.77%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (-74.71%)
helpdeskYet another helpdesk based on multiple providers
Stars: ✭ 14 (-94.55%)
redis-inventoryCLI tool to see redis memory usage by keys in hierarchical way. Think of disk inventory but for redis.
Stars: ✭ 163 (-36.58%)
ycsmThis is a quick script installation for resilient redirector using nginx reverse proxy and letsencrypt compatible with some popular Post-Ex Tools (Cobalt Strike, Empire, Metasploit, PoshC2).
Stars: ✭ 73 (-71.6%)
Youtube VideosDocumentation for Techno Tim YouTube Videos
Stars: ✭ 250 (-2.72%)
infra🚀 INFRA: your infrastructure as a GraphQL service
Stars: ✭ 48 (-81.32%)
blogblog entries
Stars: ✭ 39 (-84.82%)