SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+315.79%)
Ansible PlaybookAnsible playbook to deploy distributed technologies
Stars: ✭ 61 (+221.05%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (+215.79%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+5200%)
Dbt Sqlserverdbt adapter for SQL Server and Azure SQL
Stars: ✭ 41 (+115.79%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+4447.37%)
LakefsGit-like capabilities for your object storage
Stars: ✭ 847 (+4357.89%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+4073.68%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+41773.68%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+3305.26%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+3231.58%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+2426.32%)
Data Engineering BookAccumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (+2378.95%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+2310.53%)
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (+2073.68%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+1805.26%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+1700%)
EgeriaOpen Metadata and Governance
Stars: ✭ 328 (+1626.32%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+19400%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+25789.47%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+13457.89%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+51631.58%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-26.32%)
ClassifyBotAutomate building ML classification pipelines in .NET
Stars: ✭ 16 (-15.79%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+12226.32%)
yt-channels-DS-AI-ML-CSA comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Stars: ✭ 1,038 (+5363.16%)
mpc-DL-controllerDeep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system vs classic Model Predictive Control
Stars: ✭ 37 (+94.74%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+147.37%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+31.58%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (+131.58%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+478.95%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+21473.68%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+20968.42%)
arakatARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Stars: ✭ 23 (+21.05%)
spark-transformersSpark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (+105.26%)