soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+262.5%)
preprocessyPython package for Customizable Data Preprocessing Pipelines
Stars: ✭ 34 (+112.5%)
Everything-TechA collection of online resources to help you on your Tech journey.
Stars: ✭ 396 (+2375%)
pyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+5962.5%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (+25%)
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+981.25%)
datartDatart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+6412.5%)
get smartiesDummy variable generation with fit/transform capabilities
Stars: ✭ 23 (+43.75%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+3725%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+193.75%)
qsvCSVs sliced, diced & analyzed.
Stars: ✭ 438 (+2637.5%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+193.75%)
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+593.75%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (+1312.5%)
YuniqlFree and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (+875%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (+175%)
Data Engineering HowtoA list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+12750%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+381.25%)
deordie-meetupsDE or DIE meetup made by data engineers for data engineers. Currently in Russian only.
Stars: ✭ 48 (+200%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (+693.75%)
neon-workshopA Pachyderm deep learning tutorial for conference workshops
Stars: ✭ 19 (+18.75%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (+6.25%)
mpc-DL-controllerDeep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system vs classic Model Predictive Control
Stars: ✭ 37 (+131.25%)
papiloDEPRECATED: Stream data processing micro-framework
Stars: ✭ 24 (+50%)
lrmrLess-Resilient MapReduce framework for Go
Stars: ✭ 32 (+100%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+1643.75%)
dbt-sugardbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (+768.75%)
ml-in-productionThe practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Stars: ✭ 29 (+81.25%)
awesome-dbtA curated list of awesome dbt resources
Stars: ✭ 520 (+3150%)
funsiesfunsies is a lightweight workflow engine 🔧
Stars: ✭ 37 (+131.25%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+1456.25%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+306.25%)
PloomberA convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (+1281.25%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+256.25%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+56.25%)
AuptimizerAn automatic ML model optimization tool.
Stars: ✭ 166 (+937.5%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+231.25%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+850%)
yt-channels-DS-AI-ML-CSA comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Stars: ✭ 1,038 (+6387.5%)
Gcp Data Engineer ExamStudy materials for the Google Cloud Professional Data Engineering Exam
Stars: ✭ 144 (+800%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+756.25%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+800%)
prefect-saturnPython client for using Prefect Cloud with Saturn Cloud
Stars: ✭ 15 (-6.25%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+14537.5%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+587.5%)