DagsterAn orchestration platform for the development, production, and observation of data assets.
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
arakatARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
beneathBeneath is a serverless real-time data platform ⚡️
spark-transformersSpark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
ml-in-productionThe practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
neon-workshopA Pachyderm deep learning tutorial for conference workshops
CogStack-NiFiBuilding data processing pipelines for documents processing with NLP using Apache NiFi and related services
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines