SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-98.07%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-98.41%)
Covid19 DashboardA site that displays up to date COVID-19 stats, powered by fastpages.
Stars: ✭ 1,212 (-70.43%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+20%)
PlynxPLynx is a domain agnostic platform for managing reproducible experiments and data-oriented workflows.
Stars: ✭ 192 (-95.32%)
Drake ExamplesExample workflows for the drake R package
Stars: ✭ 57 (-98.61%)
SuspeitandoProjeto de análise de contratos com suspeita de superfaturamento e má qualidade na prestação de serviços.
Stars: ✭ 76 (-98.15%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (-41.82%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-96.93%)
AirflowApache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+487.97%)
thainThain is a distributed flow schedule platform.
Stars: ✭ 81 (-98.02%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-96.49%)
Model Describermodel-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-99.46%)
DatofutbolDato Fútbol repository
Stars: ✭ 23 (-99.44%)
Ds With PysimpleguiData science and Machine Learning GUI programs/ desktop apps with PySimpleGUI package
Stars: ✭ 93 (-97.73%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+940.11%)
BatchflowBatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Stars: ✭ 156 (-96.19%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-98.07%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-90.92%)
PowerjobEnterprise job scheduling middleware with distributed computing ability.
Stars: ✭ 3,231 (-21.18%)
WexflowAn easy and fast way to build automation and workflows on Windows, Linux, macOS, and the cloud.
Stars: ✭ 2,435 (-40.6%)
ibisIBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Stars: ✭ 48 (-98.83%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+11.76%)
Threatpursuit VmThreat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.
Stars: ✭ 814 (-80.14%)
Awesome StreamlitThe purpose of this project is to share knowledge on how awesome Streamlit is and can be
Stars: ✭ 769 (-81.24%)
VdsVerteego Data Suite
Stars: ✭ 9 (-99.78%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+94.1%)
Etl with pythonETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-98.34%)
DrakeAn R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (-68.26%)
MlA high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (-69.02%)
Auto ml[UNMAINTAINED] Automated machine learning for analytics & production
Stars: ✭ 1,559 (-61.97%)
FlyteAccelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (-69.7%)
Qlik Py ToolsData Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
Stars: ✭ 135 (-96.71%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-84.56%)
Awesome Datascience📝 An awesome Data Science repository to learn and apply for real world problems.
Stars: ✭ 17,520 (+327.42%)
PloomberA convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (-94.61%)
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (-89.92%)
ElasticR client for the Elasticsearch HTTP API
Stars: ✭ 227 (-94.46%)
Aiida CoreThe official repository for the AiiDA code
Stars: ✭ 238 (-94.19%)
SchedulisSchedulis is a high performance workflow task scheduling system that supports high availability and multi-tenant financial level features, Linkis computing middleware, and has been integrated into data application development portal DataSphere Studio
Stars: ✭ 222 (-94.58%)
zdh web大数据采集,抽取平台
Stars: ✭ 292 (-92.88%)
CqlCategorical Query Language IDE
Stars: ✭ 196 (-95.22%)
monopackerA tool for managing builds of monorepo frontend projects with eg. npm- or yarn workspaces, lerna or similar tools into a standalone application - no other tools needed.
Stars: ✭ 17 (-99.59%)
PolyaxonMachine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (-27.64%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+29.42%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-84.63%)
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (-95.29%)
zenaton-node⚡ Node.js library to run and orchestrate background jobs with Zenaton Workflow Engine
Stars: ✭ 50 (-98.78%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (-2.34%)