Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

flyteorg / Flyte

Licence: apache-2.0

Accelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.

Programming Languages

python

139335 projects - #7 most used programming language

golang

3204 projects

Labels

machine-learning kubernetes data-science data workflow grpc data-analysis kubernetes-operator scale production

Projects that are alternatives of or similar to Flyte

Datacomparer

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.

Stars: ✭ 58 (-95.33%)

Mutual labels: data-science, data-analysis, data

Graphia

A visualisation tool for the creation and analysis of graphs

Stars: ✭ 67 (-94.61%)

Mutual labels: data-science, data-analysis, data

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+296.05%)

Mutual labels: data-science, data-analysis, data

Pycm

Multi-class confusion matrix library in Python

Stars: ✭ 1,076 (-13.37%)

Mutual labels: data-science, data-analysis, data

Datacleaner

The premier open source Data Quality solution

Stars: ✭ 391 (-68.52%)

Mutual labels: data-science, data-analysis, data

Openrefine

OpenRefine is a free, open source power tool for working with messy data and improving it

Stars: ✭ 8,531 (+586.88%)

Mutual labels: data-science, data-analysis, data

Data Science Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

Stars: ✭ 273 (-78.02%)

Mutual labels: data-science, data-analysis, data

Data Science Resources

👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋

Stars: ✭ 171 (-86.23%)

Mutual labels: data-science, data-analysis, data

Production Data Science

Production Data Science: a workflow for collaborative data science aimed at production

Stars: ✭ 388 (-68.76%)

Mutual labels: data-science, production, workflow

Akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

Stars: ✭ 4,334 (+248.95%)

Mutual labels: data-science, data-analysis, data

Datascience course

Curso de Data Science em Português

Stars: ✭ 294 (-76.33%)

Mutual labels: data-science, data-analysis, data

Skdata

Python tools for data analysis

Stars: ✭ 16 (-98.71%)

Mutual labels: data-science, data-analysis, data

Knowledge Repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Stars: ✭ 4,956 (+299.03%)

Mutual labels: data-science, data-analysis, data

Gopup

数据接口：百度、谷歌、头条、微博指数,宏观数据，利率数据，货币汇率，千里马、独角兽公司，新闻联播文字稿，影视票房数据，高校名单，疫情数据…

Stars: ✭ 1,229 (-1.05%)

Mutual labels: data-science, data-analysis, data

Drake Examples

Example workflows for the drake R package

Stars: ✭ 57 (-95.41%)

Mutual labels: data-science, workflow

Dex

Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

Stars: ✭ 1,238 (-0.32%)

Mutual labels: data-science, data-analysis

Tiledb

The Universal Storage Engine

Stars: ✭ 1,072 (-13.69%)

Mutual labels: data-science, data-analysis

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-95.33%)

Mutual labels: data-science, scale

Data Science Lunch And Learn

Resources for weekly Data Science Lunch & Learns

Stars: ✭ 49 (-96.05%)

Mutual labels: data-science, data-analysis

Awesome Business Intelligence

Actively curated list of awesome BI tools. PRs welcome!

Stars: ✭ 1,157 (-6.84%)

Mutual labels: data-science, data-analysis

View All Similar Projects ➔

Flyte is a production-grade, container-native, type-safe workflow and pipelines platform optimized for large scale processing and machine learning written in Golang. Workflows can be written in any language, with out of the box support for Python, Java and Scala.

Introduction

Flyte is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine, it provides workflows as a core concepts, but it also provides a single unit of execution - tasks, as a top level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow. Flyte workflows are pure specification and can be created using any language. Every task can also by any language. We do provide first class support for python, making it perfect for modern Machine Learning and Data processing pipelines.

QuickStart

With docker installed, run this command:

  docker run --rm --privileged -p 30081:30081 -p 30084:30084 ghcr.io/flyteorg/flyte-sandbox

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console. Go ahead and visit http://localhost:30081/console. A quick visual tour of the console

Refer to Docs - Getting Started for complete end to end example.

Community & Resources

Resources that would help you get a better understanding of Flyte.

Communication channels

Biweekly Community Sync

📣 Flyte OSS Community Sync Every alternate Tuesday, 9am-10am PDT (Checkout the events calendar & subscribe
You can join the zoom link.
Meeting notes and backlog of topics are captured in Doc
Video Recordings

Conference Talks

Kubecon 2019 - Flyte: Cloud Native Machine Learning and Data Processing Platform video | deck
Kubecon 2019 - Running LargeScale Stateful workloads on Kubernetes at Lyft video
re:invent 2019 - Implementing ML workflows with Kubernetes and Amazon Sagemaker video
Cloud-native machine learning at Lyft with AWS Batch and Amazon EKS video
OSS + ELC NA 2020 splash
Datacouncil splash
FB [email protected] Making MLOps & DataOps a reality
GAIC 2020

Blog Posts

Podcasts

TWIML&AI - Scalable and Maintainable ML Workflows at Lyft - Flyte
Software Engineering Daily - Flyte: Lyft Data Processing Platform
MLOps Coffee session - Flyte: an open-source tool for scalable, extensible , and portable workflows

Features

Used at Scale in production by 500+ users at Lyft with more than 900k workflow executed a month and more than 30+ million container executions per month
Fast registration - from local to remote in one second.
Centralized Inventory of Tasks, Workflows and Executions
Single Task Execution support - Start executing a task and then convert it to a workflow
gRPC / REST interface to define and executes tasks and workflows
Type safe construction of pipelines, each task has an interface which is characterized by its input and outputs. Thus illegal construction of pipelines fails during declaration rather than at runtime
Types that help in creating machine learning and data processing pipelines like - Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps etc
Memoization and Lineage tracking
Workflows features

Multiple Schedules for every workflow
Parallel step execution
Extensible Backend to add customized plugin experiences (with simplified User experiences)
Arbitrary container execution
Branching
Inline Subworkflows (a workflow can be embeded within one node of the top level workflow)
Distributed Remote Child workflows (a remote workflow can be triggered and statically verified at compile time)
Array Tasks (map some function over a large dataset, controlled execution of 1000's of containers)
Dynamic Workflow creation and execution - with runtime type safety
Container side plugins with first class support in python
PreAlpha: Arbitrary flytekit less containers supported (RawContainer)

Maintain an inventory of tasks and workflows
Record history of all executions and executions (as long as they follow convention) are completely repeatable
Multi Cloud support (AWS, GCP and others)
Extensible core
Modularized
Automated notifications to Slack, Email, Pagerduty
Deep observability
Multi K8s cluster support
Comes with many system supported out of the box on K8s like Spark etc.
Snappy Console
Python CLI
Written in Golang and optimized for performance of large running jobs
Golang CLI - flytectl

Inprogress

Grafana templates (user/system observability)
helm chart for Flyte
Performance optimization
Flink-K8s

Available Plugins

Containers
K8s Pods
AWS Batch Arrays
K8s Pod arrays
K8s Spark (native pyspark and java/scala)
AWS Athena
Qubole Hive
Presto Queries
Distributed Pytorch (K8s Native) - Pytorch Operator
Sagemaker (builtin algorithms & custom models)
Distributed Tensorflow (K8s Native) - TFOperator
Papermill Notebook execution (python and spark)
Type safe and data checking for Pandas dataframe using Pandera

Coming Soon

Reactive pipelines
More integrations

Current Usage

Component Repos

Repo	Language	Purpose	Status
flyte	Kustomize,RST	deployment, documentation, issues	Production-grade
flyteidl	Protobuf	interface definitions	Production-grade
flytepropeller	Go	execution engine	Production-grade
flyteadmin	Go	control plane	Production-grade
flytekit	Python	python SDK and tools	Production-grade
flyteconsole	Typescript	admin console	Production-grade
datacatalog	Go	manage input & output artifacts	Production-grade
flyteplugins	Go	flyte plugins	Production-grade
flytestdlib	Go	standard library	Production-grade
flytesnacks	Python	examples, tips, and tricks	Incubating
flytekit-java	Java/Scala	Java & scala SDK for authoring Flyte workflows	Incubating
flytectl	Go	A standalone Flyte CLI	Incomplete

Production K8s Operators

Repo	Language	Purpose
Spark	Go	Apache Spark batch
Flink	Go	Apache Flink streaming

Top Contributors

Thank you to the community for making Flyte possible.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,242

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (221) 🔗