All Projects → flyteorg → Flyte

flyteorg / Flyte

Licence: apache-2.0
Accelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.

Programming Languages

python
139335 projects - #7 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Flyte

Datacomparer
dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-95.33%)
Mutual labels:  data-science, data-analysis, data
Graphia
A visualisation tool for the creation and analysis of graphs
Stars: ✭ 67 (-94.61%)
Mutual labels:  data-science, data-analysis, data
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+296.05%)
Mutual labels:  data-science, data-analysis, data
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (-13.37%)
Mutual labels:  data-science, data-analysis, data
Datacleaner
The premier open source Data Quality solution
Stars: ✭ 391 (-68.52%)
Mutual labels:  data-science, data-analysis, data
Openrefine
OpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+586.88%)
Mutual labels:  data-science, data-analysis, data
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-78.02%)
Mutual labels:  data-science, data-analysis, data
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-86.23%)
Mutual labels:  data-science, data-analysis, data
Production Data Science
Production Data Science: a workflow for collaborative data science aimed at production
Stars: ✭ 388 (-68.76%)
Mutual labels:  data-science, production, workflow
Akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 4,334 (+248.95%)
Mutual labels:  data-science, data-analysis, data
Datascience course
Curso de Data Science em Português
Stars: ✭ 294 (-76.33%)
Mutual labels:  data-science, data-analysis, data
Skdata
Python tools for data analysis
Stars: ✭ 16 (-98.71%)
Mutual labels:  data-science, data-analysis, data
Knowledge Repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (+299.03%)
Mutual labels:  data-science, data-analysis, data
Gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (-1.05%)
Mutual labels:  data-science, data-analysis, data
Drake Examples
Example workflows for the drake R package
Stars: ✭ 57 (-95.41%)
Mutual labels:  data-science, workflow
Dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Stars: ✭ 1,238 (-0.32%)
Mutual labels:  data-science, data-analysis
Tiledb
The Universal Storage Engine
Stars: ✭ 1,072 (-13.69%)
Mutual labels:  data-science, data-analysis
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-95.33%)
Mutual labels:  data-science, scale
Data Science Lunch And Learn
Resources for weekly Data Science Lunch & Learns
Stars: ✭ 49 (-96.05%)
Mutual labels:  data-science, data-analysis
Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (-6.84%)
Mutual labels:  data-science, data-analysis

Flyte Logo

Current Release Sandbox Build End to End tests License Commit activity Commit since last release GitHub milestones Completed GitHub next milestone percentage Docs Twitter Follow Slack Status

Flyte is a production-grade, container-native, type-safe workflow and pipelines platform optimized for large scale processing and machine learning written in Golang. Workflows can be written in any language, with out of the box support for Python, Java and Scala.


HomePage | Quickstart | Documentation | Features | Community & Resources | Changelogs | Components


Introduction

Flyte is a fabric that connects disparate computation backends using a type safe data dependency graph. It records all changes to a pipeline, making it possible to rewind time. It also stores a history of all executions and provides an intuitive UI, CLI and REST/gRPC API to interact with the computation.

Flyte is more than a workflow engine, it provides workflows as a core concepts, but it also provides a single unit of execution - tasks, as a top level concept. Multiple tasks arranged in a data producer-consumer order creates a workflow. Flyte workflows are pure specification and can be created using any language. Every task can also by any language. We do provide first class support for python, making it perfect for modern Machine Learning and Data processing pipelines.

QuickStart

With docker installed, run this command:

  docker run --rm --privileged -p 30081:30081 -p 30084:30084 ghcr.io/flyteorg/flyte-sandbox

This creates a local Flyte sandbox. Once the sandbox is ready, you should see the following message: Flyte is ready! Flyte UI is available at http://localhost:30081/console. Go ahead and visit http://localhost:30081/console. A quick visual tour of the console

Flyte console Example

Refer to Docs - Getting Started for complete end to end example.

Community & Resources

Resources that would help you get a better understanding of Flyte.

Communication channels

Biweekly Community Sync

Conference Talks

Blog Posts

  1. Introducing Flyte: A Cloud Native Machine Learning and Data Processing Platform
  2. Building a Gateway to Flyte

Podcasts

Features

  • Used at Scale in production by 500+ users at Lyft with more than 900k workflow executed a month and more than 30+ million container executions per month
  • Fast registration - from local to remote in one second.
  • Centralized Inventory of Tasks, Workflows and Executions
  • Single Task Execution support - Start executing a task and then convert it to a workflow
  • gRPC / REST interface to define and executes tasks and workflows
  • Type safe construction of pipelines, each task has an interface which is characterized by its input and outputs. Thus illegal construction of pipelines fails during declaration rather than at runtime
  • Types that help in creating machine learning and data processing pipelines like - Blobs (images, arbitrary files), Directories, Schema (columnar structured data), collections, maps etc
  • Memoization and Lineage tracking
  • Workflows features
  • Multiple Schedules for every workflow
  • Parallel step execution
  • Extensible Backend to add customized plugin experiences (with simplified User experiences)
  • Arbitrary container execution
  • Branching
  • Inline Subworkflows (a workflow can be embeded within one node of the top level workflow)
  • Distributed Remote Child workflows (a remote workflow can be triggered and statically verified at compile time)
  • Array Tasks (map some function over a large dataset, controlled execution of 1000's of containers)
  • Dynamic Workflow creation and execution - with runtime type safety
  • Container side plugins with first class support in python
  • PreAlpha: Arbitrary flytekit less containers supported (RawContainer)
  • Maintain an inventory of tasks and workflows
  • Record history of all executions and executions (as long as they follow convention) are completely repeatable
  • Multi Cloud support (AWS, GCP and others)
  • Extensible core
  • Modularized
  • Automated notifications to Slack, Email, Pagerduty
  • Deep observability
  • Multi K8s cluster support
  • Comes with many system supported out of the box on K8s like Spark etc.
  • Snappy Console
  • Python CLI
  • Written in Golang and optimized for performance of large running jobs
  • Golang CLI - flytectl

Inprogress

  • Grafana templates (user/system observability)
  • helm chart for Flyte
  • Performance optimization
  • Flink-K8s

Available Plugins

  • Containers
  • K8s Pods
  • AWS Batch Arrays
  • K8s Pod arrays
  • K8s Spark (native pyspark and java/scala)
  • AWS Athena
  • Qubole Hive
  • Presto Queries
  • Distributed Pytorch (K8s Native) - Pytorch Operator
  • Sagemaker (builtin algorithms & custom models)
  • Distributed Tensorflow (K8s Native) - TFOperator
  • Papermill Notebook execution (python and spark)
  • Type safe and data checking for Pandas dataframe using Pandera

Coming Soon

  • Reactive pipelines
  • More integrations

Current Usage

Component Repos

Repo Language Purpose Status
flyte Kustomize,RST deployment, documentation, issues Production-grade
flyteidl Protobuf interface definitions Production-grade
flytepropeller Go execution engine Production-grade
flyteadmin Go control plane Production-grade
flytekit Python python SDK and tools Production-grade
flyteconsole Typescript admin console Production-grade
datacatalog Go manage input & output artifacts Production-grade
flyteplugins Go flyte plugins Production-grade
flytestdlib Go standard library Production-grade
flytesnacks Python examples, tips, and tricks Incubating
flytekit-java Java/Scala Java & scala SDK for authoring Flyte workflows Incubating
flytectl Go A standalone Flyte CLI Incomplete

Production K8s Operators

Repo Language Purpose
Spark Go Apache Spark batch
Flink Go Apache Flink streaming

Top Contributors

Thank you to the community for making Flyte possible.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].