All Projects → dagster-io → Dagster

dagster-io / Dagster

Licence: apache-2.0
An orchestration platform for the development, production, and observation of data assets.

Programming Languages

python
139335 projects - #7 most used programming language
typescript
32286 projects
Jupyter Notebook
11667 projects
javascript
184084 projects - #8 most used programming language
scala
5932 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to Dagster

beneath
Beneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-98.41%)
Mutual labels:  etl, analytics, data-pipelines
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-98.07%)
Mutual labels:  data-science, analytics, etl
zenaton-node
⚡ Node.js library to run and orchestrate background jobs with Zenaton Workflow Engine
Stars: ✭ 50 (-98.78%)
Mutual labels:  scheduler, workflow-automation
rivery cli
Rivery CLI
Stars: ✭ 16 (-99.61%)
Mutual labels:  etl, data-pipelines
bitnami-docker-airflow-scheduler
Bitnami Docker Image for Apache Airflow Scheduler
Stars: ✭ 19 (-99.54%)
Mutual labels:  workflow, scheduler
thain
Thain is a distributed flow schedule platform.
Stars: ✭ 81 (-98.02%)
Mutual labels:  etl, scheduler
zdh web
大数据采集,抽取平台
Stars: ✭ 292 (-92.88%)
Mutual labels:  etl, scheduler
open-semantic-desktop-search
Virtual Machine for Desktop Search with Open Semantic Search
Stars: ✭ 22 (-99.46%)
Mutual labels:  etl, analytics
Wexflow
An easy and fast way to build automation and workflows on Windows, Linux, macOS, and the cloud.
Stars: ✭ 2,435 (-40.6%)
Mutual labels:  scheduler, workflow
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (-99.41%)
Mutual labels:  etl, data-pipelines
ibis
IBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Stars: ✭ 48 (-98.83%)
Mutual labels:  workflow, workflow-automation
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (-27.64%)
Mutual labels:  data-science, workflow
Powerjob
Enterprise job scheduling middleware with distributed computing ability.
Stars: ✭ 3,231 (-21.18%)
Mutual labels:  scheduler, workflow
Aiida Core
The official repository for the AiiDA code
Stars: ✭ 238 (-94.19%)
Mutual labels:  scheduler, workflow
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-96.49%)
Mutual labels:  etl, data-pipelines
Schedulis
Schedulis is a high performance workflow task scheduling system that supports high availability and multi-tenant financial level features, Linkis computing middleware, and has been integrated into data application development portal DataSphere Studio
Stars: ✭ 222 (-94.58%)
Mutual labels:  scheduler, workflow
monopacker
A tool for managing builds of monorepo frontend projects with eg. npm- or yarn workspaces, lerna or similar tools into a standalone application - no other tools needed.
Stars: ✭ 17 (-99.59%)
Mutual labels:  workflow, workflow-automation
Active workflow
Turn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (-89.92%)
Mutual labels:  scheduler, workflow
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+487.97%)
Mutual labels:  scheduler, workflow
Introduction Datascience Python Book
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications
Stars: ✭ 275 (-93.29%)
Mutual labels:  data-science, analytics



Dagster

An orchestration platform for the development, production, and observation of data assets.

Dagster lets you define jobs in terms of the data flow between reusable, logical components, then test locally and run anywhere. With a unified view of jobs and the assets they produce, Dagster can schedule and orchestrate Pandas, Spark, SQL, or anything else that Python can invoke.

Dagster is designed for data platform engineers, data engineers, and full-stack data scientists. Building a data platform with Dagster makes your stakeholders more independent and your systems more robust. Developing data pipelines with Dagster makes testing easier and deploying faster.

Develop and test locally, then deploy anywhere

With Dagster’s pluggable execution, the same computations can run in-process against your local file system, or on a distributed work queue against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, deploy it on-premise, or in any cloud.

Model and type the data produced and consumed by each step

Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.

Link data to computations

Dagster’s Asset Manager tracks the data sets and ML models produced by your jobs, so you can understand how they were generated and trace issues when they don’t look how you expect.

Build a self-service data platform

Dagster helps platform teams build systems for data practitioners. Jobs are built from shared, reusable, configurable data processing and infrastructure components. Dagit, Dagster’s web interface, lets anyone inspect these objects and discover how to use them.

Avoid dependency nightmares

Dagster’s repository model lets you isolate codebases so that problems in one job don’t bring down the rest. Each job can have its own package dependencies and Python version. Jobs are run in isolated processes so user code issues can't bring the system down.

Debug pipelines from a rich UI

Dagit, Dagster’s web interface, includes expansive facilities for understanding the jobs it orchestrates. When inspecting a run of your job, you can query over logs, discover the most time consuming tasks via a Gantt chart, re-execute subsets of steps, and more.

Getting Started

Installation

Dagster is available on PyPI, and officially supports Python 3.6+.

$ pip install dagster dagit

This installs two modules:

  • Dagster: the core programming model and abstraction stack; stateless, single-node, single-process and multi-process execution engines; and a CLI tool for driving those engines.
  • Dagit: the UI for developing and operating Dagster pipelines, including a DAG browser, a type-aware config editor, and a live execution interface.

Learn

Next, jump right into our tutorial, or read our complete documentation. If you're actively using Dagster or have questions on getting started, we'd love to hear from you:


Contributing

For details on contributing or running the project for development, check out our contributing guide.

Integrations

Dagster works with the tools and systems that you're already using with your data, including:

Integration Dagster Library
Apache Airflow dagster-airflow
Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as Apache Airflow DAGs.
Apache Spark dagster-spark · dagster-pyspark
Libraries for interacting with Apache Spark and PySpark.
Dask dagster-dask
Provides a Dagster integration with Dask / Dask.Distributed.
Datadog dagster-datadog
Provides a Dagster resource for publishing metrics to Datadog.
 /  Jupyter / Papermill dagstermill
Built on the papermill library, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines.
PagerDuty dagster-pagerduty
A library for creating PagerDuty alerts from Dagster workflows.
Snowflake dagster-snowflake
A library for interacting with the Snowflake Data Warehouse.
Cloud Providers
AWS dagster-aws
A library for interacting with Amazon Web Services. Provides integrations with Cloudwatch, S3, EMR, and Redshift.
Azure dagster-azure
A library for interacting with Microsoft Azure.
GCP dagster-gcp
A library for interacting with Google Cloud Platform. Provides integrations with GCS, BigQuery, and Cloud Dataproc.

This list is growing as we are actively building more integrations, and we welcome contributions!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].