All Projects β†’ llevar β†’ butler

llevar / butler

Licence: other
Butler is a framework for running scientific workflows on public and academic clouds.

Programming Languages

python
139335 projects - #7 most used programming language
SaltStack
118 projects
shell
77523 projects

Projects that are alternatives of or similar to butler

Reprozip
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
Stars: ✭ 231 (+239.71%)
Mutual labels:  science
Awesome Datascience
πŸ“ An awesome Data Science repository to learn and apply for real world problems.
Stars: ✭ 17,520 (+25664.71%)
Mutual labels:  science
compute
Scientific and statistical computing with Rust.
Stars: ✭ 55 (-19.12%)
Mutual labels:  science
Resources
πŸ“– Huge curated collection (archive) of links of Tech, Science, Economics, Politics, Life, Philosophy, Conferences, Videos and much more resources from everyday surfing. ⭐️ Since October 21, 2017.
Stars: ✭ 236 (+247.06%)
Mutual labels:  science
Astropy
Repository for the Astropy core package
Stars: ✭ 2,933 (+4213.24%)
Mutual labels:  science
ember-osf-preprints
OSF Preprints: The open preprint repository network
Stars: ✭ 38 (-44.12%)
Mutual labels:  science
Stdlib
✨ Standard library for JavaScript and Node.js. ✨
Stars: ✭ 2,749 (+3942.65%)
Mutual labels:  science
pytoshop
Library for reading and writing Photoshop PSD and PSB files
Stars: ✭ 100 (+47.06%)
Mutual labels:  science
Artiq
A leading-edge control system for quantum information experiments
Stars: ✭ 245 (+260.29%)
Mutual labels:  science
diffcalc
Diffcalc: a diffraction condition calculator for X-ray or neutron diffractometer control
Stars: ✭ 17 (-75%)
Mutual labels:  science
Homebrew Bio
πŸΊπŸ”¬ Bioinformatics formulae for the Homebrew package manager (macOS and Linux)
Stars: ✭ 237 (+248.53%)
Mutual labels:  science
Qvge
Qt Visual Graph Editor
Stars: ✭ 237 (+248.53%)
Mutual labels:  science
deep-learning-resources
A curated list of deep learning resources books, courses, papers, libraries, conferences, sample code, and many more.
Stars: ✭ 101 (+48.53%)
Mutual labels:  science
Cwltool
Common Workflow Language reference implementation
Stars: ✭ 235 (+245.59%)
Mutual labels:  science
diofant
A Python CAS library
Stars: ✭ 61 (-10.29%)
Mutual labels:  science
Stellarium
Stellarium is a free GPL software which renders realistic skies in real time with OpenGL. It is available for Linux/Unix, Windows and macOS. With Stellarium, you really see what you can see with your eyes, binoculars or a small telescope.
Stars: ✭ 3,010 (+4326.47%)
Mutual labels:  science
spinmob
Rapid and flexible acquisition, analysis, fitting, and plotting in Python. Designed for scientific laboratories.
Stars: ✭ 34 (-50%)
Mutual labels:  science
chemispy
A library for using chemistry in your applications
Stars: ✭ 28 (-58.82%)
Mutual labels:  science
ember-osf
Ember Addon for interacting with the Open Science Framework
Stars: ✭ 14 (-79.41%)
Mutual labels:  science
spectral-workbench.js
The JavaScript heart of Spectral Workbench; a Public Lab project to record, manipulate, and analyze spectrometric data.
Stars: ✭ 40 (-41.18%)
Mutual labels:  science

Code Health Documentation Status Join the chat at https://gitter.im/butler-cloud/Lobby

docs/images/butler_logo_with_text.png

A Framework for large-scale scientific analysis on the cloud

What is Butler?

Butler is a collection of tools whose goal is to aid researchers in carrying out scientific analyses on a multitude of cloud computing platforms (AWS, Openstack, Google Compute Platform, Azure, and others). Butler is based on many other Open Source projects such as - Apache Airflow, Terraform, Saltstack, Grafana, InfluxDB, PostgreSQL, Celery, Elasticsearch, Consul, and others.

Butler aims to be a comprehensive toolkit for analysing scientific data on clouds. To achieve this goal it provides functionality in four broad areas:

  • Provisioning - Creation and teardown of clusters of Virtual Machines on various clouds.
  • Configuration Management - Installation and configuration of software on Virtual Machines.
  • Workflow Management - Definition and execution of distributed scientific workflows at scale.
  • Operations Management - A set of tools for maintaining operational control of the virtualized environment as it performs work.

You can use Butler to create and execute workflows of arbitrary complexity using Python, or you can quickly wrap and execute tools that ship as Docker containers, or are described with the Common Workflow Language (CWL). Butler ships with a number of ready-made workflows that have been developed in the context of large-scale cancer genomics, including:

  • Genome Alignment using BWA
  • Germline and Somatic SNV detection and genotyping using freebayes, Pindel, and other tools
  • Germline and Somatic SV detection and genotyping using Delly
  • Variant filtering
  • R data analysis

A typical Butler deployment looks like this:

docs/images/embassy_butler_deployment_architecture.png

It can look like a bit of a tangle but is actually fairly simple: The Salt Master configures and installs software, the Tracker schedules workflows and puts them into a RabbitMQ queue keeping track of their state in a database, a fleet of Workers pick up workflow tasks and execute them, the Monitoring Server harvests logs and metrics from everything and visualizes them on graphical dashboards. That's about it. Many more details about how everything works can be found in the Documentation.

Who uses Butler?

  • The Pan Cancer Analysis of Whole Genomes Project (PCAWG) - used Butler to run cancer genomics workflows on 2800+ high-coverage whole genome samples (725 TB of data) on Openstack.
  • The European Open Science Cloud Pilot Project (EOSC) - using Butler to run cancer genomics workflows on multiple platforms (Openstack, AWS).
  • The Pan Prostate Cancer Group - using Butler to run cancer genomics workflows on 2000+ whole genome prostate cancer samples on Openstack.

Getting Started

To get started with Butler you need the following:

  • A target cloud computing environment.
  • Some data.
  • An analysis you want to perform (programs, scripts, etc.).
  • The Butler source repository.

The general sequence of steps you will use with Butler is as follows:

  • Install Terraform on your local machine
  • Clone the Butler Github repository
  • Populate cloud provider credentials
  • Select deployment parameters (VM flavours, networking and security settings, number of workers, etc.)
  • Deploy Butler cluster onto your cloud provider
  • Use Saltstack to configure and deploy all of the necessary software that is used by Butler (this is highly automated)
  • Register some workflows with your Butler deployment
  • Register and configure an analysis (what workflow do you want to run on what data)
  • Launch your analysis
  • Monitor the progress of the analysis and the health of your infrastructure using a variety of dashboards

Next Steps

Head over to the Documentation to learn about how to use Butler for your project.

Watch the keynote presentation by Sergei Yakneen from the de.NBI Cloud Computing Summer School in Giessen, Germany, from June 2017 that describes Butler.

Read the Butler paper in Nature Biotechnology.

docs/images/de.NBI_Bild.png

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].