BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

Stars: ✭ 156 (+173.68%)

Mutual labels: data-science, pipeline, workflow

Gtsummary

Presentation-Ready Data Summary and Analytic Result Tables

Stars: ✭ 450 (+689.47%)

Mutual labels: reproducible-research, reproducibility, rstats

Sarek

Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing

Stars: ✭ 124 (+117.54%)

Mutual labels: pipeline, reproducible-research, workflow

Vistrails

VisTrails is an open-source data analysis and visualization tool. It provides a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the computational processes that derive these products and their executions.

Stars: ✭ 94 (+64.91%)

Mutual labels: pipeline, reproducibility, workflow

Plynx

PLynx is a domain agnostic platform for managing reproducible experiments and data-oriented workflows.

Stars: ✭ 192 (+236.84%)

Mutual labels: data-science, reproducibility, workflow

targets-tutorial

Short course on the targets R package

Stars: ✭ 87 (+52.63%)

Mutual labels: pipeline, reproducible-research, reproducibility

open-solution-googleai-object-detection

Open solution to the Google AI Object Detection Challenge 🍁

Stars: ✭ 46 (-19.3%)

Mutual labels: pipeline, reproducible-research, reproducibility

DNAscan

DNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.

Stars: ✭ 36 (-36.84%)

Mutual labels: workflow, pipeline

papers-as-modules

Software Papers as Software Modules: Towards a Culture of Reusable Results

Stars: ✭ 18 (-68.42%)

Mutual labels: reproducible-research, reproducibility

Polyaxon

Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)

Stars: ✭ 2,966 (+5103.51%)

Mutual labels: data-science, workflow

Rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.

Stars: ✭ 305 (+435.09%)

Mutual labels: pipeline, workflow

Open Solution Mapping Challenge

Open solution to the Mapping Challenge 🌎

Stars: ✭ 291 (+410.53%)

Mutual labels: data-science, pipeline

Funflow

Functional workflows

Stars: ✭ 318 (+457.89%)

Mutual labels: reproducible-research, workflow

Datmo

Open source production model management tool for data scientists

Stars: ✭ 334 (+485.96%)

Mutual labels: data-science, reproducibility

Great expectations

Always know what to expect from your data.

Stars: ✭ 5,808 (+10089.47%)

Mutual labels: data-science, pipeline

Pipeline

Pipeline is a package to build multi-staged concurrent workflows with a centralized logging output.

Stars: ✭ 433 (+659.65%)

Mutual labels: pipeline, workflow

Wdl

Workflow Description Language - Specification and Implementations

Stars: ✭ 438 (+668.42%)

Mutual labels: reproducibility, workflow

Labnotebook

LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.

Stars: ✭ 526 (+822.81%)

Mutual labels: reproducible-research, reproducibility

Dataexplorer

Automate Data Exploration and Treatment

Stars: ✭ 362 (+535.09%)

Mutual labels: data-science, rstats

Workflowr

Organize your project into a research website

Stars: ✭ 551 (+866.67%)

Mutual labels: workflow, rstats

Moderndive book

Statistical Inference via Data Science: A ModernDive into R and the Tidyverse

Stars: ✭ 527 (+824.56%)

Mutual labels: data-science, rstats

Pdpipe

Easy pipelines for pandas DataFrames.

Stars: ✭ 590 (+935.09%)

Mutual labels: data-science, pipeline

reprozip-examples

Examples and demos for ReproZip

Stars: ✭ 13 (-77.19%)

Mutual labels: reproducible-research, reproducibility

r10e-ds-py

Reproducible Data Science in Python (SciPy 2019 Tutorial)

Stars: ✭ 12 (-78.95%)

Mutual labels: reproducible-research, reproducibility

cli-property-manager

Use this Property Manager CLI to automate Akamai property changes and deployments across many environments.

Stars: ✭ 22 (-61.4%)

Mutual labels: workflow, pipeline

reproducible

A set of tools for R that enhance reproducibility beyond package management

Stars: ✭ 33 (-42.11%)

Mutual labels: reproducible-research, reproducibility

Dagster

An orchestration platform for the development, production, and observation of data assets.

Stars: ✭ 4,099 (+7091.23%)

Mutual labels: data-science, workflow

Sacred

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

Stars: ✭ 3,678 (+6352.63%)

Mutual labels: reproducible-research, reproducibility

bistro

A library to build and execute typed scientific workflows

Stars: ✭ 43 (-24.56%)

Mutual labels: workflow, pipeline

Open Solution Home Credit

Open solution to the Home Credit Default Risk challenge 🏡

Stars: ✭ 397 (+596.49%)

Mutual labels: pipeline, reproducibility

Collective Knowledge framework (CK) helps to organize black-box research software as a database of reusable components and micro-services with common APIs, automation actions and extensible meta descriptions. See real-world use cases from Arm, General Motors, ACM, Raspberry Pi foundation and others:

Stars: ✭ 395 (+592.98%)

Mutual labels: reproducibility, workflow

Production Data Science

Production Data Science: a workflow for collaborative data science aimed at production

Stars: ✭ 388 (+580.7%)

Mutual labels: data-science, workflow

Rrtools

rrtools: Tools for Writing Reproducible Research in R

Stars: ✭ 508 (+791.23%)

Mutual labels: reproducible-research, reproducibility

Awesome R

A curated list of awesome R packages, frameworks and software.

Stars: ✭ 4,858 (+8422.81%)

Mutual labels: data-science, rstats

Reproducibilty-Challenge-ECANET

Unofficial Implementation of ECANets (CVPR 2020) for the Reproducibility Challenge 2020.

Stars: ✭ 27 (-52.63%)

Mutual labels: reproducible-research, reproducibility

Toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

Stars: ✭ 733 (+1185.96%)

Mutual labels: pipeline, workflow

Prefect

The easiest way to automate your data

Stars: ✭ 7,956 (+13857.89%)

Mutual labels: data-science, workflow

Recsys2019 deeplearning evaluation

This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.

Stars: ✭ 780 (+1268.42%)

Mutual labels: reproducible-research, reproducibility

Scipipe

Robust, flexible and resource-efficient pipelines using Go and the commandline

Stars: ✭ 826 (+1349.12%)

Mutual labels: pipeline, workflow

Learn Julia The Hard Way

Learn Julia the hard way!

Stars: ✭ 679 (+1091.23%)

Mutual labels: makefile, data-science

Galaxy

Data intensive science for everyone.