All Projects → SwissDataScienceCenter → r10e-ds-py

SwissDataScienceCenter / r10e-ds-py

Licence: Apache-2.0 License
Reproducible Data Science in Python (SciPy 2019 Tutorial)

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to r10e-ds-py

Reprozip
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
Stars: ✭ 231 (+1825%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
reprozip-examples
Examples and demos for ReproZip
Stars: ✭ 13 (+8.33%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
ukbrest
ukbREST: efficient and streamlined data access for reproducible research of large biobanks
Stars: ✭ 32 (+166.67%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
Sacred
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
Stars: ✭ 3,678 (+30550%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
Rrtools
rrtools: Tools for Writing Reproducible Research in R
Stars: ✭ 508 (+4133.33%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
ReproducibleScience
Short course on reproducible science: what, why, how
Stars: ✭ 23 (+91.67%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
papers-as-modules
Software Papers as Software Modules: Towards a Culture of Reusable Results
Stars: ✭ 18 (+50%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
Awesome Reproducible Research
A curated list of reproducible research case studies, projects, tutorials, and media
Stars: ✭ 106 (+783.33%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
researchcompendium
NOTE: This repo is archived. Please see https://github.com/benmarwick/rrtools for my current approach
Stars: ✭ 26 (+116.67%)
Mutual labels:  reproducible-research, reproducible-science, reproducibility
analysis-flow
Data Analysis Workflows & Reproducibility Learning Resources
Stars: ✭ 108 (+800%)
Mutual labels:  reproducible-science, reproducibility
openscience
Empirical Software Engineering journal (EMSE) open science and reproducible research initiative
Stars: ✭ 28 (+133.33%)
Mutual labels:  reproducible-research, reproducible-science
ngs-preprocess
A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies
Stars: ✭ 22 (+83.33%)
Mutual labels:  reproducible-research, reproducible-science
binderhub-deploy
Deploy a BinderHub from scratch on Microsoft Azure
Stars: ✭ 27 (+125%)
Mutual labels:  reproducible-research, reproducibility
software-dev
Coding Standards for the USC Biostats group
Stars: ✭ 33 (+175%)
Mutual labels:  reproducible-research, reproducibility
reproducibility-guide
⛔ ARCHIVED ⛔
Stars: ✭ 119 (+891.67%)
Mutual labels:  reproducible-research, reproducibility
awflow
Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!
Stars: ✭ 15 (+25%)
Mutual labels:  reproducible-research, reproducible-science
Reproducibilidad
Reproducible Science: what, why, how
Stars: ✭ 39 (+225%)
Mutual labels:  reproducible-science, reproducibility
benchmark VAE
Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
Stars: ✭ 1,211 (+9991.67%)
Mutual labels:  reproducible-research, reproducibility
showyourwork
Fully reproducible, open source scientific articles in LaTeX.
Stars: ✭ 361 (+2908.33%)
Mutual labels:  reproducible-research, reproducible-science
reproducible
A set of tools for R that enhance reproducibility beyond package management
Stars: ✭ 33 (+175%)
Mutual labels:  reproducible-research, reproducibility

Reproducible Data Science in Python (SciPy 2019 Tutorial)

Materials for the Reproducible Data Science in Python tutorial at SciPy 2019.

Presenters: Chandrasekhar Ramakrishnan (Swiss Data Science Center) and Xu Fei (Code Ocean)

Versions

Date Change
2019-06-18 Initial version
2019-06-25 Added instructions for windows
2019-07-01 Updated environment.yml to work on conda 4.7 and 4.6
2019-07-02 Updated environment.yml to work be cross-platform
2019-07-04 Added dot as a dependency

Description

The expectation of reproducibility in scientific work has been long established, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we will take a closer look at the concept of reproducibility, and, we will examine the technologies that provide building blocks and survey the landscape of tools. We spend the majority of the time looking at two solutions in particular, Renku and Code Ocean, and work through end-to-end scenarios in both.

Set Up

To avoid conflicts in dependencies, we recommend creating a dedicated environment for this tutorial. You can do this using any tool you like, for example pipenv or conda.

We provide instructions based on conda below. If you use docker, we also provide a Dockerfile with instructions for set up and use. If you prefer to use something else, you will need to ensure that git, git-lfs, curl, and node are installed in your environment, but you should be able to pip install the requirements.txt file for the rest.

And, if you do not wish to set up an environment on your computer, you can follow these instructions to use Renkulab; or you can run the tutorial on Code Ocean or MyBinder.

Step 1: Create Environment

Create environment using conda

  1. If you do not yet have conda, you should first install miniconda for your platform
  2. Download the conda environment
  3. In the directory where environment.yml is located, execute conda env create

Verifying the setup

  1. Activate the environment with conda activate r10eds
  2. Run git --version -- the result should be "git version 2.21.0" (or newer)
  3. Run git lfs --version -- the result should be "git-lfs/2.7.1" (or newer)
  4. Run renku --version -- the result should be "0.5.0" (or newer)

Additional setup on Windows On Windows, an additional step is necessary. Renku creates symbolic links, and on Windows it is necessary to have privileges in order to do that. Follow these instructions from from StackExchange/Super User to give your user these privileges.

Step 2: Clone the tutorial repository

  1. Activate the environment with conda activate r10eds
  2. Clone the repository git clone https://github.com/SwissDataScienceCenter/r10e-ds-py.git

Run the tutorial

Once you have the environment set up and repository cloned, you can use them.

  1. cd into the tutorial repository cd r10e-ds-py
  2. Activate the environment with conda activate r10eds
  3. Start jupyter lab jupyter lab (you can also use plan jupyter)

Optional Components

If you wish, you can install Docker Desktop. It is not a requirement, but it will make it possible to dig deeper into certain areas in the tutorial.

Schedule

Introduction (1h)
15 min Background & Theory Terminology, history, and philosophy of reproducibility
30 min Building Blocks Building blocks for achieving reproducibility
15 min Tools Survey of the current tool landscape
Break (10 min)
Hands-on with Renku (1h 30m)
30 min Starting Starting a project, importing data, building a workflow
30 min Iterating Updating code and data to improve analysis
30 min Details and Reflection What is the benefit? How much effort was it? How do we view, share, and reuse artifacts? How do things work under the covers?
Break (20 min)
Hands-on with Code Ocean (1h)
10 min Demo of a Compute Capsule Intro to Code Ocean and its design philosophy
30 min Creating a Compute Capsule Create a reproducible compute capsule using code and data from the existing Renku project. We will explore options to publish, collaborate, import from Github, export to local server, etc.
15 min Q&A, Wrap up Any questions that you want to ask

Acknowledgements

Many thanks to Erica Moreira, Laura Levin-Gleba, and Maja Garbulinksa from the Harvard School of Public Health for their helpful comments and suggestions!

The icons used are from Icons8.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].