Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → SwissDataScienceCenter → r10e-ds-py

SwissDataScienceCenter / r10e-ds-py

Licence: Apache-2.0 License

Reproducible Data Science in Python (SciPy 2019 Tutorial)

Programming Languages

Jupyter Notebook

11667 projects

Labels

reproducible-research reproducible-science reproducibility

Projects that are alternatives of or similar to r10e-ds-py

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

Stars: ✭ 231 (+1825%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

reprozip-examples

Examples and demos for ReproZip

Stars: ✭ 13 (+8.33%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

ukbREST: efficient and streamlined data access for reproducible research of large biobanks

Stars: ✭ 32 (+166.67%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

Stars: ✭ 3,678 (+30550%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

rrtools: Tools for Writing Reproducible Research in R

Stars: ✭ 508 (+4133.33%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

ReproducibleScience

Short course on reproducible science: what, why, how

Stars: ✭ 23 (+91.67%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

papers-as-modules

Software Papers as Software Modules: Towards a Culture of Reusable Results

Stars: ✭ 18 (+50%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

Awesome Reproducible Research

A curated list of reproducible research case studies, projects, tutorials, and media

Stars: ✭ 106 (+783.33%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

researchcompendium

NOTE: This repo is archived. Please see https://github.com/benmarwick/rrtools for my current approach

Stars: ✭ 26 (+116.67%)

Mutual labels: reproducible-research, reproducible-science, reproducibility

Data Analysis Workflows & Reproducibility Learning Resources

Stars: ✭ 108 (+800%)

Mutual labels: reproducible-science, reproducibility

Empirical Software Engineering journal (EMSE) open science and reproducible research initiative

Stars: ✭ 28 (+133.33%)

Mutual labels: reproducible-research, reproducible-science

A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies

Stars: ✭ 22 (+83.33%)

Mutual labels: reproducible-research, reproducible-science

binderhub-deploy

Deploy a BinderHub from scratch on Microsoft Azure

Stars: ✭ 27 (+125%)

Mutual labels: reproducible-research, reproducibility

Coding Standards for the USC Biostats group

Stars: ✭ 33 (+175%)

Mutual labels: reproducible-research, reproducibility

reproducibility-guide

⛔ ARCHIVED ⛔

Stars: ✭ 119 (+891.67%)

Mutual labels: reproducible-research, reproducibility

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Stars: ✭ 15 (+25%)

Mutual labels: reproducible-research, reproducible-science

Reproducibilidad

Reproducible Science: what, why, how

Stars: ✭ 39 (+225%)

Mutual labels: reproducible-science, reproducibility

Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)

Stars: ✭ 1,211 (+9991.67%)

Mutual labels: reproducible-research, reproducibility

Fully reproducible, open source scientific articles in LaTeX.

Stars: ✭ 361 (+2908.33%)

Mutual labels: reproducible-research, reproducible-science

A set of tools for R that enhance reproducibility beyond package management

Stars: ✭ 33 (+175%)

Mutual labels: reproducible-research, reproducibility

View All Similar Projects ➔

Reproducible Data Science in Python (SciPy 2019 Tutorial)

Materials for the Reproducible Data Science in Python tutorial at SciPy 2019.

Presenters: Chandrasekhar Ramakrishnan (Swiss Data Science Center) and Xu Fei (Code Ocean)

Versions

Date	Change
2019-06-18	Initial version
2019-06-25	Added instructions for windows
2019-07-01	Updated environment.yml to work on conda 4.7 and 4.6
2019-07-02	Updated environment.yml to work be cross-platform
2019-07-04	Added dot as a dependency

Description

The expectation of reproducibility in scientific work has been long established, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. In this tutorial, we will take a closer look at the concept of reproducibility, and, we will examine the technologies that provide building blocks and survey the landscape of tools. We spend the majority of the time looking at two solutions in particular, Renku and Code Ocean, and work through end-to-end scenarios in both.

Set Up

To avoid conflicts in dependencies, we recommend creating a dedicated environment for this tutorial. You can do this using any tool you like, for example pipenv or conda.

We provide instructions based on conda below. If you use docker, we also provide a Dockerfile with instructions for set up and use. If you prefer to use something else, you will need to ensure that git, git-lfs, curl, and node are installed in your environment, but you should be able to pip install the requirements.txt file for the rest.

And, if you do not wish to set up an environment on your computer, you can follow these instructions to use Renkulab; or you can run the tutorial on Code Ocean or MyBinder.

Step 1: Create Environment

Create environment using conda

If you do not yet have conda, you should first install miniconda for your platform
Download the conda environment
In the directory where environment.yml is located, execute conda env create

Verifying the setup

Activate the environment with conda activate r10eds
Run git --version -- the result should be "git version 2.21.0" (or newer)
Run git lfs --version -- the result should be "git-lfs/2.7.1" (or newer)
Run renku --version -- the result should be "0.5.0" (or newer)

Additional setup on Windows On Windows, an additional step is necessary. Renku creates symbolic links, and on Windows it is necessary to have privileges in order to do that. Follow these instructions from from StackExchange/Super User to give your user these privileges.

Step 2: Clone the tutorial repository

Activate the environment with conda activate r10eds
Clone the repository git clone https://github.com/SwissDataScienceCenter/r10e-ds-py.git

Run the tutorial

Once you have the environment set up and repository cloned, you can use them.

cd into the tutorial repository cd r10e-ds-py
Activate the environment with conda activate r10eds
Start jupyter lab jupyter lab (you can also use plan jupyter)

Optional Components

If you wish, you can install Docker Desktop. It is not a requirement, but it will make it possible to dig deeper into certain areas in the tutorial.

Schedule

Introduction (1h)
15 min	Background & Theory	Terminology, history, and philosophy of reproducibility
30 min	Building Blocks	Building blocks for achieving reproducibility
15 min	Tools	Survey of the current tool landscape

Break (10 min)

Hands-on with Renku (1h 30m)
30 min	Starting	Starting a project, importing data, building a workflow
30 min	Iterating	Updating code and data to improve analysis
30 min	Details and Reflection	What is the benefit? How much effort was it? How do we view, share, and reuse artifacts? How do things work under the covers?

Break (20 min)

Hands-on with Code Ocean (1h)
10 min	Demo of a Compute Capsule	Intro to Code Ocean and its design philosophy
30 min	Creating a Compute Capsule	Create a reproducible compute capsule using code and data from the existing Renku project. We will explore options to publish, collaborate, import from Github, export to local server, etc.
15 min	Q&A, Wrap up	Any questions that you want to ask

Acknowledgements

Many thanks to Erica Moreira, Laura Levin-Gleba, and Maja Garbulinksa from the Harvard School of Public Health for their helpful comments and suggestions!

The icons used are from Icons8.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 12

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗