All Projects â†’ alan-turing-institute â†’ binderhub-deploy

alan-turing-institute / binderhub-deploy

Licence: MIT license
Deploy a BinderHub from scratch on Microsoft Azure

Programming Languages

shell
77523 projects
python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to binderhub-deploy

Steppy
Lightweight, Python library for fast and reproducible experimentation đŸ”Ŧ
Stars: ✭ 119 (+340.74%)
Mutual labels:  reproducible-research, reproducibility
reproducibility-guide
⛔ ARCHIVED ⛔
Stars: ✭ 119 (+340.74%)
Mutual labels:  reproducible-research, reproducibility
Reprozip
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
Stars: ✭ 231 (+755.56%)
Mutual labels:  reproducible-research, reproducibility
Enmf
This is our implementation of ENMF: Efficient Neural Matrix Factorization (TOIS. 38, 2020). This also provides a fair evaluation of existing state-of-the-art recommendation models.
Stars: ✭ 96 (+255.56%)
Mutual labels:  reproducible-research, reproducibility
benchmark VAE
Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
Stars: ✭ 1,211 (+4385.19%)
Mutual labels:  reproducible-research, reproducibility
Awesome Reproducible Research
A curated list of reproducible research case studies, projects, tutorials, and media
Stars: ✭ 106 (+292.59%)
Mutual labels:  reproducible-research, reproducibility
team-compass
A repository for team interaction, syncing, and handling meeting notes across the JupyterHub ecosystem.
Stars: ✭ 59 (+118.52%)
Mutual labels:  binder, binderhub
Steppy Toolkit
Curated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-22.22%)
Mutual labels:  reproducible-research, reproducibility
fertile
creating optimal conditions for reproducibility
Stars: ✭ 52 (+92.59%)
Mutual labels:  reproducible-research, reproducibility
Ten Rules Jupyter
Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks
Stars: ✭ 204 (+655.56%)
Mutual labels:  binder, reproducible-research
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+4718.52%)
Mutual labels:  reproducible-research, reproducibility
software-dev
Coding Standards for the USC Biostats group
Stars: ✭ 33 (+22.22%)
Mutual labels:  reproducible-research, reproducibility
Drake Examples
Example workflows for the drake R package
Stars: ✭ 57 (+111.11%)
Mutual labels:  reproducible-research, reproducibility
Reproducibility Guide
project page for creating a guide to reproducible research
Stars: ✭ 116 (+329.63%)
Mutual labels:  reproducible-research, reproducibility
Evalai
☁ī¸ 🚀 📊 📈 Evaluating state of the art in AI
Stars: ✭ 1,087 (+3925.93%)
Mutual labels:  reproducible-research, reproducibility
persistent binderhub
A Helm chart repo to install persistent BinderHub
Stars: ✭ 18 (-33.33%)
Mutual labels:  binder, binderhub
Labnotebook
LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.
Stars: ✭ 526 (+1848.15%)
Mutual labels:  reproducible-research, reproducibility
Recsys2019 deeplearning evaluation
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
Stars: ✭ 780 (+2788.89%)
Mutual labels:  reproducible-research, reproducibility
mybinder.org-deploy
Deployment config files for mybinder.org
Stars: ✭ 64 (+137.04%)
Mutual labels:  binder, binderhub
targets-tutorial
Short course on the targets R package
Stars: ✭ 87 (+222.22%)
Mutual labels:  reproducible-research, reproducibility

Automatically deploy a BinderHub to Microsoft Azure

mit_license_badge Build and Push Docker image Lint Dockerfile Check Setup Run shellcheck and shfmt yamllint Code of Conduct Contributing Guidelines good first issue GitHub labels All Contributors

BinderHub is a cloud-based, multi-server technology used for hosting repoducible computing environments and interactive Jupyter Notebooks built from code repositories.

This repository contains a set of scripts to automatically deploy a BinderHub onto Microsoft Azure, and connect either a Docker Hub account/organisation or an Azure Container Registry, so that you can host your own Binder service.

You will require a Microsoft Azure account and subscription. A Free Trial subscription can be obtained here. You will be asked to provide a credit card for verification purposes. You will not be charged. Your resources will be frozen once your subscription expires, then deleted if you do not reactivate your account within a given time period. If you are building a BinderHub as a service for an organisation, your institution may already have an Azure account. You should contact your IT Services for further information regarding permissions and access (see the Service Principal Creation section below).

Please read our 💜 Code of Conduct 💜 and 👾 Contributing Guidelines 👾

Table of Contents:


🚸 Usage

This repo can either be run locally or as "Platform as a Service" through the "Deploy to Azure" button in the "Deploy to Azure" Button section.

To use these scripts locally, clone this repo and change into the directory.

git clone https://github.com/alan-turing-institute/binderhub-deploy.git
cd binderhub-deploy

To make the scripts executable and then run them, do the following:

cd src
chmod 700 <script-name>.sh
./<script-name>.sh

[NOTE: The above command is UNIX specific. If you are running Windows 10, this blog post discusses using a bash shell in Windows.]

To build the BinderHub, you should run setup.sh first (to install the required command line tools), then deploy.sh (which will build the BinderHub). Once the BinderHub is deployed, you can run logs.sh and info.sh to get the JupyterHub logs and IP addresses respectively. teardown.sh should only be used to delete your BinderHub deployment.

You need to create a file called config.json which has the format described in the code block below. Fill the quotation marks with your desired namespaces, etc. config.json is git-ignored so sensitive information, such as passwords and Service Principals, cannot not be pushed to GitHub.

  • For a list of available data centre regions, see here. This should be a region and not a location, for example "West Europe" or "Central US". These can be equivalently written as westeurope and centralus, respectively.
  • For a list of available Linux Virtual Machines, see here. This should be something like, for example Standard_D2s_v3.
  • The versions of the BinderHub Helm Chart can be found here and are of the form 0.2.0-<commit-hash>. It is advised to select the most recent version unless you specifically require an older one.
  • If you are deploying an Azure Container Registry, find out more about the SKU tiers here.
{
  "container_registry": "",        // Choose Docker Hub or ACR with 'dockerhub' or 'azurecr' values, respectively.
  "enable_https": "false",         // Choose whether to enable HTTPS with cert-manager. Boolean.
  "acr": {
    "registry_name": null,         // Name to give the ACR. This must be alpha-numerical and unique to Azure.
    "sku": "Basic"                 // The SKU capacity and pricing tier for the ACR
  },
  "azure": {
    "subscription": "",            // Azure subscription name or ID (a hex-string)
    "res_grp_name": "",            // Azure Resource Group name
    "location": "",                // Azure Data Centre region
    "node_count": 1,               // Number of nodes to deploy. 3 is preferrable for a stable cluster, but may be liable to caps.
    "vm_size": "Standard_D2s_v3",  // Azure virtual machine type to deploy
    "sp_app_id": null,             // Azure service principal ID (optional)
    "sp_app_key": null,            // Azure service principal password (optional)
    "sp_tenant_id": null,          // Azure tenant ID (optional)
    "log_to_blob_storage": false   // Store logs in blob storage when not running from a container
  },
  "binderhub": {
    "name": "",                    // Name of your BinderHub
    "version": "",                 // Helm chart version to deploy, should be 0.2.0-<commit-hash>
    "image_prefix": ""             // The prefix to preppend to Docker images (e.g. "binder-prod")
  },
  "docker": {
    "username": null,              // Docker username (can be supplied at runtime)
    "password": null,              // Docker password (can be supplied at runtime)
    "org": null                    // A Docker Hub organisation to push images to (optional)
  },
  "https:": {
    "certmanager_version": null,   // Version of cert-manager to install
    "contact_email": null,        // Contact email for Let's Encrypt
    "domain_name": null,          // Domain name to issue certificates for
    "nginx_version": null         // Version on nginx-ingress to install
  }
}

You can copy template-config.json should you require.

Please note that all entries in template-config.json must be surrounded by double quotation marks ("), with the exception of node_count.

Important for Free Trial subscriptions

If you have signed up to an Azure Free Trial subscription, you are not allowed to deploy more than 4 cores. How many cores you deploy depends on your choice of node_count and vm_size.

For example, a Standard_D2s_v3 machine has 2 cores. Therefore, setting node_count to 2 will deploy 4 cores and you will have reached your quota for cores on your Free Trial subscription.

đŸ“Ļ Choosing between Docker Hub and Azure Container Registry

To select either a Docker Hub account/organisation or an Azure Container Registry (ACR), you must set the top-level container_registry key in config.json to either dockerhub or azurecr respectively. This will tell deploy.sh which variables and YAML templates to use. Then fill in the values under either the dockerhub or acr key as required.

Using a Docker Hub account/organisation has the benefit of being relatively simple to set up. However, all the BinderHub images pushed there will be publicly available. For a few extra steps, deploying an ACR will allow the BinderHub images to be pushed to a private repository.

Important Caveats when deploying an ACR

Service Principal:

In the Service Principal Creation section, we cover how to create a Service Principal in order to deploy a BinderHub. When following these steps, the --role argument of Contributor should be replaced with Owner. This is because the Service Principal will need the AcrPush role in order to push images to the ACR and the Contributor role does not have permission to create new role assignments.

đŸšĻ setup.sh

This script checks whether the required command line tools are already installed. If any are missing, the script uses the system package manager or curl to install the command line interfaces (CLIs). The CLIs to be installed are:

Any dependencies that are not automatically installed by these packages will also be installed.

🔐 Enabling HTTPS for a Domain Name

If you have a domain name that you would like your BinderHub to be hosted at, the package can configure a DNS Zone to host the records for your domain name and configure the BinderHub to use these addresses rather than raw IP addresses. HTTPS certificates will also be requested for the DNS records using cert-manager and Let's Encrypt.

🔨 Manual steps required

While the package tries to automate as much as possible, when enabling HTTPS there are still a few steps that the user will have to do manually.

  1. Delegate the domain to the name servers

    The script will return four name servers that are hosting the DNS Zone, the will be saved to the log file name-servers.log. Your parent domain NS records need to be updated to delegate to these name servers (see the Azure documentation). How this is achieved will be different depending on your domain registrar.

  2. Point the A records to the Load Balancer IP Address

    Two A records are created for the Binder page and the JupyterHub and these records need to be set to the public IP address of the cluster's load balancer. The package tries to complete this step automatically but often fails, due to the long-running nature of Azure's process to update the CLI. It is recommended to wait some time (overnight is best) and then run set-a-records.sh. Alternatively, there are manual instructions for setting the A records in the Azure Portal.

  3. Switching from Let's Encrypt staging to production

    Let's Encrypt provides a staging platform to test against and this is the environment the package will request certificates from. Once you have verified the staging certificates have been issued correctly, the user must switch to requesting certificates from Let's Encrypt's production environment to receive trusted certificates. Instructions for switching environments.

🚀 deploy.sh

This script reads in values from config.json and deploys a Kubernetes cluster. It then creates config.yaml and secret.yaml files which are used to install the BinderHub using the templates in the templates folder.

If you have chosen a Docker Hub account/organisation, the script will ask for your Docker ID and password if you haven't supplied them in the config file. The ID is your Docker username, NOT the associated email. If you have provided a Docker organisation in config.json, then Docker ID MUST be a member of this organisation.

If you have chosen an ACR, the script will create one and assign the AcrPush role to your Service Principal. The registry server and Service Principal credentials will then be parsed into config.yaml and secret.yaml so that the BinderHub can connect to the ACR.

If you have requested HTTPS to be enabled, the script will create a DNS Zone and A records for the Binder and JupyterHub endpoints. The nginx-ingress and cert-manager helm charts will also be installed to provide a load balancer and automated requests for certificates from Let's Encrypt, respectively.

Both a JupyterHub and BinderHub are installed via a Helm Chart onto the deployed Kubernetes cluster and the config.yaml file is updated with the JupyterHub IP address.

config.yaml and secret.yaml are both git-ignored so that secrets cannot be pushed back to GitHub.

The script also outputs log files (<file-name>.log) for each stage of the deployment. These files are also git-ignored.

If the azure.log_to_blob_storage value in config.json is set to true the script is running from the command line, then the log files will be stored in blob storage.

đŸ“Ĩ set-a-records.sh

🚨 This script is only relevant if deploying a BinderHub with a domain name and HTTPS certificates 🚨

This script reads in values from config.json and try to set the Kubernetes public IP address to the binder and hub A records in the DNS Zone.

📊 logs.sh

This script will print the JupyterHub logs to the terminal to assist with debugging issues with the BinderHub. It reads from config.json in order to get the BinderHub name.

ℹī¸ info.sh

This script will print the pod status of the Kubernetes cluster and the IP addresses of both the JupyterHub and BinderHub to the terminal. It reads the BinderHub name from config.json.

âŦ†ī¸ upgrade.sh

This script will automatically upgrade the Helm Chart deployment configuring the BinderHub and then prints the Kubernetes pods. It reads the BinderHub name and Helm Chart version from config.json.

đŸ’Ĩ teardown.sh

This script will purge the Helm Chart release, delete the Kubernetes namespace and then delete the Azure Resource Group containing the computational resources. It will read the namespaces from config.json. The user should check the Azure Portal to verify the resources have been deleted. It will also purge the cluster information from your kubectl configuration file.

🚀 "Deploy to Azure" Button

To deploy BinderHub to Azure in a single click (and some form-filling), use the deploy button below.

Deploy to Azure

✨ Service Principal Creation

You will be asked to provide a Service Principal in the form launched when you click the "Deploy to Azure" button above.

[NOTE: The following instructions can also be run in a local terminal session. They will require the Azure command line to be installed, so make sure to run setup.sh first.]

To create a Service Principal, go to the Azure Portal (and login!) and open the Cloud Shell:

Open Shell in Azure

You may be asked to create storage when you open the shell. This is expected, click "Create".

Make sure the shell is set to Bash, not PowerShell.

Bash Shell

Set the subscription you'd like to deploy your BinderHub on.

az account set --subscription <subscription>

This image shows the command being executed for an "Azure Pass - Sponsorship" subscription.

Set Subscription

You will need the subscription ID, which you can retrieve by running:

az account list --refresh --output table

List Subscriptions

Next, create the Service Principal with the following command. Make sure to give it a sensible name!

az ad sp create-for-rbac \
    --name binderhub-sp \
    --role Contributor \
    --scope /subscriptions/<subscription ID from above>

NOTE: If you are deploying an ACR rather than connecting to Docker Hub, then this command should be:

az ad sp create-for-rbac \
    --name binder\
    --scope /subscriptions/<subscription ID from above>

Create Service Principal

The fields appId, password and tenant are the required pieces of information. These should be copied into the "Service Principal App ID", "Service Principal App Key" and "Service Principal Tenant ID" fields in the form, respectively.

Keep this information safe as the password cannot be recovered after this step!

📈 Monitoring Deployment Progress

To monitor the progress of the blue-button deployment, go to the Azure portal and select "Resource Groups" from the left hand pane. Then in the central pane select the resource group you chose to deploy into.

Select Resource Group

This will give you a right hand pane containing the resources within the group. You may need to "refresh" until you see a new container instance.

Select Container Instance

When it appears, select it and then in the new pane go to "Settings->Containers". You should see your new container listed.

Container Events

Select it, then in the lower right hand pane select "Logs". You may need to "refresh" this to display the logs until the container starts up. The logs are also not auto-updating, so keep refreshing them to see progress.

Container Logs

đŸ“Ļ Retrieving Deployment Output from Azure

When BinderHub is deployed using the "Deploy to Azure" button (or with a local container), output logs, YAML files, and ssh keys are pushed to an Azure storage account to preserve them once the container exits. The storage account is created in the same resource group as the Kubernetes cluster, and files are pushed into a storage blob within the account.

Both the storage blob name and the storage account name are derived from the name you gave to your BinderHub instance, but may be modified and/or have a random seed appended. To find the storage account name, navigate to your resource group by selecting "Resource Groups" in the left-most panel of the Azure Portal, then clicking on the resource group containing your BinderHub instance. Along with any pre-existing resources (for example, if you re-used an existing resource group), you should see three new resources: a container instance, a Kubernetes service, and a storage account. Make a note of the name of the storage account (referred to in the following commands as ACCOUNT_NAME) then select this storage account.

Storage Account

In the new pane that opens, select "Blobs" from the "Services" section. You should see a single blob listed. Make a note of the name of this blob, which will be BLOB_NAME in the following commands.

Blob Storage

Select Blob Storage

The Azure CLI can be used to fetch files from the blob (either in the cloud shell in the Azure Portal, or in a local terminal session if you've run setup.sh first). Files are fetched into a local directory, which must already exist, referred to as OUTPUT_DIRECTORY in the following commands.

You can run setup.sh to install the Azure CLI or use the cloud shell on the Azure Portal.

To fetch all files:

az storage blob download-batch \
    --account-name <ACCOUNT_NAME> \
    --source <BLOB_NAME> \
    --pattern "*" \
    --destination "<OUTPUT_DIRECTORY>"

The --pattern argument can be used to fetch particular files, for example all log files:

az storage blob download-batch \
    --account-name <ACCOUNT_NAME> \
    --source <BLOB_NAME> \
    --pattern "*.log" \
    --destination "<OUTPUT_DIRECTORY>"

To fetch a single file, specify REMOTE_FILENAME for the name of the file in blob storage, and LOCAL_FILENAME for the filename it will be fetched into:

az storage blob download \
    --account-name <ACCOUNT_NAME> \
    --container-name <BLOB_NAME> \
    --name <REMOTE_FILENAME> \
    --file <LOCAL_FILENAME>

For full documentation, see the az storage blob documentation.

🔓 Accessing your BinderHub after Deployment

Once the deployment has succeeded and you've downloaded the log files, visit the IP address of your Binder page to test it's working.

The Binder IP address can be found by running the following:

cat <OUTPUT_DIRECTORY>/binder-ip.log

A good repository to test your BinderHub with is binder-examples/requirements

🏡 Running the Container Locally

The third way to deploy BinderHub to Azure would be to pull the Docker image and run it directly, parsing the values you would have entered in config.json as environment variables.

You will need the Docker CLI installed. Installation instructions can be found here.

First, pull the binderhub-setup image.

docker pull sgibson91/binderhub-setup:<TAG>

where <TAG> is your chosen image tag.

A list of availabe tags can be found here. It is recommended to use the most recent version number. The latest tag is the most recent build from the default branch and may be subject fluctuations.

Then, run the container with the following arguments, replacing the <> fields as necessary:

docker run \
-e "AKS_NODE_COUNT=1" \  # Required
-e "AKS_NODE_VM_SIZE=Standard_D2s_v3" \  # Required
-e "AZURE_SUBSCRIPTION=<Azure Subscription ID>" \  # Required
-e "BINDERHUB_CONTAINER_MODE=true" \  # Required
-e "BINDERHUB_NAME=<Chosen BinderHub name>" \  # Required
-e "BINDERHUB_VERSION=<Chosen BinderHub version>" \  # Required
-e "CONTAINER_REGISTRY=<dockerhub or azurecr>" \  # Required
-e "DOCKER_IMAGE_PREFIX=binder-dev" \  # Required
-e "DOCKERHUB_ORGANISATION=<Docker organisation>" \
-e "DOCKERHUB_PASSWORD=<Docker password>" \
-e "DOCKERHUB_USERNAME=<Docker ID>" \
-e "REGISTRY_NAME=<Registry Name>" \
-e "REGISTRY_SKU=Basic" \
-e "RESOURCE_GROUP_LOCATION=westeurope" \  # Required
-e "RESOURCE_GROUP_NAME=<Chosen Resource Group name>" \  # Required
-e "SP_APP_ID=<Service Principal ID>" \  # Required
-e "SP_APP_KEY=<Service Principal Key>" \  # Required
-e "SP_TENANT_ID=<Service Principal Tenant ID>" \  # Required
-it sgibson91/binderhub-setup:<TAG>

The output will be printed to your terminal and the files will be pushed to blob storage, as in the button deployment. See the Retrieving Deployment Output from Azure section for how to return these files.

🎨 Customising your BinderHub Deployment

Customising your BinderHub deployment is as simple as editing config.yaml and/or secret.yaml and then upgrading the BinderHub Helm Chart. The Helm Chart can be upgraded by running upgrade.sh (make sure you have the CLIs installed by running setup.sh first).

The Jupyter guide to customising the underlying JupyterHub can be found here.

The BinderHub guide for changing the landing page logo can be found here.

đŸ’ģ Developers Guide

🔧 Building the Docker image for testing

The Docker image will automatically be built by Docker Hub when new pushes are made to main. However, a developer may wish to build the image to test deployments before merging code.

Firstly, make sure config.json has been removed from the repository. Otherwise, secrets within the file may be built into the image.

The command to build a Docker image from the root of the repo is as follows.

docker build -t <DOCKER_USERNAME>/binderhub-setup:<TAG> .

It is not necessary to push this image to a container registry. But if you choose to do so, the command is as follows.

docker push <REGISTRY-HOST>/<DOCKER-USERNAME>/binderhub-setup:<TAG>

🏷ī¸ Tagging a Release

Docker Hub will automatically build the image from the repo with every push to main and tag this as latest.

To release a specific version, update the Azure ARM template with the new/desired version on line 166 and the block starting at line 170. We follow SemVer versioning format.

Once the Pull Request containing the new code/version/release has been merged, run the following commands, where vX.Y.Z is the new/desired version release.

git checkout main
git pull
git tag -a vX.Y.Z  # For an annotated tag
git tag -m vX.Y.Z  # For a lightweight tag
git tag vX.Y.Z     # For a tag with no extra data
git push --tags

This will trigger Docker Hub to build an image with the SemVer version as a tag.

See the following documentation for information on tagging:

💜 Contributors

Please read our 💜 Code of Conduct 💜 and 👾 Contributing Guidelines 👾 to get you started!

Thanks goes to these wonderful people (emoji key):


Diego

🐛 🤔 👀

Gerard Gorman

🤔 👀

James Robinson

đŸ’ģ

Nicholas Paldino

đŸ’ģ

Sarah Gibson

🐛 đŸ’ģ 📖 🤔 🚇 🚧 đŸ“Ļ 📆 đŸ’Ŧ 👀 🔧 ⚠ī¸

Simon Li

🐛

Tania Allard

🐛 đŸ’ģ 🤔 ✅ đŸ’Ŧ

Tim Greaves

🐛 đŸ’ģ 🤔 🚇 đŸ“Ļ 🔧
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].