All Projects → NVIDIA → ngc-container-replicator

NVIDIA / ngc-container-replicator

Licence: BSD-3-Clause License
NGC Container Replicator

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to ngc-container-replicator

docker-nvidia-glx-desktop
MATE Desktop container designed for Kubernetes supporting OpenGL GLX and Vulkan for NVIDIA GPUs with WebRTC and HTML5, providing an open source remote cloud graphics or game streaming platform. Spawns its own fully isolated X Server instead of using the host X server, therefore not requiring /tmp/.X11-unix host sockets or host configuration.
Stars: ✭ 47 (+147.37%)
Mutual labels:  nvidia-docker
EnvyUpdate
Small update checker application for Nvidia GPUs
Stars: ✭ 40 (+110.53%)
Mutual labels:  nvidia-gpus
zabbix-nvidia-smi-integration
The Zabbix template for monitoring Nvidia graphics cards.
Stars: ✭ 22 (+15.79%)
Mutual labels:  nvidia-gpus
folding-at-home
A Folding@Home Docker container with GPU support
Stars: ✭ 38 (+100%)
Mutual labels:  nvidia-docker
gpu-cluster-config
How to Configure a GPU Cluster Running Ubuntu Linux
Stars: ✭ 45 (+136.84%)
Mutual labels:  nvidia-gpus
pixel-decoder
A tool for running deep learning algorithms for semantic segmentation with satellite imagery
Stars: ✭ 68 (+257.89%)
Mutual labels:  nvidia-docker
mpu
A shim driver allows in-docker nvidia-smi showing correct process list without modify anything
Stars: ✭ 27 (+42.11%)
Mutual labels:  nvidia-docker
NemosMiner
NemosMiner multi algo profit switching NVIDIA/AMD/CPU miner
Stars: ✭ 20 (+5.26%)
Mutual labels:  nvidia-gpus
handbrake-nvenc-docker
Handbrake GUI with Web browser and VNC access. Supports NVENC encoding
Stars: ✭ 32 (+68.42%)
Mutual labels:  nvidia-docker
memalloy
Memory consistency modelling using Alloy
Stars: ✭ 23 (+21.05%)
Mutual labels:  nvidia-gpus
lc0-docker
lc0docker: run lc0 chess client and lichess bot under Docker and Kubernetes
Stars: ✭ 26 (+36.84%)
Mutual labels:  nvidia-docker
DistributedDeepLearning
Tutorials on running distributed deep learning on Batch AI
Stars: ✭ 23 (+21.05%)
Mutual labels:  nvidia-docker
restful-yolo
RESTful Web Service and C++ compilable version of YOLO written in C and CUDA for object detection.
Stars: ✭ 19 (+0%)
Mutual labels:  nvidia-docker
ansible-nvidia
No description or website provided.
Stars: ✭ 32 (+68.42%)
Mutual labels:  nvidia-docker
fahclient
Dockerized Folding@home client with NVIDIA GPU support to help battle COVID-19
Stars: ✭ 38 (+100%)
Mutual labels:  nvidia-docker
Image Super Resolution
🔎 Super-scale your images and run experiments with Residual Dense and Adversarial Networks.
Stars: ✭ 3,293 (+17231.58%)
Mutual labels:  nvidia-docker
Nvidia Docker
Build and run Docker containers leveraging NVIDIA GPUs
Stars: ✭ 13,961 (+73378.95%)
Mutual labels:  nvidia-docker
GPU-Jupyterhub
Setting up a Jupyterhub Dockercontainer to spawn Jupyter Notebooks with GPU support (containing Tensorflow, Pytorch and Keras)
Stars: ✭ 23 (+21.05%)
Mutual labels:  nvidia-docker
docker-ce docker-compose nvidia-docker2
脚本离线安装支持 NVIDIA GPU 的 Docker 套装
Stars: ✭ 29 (+52.63%)
Mutual labels:  nvidia-docker

NGC Replicator

Clones nvcr.io using the either DGX (compute.nvidia.com) or NGC (ngc.nvidia.com) API keys.

The replicator will make an offline clone of the NGC/DGX container registry. In its current form, the replicator will download every CUDA container image as well as each Deep Learning framework image in the NVIDIA project.

Tarfiles will be saved in /output inside the container, so be sure to volume mount that directory. In the following example, we will collect our images in /tmp on the host.

Use --min-version to limit the number of versions to download. In the example below, we will only clone versions 17.10 and later DL framework images.

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output \
    deepops/replicator --project=nvidia --min-version=17.12 \
                       --api-key=<your-dgx-or-ngc-api-key>

You can also filter on specific images. If you want to filter only on image names containing the strings "tensorflow", "pytorch", and "tensorrt", you would simply add --image for each option, e.g.

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output \
    deepops/replicator --project=nvidia --min-version=17.12 \
                       --image=tensorflow --image=pytorch --image=tensorrt \
                       --dry-run \
                       --api-key=<your-dgx-or-ngc-api-key>

Note: the --dry-run option lets you see what will happen without committing to a lengthy download.

By default, the --image flag does a substring match in order to ensure you match all images that may be desired. Sometimes, however, you only want to download a specific image with no substring matching. In this case, you can add the --strict-name-match flag, e.g.

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/output \
    deepops/replicator --project=nvidia --min-version=17.12 \
                       --image=tensorflow \
                       --strict-name-match \
                       --dry-run \
                       --api-key=<your-dgx-or-ngc-api-key>

Note: a state.yml file will be created the output directory. This saved state will be used to avoid pulling images that were previously pulled. If you wish to repull and save an image, just delete the entry in state.yml corresponding to the image_name and tag you wish to refresh.

Kubernetes Deployment

If you don't already have a deepops namespace, create one now.

kubectl create namespace deepops

Next, create a secret with your NGC API Key

kubectl -n deepops create secret generic  ngc-secret
--from-literal=apikey=<your-api-key-goes-here>

Next, create a persistent volume claim that will life outside the lifecycle of the CronJob. If you are using DeepOps you can use a Rook/Ceph PVC similar to:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ngc-replicator-pvc
  namespace: deepops
  labels:
    app: ngc-replicator
spec:
  storageClassName: rook-raid0-retain  # <== Replace with your StorageClass
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 32Mi

Finally, create a CronJob that executes the replicator on a schedule. This eample run the replicator every hour. Note: This example used Rook block storage to provide a persistent volume to hold the state.yml between executions. This ensures you will only download new container images. For more details, see our DeepOps project.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: replicator-config
  namespace: deepops
data:
  ngc-update.sh: |
    #!/bin/bash
    ngc_replicator                                        \
      --project=nvidia                                    \
      --min-version=$(date +"%y.%m" -d "1 month ago")     \
      --py-version=py3                                    \
      --image=tensorflow --image=pytorch --image=tensorrt \
      --no-exporter                                       \
      --registry-url=registry.local  # <== Replace with your local repo
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: ngc-replicator
  namespace: deepops
  labels:
    app: ngc-replicator
spec:
  schedule: "0 4 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          nodeSelector:
            node-role.kubernetes.io/master: ""
          containers:
            - name: replicator
              image: deepops/replicator
              imagePullPolicy: Always
              command: [ "/bin/sh", "-c", "/ngc-update/ngc-update.sh" ]
              env:
              - name: NGC_REPLICATOR_API_KEY
                valueFrom:
                  secretKeyRef:
                    name: ngc-secret
                    key: apikey
              volumeMounts:
              - name: registry-config
                mountPath: /ngc-update
              - name: docker-socket
                mountPath: /var/run/docker.sock
              - name: ngc-replicator-storage
                mountPath: /output
          volumes:
            - name: registry-config
              configMap:
                name: replicator-config
                defaultMode: 0777
            - name: docker-socket
              hostPath:
                path: /var/run/docker.sock
                type: File
            - name: ngc-replicator-storage
              persistentVolumeClaim:
                claimName: ngc-replicator-pvc
          restartPolicy: Never

Developer Quickstart

make dev
py.test

TODOs

  • save markdown readmes for each image. these are not version controlled
  • test local registry push service. coded, beta testing
  • add templater to workflow
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].