matpool / mpu

Licence: GPL-2.0 license

A shim driver allows in-docker nvidia-smi showing correct process list without modify anything

Programming Languages

50402 projects - #5 most used programming language

30231 projects

Projects that are alternatives of or similar to mpu

MATE Desktop container designed for Kubernetes supporting OpenGL GLX and Vulkan for NVIDIA GPUs with WebRTC and HTML5, providing an open source remote cloud graphics or game streaming platform. Spawns its own fully isolated X Server instead of using the host X server, therefore not requiring /tmp/.X11-unix host sockets or host configuration.

Stars: ✭ 47 (+74.07%)

Mutual labels: nvidia, nvidia-docker

zabbix-nvidia-smi-integration

The Zabbix template for monitoring Nvidia graphics cards.

Stars: ✭ 22 (-18.52%)

Mutual labels: nvidia, nvidia-smi

nvidia gpu exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

Stars: ✭ 85 (+214.81%)

Mutual labels: nvidia, nvidia-smi

GPU-Jupyterhub

Setting up a Jupyterhub Dockercontainer to spawn Jupyter Notebooks with GPU support (containing Tensorflow, Pytorch and Keras)

Stars: ✭ 23 (-14.81%)

Mutual labels: nvidia, nvidia-docker

fahclient

Dockerized Folding@home client with NVIDIA GPU support to help battle COVID-19

Stars: ✭ 38 (+40.74%)

Mutual labels: nvidia, nvidia-docker

handbrake-nvenc-docker

Handbrake GUI with Web browser and VNC access. Supports NVENC encoding

Stars: ✭ 32 (+18.52%)

Mutual labels: nvidia, nvidia-docker

nvhtop

A tool for enriching the output of nvidia-smi forked from peci1/nvidia-htop.

Stars: ✭ 21 (-22.22%)

Mutual labels: nvidia, nvidia-smi

DistributedDeepLearning

Tutorials on running distributed deep learning on Batch AI

Stars: ✭ 23 (-14.81%)

Mutual labels: nvidia, nvidia-docker

lc0-docker

lc0docker: run lc0 chess client and lichess bot under Docker and Kubernetes

Stars: ✭ 26 (-3.7%)

Mutual labels: nvidia-docker

dofbot-jetson nano

Yahboom DOFBOT AI Vision Robotic Arm with ROS for Jetson NANO 4GB B01

Stars: ✭ 24 (-11.11%)

Mutual labels: nvidia

nvidia smi exporter

nvidia-smi exporter for Prometheus

Stars: ✭ 66 (+144.44%)

Mutual labels: nvidia-smi

NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

Stars: ✭ 797 (+2851.85%)

Mutual labels: nvidia

nvidia-video-codec-rs

Bindings for the NVIDIA Video Codec SDK

Stars: ✭ 24 (-11.11%)

Mutual labels: nvidia

gnome-nvidia-extension

A Gnome extension to show NVIDIA GPU information

Stars: ✭ 29 (+7.41%)

Mutual labels: nvidia

constant-memory-waveglow

PyTorch implementation of NVIDIA WaveGlow with constant memory cost.

Stars: ✭ 36 (+33.33%)

Mutual labels: nvidia

grepo

GKISS - A fork of KISS Linux that uses the GNU C library, mirror of https://codeberg.org/kiss-community/grepo

Stars: ✭ 51 (+88.89%)

Mutual labels: nvidia

Grub-Nvidia-Entry

Enable Nvidia driver only with the last entry in grub.

Stars: ✭ 40 (+48.15%)

Mutual labels: nvidia

hashcat-benchmark-comparison

Hashcat Benchmark Comparison

Stars: ✭ 22 (-18.52%)

Mutual labels: nvidia

nvidia-jetson-rt

Real-Time Scheduling with NVIDIA Jetson TX2

Stars: ✭ 38 (+40.74%)

Mutual labels: nvidia

faucon

NVIDIA Falcon Microprocessor Suite

Stars: ✭ 28 (+3.7%)

Mutual labels: nvidia

View All Similar Projects ➔

MPU

A shim driver allows in-docker nvidia-smi showing correct process list without modify anything.

The problems

The NVIDIA driver is not aware of the PID namespace and nvidia-smi has no capability to map global pid to virtual pid, thus it shows nothing. What's more, The NVIDIA driver is proprietary and we have no idea what's going on inside even small part of the Linux NVIDIA driver is open sourced.

The alternatives

add 'hostPID: true' to the pod specification
add '--pid=host' when starting a docker instance

Installation

NOTE: kernel 5.7.7 build routines don't export kallsyms kernel functions any longer, which means this module may not work properly.

for debian, to get kernel headers installed with sudo apt install linux-headers-$(uname -r). run sudo apt-get install build-essential to get make toolset installed.
clone this repo
cd and make
after build succeeded, sudo make install to install the module
using docker to create --gpu enabled instance and run several cases and check process list via nvidia-smi to see if all associated processes have been correctly shown

The steps

figure out the basic mechanism of the NVIDIA driver with the open sourced part
do some reverse engineering tests on the driver via GDB tools and several scripts (cuda/NVML)
use our module to intercept syscalls and re-write fields of data strucuture with the knowledge of reverse engineering
run the nvidia-smi with our module with several test cases

The details

nvidia-smi requests 0x20 ioctl command with 0xee4 flag to getting the global PID list (under init_pid_ns) ①
after getting non-empty PID list, it'll request 0x20 ioctl command with 0x1f48 flag with previous returned pids as input arguments to getting the process GPU memory consumptions ②
we hook the syscalls in system-wide approaching and intercept only NVIDIA device ioctl syscall (device major number is 195 and minor is 255 (control dev) which is defined in NVIDIA header file)
check if request task is under any PID namespace, do nothing if it's global one (under init_pid_ns)
if so, ① convert the PID list from global to virtual
however, ② is a little more complicated which contains two-way interceptors--pre and post.
- on pre-stage, before invoking NVIDIA ioctl, the virtual PIDs (returned from ①, converted) must convert back to global ones, since NVIDIA driver only recognize global PIDs.
- and one post-stage, after NVIDIA ioctl invoked, cast global PIDs back

NOTE

tested on

kernel 4.15.0-136 x64 , docker 19.03.15 , NVIDIA driver 440.64
kernel 4.19.0-14 x64, NVIDIA driver 460.32

Afterwords, we'd like to maintain the project with fully tested and more kernels and NVIDIA drivers supported. However we sincerely hope NVIDIA will fix this with simplicity and professionalism. Thx.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

matpool / mpu

Programming Languages

Labels

Projects that are alternatives of or similar to mpu

MPU

The problems

The alternatives

Installation

The steps

The details

NOTE