All Projects → msalvaris → Gpu_monitor

msalvaris / Gpu_monitor

Licence: mit
Monitor your GPUs whether they are on a single computer or in a cluster

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gpu monitor

Victoriametrics
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
Stars: ✭ 5,558 (+4078.95%)
Mutual labels:  influxdb, cluster, grafana
Wait4disney
Shanghai Disney Waiting Queue Statistics 上海迪士尼排队情况
Stars: ✭ 99 (-25.56%)
Mutual labels:  influxdb, grafana
Nvfancontrol
NVidia dynamic fan control for Linux and Windows
Stars: ✭ 93 (-30.08%)
Mutual labels:  nvidia, gpu
Docker Influxdb Grafana
A Docker container which runs InfluxDB and Grafana ready for persisting data
Stars: ✭ 130 (-2.26%)
Mutual labels:  influxdb, grafana
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+854.89%)
Mutual labels:  nvidia, gpu
Influxgraph
Graphite InfluxDB backend. InfluxDB storage finder / plugin for Graphite API.
Stars: ✭ 87 (-34.59%)
Mutual labels:  influxdb, grafana
Nvidia p106
NVIDIA P106 GPUs
Stars: ✭ 106 (-20.3%)
Mutual labels:  nvidia, gpu
Amp
** THIS PROJECT IS STOPPED ** An open source CaaS for Docker, batteries included.
Stars: ✭ 74 (-44.36%)
Mutual labels:  cloud, cluster
Grafana
The open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more.
Stars: ✭ 45,930 (+34433.83%)
Mutual labels:  influxdb, grafana
Icinga2
Icinga is a monitoring system which checks the availability of your network resources, notifies users of outages, and generates performance data for reporting.
Stars: ✭ 1,670 (+1155.64%)
Mutual labels:  influxdb, cluster
Libcudacxx
The C++ Standard Library for your entire system.
Stars: ✭ 1,861 (+1299.25%)
Mutual labels:  nvidia, gpu
Cloudprober
An active monitoring software to detect failures before your customers do.
Stars: ✭ 1,269 (+854.14%)
Mutual labels:  cloud, grafana
Waifu2x Ncnn Vulkan
waifu2x converter ncnn version, runs fast on intel / amd / nvidia GPU with vulkan
Stars: ✭ 1,258 (+845.86%)
Mutual labels:  nvidia, gpu
Internet Speedtest Docker
Internet testing running on Docker Compose.
Stars: ✭ 88 (-33.83%)
Mutual labels:  influxdb, grafana
Nplusminer
NPlusMiner + GUI | NVIDIA/AMD/CPU miner | AI | Autoupdate | MultiRig remote management
Stars: ✭ 75 (-43.61%)
Mutual labels:  nvidia, gpu
Iotstack
docker stack for getting started on IOT on the Raspberry PI
Stars: ✭ 1,383 (+939.85%)
Mutual labels:  influxdb, grafana
Grafana Influx Dashboard
Grafana InfluxDB scripted dashboard
Stars: ✭ 130 (-2.26%)
Mutual labels:  influxdb, grafana
Parenchyma
An extensible HPC framework for CUDA, OpenCL and native CPU.
Stars: ✭ 71 (-46.62%)
Mutual labels:  nvidia, gpu
Flowr
Robust and efficient workflows using a simple language agnostic approach
Stars: ✭ 73 (-45.11%)
Mutual labels:  cloud, cluster
Nvapiwrapper
NvAPIWrapper is a .Net wrapper for NVIDIA public API, capable of managing all aspects of a display setup using NVIDIA GPUs
Stars: ✭ 105 (-21.05%)
Mutual labels:  nvidia, gpu

GPU Monitor

This is an app for monitoring GPUs on a single machine and across a cluster. You can use it to record various GPU measurements during a specific period using the context based loggers or continuously using the gpumon cli command. The context logger can either record to a file, which can be read back into a dataframe, or to an InfluxDB database. Data from the InfluxDB database can then be accessed using the python InfluxDB client or can be viewed in realtime using dashboards such as Grafana. Examples in Juypyter notebooks can be found here

When logging to influxdb the logger uses the Python bindings for the NVIDIA Management Library (NVML) which is a C-based API used for monitoring NVIDIA GPU devices. The performance of NVML is better and more efficient when compared to using nvidia-smi leading to a higher sampling frequency of the measurements.

Below is an example dashboard using the InfluxDB log context and a Grafana dashboard

Grafana GPU Dashboard

Installation

To install simply either clone the repository

git clone https://github.com/msalvaris/gpu_monitor.git

Then install it:

pip install -e /path/to/repo

For now I recommend the -e flag since it is in active development and will be easy to update by pulling the latest changes from the repo.

Or just install using pip

pip install git+https://github.com/msalvaris/gpu_monitor.git

Docker

You can also run it straight from a docker image (masalvar_gpumon).

nvidia-docker run -it masalvar/gpumon gpumon msdlvm.southcentralus.cloudapp.azure.com admin password gpudb 8086 gpuseries

Usage

Running gpu monitor in Jupyter notebook with file based log context

from gpumon.file import log_context
from bokeh.io import output_notebook, show

output_notebook()# Without this the plot won't show in Jupyter notebook

with log_context('log.txt') as log:
    # GPU code
    
show(log.plot())# Will plot the utilisation during the context

log()# Will return dataframe with all the logged properties

Click here to see the example notebook

Running gpu monitor in Jupyter notebook with InfluxDB based log context

To do this you need to set up and install InfluxDB and Grafana. There are many ways to install and run InfluxDB and Grafana in this example we will be using Docker containers and docker-compose.

If you haven't got docker-compose installed see here for instructions

You must be also be able to execute the docker commands without the requirement of sudo. To do this in Ubuntu execute the following:

sudo groupadd docker
sudo usermod -aG docker $USER

If you haven't downloaded the whole repo then download the scripts directory. In there should be three files The file example.env contains the following variables:
INFLUXDB_DB=gpudb
INFLUXDB_USER=admin
INFLUXDB_USER_PASSWORD=password
INFLUXDB_ADMIN_ENABLED=true
GF_SECURITY_ADMIN_PASSWORD=password
GRAFANA_DATA_LOCATION=/tmp/grafana
INFLUXDB_DATA_LOCATION=/tmp/influxdb
GF_PATHS_PROVISIONING=/grafana-conf

Please change them to appropriate values. The data location entries (GRAFANA_DATA_LOCATION, INFLUXDB_DATA_LOCATION) will tell Grafana and InfluxDB where to store their data so that when the containers are destroyed the data remains. Once you have edited it rename example.env to .env.

Now inside the folder that contains the file you can run the command below and it will give you the various commands you can execute.

make

To start InfluxDB and Grafana you run

make run

Now in your Jupyter notebook simply add these lines

from gpumon.influxdb import log_context

with log_context('localhost', 'admin', 'password', 'gpudb', 'gpuseries'):
	# GPU Code

Make sure you replace the values in the call to the log_context with the appropriate values. gpudata is the name of the database and gpuseries is the name we have given to our series, feel free to change these. If the database name given in the context isn't the same as the one supplied in the .env file a new database will be created. Have a look at this notebook for a full example.

If you want to use the CLI version run the following command:

gpumon localhost admin password gpudb --series_name=gpuseries

The above command will connect to the influxdb database running on localhost with user=admin
password=password
database=gpudb
series_name=gpuseries

You can also put your parameters in a .env file in the same directory as you are executing the cli logger or the logging context and the necessary information will be pulled from it. You can also pass commands to them and these will override what is in your .env file

Setting up Grafana dashboard

By default a datasource and dashboard are set up for you if everything went well above GPU metrics should be flowing to your database. You can log in to Grafana by pointing a browser to the IP of your VM or computer on port 3000. If you are executing on a VM make sure that port is open. Once there log in with the credentials you specified in your .env file.

Manually setting up datasource and dashboard

Below is an example screen-shot of the datasource config

Datasource config

Once that is set up you will need to also set up your dashboard. The dashboard shown in the gif above can be found here and is installed by default.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].