All Projects → andyljones → Coolgpus

andyljones / Coolgpus

Licence: mit
GPU fan control for headless Linux

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Coolgpus

Remotery
Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer
Stars: ✭ 1,908 (+1115.29%)
Mutual labels:  gpu
Gpu Sentry
Flask-based package for monitoring utilisation of nVidia GPUs.
Stars: ✭ 153 (-2.55%)
Mutual labels:  gpu
F Lm
Language Modeling
Stars: ✭ 156 (-0.64%)
Mutual labels:  gpu
Fsynth
Web-based and pixels-based collaborative synthesizer
Stars: ✭ 146 (-7.01%)
Mutual labels:  gpu
Waifu2x Extension
Image, GIF and Video enlarger/upscaler achieved with waifu2x and Anime4K. [NO LONGER UPDATED]
Stars: ✭ 149 (-5.1%)
Mutual labels:  gpu
Difftaichi
10 differentiable physical simulators built with Taichi differentiable programming (DiffTaichi, ICLR 2020)
Stars: ✭ 2,024 (+1189.17%)
Mutual labels:  gpu
Citro3d
Homebrew PICA200 GPU wrapper library for Nintendo 3DS
Stars: ✭ 143 (-8.92%)
Mutual labels:  gpu
Rainbowminer
GPU/CPU Mining script with intelligent profit-switching between miningpools, algorithms, miners, using all possible combinations of devices (NVIDIA, AMD, CPU). Features: actively maintained, uses the top actual miner programs (Bminer, Ccminer, Claymore, Dstm, EnemyZ, Sgminer, T-rex and more) easy setup wizard, webinterface, auto update.
Stars: ✭ 158 (+0.64%)
Mutual labels:  gpu
Ml Workspace
🛠 All-in-one web-based IDE specialized for machine learning and data science.
Stars: ✭ 2,337 (+1388.54%)
Mutual labels:  gpu
Texture Compressor
CLI tool for texture compression using ASTC, ETC, PVRTC and S3TC in a KTX container.
Stars: ✭ 156 (-0.64%)
Mutual labels:  gpu
Optical Flow Filter
A real time optical flow algorithm implemented on GPU
Stars: ✭ 146 (-7.01%)
Mutual labels:  gpu
Floyd Cli
Command line tool for FloydHub - the fastest way to build, train, and deploy deep learning models
Stars: ✭ 147 (-6.37%)
Mutual labels:  gpu
Umpire
An application-focused API for memory management on NUMA & GPU architectures
Stars: ✭ 154 (-1.91%)
Mutual labels:  gpu
Learnunityshader
学习Unity Shader过程中的一些记录,特效,动画Demo。
Stars: ✭ 141 (-10.19%)
Mutual labels:  gpu
Prophecis
Prophecis is a one-stop cloud native machine learning platform.
Stars: ✭ 156 (-0.64%)
Mutual labels:  gpu
Nd4j
Fast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+1009.55%)
Mutual labels:  gpu
Gapid
GAPID is a collection of tools that allows you to inspect, tweak and replay calls from an application to a graphics driver.
Stars: ✭ 1,975 (+1157.96%)
Mutual labels:  gpu
Xmrminer
🐜 A CUDA based miner for Monero
Stars: ✭ 158 (+0.64%)
Mutual labels:  gpu
3dunderworld Sls Gpu cpu
A structured light scanner
Stars: ✭ 157 (+0%)
Mutual labels:  gpu
Cumf als
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
Stars: ✭ 154 (-1.91%)
Mutual labels:  gpu

This script lets you set a custom GPU fan curve on a headless Linux server.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:08:00.0 Off |                  N/A |
| 75%   60C    P2   254W / 250W |   9560MiB / 11019MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:41:00.0  On |                  N/A |
| 90%   70C    P2   237W / 250W |   9556MiB / 11016MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

It does not work on partially-headless servers, where some of the GPUs have displays and some don't

Instructions

pip install coolgpus
sudo $(which coolgpus) --speed 99 99

If you hear your server take off, it works! Now interrupt it and re-run either with Sensible Defaults (TM),

sudo $(which coolgpus)

or you can pass your own fan curve with

sudo $(which coolgpus) --temp 17 84 --speed 15 99 

This will make the fan speed increase linearly from 15% at <17C to 99% at >84C. You can also increase --hyst if you want to smooth out oscillations, at the cost of the fans possibly going faster than they need to.

Piecewise Linear Control

More generally, you can list any sequence of (increasing!) temperatures and speeds, and they'll be linearly interpolated:

sudo $(which coolgpus) --temp 20 55 80 --speed 5 30 99

Now the fan speed will be 5% at <20C, then increase linearly to 30% up to 55C, then again linearly to 99% up to 80C.

systemd

If your system uses systemd and you want to run this as a service, create a systemd unit file at /etc/systemd/system/coolgpus.service as per this template:

[Unit]
Description=Headless GPU Fan Control
After=syslog.target

[Service]
ExecStart=/home/ajones/conda/bin/coolgpus --kill 
Restart=on-failure
RestartSec=5s
ExecStop=/bin/kill -2 $MAINPID
KillMode=none 

[Install]
WantedBy=multi-user.target

You just need to sub in your own install location (which you can find with which coolgpus), and any flags you want. Then enable and start it with

sudo systemctl enable coolgpus
sudo systemctl start coolgpus

Troubleshooting

  • You've got a display attached: it won't work, but see this issue for progress.
  • You've got an X server hanging around for some reason: assuming you don't actually need it, run the script with --kill, which'll murder any existing X servers and let the script set up its own. Sometimes the OS might automatically recreate its X servers, and that's tricky enough to handle that it's up to you to sort out.
  • coolgpus: command not found: the pip script folder probably isn't on your PATH. On Ubuntu with the apt-get-installed pip, look in ~/.local/bin.
  • You hit Ctrl+C twice and now your fans are stuck at a certain speed: run the script again and interrupt it once, then let it shut down gracefully. Double interrupts stop it from handing control back to the driver. Don't double-interrupt things you barbarian.
  • General troubleshooting:
    • Read coolgpus --help
    • See if sudo /path/to/coolgpus actually works
    • Check that XOrg, nvidia-settings and nvidia-smi can all be called from your terminal.
    • Open coolgpus in a text editor, add a import pdb; pdb.set_trace() somewhere, and explore till you hit the error.

Why's this necessary?

If you want to install multiple GPUs in a single machine, you have to use blower-style GPUs else the hot exhaust builds up in your case. Blower-style GPUs can get very loud, so to avoid annoying customers nvidia artifically limits their fans to ~50% duty. At 50% duty and a heavy workload, blower-style GPUs will hot up to 85C or so and throttle themselves.

Now if you're on Windows nvidia happily lets you override that limit by setting a custom fan curve. If you're on Linux though you need to use nvidia-settings, which - as of Sept 2019 - requires a display attached to each GPU you want to set the fan for. This is a pain to set up, as is checking the GPU temp every few seconds and adjusting the fan speed.

This script does all that for you.

How it works

When you run coolgpus, it sets up a temporary X server for each GPU with a fake display attached. Then, it loops over the GPUs every few seconds and sets the fan speed according to their temperature. When the script dies, it returns control of the fans to the drivers and cleans up the X servers.

Credit

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].