All Projects → eBay → Nvidiagpubeat

eBay / Nvidiagpubeat

Licence: apache-2.0
nvidiagpubeat is an elastic beat that uses NVIDIA System Management Interface (nvidia-smi) to monitor NVIDIA GPU devices and can ingest metrics into Elastic search cluster, with support for both 6.x and 7.x versions of beats. nvidia-smi is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Nvidiagpubeat

Diskover
File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
Stars: ✭ 977 (+2120.45%)
Mutual labels:  elasticsearch
Ghidra falcon
Support of Nvidia Falcon processors for Ghidra (WIP)
Stars: ✭ 39 (-11.36%)
Mutual labels:  nvidia
Dashboard
📺 Create your own team dashboard with custom widgets. Built with Next.js, React, styled-components and polished.
Stars: ✭ 1,007 (+2188.64%)
Mutual labels:  elasticsearch
Coreos Nvidia
Yet another NVIDIA driver container for Container Linux (aka CoreOS)
Stars: ✭ 36 (-18.18%)
Mutual labels:  nvidia
Estab
Export elasticsearch as TSV or line delimited JSON.
Stars: ✭ 37 (-15.91%)
Mutual labels:  elasticsearch
Foundatio.parsers
A lucene style query parser that is extensible and allows modifying the query.
Stars: ✭ 39 (-11.36%)
Mutual labels:  elasticsearch
3d kibana charts vis
3D Kibana Charts: Pie Chart, Bars Chart, Bubbles Chart
Stars: ✭ 34 (-22.73%)
Mutual labels:  elasticsearch
Pyoptix
Python wrapper for NVIDIA OptiX Ray Tracing Engine
Stars: ✭ 42 (-4.55%)
Mutual labels:  nvidia
Fractional differencing gpu
Rapid large-scale fractional differencing with RAPIDS to minimize memory loss while making a time series stationary. 6x-400x speed up over CPU implementation.
Stars: ✭ 38 (-13.64%)
Mutual labels:  nvidia
Elasticsearch Jdbc
A elasticsearch specified SQL interface on Java, no need to tweak your es instance.
Stars: ✭ 41 (-6.82%)
Mutual labels:  elasticsearch
Openwisp Monitoring
Network monitoring system written in Python and Django, designed to be extensible, programmable, scalable and easy to use by end users: once the system is configured, monitoring checks, alerts and metric collection happens automatically.
Stars: ✭ 37 (-15.91%)
Mutual labels:  elasticsearch
Real Time Stream Processing Engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stars: ✭ 37 (-15.91%)
Mutual labels:  elasticsearch
Kuzzle
Open-source Back-end, self-hostable & ready to use - Real-time, storage, advanced search - Web, Apps, Mobile, IoT -
Stars: ✭ 991 (+2152.27%)
Mutual labels:  elasticsearch
Elasticsplunk
A Search command to explore Elasticsearch data within Splunk.
Stars: ✭ 35 (-20.45%)
Mutual labels:  elasticsearch
Flowpack.elasticsearch.contentrepositoryadaptor
Flowpack.ElasticSearch adapter to support the Neos Content Repository
Stars: ✭ 41 (-6.82%)
Mutual labels:  elasticsearch
Linux Tutorial
《Java 程序员眼中的 Linux》
Stars: ✭ 7,757 (+17529.55%)
Mutual labels:  elasticsearch
Entities Search Engine
Scripts and microservice to feed an ElasticSearch with Wikidata and Inventaire entities, and keep those up-to-date
Stars: ✭ 39 (-11.36%)
Mutual labels:  elasticsearch
Phalcon Vm
Vagrant configuration for PHP7, Phalcon 3.x and Zephir development.
Stars: ✭ 43 (-2.27%)
Mutual labels:  elasticsearch
Twint Search
Explore tweets gathered with Twint with faceted search
Stars: ✭ 42 (-4.55%)
Mutual labels:  elasticsearch
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+2172.73%)
Mutual labels:  elasticsearch

nvidiagpubeat

Welcome to nvidiagpubeat. nvidiagpubeat is an elastic beat that uses NVIDIA System Management Interface (nvidia-smi - https://developer.nvidia.com/nvidia-system-management-interface) to monitor NVIDIA GPU devices and can ingest metrics into Elastic search cluster. nvidia-smi is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.

nvidiagpubeat is built using Beats framework described at https://www.elastic.co/guide/en/beats/devguide/current/new-beat.html.

nvidiagpubeat elastic beat with help of nvidia-smi allows administrators to query GPU device state. It is targeted at the TeslaTM, GRIDTM, QuadroTM and Titan X product, though limited support is also available on other NVIDIA GPUs.

NVIDIA-smi ships with NVIDIA GPU display drivers on Linux, and with 64bit Windows Server 2008 R2 and Windows 7.

nvidiagpubeat provides ability (look at nvidiagpubeat.yml) to configure metrics that needs to be monitored and by default it queries utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate and can ingest them into elastic search cluster for possibly visualization using Kibana.

Getting Started with nvidiagpubeat

Build on macOS

Prerequisites

sudo curl https://bootstrap.pypa.io/get-pip.py | sudo python
sudo pip install virtualenv
brew install glide

Build on RedHat EL7

Prerequisites

yum install python-virtualenv
yum install golang

Initialize Project

Once the prerequisites have been installed, rest of the steps are common across OSes as nvidiagpubeat is written in Golang.

#Start with an empty directory
mkdir beats_dev

#Build with elastic beats branch=6.5 (Master branch might not always upto-date).
export WORKSPACE=`pwd`/beats_dev
export GOPATH=$WORKSPACE
git clone https://github.com/elastic/beats ${GOPATH}/src/github.com/elastic/beats --branch 6.5

#Clone nvidiagpubeat
mkdir $WORKSPACE/src/github.com/ebay
cd $WORKSPACE/src/github.com/ebay
git clone [email protected]:eBay/nvidiagpubeat.git

#Build
cd $WORKSPACE/src/github.com/ebay/nvidiagpubeat/
make setup
make

nvidiagpubeat and beats compatibility

To keep nvidiagpubeat compatible with upcoming versions of beats, nvidiagpubeat will be branched out beacause at times beats does make breaking changes across major releases. For instance between beats 6.x and beats 7.x, the cmd.GenRootCmd is broken.

beats version nvidiagpubeat branch
6.x master
7.x withBeats7.3

For instance to use 7.3 version of beats and compatible nvidiagpubeat, perform following actions before compilation.

#Build with elastic beats branch=6.5 (Master branch might not always upto-date).
export WORKSPACE=`pwd`/beats_dev
export GOPATH=$WORKSPACE
git clone https://github.com/elastic/beats ${GOPATH}/src/github.com/elastic/beats --branch 7.3

#Clone nvidiagpubeat
mkdir $WORKSPACE/src/github.com/ebay
cd $WORKSPACE/src/github.com/ebay
git clone [email protected]:eBay/nvidiagpubeat.git --branch withBeats7.3

Above instructions will generate a binary in the same directory with the name nvidiagpubeat.

Run in production environment

To run nvidiagpubeat with pre-installed nvidia-smi that is available in PATH, switch to "test" environment in nvidiagpubeat.yml and use

env: "production"
export PATH=$PATH:.
./nvidiagpubeat -c nvidiagpubeat.yml -e -d "*" -E seccomp.enabled=false

seccomp.enabled setting : nvidiagpubeat uses libbeat framework. For security purposes the libbeat framework by default drops the ability to fork/exec. As nvidiagpubeat executes nvidia-smi, security setting must be disabled by setting through command line.

Run in test environment (macOS)

To run nvidiagpubeat with pre-packaged localnvidiasmi switch to "test" environment in nvidiagpubeat.yml and use

env: "test"
export PATH=$PATH:.
./nvidiagpubeat -c nvidiagpubeat.yml -e -d "*" -E seccomp.enabled=false

localnvidiasmi executable built for macOS and is a mock GPU event generator that supports events for --query-compute-apps and --query-gpu. The executable is generated using nvidiasmilocal/localnvidiasmi.go file.

Sample event

The file nvidiagpubeat.yml defines the beat nvidiagpubeat with multiple options for query. For example query: "--query-gpu= will provide information about GPU and query: "--query-compute-apps= will list currently active compute processes.

The --query-gpu will generate below event by nvidiagpubeat.

Publish event: Publish event: {
  "@timestamp": "2021-01-03T07:27:16.080Z",
  "@metadata": {
    "beat": "nvidiagpubeat",
    "type": "doc",
    "version": "6.5.5"
  },
  "type": "nvidiagpubeat",
  "gpu_uuid": "GPU-b884db58-6340-7969-a79f-b937f3583884",
  "driver_version": "418.87.01",
  "index": 3,
  "gpu_serial": 3.20218176911e+11,
  "memory": {
    "used": 3256,
    "total": 16280
  },
  "name": "Tesla100-PCIE-16GB",
  "host": {
    "name": "AB-SJC-11111111"
  },
  "utilization": {
    "memory": 50,
    "gpu": 50
  },
  "beat": {
    "name": "AB-SJC-11111111",
    "hostname": "AB-SJC-11111111",
    "version": "6.5.5"
  },
  "pstate": 0,
  "gpu_bus_id": "00000000:19:00.0",
  "count": 4,
  "fan": {
    "speed": "[NotSupported]"
  },
  "gpuIndex": 3,
  "power": {
    "draw": 25.28,
    "limit": 250
  },
  "temperature": {
    "gpu": 24
  },
  "clocks": {
    "gr": 405,
    "sm": 405,
    "mem": 715
  }
}

The --query-compute-apps will generate below event by nvidiagpubeat.

Publish event: {
  "@timestamp": "2021-01-03T07:29:53.633Z",
  "@metadata": {
    "beat": "nvidiagpubeat",
    "type": "doc",
    "version": "6.5.5"
  },
  "pid": 222414,
  "process_name": "[NotFound]",
  "used_gpu_memory": 10,
  "gpu_bus_id": "00000000:19:00.0",
  "gpu_serial": 3.20218176911e+11,
  "beat": {
    "name": "AB-SJC-11111111",
    "hostname": "AB-SJC-11111111",
    "version": "6.5.5"
  },
  "gpu_name": "Tesla100-PCIE-16GB",
  "used_memory": 15,
  "gpuIndex": 3,
  "type": "nvidiagpubeat",
  "gpu_uuid": "GPU-b884db58-6340-7969-a79f-b937f3583884",
  "host": {
    "name": "AB-SJC-11111111"
  }
}

Build

Make changes (if any) and build the binary for nvidiagpubeat run the command below. This will generate a binary in the same directory with the name nvidiagpubeat.

make

Test cases

To test nvidiagpubeat, run the following command:

make unit-tests
make integration-tests
make coverage-report

The test coverage is reported in the folder ./build/coverage/

Cleanup dependencies

To remove them dependent libraries, builds, temporary files, files created during tests and the executatble, simply run

 rm -rf build/
 rm -rf data/
 rm -rf nvidiagpubeat
 rm -rf nvidiagpubeat.test
 rm -rf vendor
 rm -rf logs

Clone

To clone nvidiagpubeat from the git repository, run the following commands:

mkdir -p ${GOPATH}/github.com/ebay
cd ${GOPATH}/github.com/ebay
git clone [email protected]:eBay/nvidiagpubeat.git
cd nvidiagpubeat
git remote add upstream [email protected]:eBay/nvidiagpubeat.git

For further development, check out the beat developer guide.

Testimonials

I would love to hear about your use case. It will help me improve nvidiagpubeat. Please add few lines about your use case, affiliation and location. All fields are optional.

  1. Use Case: I am running deeplearning models on a headless linux machine. Therefore, it is essential for me to track my gpu load and related stats.

    Country: Germany

    Full Name: Julius Zimmermann

    Affiliation: Student

  2. Use Case:

    Country:

    Full Name:

    Affiliation:

  3. Use Case:

    Country:

    Full Name:

    Affiliation:

License

Copyright 2016-2018 eBay Inc. Architect/Developer: Deepak Vasthimal

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].