All Projects → nvpro-samples → gl_dynamic_lod

nvpro-samples / gl_dynamic_lod

Licence: Apache-2.0 license
GPU classifies how to render millions of particles

Programming Languages

C++
36643 projects - #6 most used programming language
GLSL
2045 projects
CMake
9771 projects

Projects that are alternatives of or similar to gl dynamic lod

Pine
🌲 Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.
Stars: ✭ 202 (+220.63%)
Mutual labels:  nvidia
Jetson easy
🔩 Automatically script to setup and configure your NVIDIA Jetson [Nano, Xavier, TX2i, TX2, TX1, TK1] . This script run different modules to update, fix and patch the kernel, install ROS and other...
Stars: ✭ 219 (+247.62%)
Mutual labels:  nvidia
Plotoptix
Data visualisation in Python based on OptiX 7.2 ray tracing framework.
Stars: ✭ 252 (+300%)
Mutual labels:  nvidia
Nvidia Htop
A tool for enriching the output of nvidia-smi.
Stars: ✭ 213 (+238.1%)
Mutual labels:  nvidia
Nicehashquickminer
Super simple & easy Windows 10 cryptocurrency miner made by NiceHash.
Stars: ✭ 211 (+234.92%)
Mutual labels:  nvidia
Nemosminer
NemosMiner multi algo profit switching NVIDIA miner
Stars: ✭ 224 (+255.56%)
Mutual labels:  nvidia
Smart Sketch
🖌 photorealistic drawings from simple sketches using NVIDIA's GauGAN
Stars: ✭ 195 (+209.52%)
Mutual labels:  nvidia
xnxpilot
Openpilot on Jetson Xavier NX
Stars: ✭ 81 (+28.57%)
Mutual labels:  nvidia
Relion
Image-processing software for cryo-electron microscopy
Stars: ✭ 219 (+247.62%)
Mutual labels:  nvidia
Deeppicar
Deep Learning Autonomous Car based on Raspberry Pi, SunFounder PiCar-V Kit, TensorFlow, and Google's EdgeTPU Co-Processor
Stars: ✭ 242 (+284.13%)
Mutual labels:  nvidia
Nvidia Clerk
A cross-platform go bot that tracks for availability of stock from Nvidia's store and adds a cart to your checkout.
Stars: ✭ 214 (+239.68%)
Mutual labels:  nvidia
Moonlight Common C
Core implementation of Nvidia's GameStream protocol
Stars: ✭ 218 (+246.03%)
Mutual labels:  nvidia
Nvidia Modded Inf
Modified nVidia .inf files to run drivers on all video cards, research & telemetry free drivers
Stars: ✭ 227 (+260.32%)
Mutual labels:  nvidia
Jetson Nano Baseboard
Antmicro's open hardware baseboard for the NVIDIA Jetson Nano and Jetson Xavier NX
Stars: ✭ 209 (+231.75%)
Mutual labels:  nvidia
F1-demo
Real-time vehicle telematics analytics demo using OmniSci
Stars: ✭ 27 (-57.14%)
Mutual labels:  nvidia
Nvidia Sniper
🎯 Autonomously buy Nvidia Founders Edition GPUs as soon as they become available.
Stars: ✭ 193 (+206.35%)
Mutual labels:  nvidia
Gl ssao
optimized screen-space ambient occlusion, cache-aware hbao
Stars: ✭ 220 (+249.21%)
Mutual labels:  nvidia
nvidia-vaapi-driver
A VA-API implemention using NVIDIA's NVDEC
Stars: ✭ 789 (+1152.38%)
Mutual labels:  nvidia
nvidia-docker-bootstrap
For those times when nvidia-docker is not possible (like AWS ECS)
Stars: ✭ 19 (-69.84%)
Mutual labels:  nvidia
Jetson Containers
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Stars: ✭ 223 (+253.97%)
Mutual labels:  nvidia

gl dynamic lod

With the addition of indirect rendering (ARB_draw_indirect and ARB_multi_draw_indirect), OpenGL got an efficient mechanism that allows the GPU to create or modify its own work without stalling the pipeline. As the CPU and GPU are best used when working asynchronously, avoiding readbacks to CPU to drive decision making is beneficial.

In this sample we use ARB_draw_indirect and ARB_shader_atomic_counters to build three distinct render lists for drawing particles as spheres, each using a different shader and representing a different level of detail (LOD):

  • Draw as point
  • Draw as instanced low resolution mesh
  • Draw as instanced adaptively tessellated mesh

sample screenshot

This allows us to limit the total amount of geometry being rasterized, and still benefit from high geometric quality where needed.

sample screenshot

The frame timeline is therefore split into two parts:

  1. LOD Classification:
  • Each particle is put in one of the appropriate lists using global atomics based on projected size in the viewport. Frustum-culling is also applied in advance.
  • A single shader invocation manipulates the DrawIndirect commands based on the atomic counter values. This step is required as the sample uses an alternative way to classic instancing.
  1. Rendering:
  • Every list is drawn by one or two glDrawElementsIndirect calls to render the particles.
  • Instancing is done via batching in two steps (see later).
struct DrawElementsIndirect {
  uint  elementCount;   // modified at runtime
  uint  instanceCount;  // modified at runtime
  uint  first;          // 0
  uint  baseVertex;     // 0
  uint  baseInstance;   // 0
};

Batched low complexity mesh instancing

When instancing meshes that have only very few triangles, the classic way of using the graphics API's instance counter may not be the most efficient for the hardware. We use batching to improve performance. Instead of drawing all particles at once, we draw them in two steps, which depends on how much we want to draw overall (listSize):

  1. elementCount = batchSize * meshSize; instanceCount = listSize / batchSize;
  2. elementCount = (listSize % batchSize) * meshSize; instanceCount = 1;

We first draw batchSize meshes via classic instancing, and then whatever is left.

The instanced mesh is replicated batchSize times in the source VBO/IBO, instead of storing it only once. That way each per-instance hardware drawcall does more work, which helps leverage GPU parallelism. The memory cost of this can typically be neglected, as we specifically target low-complexity meshes with just a few triangles & vertices; if we had a lot of triangles per-mesh, then classic instancing would do the trick.

With classic instancing we would simply use gl_InstanceID to find out which instance we are, but here we use an alternative formula:

instanceID = batchedID + gl_InstanceID * MESH_BATCHSIZE;

batchedID represents which of the replicated batched meshes we are currently rendering. While it isn't a built-in vertex shader variable, we can derive it from the gl_VertexID, as the index buffer accounts for the vertex data replication in the VBO. The index values (gl_VertexID) of a batched mesh are in the range [MESH_VERTICES * batchedID, (MESH_VERTICES * (batchedID+1)) -1], so

instanceID = (gl_VertexID / MESH_VERTICES) + gl_InstanceID * MESH_BATCHSIZE;

When drawing the rest of the meshes with the second drawcall, one has to offset the instanceID by the number of meshes already drawn.

instanceID +=   int(firstCmd.instanceCount) * 
              ( int(firstCmd.elementCount) / MESH_INDICES); 

Performance

The UI can be used to modify the sample a bit. For example, "invisible rendering" via glEnable(GL_RASTERIZER_DISCARD) can be used to time the classification or compute shaders alone. The entire task can also be split into multiple jobs, which allows the program to decrease the size of temporary list buffers. Last but not least, one can experiment with recording the particle data directly or indices. The default configuration gives the best performance for higher amounts of particles (compute, single job, indices).

Timings in microseconds via GL timer query taken on a Quadro M6000, 1048574 particles

 Timer Frame;    GL    4206;
  Timer Lod;     GL     151;
   Timer Cont;   GL     139;  // Particle classification (content)
   Timer Cmds;   GL       8;  // DrawIndirect struct (commands)
  Timer Draw;    GL    3888;
   Timer Tess;   GL     256;  // Adaptively-tessellated spheres
   Timer Mesh;   GL    3586;  // Simple sphere mesh
   Timer Pnts;   GL      39;  // Spheres drawn as points
  Timer TwDraw;  GL     160;

Sample Highlights

The user can influence the classification based on the viewport size using the "pixelsize" parameters. The classification can also be paused and re-used despite camera being changed, which can be useful to see the frustum culling in action, or inspect low-resolution representations.

Key functionality is found in

  • Sample::drawLod()

As well as in helper functions

  • Sample::initParticleBuffer()
  • Sample::initLodBuffers()

In common.h, you can set USE_COMPACT_PARTICLE to 1 to reduce the size of the particles to a single vec4 by giving all particles the same world size. This mode allows rendering around 130 million particles on NVIDIA hardware, twice as much as the default 0 setting.

Building

Ideally, clone this and other interesting nvpro-samples repositories into a common subdirectory. You will always need nvpro_core. The nvpro_core is searched either as a subdirectory of the sample, or one directory up.

If you are interested in multiple samples, you can use the build_all CMAKE as an entry point. It will also give you options to enable or disable individual samples when creating the solutions.

Related Samples

gl_occlusion_culling makes use of similar OpenGL functionality to perform more accurate visibility culling.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].