All Projects → harrism → mini-nbody

harrism / mini-nbody

Licence: Apache-2.0 License
A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.

Programming Languages

c
50402 projects - #5 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to mini-nbody

Babelstream
STREAM, for lots of devices written in many programming models
Stars: ✭ 121 (+65.75%)
Mutual labels:  benchmark, cuda
octotiger
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
Stars: ✭ 30 (-58.9%)
Mutual labels:  cuda, astrophysics
Nvidia libs test
Tests and benchmarks for cudnn (and in the future, other nvidia libraries)
Stars: ✭ 36 (-50.68%)
Mutual labels:  benchmark, cuda
Mixbench
A GPU benchmark tool for evaluating GPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL)
Stars: ✭ 130 (+78.08%)
Mutual labels:  benchmark, cuda
heyoka.py
Python library for ODE integration via Taylor's method and LLVM
Stars: ✭ 45 (-38.36%)
Mutual labels:  astrophysics, nbody
HeCBench
software.intel.com/content/www/us/en/develop/articles/repo-evaluating-performance-productivity-oneapi.html
Stars: ✭ 85 (+16.44%)
Mutual labels:  benchmark, cuda
heyoka
C++ library for ODE integration via Taylor's method and LLVM
Stars: ✭ 151 (+106.85%)
Mutual labels:  astrophysics, nbody
nBody
GPU-accelerated N-Body particle simulator with visualizer.
Stars: ✭ 28 (-61.64%)
Mutual labels:  cuda, nbody
tpch-spark
TPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-13.7%)
Mutual labels:  benchmark
Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-69.86%)
Mutual labels:  benchmark
bazel.cmake
bazel.cmake mimics the behavior of bazel to simplify the usability of CMake
Stars: ✭ 38 (-47.95%)
Mutual labels:  cuda
Audit-Test-Automation
The Audit Test Automation Package gives you the ability to get an overview about the compliance status of several systems. You can easily create HTML-reports and have a transparent overview over compliance and non-compliance of explicit setttings and configurations in comparison to industry standards and hardening guides.
Stars: ✭ 37 (-49.32%)
Mutual labels:  benchmark
fizzboom
Benchmark to compare async web server + interpreter + web client implementations across various languages
Stars: ✭ 46 (-36.99%)
Mutual labels:  benchmark
PbfVs
Implementation of Macklin, Miles, and Matthias Müller. "Position based fluids.". Visual Studio 2015 + CUDA 8.0
Stars: ✭ 100 (+36.99%)
Mutual labels:  cuda
dynamic-occupancy-grid-map
Implementation of A Random Finite Set Approach for Dynamic Occupancy Grid Maps with Real-Time Application
Stars: ✭ 89 (+21.92%)
Mutual labels:  cuda
LaneandYolovehicle-DetectionLinux
Lane depertaure and Yolo objection detection C++ Linux
Stars: ✭ 16 (-78.08%)
Mutual labels:  cuda
BinKit
Binary Code Similarity Analysis (BCSA) Benchmark
Stars: ✭ 54 (-26.03%)
Mutual labels:  benchmark
react-native-css-in-js-benchmarks
CSS in JS Benchmarks for React Native
Stars: ✭ 46 (-36.99%)
Mutual labels:  benchmark
cresset
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.
Stars: ✭ 573 (+684.93%)
Mutual labels:  cuda
sncosmo
Python library for supernova cosmology
Stars: ✭ 53 (-27.4%)
Mutual labels:  astrophysics

mini-nbody: A simple N-body Code

A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.

Benchmarks

There are 5 different benchmarks provided for CUDA and MIC platforms.

  1. nbody-orig : the original, unoptimized simulation (also for CPU)
  2. nbody-soa : Conversion from array of structures (AOS) data layout to structure of arrays (SOA) data layout
  3. nbody-flush : Flush denormals to zero (no code changes, just a command line option)
  4. nbody-block : Cache blocking
  5. nbody-unroll / nbody-align : platform specific final optimizations (loop unrolling in CUDA, and data alignment on MIC)

Files

nbody.c : simple, unoptimized OpenMP C code timer.h : simple cross-OS timing code

Each directory below includes scripts for building and running a "shmoo" of five successive optimizations of the code over a range of data sizes from 1024 to 524,288 bodies.

cuda/ : folder containing CUDA optimized versions of the original C code (in order of performance on Tesla K20c GPU)

  1. nbody-orig.cu : a straight port of the code to CUDA (shmoo-cuda-nbody-orig.sh)
  2. nbody-soa.cu : conversion to structure of arrays (SOA) data layout (shmoo-cuda-nbody-soa.sh)
  3. nbody-soa.cu + ftz : Enable flush denorms to zero (shmoo-cuda-nbody-ftz.sh)
  4. nbody-block.cu : cache blocking in CUDA shared memory (shmoo-cuda-nbody-block.sh)
  5. nbody-unroll.cu : addition of "#pragma unroll" to inner loop (shmoo-cuda-nbody-unroll.sh)

mic/ : folder containing Intel Xeon Phi (MIC) optimized versions of the original C code (in order of performance on Xeon Phi 7110P)

  1. ../nbody-orig.cu : original code (shmoo-mic-nbody-orig.sh)
  2. nbody-soa.c : conversion to structure of arrays (SOA) data layout (shmoo-mic-nbody-soa.sh)
  3. nbody-soa.cu + ftz : Enable flush denorms to zero (shmoo-mic-nbody-ftz.sh)
  4. nbody-block.c : cache blocking via loop splitting (shmoo-mic-nbody-block.sh)
  5. nbody-align.c : aligned memory allocation and vector access (shmoo-mic-nbody-align.sh)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].