All Projects → spcl → Dace

spcl / Dace

Licence: bsd-3-clause
DaCe - Data Centric Parallel Programming

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dace

bifrost
A stream processing framework for high-throughput applications.
Stars: ✭ 48 (-54.72%)
Mutual labels:  cuda, high-performance-computing
GOSH
An ultra-fast, GPU-based large graph embedding algorithm utilizing a novel coarsening algorithm requiring not more than a single GPU.
Stars: ✭ 12 (-88.68%)
Mutual labels:  cuda, high-performance-computing
Claymore
Stars: ✭ 135 (+27.36%)
Mutual labels:  cuda, high-performance-computing
Arraymancer
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
Stars: ✭ 793 (+648.11%)
Mutual labels:  cuda, high-performance-computing
Hipsycl
Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs
Stars: ✭ 377 (+255.66%)
Mutual labels:  cuda, high-performance-computing
rbcuda
CUDA bindings for Ruby
Stars: ✭ 57 (-46.23%)
Mutual labels:  cuda, high-performance-computing
Relion
Image-processing software for cryo-electron microscopy
Stars: ✭ 219 (+106.6%)
Mutual labels:  cuda, high-performance-computing
Bayadera
High-performance Bayesian Data Analysis on the GPU in Clojure
Stars: ✭ 342 (+222.64%)
Mutual labels:  cuda, high-performance-computing
Taskflow
A General-purpose Parallel and Heterogeneous Task Programming System
Stars: ✭ 6,128 (+5681.13%)
Mutual labels:  cuda, high-performance-computing
Neanderthal
Fast Clojure Matrix Library
Stars: ✭ 927 (+774.53%)
Mutual labels:  cuda, high-performance-computing
Region Conv
Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
Stars: ✭ 95 (-10.38%)
Mutual labels:  cuda
Pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
Stars: ✭ 95 (-10.38%)
Mutual labels:  cuda
Pygraphistry
PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer
Stars: ✭ 1,365 (+1187.74%)
Mutual labels:  cuda
Neorv32
A small and customizable full-scale 32-bit RISC-V soft-core CPU and SoC written in platform-independent VHDL.
Stars: ✭ 106 (+0%)
Mutual labels:  fpga
Fbtt Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.
Stars: ✭ 92 (-13.21%)
Mutual labels:  cuda
Deepnet
Deep.Net machine learning framework for F#
Stars: ✭ 99 (-6.6%)
Mutual labels:  cuda
Vgasim
A Video display simulator
Stars: ✭ 94 (-11.32%)
Mutual labels:  fpga
Boinc
Open-source software for volunteer computing and grid computing.
Stars: ✭ 1,320 (+1145.28%)
Mutual labels:  high-performance-computing
Icestation 32
Compact FPGA game console
Stars: ✭ 93 (-12.26%)
Mutual labels:  fpga
Fpga Soc Linux
FPGA+SoC+Linux+Device Tree Overlay+FPGA Manager U-Boot&Linux Kernel&Debian10 Images (for Xilinx:Zynq-Zybo:PYNQ-Z1 Altera:de0-nano-soc)
Stars: ✭ 106 (+0%)
Mutual labels:  fpga

Build and Test Documentation Status PyPI version codecov

DaCe - Data-Centric Parallel Programming

Decoupling domain science from performance optimization.

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages, and maps it to high-performance CPU, GPU, and FPGA programs, which can be optimized to achieve state-of-the-art. Internally, DaCe uses the Stateful DataFlow multiGraph (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is posible to optimize a program without changing its source, so that it stays readable. On the other hand, transformations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.

DaCe generates high-performance programs for:

  • Multi-core CPUs (tested on Intel and IBM POWER9)
  • NVIDIA GPUs
  • AMD GPUs (with HIP)
  • Xilinx FPGAs
  • Intel FPGAs

DaCe can be written inline in Python and transformed in the command-line/Jupyter Notebooks, or SDFGs can be interactively modified using the Data-centric Interactive Optimization Development Environment (DIODE, currently experimental).

For more information, see our paper.

See an example SDFG in the standalone viewer (SDFV).

Tutorials

Installation and Dependencies

To install: pip install dace

Runtime dependencies:

  • A C++14-capable compiler (e.g., gcc 5.3+)
  • Python 3.6 or newer
  • CMake 3.15 or newer

Running

Python scripts: Run DaCe programs (in implicit or explicit syntax) using Python directly.

SDFV (standalone SDFG viewer): To view SDFGs separately, run the sdfv installed script with the .sdfg file as an argument. Alternatively, you can use the link or open diode/sdfv.html directly and choose a file in the browser.

Visual Studio Code plugin: Install from the VSCode marketplace or open an .sdfg file for interactive SDFG viewing and transformation.

DIODE interactive development (experimental):: Either run the installed script diode, or call python3 -m diode from the shell. Then, follow the printed instructions to enter the web interface.

The sdfgcc tool: Compile .sdfg files with sdfgcc program.sdfg. Interactive command-line optimization is possible with the --optimize flag.

Jupyter Notebooks: DaCe is Jupyter-compatible. If a result is an SDFG or a state, it will show up directly in the notebook. See the tutorials for examples.

Octave scripts (experimental): .m files can be run using the installed script dacelab, which will create the appropriate SDFG file.

Note for Windows/Visual C++ users: If compilation fails in the linkage phase, try setting the following environment variable to force Visual C++ to use Multi-Threaded linkage:

X:\path\to\dace> set _CL_=/MT

Publication

If you use DaCe, cite us:

@inproceedings{dace,
  author    = {Ben-Nun, Tal and de~Fine~Licht, Johannes and Ziogas, Alexandros Nikolaos and Schneider, Timo and Hoefler, Torsten},
  title     = {Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures},
  year      = {2019},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  series = {SC '19}
}

Configuration

DaCe creates a file called .dace.conf in the user's home directory. It provides useful settings that can be modified either directly in the file (YAML), within DIODE, or overriden on a case-by-case basis using environment variables that begin with DACE_ and specify the setting (where categories are separated by underscores). The full configuration schema is located here.

Useful environment variable configurations include:

  • DACE_CONFIG (default: ~/.dace.conf): Override DaCe configuration file choice.

General configuration:

  • DACE_debugprint (default: False): Print debugging information.
  • DACE_compiler_use_cache (default: False): Uses DaCe program cache instead of re-optimizing and compiling programs.
  • DACE_compiler_default_data_types (default: Python): Chooses default types for integer and floating-point values. If Python is chosen, int and float are both 64-bit wide. If C is chosen, int and float are 32-bit wide.

GPU programming and debugging:

  • DACE_compiler_cuda_backend (default: cuda): Chooses the GPU backend to use (can be cuda for NVIDIA GPUs or hip for AMD GPUs).
  • DACE_compiler_cuda_syncdebug (default: False): If True, calls device-synchronization after every GPU kernel and checks for errors. Good for checking crashes or invalid memory accesses.

FPGA programming:

  • DACE_compiler_fpga_vendor: (default: xilinx): Can be xilinx for Xilinx FPGAs, or intel_fpga for Intel FPGAs.

SDFG interactive transformation:

  • DACE_optimizer_transform_on_call (default: False): Uses the transformation command line interface every time a @dace function is called.
  • DACE_optimizer_interface (default: dace.transformation.optimizer.SDFGOptimizer): Controls the SDFG optimization process if transform_on_call is enabled. By default, uses the transformation command line interface.
  • DACE_optimizer_automatic_strict_transformations (default: True): If False, skips automatic strict transformations in the Python frontend (see transformations tutorial for more information).

Profiling:

  • DACE_profiling (default: False): Enables profiling measurement of the DaCe program runtime in milliseconds. Produces a log file and prints out median runtime.
  • DACE_treps (default: 100): Number of repetitions to run a DaCe program when profiling is enabled.

Contributing

DaCe is an open-source project. We are happy to accept Pull Requests with your contributions! Please follow the contribution guidelines before submitting a pull request.

License

DaCe is published under the New BSD license, see LICENSE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].