Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → spcl → Dace

spcl / Dace

Licence: bsd-3-clause

DaCe - Data Centric Parallel Programming

Programming Languages

139335 projects - #7 most used programming language

Labels

cuda fpga high-performance-computing

Projects that are alternatives of or similar to Dace

A stream processing framework for high-throughput applications.

Stars: ✭ 48 (-54.72%)

Mutual labels: cuda, high-performance-computing

An ultra-fast, GPU-based large graph embedding algorithm utilizing a novel coarsening algorithm requiring not more than a single GPU.

Stars: ✭ 12 (-88.68%)

Mutual labels: cuda, high-performance-computing

Stars: ✭ 135 (+27.36%)

Mutual labels: cuda, high-performance-computing

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

Stars: ✭ 793 (+648.11%)

Mutual labels: cuda, high-performance-computing

Implementation of SYCL for CPUs, AMD GPUs, NVIDIA GPUs

Stars: ✭ 377 (+255.66%)

Mutual labels: cuda, high-performance-computing

CUDA bindings for Ruby

Stars: ✭ 57 (-46.23%)

Mutual labels: cuda, high-performance-computing

Image-processing software for cryo-electron microscopy

Stars: ✭ 219 (+106.6%)

Mutual labels: cuda, high-performance-computing

High-performance Bayesian Data Analysis on the GPU in Clojure

Stars: ✭ 342 (+222.64%)

Mutual labels: cuda, high-performance-computing

A General-purpose Parallel and Heterogeneous Task Programming System

Stars: ✭ 6,128 (+5681.13%)

Mutual labels: cuda, high-performance-computing

Fast Clojure Matrix Library

Stars: ✭ 927 (+774.53%)

Mutual labels: cuda, high-performance-computing

Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade

Stars: ✭ 95 (-10.38%)

Mutual labels: cuda

A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python

Stars: ✭ 95 (-10.38%)

Mutual labels: cuda

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

Stars: ✭ 1,365 (+1187.74%)

Mutual labels: cuda

A small and customizable full-scale 32-bit RISC-V soft-core CPU and SoC written in platform-independent VHDL.

Stars: ✭ 106 (+0%)

Mutual labels: fpga

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

Stars: ✭ 92 (-13.21%)

Mutual labels: cuda

Deep.Net machine learning framework for F#

Stars: ✭ 99 (-6.6%)

Mutual labels: cuda

A Video display simulator

Stars: ✭ 94 (-11.32%)

Mutual labels: fpga

Open-source software for volunteer computing and grid computing.

Stars: ✭ 1,320 (+1145.28%)

Mutual labels: high-performance-computing

Compact FPGA game console

Stars: ✭ 93 (-12.26%)

Mutual labels: fpga

FPGA+SoC+Linux+Device Tree Overlay+FPGA Manager U-Boot&Linux Kernel&Debian10 Images (for Xilinx:Zynq-Zybo:PYNQ-Z1 Altera:de0-nano-soc)

Stars: ✭ 106 (+0%)

Mutual labels: fpga

View All Similar Projects ➔

aCe - Data-Centric Parallel Programming

Decoupling domain science from performance optimization.

DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages, and maps it to high-performance CPU, GPU, and FPGA programs, which can be optimized to achieve state-of-the-art. Internally, DaCe uses the Stateful DataFlow multiGraph (SDFG) data-centric intermediate representation: A transformable, interactive representation of code based on data movement. Since the input code and the SDFG are separate, it is posible to optimize a program without changing its source, so that it stays readable. On the other hand, transformations are customizable and user-extensible, so they can be written once and reused in many applications. With data-centric parallel programming, we enable direct knowledge transfer of performance optimization, regardless of the application or the target processor.

DaCe generates high-performance programs for:

Multi-core CPUs (tested on Intel and IBM POWER9)
NVIDIA GPUs
AMD GPUs (with HIP)
Xilinx FPGAs
Intel FPGAs

DaCe can be written inline in Python and transformed in the command-line/Jupyter Notebooks, or SDFGs can be interactively modified using the Data-centric Interactive Optimization Development Environment (DIODE, currently experimental).

For more information, see our paper.

See an example SDFG in the standalone viewer (SDFV).

Tutorials

Installation and Dependencies

To install: pip install dace

Runtime dependencies:

A C++14-capable compiler (e.g., gcc 5.3+)
Python 3.6 or newer
CMake 3.15 or newer

Running

Python scripts: Run DaCe programs (in implicit or explicit syntax) using Python directly.

SDFV (standalone SDFG viewer): To view SDFGs separately, run the sdfv installed script with the .sdfg file as an argument. Alternatively, you can use the link or open diode/sdfv.html directly and choose a file in the browser.

Visual Studio Code plugin: Install from the VSCode marketplace or open an .sdfg file for interactive SDFG viewing and transformation.

DIODE interactive development (experimental):: Either run the installed script diode, or call python3 -m diode from the shell. Then, follow the printed instructions to enter the web interface.

The sdfgcc tool: Compile .sdfg files with sdfgcc program.sdfg. Interactive command-line optimization is possible with the --optimize flag.

Jupyter Notebooks: DaCe is Jupyter-compatible. If a result is an SDFG or a state, it will show up directly in the notebook. See the tutorials for examples.

Octave scripts (experimental): .m files can be run using the installed script dacelab, which will create the appropriate SDFG file.

Note for Windows/Visual C++ users: If compilation fails in the linkage phase, try setting the following environment variable to force Visual C++ to use Multi-Threaded linkage:

X:\path\to\dace> set _CL_=/MT

Publication

If you use DaCe, cite us:

@inproceedings{dace,
  author    = {Ben-Nun, Tal and de~Fine~Licht, Johannes and Ziogas, Alexandros Nikolaos and Schneider, Timo and Hoefler, Torsten},
  title     = {Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures},
  year      = {2019},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  series = {SC '19}
}

Configuration

DaCe creates a file called .dace.conf in the user's home directory. It provides useful settings that can be modified either directly in the file (YAML), within DIODE, or overriden on a case-by-case basis using environment variables that begin with DACE_ and specify the setting (where categories are separated by underscores). The full configuration schema is located here.

Useful environment variable configurations include:

DACE_CONFIG (default: ~/.dace.conf): Override DaCe configuration file choice.

General configuration:

DACE_debugprint (default: False): Print debugging information.
DACE_compiler_use_cache (default: False): Uses DaCe program cache instead of re-optimizing and compiling programs.
DACE_compiler_default_data_types (default: Python): Chooses default types for integer and floating-point values. If Python is chosen, int and float are both 64-bit wide. If C is chosen, int and float are 32-bit wide.

GPU programming and debugging:

DACE_compiler_cuda_backend (default: cuda): Chooses the GPU backend to use (can be cuda for NVIDIA GPUs or hip for AMD GPUs).
DACE_compiler_cuda_syncdebug (default: False): If True, calls device-synchronization after every GPU kernel and checks for errors. Good for checking crashes or invalid memory accesses.

FPGA programming:

DACE_compiler_fpga_vendor: (default: xilinx): Can be xilinx for Xilinx FPGAs, or intel_fpga for Intel FPGAs.

SDFG interactive transformation:

DACE_optimizer_transform_on_call (default: False): Uses the transformation command line interface every time a @dace function is called.
DACE_optimizer_interface (default: dace.transformation.optimizer.SDFGOptimizer): Controls the SDFG optimization process if transform_on_call is enabled. By default, uses the transformation command line interface.
DACE_optimizer_automatic_strict_transformations (default: True): If False, skips automatic strict transformations in the Python frontend (see transformations tutorial for more information).

Profiling:

DACE_profiling (default: False): Enables profiling measurement of the DaCe program runtime in milliseconds. Produces a log file and prints out median runtime.
DACE_treps (default: 100): Number of repetitions to run a DaCe program when profiling is enabled.

Contributing

DaCe is an open-source project. We are happy to accept Pull Requests with your contributions! Please follow the contribution guidelines before submitting a pull request.

License

DaCe is published under the New BSD license, see LICENSE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 106

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (76) 🔗