All Projects → intel → yask

intel / yask

Licence: other
YASK--Yet Another Stencil Kit: a domain-specific language and framework to create high-performance stencil code for implementing finite-difference methods and similar applications.

Programming Languages

C++
36643 projects - #6 most used programming language
perl
6916 projects
Makefile
30231 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to yask

Training Material
A collection of code examples as well as presentations for training purposes
Stars: ✭ 85 (+4.94%)
Mutual labels:  hpc, optimization, openmp, mpi
wxparaver
wxParaver is a trace-based visualization and analysis tool designed to study quantitative detailed metrics and obtain qualitative knowledge of the performance of applications, libraries, processors and whole architectures.
Stars: ✭ 23 (-71.6%)
Mutual labels:  hpc, openmp, mpi
gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
Stars: ✭ 227 (+180.25%)
Mutual labels:  hpc, openmp, mpi
EFDCPlus
www.eemodelingsystem.com
Stars: ✭ 9 (-88.89%)
Mutual labels:  openmp, mpi, finite-difference-method
libquo
Dynamic execution environments for coupled, thread-heterogeneous MPI+X applications
Stars: ✭ 21 (-74.07%)
Mutual labels:  hpc, openmp, mpi
pyccel
Python extension language using accelerators
Stars: ✭ 189 (+133.33%)
Mutual labels:  hpc, openmp, mpi
Hiop
HPC solver for nonlinear optimization problems
Stars: ✭ 75 (-7.41%)
Mutual labels:  hpc, optimization, mpi
Foundations of HPC 2021
This repository collects the materials from the course "Foundations of HPC", 2021, at the Data Science and Scientific Computing Department, University of Trieste
Stars: ✭ 22 (-72.84%)
Mutual labels:  hpc, openmp, mpi
arbor
The Arbor multi-compartment neural network simulation library.
Stars: ✭ 87 (+7.41%)
Mutual labels:  hpc, mpi
hpc
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-51.85%)
Mutual labels:  hpc, mpi
analisis-numerico-computo-cientifico
Análisis numérico y cómputo científico
Stars: ✭ 42 (-48.15%)
Mutual labels:  openmp, mpi
research-computing-with-cpp
UCL-RITS *C++ for Research* engineering course
Stars: ✭ 16 (-80.25%)
Mutual labels:  openmp, mpi
Singularity-tutorial
Singularity 101
Stars: ✭ 31 (-61.73%)
Mutual labels:  hpc, mpi
gardenia
GARDENIA: Graph Analytics Repository for Designing Efficient Next-generation Accelerators
Stars: ✭ 22 (-72.84%)
Mutual labels:  openmp, xeon-phi
t8code
Parallel algorithms and data structures for tree-based AMR with arbitrary element shapes.
Stars: ✭ 37 (-54.32%)
Mutual labels:  hpc, mpi
claw-compiler
CLAW Compiler for Performance Portability
Stars: ✭ 38 (-53.09%)
Mutual labels:  hpc, openmp
ultra-sort
DSL for SIMD Sorting on AVX2 & AVX512
Stars: ✭ 29 (-64.2%)
Mutual labels:  intel, avx512
az-hop
The Azure HPC On-Demand Platform provides an HPC Cluster Ready solution
Stars: ✭ 33 (-59.26%)
Mutual labels:  hpc, mpi
gslib
sparse communication library
Stars: ✭ 22 (-72.84%)
Mutual labels:  hpc, mpi
fml
Fused Matrix Library
Stars: ✭ 24 (-70.37%)
Mutual labels:  hpc, mpi

YASK--Yet Another Stencil Kit

Overview

YASK is a framework to rapidly create high-performance stencil code including optimizations and features such as

  • Support for boundary layers and staggered-grid stencils.
  • Vector-folding to increase data reuse via non-traditional data layout.
  • Multi-level OpenMP parallelism to exploit multiple cores and threads.
  • Scaling to multiple sockets and nodes via MPI with overlapped communication and compute.
  • Spatial tiling with automatically-tuned block sizes.
  • Temporal tiling in multiple dimensions to further increase cache locality.
  • APIs for C++ and Python.

YASK contains a domain-specific compiler to convert stencil-equation specifications to SIMD-optimized code for Intel(R) Xeon Phi(TM) and Intel(R) Xeon(R) processors.

Supported Platforms and Processors:

  • 64-bit Linux.
  • Intel(R) Xeon(R) processors supporting the AVX, AVX2, or CORE_AVX512 instruction sets.
  • Intel(R) Xeon Phi(TM) x200-family processors supporting the MIC_AVX512 instruction set.
  • Intel(R) Xeon Phi(TM) x100-family coprocessors supporting the Knights-Corner instruction set (no longer tested).

Pre-requisites:

  • Intel(R) Parallel Studio XE Cluster Edition for Linux for multi-socket and multi-node operation or Intel(R) Parallel Studio XE Composer Edition for C++ Linux for single-socket only (2020.1.217, a.k.a. 19.1.1.217, or later recommended).
    • There was an issue in Intel(R) MPI versions 2019u1 and 2019u2 that caused the application to crash when allocating very large shared-memory (shm) regions, so you may have to use the -no-use_shm option with these versions. This issue was resolved in MPI version 2019u3.
    • There was an issue in the Intel(R) C++ compiler 2019.1.0 that caused an internal error when building YASK kernels. This has been fixed in 19.1.1.x and later versions.
    • If you are using the Intel(R) C++ compiler with g++ version 8.x or later, Intel(R) C++ version 2019 or later is required.
    • Building a YASK kernel with clang or the "nextgen" Intel(R) C++ compiler is possible; however, SIMD operations for functions such as sin() are not supported in the nextgen compiler at this time. Also, the Python interface may not work with the nextgen compiler.
    • Building a YASK kernel with the Gnu C++ compiler is possible. Limited testing with g++ 8.2.0 shows the "iso3dfd" kernel runs about 30% slower compared to the same kernel built with the Intel C++ compiler. Older Gnu C++ compilers can produce kernels that run many times slower.
  • Gnu C++ compiler, g++ (4.9.0 or later; 9.1.0 or later recommended). Even when using Intel compilers, they rely on functionality provided by a g++ installation.
  • Linux libraries librt and libnuma.
  • Perl (5.010 or later).
  • Awk.
  • Gnu make.
  • Bash shell.
  • Numactl utility.
  • Optional utilities and their purposes:
    • The indent or gindent utility, used automatically during the build process to make the generated code easier for humans to read. You'll get a warning when running make if one of these doesn't exist. Everything will still work, but the generated code will be difficult to read. Reading the generated code is only necessary for debug or curiosity.
    • SWIG (3.0.12 or later; 4.0.0 or later recommended), http://www.swig.org, for creating the Python interface.
    • Python 2 (2.7.5 or later) or 3 (3.6.1 or later), https://www.python.org/downloads, for creating and using the Python interface.
    • Doxygen (1.8.11 or later), http://doxygen.org, for creating updated API documentation. If you're not changing the API documentation, you can view the existing documentation at the link at the top of this page.
    • Graphviz (2.30.1 or later), http://www.graphviz.org, for rendering stencil diagrams.
    • Intel(R) Software Development Emulator, https://software.intel.com/en-us/articles/intel-software-development-emulator, for functional testing if you don't have native support for any given instruction set.

Backward-compatibility notices

Version 3

  • Version 3.05.00 changed the default setting of -use_shm to true. Use -no-use_shm to disable shared-memory inter-rank communication.
  • Version 3.04.00 changed the terms "pack" and "pass" to "stage", which may affect user-written result parsers. Option auto_tune_each_pass changed to auto_tune_each_stage.
  • Version 3.01.00 moved the -trace and -msg_rank options from the kernel library to the kernel utility, so those options may no longer be set via yk_solution::apply_command_line_options(). APIs to set the corresponding options are now in yk_env. This allows configuring the debug output before a yk_solution is created.
  • Version 3.00.00 was a major release with a number of backward-compatibility notices:
    • The old (v1 and v2) internal DSL that used undocumented types such as SolutionBase and GridValue and undocumented macros such as MAKE_GRID was replaced with an expanded version of the documented YASK compiler API. Canonical v2 DSL code should still work using the Soln.hpp backward-compatibility header file. To convert v2 DSL code to v3 format, use the ./utils/bin/convert_v2_stencil.pl utility. Conversion is recommended.
    • For both the compiler and kernel APIs, all uses of the term "grid" were changed to "var". (Historically, early versions of YASK allowed only variables whose elements were points on the domain grid, so the terms were essentially interchangeable. Later, variables became more flexible. They could be defined with a subset of the domain dimensions, include non-domain or "miscellaneous" indices, or even be simple scalar values, so the term "grid" to describe any variable became inaccurate. This change addresses that contradiction.) Again, backward-compatibility features in the API should maintain functionality of v2 DSL and kernel code.
    • The default strings used in the kernel library and filenames to identify the targeted architecture were changed from Intel CPU codenames to [approximate] instruction-set architecture (ISA) names "avx512", "avx2", "avx", "knl", "knc", or "intel64". The YASK targets used in the YASK compiler were updated to be consistent with this list.
    • The "mid" (roughly, median) performance results are now the first ones printed by the utils/bin/yask_log_to_csv.pl script.
    • In general, any old DSL and kernel code or user-written output-parsing scripts that use any undocumented files, data, or types may have to be updated.

Version 2

  • Version 2.22.00 changed the heuristic to determine vector-folding sizes when some sizes are specified. This did not affect the default folding sizes.
  • Version 2.21.02 simplified the example 3-D stencils (3axis, 3plane, etc.) to calculate simple averages like those in the MiniGhost benchmark. This reduced the number of floating-point operations but not the number of points read for each stencil.
  • Version 2.20.00 added checking of the step-dimension index value in the yk_grid::get_element() and similar APIs. Previously, invalid values silently "wrapped" around to valid values. Now, by default, the step index must be valid when reading, and the valid step indices are updated when writing. The old behavior of silent index wrapping may be restored via set_step_wrap(true). The default for all strict_indices API parameters is now true to catch more programming errors and increase consistency of behavior between "set" and "get" APIs. Also, the advanced share_storage() APIs have been replaced with fuse_grids().
  • Version 2.19.01 turned off multi-pass tuning by default. Enable with -auto_tune_each_pass.
  • Version 2.18.03 allowed the default radius to be stencil-specific and changed the names of example stencil "9axis" to "3axis_with_diags".
  • Version 2.18.00 added the ability to specify the global-domain size, and it will calculate the local-domain sizes from it. There is no longer a default local-domain size. Output changed terms "overall-problem" to "global-domain" and "rank-domain" to "local-domain".
  • Version 2.17.00 determined the host architecture in make and bin/yask.sh and number of MPI ranks in bin/yask.sh. This changed the old behavior of make defaulting to snb architecture and bin/yask.sh requiring -arch and -ranks. Those options are still available to override the host-based default.
  • Version 2.16.03 moved the position of the log-file name to the last column in the CSV output of utils/bin/yask_log_to_csv.pl.
  • Version 2.15.04 required a call to yc_grid::set_dynamic_step_alloc(true) to allow changing the allocation in the step (time) dimension at run-time for grid variables created at YASK compile-time.
  • Version 2.15.02 required all "misc" indices to be yask-compiler-time constants.
  • Version 2.14.05 changed the meaning of temporal sizes so that 0 means never do temporal blocking and 1 allows blocking within a single time-step for multi-pack solutions. The default setting is 0, which keeps the old behavior.
  • Version 2.13.06 changed the default behavior of the performance-test utility (yask.sh) to run trials for a given amount of time instead of a given number of steps. As of version 2.13.08, use the -trial_time option to specify the number of seconds to run. To force a specific number of trials as in previous versions, use the -trial_steps option.
  • Version 2.13.02 required some changes in perf statistics due to step (temporal) conditions. Both text output and yk_stats APIs affected.
  • Version 2.12.00 removed the long-deprecated == operator for asserting equality between a grid point and an equation. Use EQUALS instead.
  • Version 2.11.01 changed the plain-text format of some of the performance data in the test-utility output. Specifically, some leading spaces were added, SI multipliers for values < 1 were added, and the phrase "time in" no longer appears before each time breakdown. This may affect some user programs that parse the output to collect stats.
  • Version 2.10.00 changed the location of temporary files created during the build process. This will not affect most users, although you may need to manually remove old src/compiler/gen and src/kernel/gen directories.
  • Version 2.09.00 changed the location of stencils in the internal DSL from .hpp to .cpp files. See the notes in https://github.com/intel/yask/releases/tag/v2.09.00 if you have any new or modified code in src/stencils.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].