All Projects → LLNL → scr

LLNL / scr

Licence: other
SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.

Programming Languages

c
50402 projects - #5 most used programming language
python
139335 projects - #7 most used programming language
perl
6916 projects
shell
77523 projects
CMake
9771 projects

Projects that are alternatives of or similar to scr

XH5For
XDMF parallel partitioned mesh I/O on top of HDF5
Stars: ✭ 23 (-72.62%)
Mutual labels:  scalable, mpi
conduit
Simplified Data Exchange for HPC Simulations
Stars: ✭ 114 (+35.71%)
Mutual labels:  data-management, radiuss
azurehpc
This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
Stars: ✭ 102 (+21.43%)
Mutual labels:  mpi
Singularity-tutorial
Singularity 101
Stars: ✭ 31 (-63.1%)
Mutual labels:  mpi
ParMmg
Distributed parallelization of 3D volume mesh adaptation
Stars: ✭ 19 (-77.38%)
Mutual labels:  mpi
Foundations of HPC 2021
This repository collects the materials from the course "Foundations of HPC", 2021, at the Data Science and Scientific Computing Department, University of Trieste
Stars: ✭ 22 (-73.81%)
Mutual labels:  mpi
maritime-charting-sample-scripts
Sample scripts and models to automate work in ArcGIS for Maritime: Charting
Stars: ✭ 19 (-77.38%)
Mutual labels:  data-management
Coerce Rs
Coerce - an asynchronous (async/await) Actor runtime and cluster framework for Rust
Stars: ✭ 231 (+175%)
Mutual labels:  scalable
arbor
The Arbor multi-compartment neural network simulation library.
Stars: ✭ 87 (+3.57%)
Mutual labels:  mpi
hydra-zen
Pythonic functions for creating and enhancing Hydra applications
Stars: ✭ 165 (+96.43%)
Mutual labels:  scalable
ravel
Ravel MPI trace visualization tool
Stars: ✭ 26 (-69.05%)
Mutual labels:  mpi
raster-tiles-compactcache
Compact Cache V2 is used by ArcGIS to store raster tiles. The bundle file structure is very simple and optimized for quick access, resulting in improved performance over alternative formats.
Stars: ✭ 49 (-41.67%)
Mutual labels:  data-management
api-spec
API Specififications
Stars: ✭ 30 (-64.29%)
Mutual labels:  mpi
t8code
Parallel algorithms and data structures for tree-based AMR with arbitrary element shapes.
Stars: ✭ 37 (-55.95%)
Mutual labels:  mpi
hp2p
Heavy Peer To Peer: a MPI based benchmark for network diagnostic
Stars: ✭ 17 (-79.76%)
Mutual labels:  mpi
bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (+33.33%)
Mutual labels:  data-management
React Native Scalable Image
React Native Image component which scales width or height automatically to keep the original aspect ratio
Stars: ✭ 241 (+186.9%)
Mutual labels:  scalable
simplx
C++ development framework for building reliable cache-friendly distributed and concurrent multicore software
Stars: ✭ 61 (-27.38%)
Mutual labels:  scalable
az-hop
The Azure HPC On-Demand Platform provides an HPC Cluster Ready solution
Stars: ✭ 33 (-60.71%)
Mutual labels:  mpi
portalpy
A module that allows you to administer Portal for ArcGIS and ArcGIS Online.
Stars: ✭ 50 (-40.48%)
Mutual labels:  data-management

Scalable Checkpoint / Restart (SCR) Library

The Scalable Checkpoint / Restart (SCR) library enables MPI applications to utilize distributed storage on Linux clusters to attain high file I/O bandwidth for checkpointing, restarting, and output in large-scale jobs. With SCR, jobs run more efficiently, recompute less work upon a failure, and reduce load on critical shared resources such as the parallel file system.

Users

Instructions to build and use SCR are hosted at scr.readthedocs.io.

For new users, the Quick Start guide shows one how to build and run an example using SCR.

For more detailed build instructions, refer to Build SCR.

User Docs Status

Contribute

As an open source project, we welcome contributions via pull requests, as well as questions, feature requests, or bug reports via issues. Please refer to both our code of conduct and our contributing guidelines.

Developers

Developer documentation is provided at SCR-dev.ReadTheDocs.io.

Developer Docs Status

SCR uses components from ECP-VeloC, which have their own user and developer docs.

A development build is useful for those who wish to modify how SCR works. It checks out and builds SCR and many of its dependencies separately. The process is more complicated than the user build described above, but the development build is helpful when one intends to commit changes back to the project.

For a development build of SCR and its dependencies on SLURM systems, one can use the bootstrap.sh script:

git clone https://github.com/LLNL/scr.git
cd scr

./bootstrap.sh

cd build
cmake -DCMAKE_INSTALL_PREFIX=../install ..
make install

When using a debugger with SCR, one can build with the following flags to disable compiler optimizations:

./bootstrap.sh --debug

cd build
cmake -DCMAKE_INSTALL_PREFIX=../install -DCMAKE_BUILD_TYPE=Debug ..
make install

One can then run a test program:

cd examples
srun -n4 -N4 ./test_api

For developers who may be installing SCR outside of an HPC cluster, who are using Fedora, and who have sudo access, the following steps install and activate most of the necessary base dependencies:

sudo dnf groupinstall "Development Tools"
sudo dnf install cmake gcc-c++ mpi mpi-devel environment-modules zlib-devel pdsh
[restart shell]
module load mpi

Authors

Numerous people have contributed to the SCR project.

To reference SCR in a publication, please cite the following paper:

Additional information and research publications can be found here:

http://computation.llnl.gov/projects/scalable-checkpoint-restart-for-mpi

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].