All Projects → kleinhenz → SlurmClusterManager.jl

kleinhenz / SlurmClusterManager.jl

Licence: other
julia package for running code on slurm clusters

Programming Languages

julia
2034 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to SlurmClusterManager.jl

future.batchtools
🚀 R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools
Stars: ✭ 77 (+185.19%)
Mutual labels:  distributed-computing, slurm
open-stream-processing-benchmark
This repository contains the code base for the Open Stream Processing Benchmark.
Stars: ✭ 37 (+37.04%)
Mutual labels:  distributed-computing
prometheus-spec
Censorship-resistant trustless protocols for smart contract, generic & high-load computing & machine learning on top of Bitcoin
Stars: ✭ 24 (-11.11%)
Mutual labels:  distributed-computing
paleo
An analytical performance modeling tool for deep neural networks.
Stars: ✭ 76 (+181.48%)
Mutual labels:  distributed-computing
fleex
Fleex makes it easy to create multiple VPS on cloud providers and use them to distribute workloads.
Stars: ✭ 181 (+570.37%)
Mutual labels:  distributed-computing
slurmR
slurmR: A Lightweight Wrapper for Slurm
Stars: ✭ 43 (+59.26%)
Mutual labels:  slurm
whitepaper
📄 The Ambients protocol white paper
Stars: ✭ 44 (+62.96%)
Mutual labels:  distributed-computing
rslurm
Submit R code to a Slurm cluster
Stars: ✭ 40 (+48.15%)
Mutual labels:  slurm
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+166.67%)
Mutual labels:  distributed-computing
mesos-pinspider
A framework called "pinspider" on Apache mesos, to get basic user information from a pinterest page of a user.
Stars: ✭ 18 (-33.33%)
Mutual labels:  distributed-computing
Orleans.CosmosDB
Orleans providers for Azure Cosmos DB
Stars: ✭ 36 (+33.33%)
Mutual labels:  distributed-computing
fahclient
Dockerized Folding@home client with NVIDIA GPU support to help battle COVID-19
Stars: ✭ 38 (+40.74%)
Mutual labels:  distributed-computing
meesee
Task queue, Long lived workers for work based parallelization, with processes and Redis as back-end. For distributed computing.
Stars: ✭ 14 (-48.15%)
Mutual labels:  distributed-computing
launcher-scripts
(DEPRECATED) A set of launcher scripts to be used with OAR and Slurm for running jobs on the UL HPC platform
Stars: ✭ 14 (-48.15%)
Mutual labels:  slurm
HPC
A collection of various resources, examples, and executables for the general NREL HPC user community's benefit. Use the following website for accessing documentation.
Stars: ✭ 64 (+137.04%)
Mutual labels:  slurm
job stream
An MPI-based C++ or Python library for easy distributed pipeline processing
Stars: ✭ 32 (+18.52%)
Mutual labels:  distributed-computing
yakut
Simple CLI tool for diagnostics and debugging of Cyphal networks
Stars: ✭ 29 (+7.41%)
Mutual labels:  distributed-computing
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (+44.44%)
Mutual labels:  distributed-computing
tasq
A simple task queue implementation to enqeue jobs on local or remote processes.
Stars: ✭ 83 (+207.41%)
Mutual labels:  distributed-computing
LSTM-TensorSpark
Implementation of a LSTM with TensorFlow and distributed on Apache Spark
Stars: ✭ 40 (+48.15%)
Mutual labels:  distributed-computing

SlurmClusterManager.jl

Build Status

This package provides support for using julia within the Slurm cluster environment. The code is adapted from ClusterManagers.jl with some modifications.

Usage

This script uses all resources from a Slurm allocation as julia workers and prints the id and hostname on each one.

#!/usr/bin/env julia

using Distributed, SlurmClusterManager
addprocs(SlurmManager())
@everywhere println("hello from $(myid()):$(gethostname())")

If the code is saved in script.jl it can be queued and executed on two nodes using 64 workers per node by running

sbatch -N 2 --ntasks-per-node=64 script.jl

Differences from ClusterManagers.jl

  • Only supports Slurm (see this issue for some background).
  • Requires that SlurmManager be created inside a Slurm allocation created by sbatch/salloc. Specifically SLURM_JOBID and SLURM_NTASKS must be defined in order to construct SlurmManager. This matches typical HPC workflows where resources are requested using sbatch and then used by the application code. In contrast ClusterManagers.jl will dynamically request resources when run outside of an existing Slurm allocation. I found that this was basically never what I wanted since this leaves the manager process running on a login node, and makes the script wait until resources are granted which is better handled by the actual Slurm queueing system.
  • Does not take any Slurm arguments. All Slurm arguments are inherited from the external Slurm allocation created by sbatch/salloc.
  • Output from workers is redirected to the manager process instead of requiring a separate output file for every task.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].