All Projects → soumyadipghosh → eventgrad

soumyadipghosh / eventgrad

Licence: BSD-3-Clause license
Event-Triggered Communication in Parallel Machine Learning

Programming Languages

C++
36643 projects - #6 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to eventgrad

ATOMO
Atomo: Communication-efficient Learning via Atomic Sparsification
Stars: ✭ 23 (+64.29%)
Mutual labels:  distributed-machine-learning, communication-efficient
h5fortran-mpi
HDF5-MPI parallel Fortran object-oriented interface
Stars: ✭ 15 (+7.14%)
Mutual labels:  mpi
Galaxy
Galaxy is an asynchronous parallel visualization ray tracer for performant rendering in distributed computing environments. Galaxy builds upon Intel OSPRay and Intel Embree, including ray queueing and sending logic inspired by TACC GraviT.
Stars: ✭ 18 (+28.57%)
Mutual labels:  mpi
XH5For
XDMF parallel partitioned mesh I/O on top of HDF5
Stars: ✭ 23 (+64.29%)
Mutual labels:  mpi
raptor
General, high performance algebraic multigrid solver
Stars: ✭ 50 (+257.14%)
Mutual labels:  mpi
FluxUtils.jl
Sklearn Interface and Distributed Training for Flux.jl
Stars: ✭ 12 (-14.29%)
Mutual labels:  mpi
mpiBench
MPI benchmark to test and measure collective performance
Stars: ✭ 39 (+178.57%)
Mutual labels:  mpi
pbdML
No description or website provided.
Stars: ✭ 13 (-7.14%)
Mutual labels:  mpi
bsuir-csn-cmsn-helper
Repository containing ready-made laboratory works in the specialty of computing machines, systems and networks
Stars: ✭ 43 (+207.14%)
Mutual labels:  mpi
EDLib
Exact diagonalization solver for quantum electron models
Stars: ✭ 18 (+28.57%)
Mutual labels:  mpi
faabric
Messaging and state layer for distributed serverless applications
Stars: ✭ 39 (+178.57%)
Mutual labels:  mpi
SIRIUS
Domain specific library for electronic structure calculations
Stars: ✭ 87 (+521.43%)
Mutual labels:  mpi
nbodykit
Analysis kit for large-scale structure datasets, the massively parallel way
Stars: ✭ 93 (+564.29%)
Mutual labels:  mpi
mls
CSCE 585 - Machine Learning Systems
Stars: ✭ 36 (+157.14%)
Mutual labels:  distributed-machine-learning
fml
Fused Matrix Library
Stars: ✭ 24 (+71.43%)
Mutual labels:  mpi
hpc
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (+178.57%)
Mutual labels:  mpi
sst-core
SST Structural Simulation Toolkit Parallel Discrete Event Core and Services
Stars: ✭ 82 (+485.71%)
Mutual labels:  mpi
SWCaffe
A Deep Learning Framework customized for Sunway TaihuLight
Stars: ✭ 37 (+164.29%)
Mutual labels:  mpi
ACCL
Accelerated Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
Stars: ✭ 28 (+100%)
Mutual labels:  mpi
fdtd3d
fdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x86, arm, arm64 architectures
Stars: ✭ 77 (+450%)
Mutual labels:  mpi

Event-Triggered Communication in Parallel Machine Learning

The primary objective of this repository is to introduce EventGraD - a novel communication algorithm based on event-triggered communication to reduce communication in parallel machine learning. EventGraD considers the decentralized setting where communication happens only with the neighbor processors at every iteration instead of an AllReduce involving every processor at every iteration. The main idea is to trigger communication in events only when the parameter to be communicated changes by a threshold. For details on how to choose an adaptive threshold and convergence proofs, please refer to the publications. EventGraD saves around 70% of the messages in MNIST and 60% of the messages on CIFAR-10. Please see /dmnist/event/ for the EventGraD code on MNIST and /dcifar10/event for the EventGraD code on CIFAR-10.

PyTorch C++ API meets MPI

The secondary objective of this repository is to serve as a starting point to implement parallel/distributed machine learning using PyTorch C++ (LibTorch) and MPI. Apart from EventGraD, other popular distributed algorithms such as AllReduce based training (/dmnist/cent/) and decentralized training with neighbors(/dmnist/decent/) are covered. The AllReduce based training code was contributed to the pytorch/examples repository here through this pull request.

Publications

  1. Soumyadip Ghosh, Bernardo Aquino and Vijay Gupta, "EventGraD: Event-Triggered Communication in Parallel Machine Learning", Neurocomputing, Nov 2021 (arxiv)

  2. Soumyadip Ghosh and Vijay Gupta, "EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent", Workshop on Machine Learning in HPC Environments (MLHPC), Supercomputing Conference (SC), Virtual, November 2020

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].