All Projects → botkop → akkordeon

botkop / akkordeon

Licence: other
training neural networks with akka

Programming Languages

scala
5932 projects
shell
77523 projects

Projects that are alternatives of or similar to akkordeon

protoactor-python
Proto Actor - Ultra fast distributed actors
Stars: ✭ 78 (+52.94%)
Mutual labels:  akka, actor-model
protoactor-go
Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin
Stars: ✭ 4,138 (+8013.73%)
Mutual labels:  akka, actor-model
nact
nact ⇒ node.js + actors ⇒ your services have never been so µ
Stars: ✭ 1,003 (+1866.67%)
Mutual labels:  akka, actor-model
Akka.net
Port of Akka actors for .NET
Stars: ✭ 4,024 (+7790.2%)
Mutual labels:  akka, actor-model
kotlin-akka
Akka-Kotlin sample, and support library.
Stars: ✭ 25 (-50.98%)
Mutual labels:  akka, actor-model
Nact
nact ⇒ node.js + actors ⇒ your services have never been so µ
Stars: ✭ 848 (+1562.75%)
Mutual labels:  akka, actor-model
Protoactor Go
Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin
Stars: ✭ 3,934 (+7613.73%)
Mutual labels:  akka, actor-model
Akka
Build highly concurrent, distributed, and resilient message-driven applications on the JVM
Stars: ✭ 11,938 (+23307.84%)
Mutual labels:  akka, actor-model
akka-http-actor-per-request
Example akka application that uses the actor per request model
Stars: ✭ 16 (-68.63%)
Mutual labels:  akka, actor-model
chordial
A simple Scala implementation of Chord, a distributed lookup protocol
Stars: ✭ 24 (-52.94%)
Mutual labels:  akka
xoom-cluster
The VLINGO XOOM platform SDK cluster management for Reactive, scalable resiliency of JVM tools and applications running on XOOM LATTICE and XOOM ACTORS.
Stars: ✭ 25 (-50.98%)
Mutual labels:  actor-model
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-62.75%)
Mutual labels:  akka
stem
Event sourcing framework based on ZIO and pluggable runtime (currently working with Akka cluster)
Stars: ✭ 22 (-56.86%)
Mutual labels:  akka
ripple
Simple shared surface streaming application
Stars: ✭ 17 (-66.67%)
Mutual labels:  actor-model
changestream
A stream of changes for MySQL built on Akka
Stars: ✭ 25 (-50.98%)
Mutual labels:  akka
akka-microservice
Example of a microservice with Scala, Akka, Spray and Camel/ActiveMQ
Stars: ✭ 45 (-11.76%)
Mutual labels:  akka
scala-zen
Scala Zen
Stars: ✭ 29 (-43.14%)
Mutual labels:  akka
akka-jwt
Library for jwt authentication with akka
Stars: ✭ 16 (-68.63%)
Mutual labels:  akka
cassandra.realtime
Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
Stars: ✭ 25 (-50.98%)
Mutual labels:  akka
akka-http-circe-json-template
Akka HTTP REST API Project Template using Akka HTTP 10.0.4 with Circe 0.7.0 targeting Scala 2.12.x
Stars: ✭ 21 (-58.82%)
Mutual labels:  akka

"What I cannot create, I do not understand." - Richard Feynman.

This project shows how to train an artificial neural network in an actor framework. Traditional neural networks are monolithic blobs trained on static hardware infrastructure. Here I propose an approach that distributes the components of a neural net (layers, data providers...) over multiple processes that run independently (async and concurrent), possibly on different machines. It also allows to dynamically add or remove training, validation and test modules, and thus provides the infrastructure for online learning.

Akkordeon: Training a neural net with Akka

The world is asynchronous.

This project shows how to train a neural net with Akka.

The mechanics are as follows:

A layer is embedded in a gate. A gate is an actor. The results of the forward and backward pass are passed as messages from one gate to the next. Calculations inside a layer are performed asynchronously from other layers. Thus, a layer does not have to wait for the backward pass in order to perform the forward pass of the next batch.

Every gate has its optimizer. Optimization on a gate runs asynchronously from other gates. To alleviate the 'delayed gradient' problem, I use an implementation of the 'Asynchronous Stochastic Gradient Descent with Delay Compensation' optimizer.

Data providers are embedded in sentinels and implemented as actors. You can have mutiple sentinels running at the same time, each with a subset of the training data for example. This also allows me to run the training and validation phases concurrently.

All actors can be deployed on a single machine or in a cluster of machines, leveraging both horizontal and vertical computing power.

Components

components

Gate

A gate is similar to a layer. Every gate is an actor. Whereas in a traditional network there is only one optimizer for the complete network, here every gate has its optimizer. There is however no difference in functionality, since optimizers do not share data between layers.

A gate can consist of an arbitrarily complex network in itself. You can put multiple convolutional, pooling, batchnorm, dropouts, ... and so on in one gate. Or you can assign them to different gates, thus distributing the work over multiple actors.

Network

A network is a sequence of gates. The sequence is open. You can attach multiple sentinels, each with its data provider, to the network.

Sentinel

The sentinel is an actor, and does a couple of things:

  • provide data, through the data provider, for training, validation and test
  • calculate and report loss and accuracy during training and validation
  • trigger the forward pass for each batch during training, validation and test
  • trigger the backward pass for each batch when training

You can attach multiple sentinels to a network. Typically, one or more sentinels are provided for training, and one for validation. The latter runs every 20 seconds for example, whereas the training sentinels run continuously.

Prepare

After having cloned/downloaded the source code of this project, get the MNIST dataset by executing the script scripts/download_mnist.sh or by manually downloading the files from the URLs in the script, and putting them in a folder data/mnist.

You will need sbt to build the project.

Build and run

Single JVM

sbt 'runMain botkop.akkordeon.SimpleAkkordeon'

This will produce output similar to this:

[info] tdp        epoch:     1 loss:  2.939994 duration: 7105.075212ms scores: (0.22618558114035087)
[info] tdp        epoch:     2 loss:  1.848889 duration: 2339.476822ms scores: (0.4044360040590681)
[info] tdp        epoch:     3 loss:  1.463448 duration: 2278.748975ms scores: (0.5158070709745762)
[info] tdp        epoch:     4 loss:  1.136699 duration: 2245.955278ms scores: (0.6229231711161338)
[info] tdp        epoch:     5 loss:  0.968350 duration: 2309.301106ms scores: (0.6776098002821712)
[info] tdp        epoch:     6 loss:  0.880695 duration: 2259.42184ms scores: (0.7060564301781735)
[info] tdp        epoch:     7 loss:  0.892328 duration: 2856.552759ms scores: (0.7027704402551813)
[info] vdp        epoch:     1 loss:  0.866831 duration: 1768.835725ms scores: (0.7107204861111112)

Multiple JVMs

In this scenario, I show how to deploy the neural net on one JVM, and the sentinels on other JVMs. The JVMs can be deployed on the same machine, or on different machines. Note that when deploying the sentinels on separate machines, you will need to make the data accessible on those machines.

Another scenario that comes to mind is to split the network itself in separate entities, and deploy those on different JVMs. Let's say that for now, I leave this as an exercise for the reader.

Obtain the IP address of the machine on which you want to run the neural net. If you run all JVMs on the same machine, then you can use 127.0.0.1. Append a free port number separated by colon:

export NNADDR=192.168.1.23:25520

Start the neural net in a terminal window:

sbt "runMain botkop.akkordeon.examples.NetworkApp $NNADDR"

Obtain the IP address of a machine on which you want to run a sentinel. If you run all JVMs on the same machine, then you can use 127.0.0.1. The parameter 60000 is the number of samples from the data set you want to use. Start a training sentinel in another terminal:

MY_IP=192.168.0.158
sbt "runMain botkop.akkordeon.examples.SentinelApp $MY_IP train 60000 $NNADDR"

And another one:

MY_IP=192.168.0.159
sbt "runMain botkop.akkordeon.examples.SentinelApp $MY_IP train 3000 $NNADDR"

Also start a validation sentinel.

MY_IP=192.168.0.160
sbt "runMain botkop.akkordeon.examples.SentinelApp $MY_IP validate 10000 $NNADDR"

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].