All Projects → jsikyoon → Pathnet

jsikyoon / Pathnet

Licence: mit
Tensorflow Implementation of PathNet: Evolution Channels Gradient Descent in Super Neural Networks

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pathnet

Visual Interaction Networks tensorflow
Tensorflow Implementation of Visual Interaction Networks
Stars: ✭ 133 (+38.54%)
Mutual labels:  agi, deepmind
He4o
和(he for objective-c) —— “信息熵减机系统”
Stars: ✭ 284 (+195.83%)
Mutual labels:  agi, transfer-learning
Pathnet Pytorch
PyTorch implementation of PathNet: Evolution Channels Gradient Descent in Super Neural Networks
Stars: ✭ 63 (-34.37%)
Mutual labels:  agi, transfer-learning
Jiant
jiant is an NLP toolkit
Stars: ✭ 1,147 (+1094.79%)
Mutual labels:  transfer-learning
Tensorflow
This Repository contains all tensorflow tutorials.
Stars: ✭ 68 (-29.17%)
Mutual labels:  transfer-learning
Imageclassification
Deep Learning: Image classification, feature visualization and transfer learning with Keras
Stars: ✭ 83 (-13.54%)
Mutual labels:  transfer-learning
Awesome Computer Vision
Awesome Resources for Advanced Computer Vision Topics
Stars: ✭ 92 (-4.17%)
Mutual labels:  transfer-learning
Farm
🏡 Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Stars: ✭ 1,140 (+1087.5%)
Mutual labels:  transfer-learning
Nfnets pytorch
Pre-trained NFNets with 99% of the accuracy of the official paper "High-Performance Large-Scale Image Recognition Without Normalization".
Stars: ✭ 85 (-11.46%)
Mutual labels:  deepmind
Starpy
Mirror of Python twisted library for AMI and FastAGI: No pull requests here please. Use Gerrit: https://gerrit.asterisk.org
Stars: ✭ 77 (-19.79%)
Mutual labels:  agi
Causalworld
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning
Stars: ✭ 76 (-20.83%)
Mutual labels:  transfer-learning
Libtlda
Library of transfer learners and domain-adaptive classifiers.
Stars: ✭ 71 (-26.04%)
Mutual labels:  transfer-learning
Ddc Transfer Learning
A simple implementation of Deep Domain Confusion: Maximizing for Domain Invariance
Stars: ✭ 83 (-13.54%)
Mutual labels:  transfer-learning
Deep Transfer Learning
Deep Transfer Learning Papers
Stars: ✭ 68 (-29.17%)
Mutual labels:  transfer-learning
Allie
Allie: A UCI compliant chess engine
Stars: ✭ 89 (-7.29%)
Mutual labels:  deepmind
Cross Domain ner
Cross-domain NER using cross-domain language modeling, code for ACL 2019 paper
Stars: ✭ 67 (-30.21%)
Mutual labels:  transfer-learning
Intrusion Detection System Using Deep Learning
VGG-19 deep learning model trained using ISCX 2012 IDS Dataset
Stars: ✭ 85 (-11.46%)
Mutual labels:  transfer-learning
Dmc2gym
OpenAI Gym wrapper for the DeepMind Control Suite
Stars: ✭ 75 (-21.87%)
Mutual labels:  deepmind
Voicer
AGI-server voice recognizer for #Asterisk
Stars: ✭ 73 (-23.96%)
Mutual labels:  agi
Transfer Learning Conv Ai
🦄 State-of-the-Art Conversational AI with Transfer Learning
Stars: ✭ 1,217 (+1167.71%)
Mutual labels:  transfer-learning

pathnet

Tensorflow Implementation of Pathnet from Google Deepmind.

Implementation is on Tensorflow r1.2

https://arxiv.org/pdf/1701.08734.pdf

"Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning." Form Paper

alt tag

Failure Story

Memory Leak Problem was happened without placeholder for geopath. Without placeholder, changing the value of tensor variable is to assign new memory, thus assigning new path for each generation caused memory leak and slow learning.

Binary MNIST classification tasks

python binary_mnist_pathnet.py

If you want to run that repeatly, then do as followed.

./auto_binary_mnist_pathnet.sh

Settings

L, M, N, B and the number of populations are 3, 10, 3, 2 and 20, respectively (In paper, the number of populations is 64.). GradientDescent Method is used with learning rate=0.05 (In paper, learning rate=0.0001.). Aggregation function between layers is average (In paper, that is summation.). Skip connection, Resnet and linear modules are used for each layers except input layer. Fixed path of first task is always activated when feed-forwarding the networks on second task (In paper, the path is not always activated.). The learning is converaged, when training accuracy is over 99%.

Chrisantha Fernando (1st author of this paper) and I checked the results of the paper was generated when the value is 20. Thus, I set that as 20. I set bigger learning rate vaule than that of paper for getting results faster than before. Higher learning rate can accelate network learning faster than positive transfer learning. For de-accelating converage, average function is used. The author and I checked the paper results was generated when last aggregation function is average not summation (Except last one, others are summation.). Fixed path activation is for generating more dramatic results than before. For faster converage than before, lower converage accuracy then before(99.8%) is used.

B candidates use same data batchs. geopath set and parameters except the ones on optimal path of first task are reset after finishing first task.

Results

alt tag alt tag alt tag

The experiments are 1vs3 <-> 1vs2 and 4vs5 <-> 6vs7. The reason of selecting those classes is to check positive transfer learning whenever there are sharing class or not.

1vs3 experiments showed first task and second task after 1vs2 converage generation means are about 168.25 and 82.64. Pathnet made about 2 times faster converage than that from the scratch.

1vs2 experiments showed first task and second task after 1vs3 converage generation means are about 196.60 and 118.32. Pathnet made about 1.7 times faster converage than that from the scratch.

4vs5 experiments showed first task and second task after 6vs7 converage generation means are about 270.68 and 149.31. Pathnet made about 1.8 times faster converage than that from the scratch.

6vs7 experiments showed first task and second task after 4vs5 converage generation means are about 93.69 and 55.91. Pathnet made about 1.7 times faster converage than that from the scratch.

Pathnet showed about 1.7~2 times better performance than that of "learning from scratch" on Binary MNIST Classification whenever there are sharing class or not.

CIFAR10 and SVHN classification tasks

python cifar_svhn_pathnet.py

If you want to run that repeatly, then do as followed.

./auto_cifar_svhn_pathnet.sh

Settings

L, M, N, B and the number of populations are 3, 20, 5, 2 and 20, respectively. GradientDescent Method is used with learning rate=0.2 (With learning rate=0.05, this task can not be learned. Thus, higher learning rate than before is set). The accuracy is checked after 500 epoches.

Except M, N and learning rate, other parameters are same to that of Binary MNIST classification task.

Results

alt tag

The experiments are CIFAR10 <-> SVHN.

CIFAR10 experiments showed first task and second task after SVHN accuracy means are about 38.56% and 41.75%. Pathnet made about 1.1 times higher accuracy than that from the scratch.

SVHN experiments showed first task and second task after CIFAR10 accuracy means are about 19.68% and 56.25%. Pathnet made about 2.86 times higher accuracy than that from the scratch.

Pathnet showed positive transfer learning performance for both of the datasets. For SVHN, quitely higher transfer learning performance than CIFAR10 is showed. Because, CIFAR10 dataset has more plenty of patterns than SVHN.

Atari Game (Pong)

./auto_atari_pathnet.sh

This module is implemented by Distributed Tensorflow. You can set the number of parameter server and worker in the shell script, and please before running that, check the port is idle (used port number is from 2222 to 2222+ps#+w#).

Basic code for A3C is based on https://github.com/miyosuda/async_deep_reinforce

Settings

L, M, N are 4, 10, anf 4, respectively (same to the paper). The feature for each conv layer is 8 (same to original ones from author, I did check that.) B and the number ofpopulations are 3 and 10, respectively, which are different to the paper, because my server cannot run 64 worker parallelly, thus, I did decrease the number of populations and B. Aggregation function between layers is summation for faster learning than average (In paper, that is summation.).

I implemented PathNet with Distributed Tensorflow by adding one worker for processing genetic algorithm. The worker checks score set including each worker's one, and operates genetic algorithm (in here, that is tournament algorithm.). Those operation is processed per each 5 seconds. As same to the paper, winner score is not initialized as -1000.

I apply LSTM layer after last layer of pathnet for learning the model more efficiently than original one. (LSTM layer is also initilized after the task.) (LSTM layer makes really more efficient learning than before. I checked the model except LSTM, which are saturated at about 100M step, however the model having LSTM just needs about 20M step.)

I used just pong game for checking positive transfer learning (the parameters except fixed path are initialzed after first task.), assumed second pong game will be more quickly saturated than first one. Each task learns pong game in 15M steps, and I checked score graph in tensorboard.

Results

alt tag

The experiments are just two times pong game for checking positive transfer learning (the parameters except fixed path are initialzed after first task.), and I assumed second pong game will be more quickly saturated than first one. Each task learns pong game in 15M steps, and I checked score graph in tensorboard.

I can check second pong game was saturated i more quickly than first one.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].