All Projects → pemami4911 → sinkhorn-policy-gradient.pytorch

pemami4911 / sinkhorn-policy-gradient.pytorch

Licence: BSD-3-Clause license
Code accompanying the paper "Learning Permutations with Sinkhorn Policy Gradient"

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to sinkhorn-policy-gradient.pytorch

ecole
Extensible Combinatorial Optimization Learning Environments
Stars: ✭ 249 (+591.67%)
Mutual labels:  combinatorial-optimization
mxfactorial
a payment application intended for deployment by the united states treasury
Stars: ✭ 36 (+0%)
Mutual labels:  combinatorial-optimization
contextual
Contextual Bandits in R - simulation and evaluation of Multi-Armed Bandit Policies
Stars: ✭ 72 (+100%)
Mutual labels:  contextual-bandits
mabwiser
[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library
Stars: ✭ 120 (+233.33%)
Mutual labels:  contextual-bandits
conjure
Conjure: The Automated Constraint Modelling Tool
Stars: ✭ 84 (+133.33%)
Mutual labels:  combinatorial-optimization
onn
Online Deep Learning: Learning Deep Neural Networks on the Fly / Non-linear Contextual Bandit Algorithm (ONN_THS)
Stars: ✭ 139 (+286.11%)
Mutual labels:  contextual-bandits
genstar
Generation of Synthetic Populations Library
Stars: ✭ 17 (-52.78%)
Mutual labels:  combinatorial-optimization
gym-forest
Reinforcement learning environment for the classical synthesis of quantum programs.
Stars: ✭ 25 (-30.56%)
Mutual labels:  combinatorial-optimization
pack
📦 lightweight rectangle packing algorithm
Stars: ✭ 31 (-13.89%)
Mutual labels:  combinatorial-optimization
eco-dqn
Implementation of ECO-DQN as reported in "Exploratory Combinatorial Optimization with Reinforcement Learning".
Stars: ✭ 49 (+36.11%)
Mutual labels:  combinatorial-optimization
graph-convnet-tsp
Code for the paper 'An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem' (INFORMS Annual Meeting Session 2019)
Stars: ✭ 196 (+444.44%)
Mutual labels:  combinatorial-optimization
adversarial-code-generation
Source code for the ICLR 2021 work "Generating Adversarial Computer Programs using Optimized Obfuscations"
Stars: ✭ 16 (-55.56%)
Mutual labels:  combinatorial-optimization
submodlib
Summarize Massive Datasets using Submodular Optimization
Stars: ✭ 36 (+0%)
Mutual labels:  combinatorial-optimization
GHOST
General meta-Heuristic Optimization Solving Toolkit
Stars: ✭ 28 (-22.22%)
Mutual labels:  combinatorial-optimization
ThinkMatch
Code & pretrained models of novel deep graph matching methods.
Stars: ✭ 639 (+1675%)
Mutual labels:  combinatorial-optimization
Agents
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Stars: ✭ 2,135 (+5830.56%)
Mutual labels:  contextual-bandits
Vowpal wabbit
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
Stars: ✭ 7,815 (+21608.33%)
Mutual labels:  contextual-bandits
MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-58.33%)
Mutual labels:  contextual-bandits

sinkhorn-policy-gradient.pytorch

This repository contains code accompanying Learning Permutations with Sinkhorn Policy Gradient that can be used to replicate the experiments for sorting with N={20,50}, maximum weight matching with N={10,15,20,25}, and the Euclidean TSP with N=20.

What is Sinkhorn Policy Gradient?

SPG is an off-policy actor-critic deterministic policy gradient algorithm. It can be used to train the SPG+Matching and SPG+Sequential deep network architectures from the paper to solve combinatorial optimization problems involving permutations.

SPG+Matching

Dependencies

  • PyTorch
    • Tested with versions 0.2 with cuda80 and 0.3.1 with cuda90
  • h5py
  • tqdm
  • tensorboard_logger
  • pathos
  • scikit-learn (0.19.1)

Data

Download the data used for all experiments in the paper here. Create a directory called data in the base directory of the repo, and unzip the three zip files there.

For sorting and Euclidean TSP, a train and test dataset will automatically be created if you try to run an experiment without the dataset existing in the required folder. For MWM, you can set a variable (see below) to optionally force the creation of new train/test/val datasets.

Running the experiments

To run an experiment, modify the variables in the run_spg.sh or run_nco.sh file. I prefer this extra layer around argparse so you don't have to deal with typing the long list of command line arguments. I will briefly explain the important variables here.

N.B. I have --_id set up with argparse for FGMachine.

SPG (run_spg.sh)

  • N_NODES Sets the problem size.
  • N_FEATURES Feature dimension of problem instance.
  • COP The Combinatorial Optimization Problem. Choose from {mwm2D_$N_NODES, sort_0-19, sort_0-49, tsp_$N_NODES}.
  • ACTOR_WORKERS The number of cores to split the batch of problem instances across for parallel Hungarian method.
  • ARCH Choose from {sequential, matching}.
  • RANDOM_SEED Passed as CLI argument to run_spg.sh, e.g, ./run_spg.sh 1234.
  • RUN_NUM Passed as CLI argument to run_spg.sh, e.g., ./run_spg.sh 1234 -1.
  • PARALLEL_ENVS Number of problem instances in each batch in the forward pass.
  • BATCH_SIZE Number of problem instances to use in each backwards pass minibatch for gradient estimation.
  • N_EPOCHS Number of passes of the training set.
  • DISABLE_TENSORBOARD Don't log tensorboard outfile.
  • RNN_DIM Hidden layer dim for the GRU. Automatically doubled for the bidirectional GRU in SPG+Sequential.
  • CUDA_DEVICE Set the GPU device ID, default is 0.
  • REPLAY_BUFFER_GPU Store the replay buffer on the GPU or on the CPU (requires passing more tensors back and forth but can use system RAM).
  • SAVE_STATS Store rewards to a h5py file and store test scores to a json file for FGLab.
  • SAVE_MODEL Save model weights after each epoch.
  • BASE_DIR The directory where logs, models, fglab results, etc. will be saved.
  • MAKE_ONLY [mwm2D] -1 make all 0 only train 1 only test 2 only val 3 make none (default)

SPG Examples

sort-20:

N_NODES=20
N_FEATURES=1
COP='sort_0-19'
ARCH='sequential'

mwm2D-10:

N_NODES=10
N_FEATURES=2
COP="mwm2D_$N_NODES"
ARCH='matching'

tsp-20:

N_NODES=20
N_FEATURES=2
COP="tsp_$N_NODES"
ARCH='sequential'

PN-AC, PN-AC+Matching, AC+Matching (run_nco.sh)

  • INPUT_SIZE Sets the problem size.
  • TASK Equivalent to COP.
  • USE_DECODER Set to True for PN-AC (non-matching tasks) and PN-AC+Matching (matching tasks), and False for AC+Matching.
  • N_GLIMPSES The glimpse attention module in the pointer network. Default is 1 "glimpse" over the input sequence.
  • USE_TANH Apply tanh and multiply the logits by 10 in the attention layer in the pointer network, from Bello et. al. 2017.
  • CRITIC_BETA EMA beta hyperparameter. Default is 0.8.

Adding new environments

See the env directory and create a new file yournewenv_task.py that follows the structure of sorting_task.py. Basically, there should be a create_dataset(...) function, an EnvDataset class extending Dataset, and a reward function. Then, modify envs/dataset.py so that, if the COP or TASK is set to the name of the new env, the build(args, ...) function in envs/dataset.py will appropriately set the env, training_dataloader, and test_dataloader variables that it returns from yournewenv_task.py. The env variable here is just an alias for yournewenv_task.reward.

Licensing

Please read and respect the license. :)

Citations

Use this citation for the paper:

@article{emami2018learning,
   title = {Learning Permutations with Sinkhorn Policy Gradient},
   author = {Emami, Patrick and Ranka, Sanjay},
   journal = {arXiv:1805.07010 [cs.LG]},
   year = {2018}

If you use or modify this code for your work, please use the following citation:

@misc{emami2018spg,
  title = {sinkhorn-policy-gradient.pytorch}, 
  author = {Emami, Patrick and Ranka, Sanjay},
  howpublished = {\url{https://github.com/pemami4911/sinkhorn-policy-gradient.pytorch}},
  note = {Accessed: [Insert date here]}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].