All Projects → coallaoh → WhitenBlackBox

coallaoh / WhitenBlackBox

Licence: MIT license
Towards Reverse-Engineering Black-Box Neural Networks, ICLR'18

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to WhitenBlackBox

mwptools
ground station, mission planner and tools for inav and multiwii-nav
Stars: ✭ 147 (+212.77%)
Mutual labels:  blackbox
icml-nips-iclr-dataset
Papers, authors and author affiliations from ICML, NeurIPS and ICLR 2006-2021
Stars: ✭ 21 (-55.32%)
Mutual labels:  iclr
Ssl Kill Switch2
Blackbox tool to disable SSL certificate validation - including certificate pinning - within iOS and macOS applications.
Stars: ✭ 2,420 (+5048.94%)
Mutual labels:  blackbox
Interpret
Fit interpretable models. Explain blackbox machine learning.
Stars: ✭ 4,352 (+9159.57%)
Mutual labels:  blackbox
madbomber
Backtrace-on-throw C++ exception logger
Stars: ✭ 17 (-63.83%)
Mutual labels:  blackbox
WhiteBox-Part1
In this part, I've introduced and experimented with ways to interpret and evaluate models in the field of image. (Pytorch)
Stars: ✭ 34 (-27.66%)
Mutual labels:  blackbox
dextop
Dextop - Linux-based distribution workstation on Android
Stars: ✭ 24 (-48.94%)
Mutual labels:  blackbox
blackbox-log-viewer
Interactive log viewer for flight logs recorded with blackbox
Stars: ✭ 17 (-63.83%)
Mutual labels:  blackbox
MultiBUGS
Multi-core BUGS for fast Bayesian inference of large hierarchical models
Stars: ✭ 28 (-40.43%)
Mutual labels:  blackbox
Awesome-Computer-Vision-Paper-List
This repository contains all the papers accepted in top conference of computer vision, with convenience to search related papers.
Stars: ✭ 248 (+427.66%)
Mutual labels:  iclr
cool-papers-in-pytorch
Reimplementing cool papers in PyTorch...
Stars: ✭ 21 (-55.32%)
Mutual labels:  iclr
skip-connections-matter
Codes for ICLR 2020 paper "Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets"
Stars: ✭ 61 (+29.79%)
Mutual labels:  iclr
TailCalibX
Pytorch implementation of Feature Generation for Long-Tail Classification by Rahul Vigneswaran, Marc T Law, Vineeth N Balasubramaniam and Makarand Tapaswi
Stars: ✭ 32 (-31.91%)
Mutual labels:  iclr
deep-weight-prior
The Deep Weight Prior, ICLR 2019
Stars: ✭ 42 (-10.64%)
Mutual labels:  iclr
hierarchical-dnn-interpretations
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)
Stars: ✭ 110 (+134.04%)
Mutual labels:  iclr
CGCF-ConfGen
🧪 Learning Neural Generative Dynamics for Molecular Conformation Generation (ICLR 2021)
Stars: ✭ 41 (-12.77%)
Mutual labels:  iclr

Towards Reverse-Engineering Black-Box Neural Networks, ICLR'18

Seong Joon Oh, Max Augustin, Bernt Schiele, Mario Fritz.

Max-Planck Institute for Informatics.

Towards Reverse-Engineering Black-Box Neural Networks, ICLR'18

Many deployed learned models are black boxes: given input, returns output. Internal information about the model, such as the architecture, optimisation procedure, or training data, is not disclosed explicitly as it might contain proprietary information or make the system more vulnerable. This work shows that such attributes of neural networks can be exposed from a sequence of queries. This has multiple implications. On the one hand, our work exposes the vulnerability of black-box neural networks to different types of attacks -- we show that the revealed internal information helps generate more effective adversarial examples against the black box model. On the other hand, this technique can be used for better protection of private content from automatic recognition models using adversarial examples. Our paper suggests that it is actually hard to draw a line between white box and black box models.

Metamodels for reverse-engineering network hyperparameters

We extract diverse types of information from a black-box neural network (which we call model attributes; examples include the non-linear activation type, optimisation algorithm, training dataset) by observing its output with respect to certain query inputs. This is achieved by learning the correlation between the network attributes and certain patterns in the network's output. The correlation is learned by training a classifier over outputs from multiple models to predict the model attributes - we call this a metamodel because it literally classifies classifiers. We introduce three novel metamodel methods in this project. They differ in the way they choose the query inputs and interpret the corresponding outputs.

kennen-o

The simplest one - kennen-o - selects the query inputs at random from a dataset. An MLP classifier is trained over the outputs with respect to the selected inputs to predict network attributes. See the figure above.

kennen-i

Our second approach - kennen-i - approaches the problem from a completely different point of view. For the sake of clarity, we take an MNIST digit classifier as an example. Over multiple white-box models (training set models), we craft an input that is designed to expose inner secrets of the training set models. This crafted input turns out to generalise very well to unseen black-box models, in the sense that it also reveals the secrets of the unseen black box. More specifically, using gradient signals from a diverse set of white box models, we design a query input that forces an MNIST digit classifier to predict 0 if the classifier has the attribute A, and 1 if it doesn't. In other words, the crafted input re-purposes a digit classifier into a model attribute classifier. See the figure above for the training procedure. We also show below some learned query inputs which are designed to induce the prediction of label 0 if the victim black box has a max-pooling layer, train-time dropout layer, and kernel size 3, respectively, and 1 otherwise.

Max-Pooling, yes or no? Dropout, yes or no? Kernel Size, 3 or 5?
Crafted input
Reverse-engineering
success rate
(random chance)
94.8%
(50%)
77.0%
(50%)
88.5%
(50%)

They share similarities to adversarial examples to neural networks (Explaining and Harnessing Adversarial Examples) that are also designed to alter the behaviour a neural network. The only difference is the goal. The goal of adversarial examples is to induce a specific output (e.g. wrong output, specific prediction for malicious purpose). The goal of the kennen-i inputs is to expose the model attributes. They both seem to generalise well to unseen models, enabling attacks on black boxes. (See Delving into Transferable Adversarial Examples and Black-Box Attacks for the transferability of adversarial examples.)

kennen-io

Our final metamodel - kennen-io - combines kennen-o and kennen-i.

More in the ICLR paper!

The ICLR paper contains much more detailed experimental results on MNIST, including the prediction of 12 diverse model attributes, as well as extrapolation setups where the test black-box model is significantly different from the training models. We also show results on attacking black-box ImageNet classifiers with adversarial examples generated using the reverse-engineered information.

Environment

We support python versions 2.7 and 3.5 for this project. Conda environment with pytorch (with cuda8.0 or 10.0) has been used. Tested on pytorch versions 0.4.1 and 1.1.0.

Installation

VERY IMPORTANT Clone this repository recursively.

$ git clone https://github.com/coallaoh/WhitenBlackBox.git --recursive

Download data

Run the following commands to download and untar the necessary data (6.3MB).

$ mkdir cache && wget https://datasets.d2.mpi-inf.mpg.de/joon18iclr/mnist_val.pkl.tar.gz -P cache/ && cd cache && tar xvf mnist_val.pkl.tar.gz && cd ..

(Optional) Download the MNIST-NET dataset

MNIST-NET is a dataset of 11,282 diverse MNIST digit classifiers. The full pipeline for generating MNIST-NET is included in the repository (see below). The generation has taken about 40 GPU days with NVIDIA Tesla K80. Alternatively, the dataset can be downloaded from this link (19GB). Untar the downloaded file in the cache/ folder.

Follow the steps below:

wget https://datasets.d2.mpi-inf.mpg.de/joon18iclr/MNIST-NET.tar.gz
tar -xvzf MNIST-NET.tar.gz -C cache/

Running the code

Running

$ ./run.py

will (1) generate the MNIST-NET dataset and (2) train and evaluate various metamodels (kennen variants - see paper) on the MNIST-NET. Read run.py in detail for more information on configuration etc.

Contact

For any problem with implementation or bug, please contact Seong Joon Oh (coallaoh at gmail).

Citation

  @article{joon18iclr,
    title = {Towards Reverse-Engineering Black-Box Neural Networks},
    author = {Oh, Seong Joon and Augustin, Max and Schiele, Bernt and Fritz, Mario},
    year = {2018},
    journal = {International Conference on Learning Representations},
  }
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].