All Projects → dtak → adversarial-robustness-public

dtak / adversarial-robustness-public

Licence: MIT license
Code for AAAI 2018 accepted paper: "Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients"

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to adversarial-robustness-public

Aspect Based Sentiment Analysis
💭 Aspect-Based-Sentiment-Analysis: Transformer & Explainable ML (TensorFlow)
Stars: ✭ 206 (+320.41%)
Mutual labels:  interpretability
ViTs-vs-CNNs
[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)
Stars: ✭ 145 (+195.92%)
Mutual labels:  robustness
concept-based-xai
Library implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI
Stars: ✭ 41 (-16.33%)
Mutual labels:  interpretability
recentrifuge
Recentrifuge: robust comparative analysis and contamination removal for metagenomics
Stars: ✭ 79 (+61.22%)
Mutual labels:  robustness
kernel-mod
NeurIPS 2018. Linear-time model comparison tests.
Stars: ✭ 17 (-65.31%)
Mutual labels:  interpretability
xai-iml-sota
Interesting resources related to Explainable Artificial Intelligence, Interpretable Machine Learning, Interactive Machine Learning, Human in Loop and Visual Analytics.
Stars: ✭ 51 (+4.08%)
Mutual labels:  interpretability
Explainx
Explainable AI framework for data scientists. Explain & debug any blackbox machine learning model with a single line of code.
Stars: ✭ 196 (+300%)
Mutual labels:  interpretability
ArenaR
Data generator for Arena - interactive XAI dashboard
Stars: ✭ 28 (-42.86%)
Mutual labels:  interpretability
spatial-smoothing
(ICML 2022) Official PyTorch implementation of “Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness”.
Stars: ✭ 68 (+38.78%)
Mutual labels:  robustness
SimP-GCN
Implementation of the WSDM 2021 paper "Node Similarity Preserving Graph Convolutional Networks"
Stars: ✭ 43 (-12.24%)
Mutual labels:  robustness
thermostat
Collection of NLP model explanations and accompanying analysis tools
Stars: ✭ 126 (+157.14%)
Mutual labels:  interpretability
pre-training
Pre-Training Buys Better Robustness and Uncertainty Estimates (ICML 2019)
Stars: ✭ 90 (+83.67%)
Mutual labels:  robustness
Advances-in-Label-Noise-Learning
A curated (most recent) list of resources for Learning with Noisy Labels
Stars: ✭ 360 (+634.69%)
Mutual labels:  robustness
Torch Cam
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM)
Stars: ✭ 249 (+408.16%)
Mutual labels:  interpretability
POPQORN
An Algorithm to Quantify Robustness of Recurrent Neural Networks
Stars: ✭ 44 (-10.2%)
Mutual labels:  robustness
Captum
Model interpretability and understanding for PyTorch
Stars: ✭ 2,830 (+5675.51%)
Mutual labels:  interpretability
ALPS 2021
XAI Tutorial for the Explainable AI track in the ALPS winter school 2021
Stars: ✭ 55 (+12.24%)
Mutual labels:  interpretability
aliyun-mns
阿里云MNS
Stars: ✭ 13 (-73.47%)
Mutual labels:  robustness
EgoCNN
Code for "Distributed, Egocentric Representations of Graphs for Detecting Critical Structures" (ICML 2019)
Stars: ✭ 16 (-67.35%)
Mutual labels:  interpretability
Denoised-Smoothing-TF
Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.
Stars: ✭ 19 (-61.22%)
Mutual labels:  robustness

Adversarial Robustness (and Interpretability) via Gradient Regularization

This repository contains Python code and iPython notebooks used to run the experiments in Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients.

SVHN iterated tgsm gif

Main Idea

If you add an imperceptibly small amount of carefully crafted noise to an image which a neural network classifies correctly, you can usually cause it to make an incorrect prediction. This type of noise addition is called "adversarial perturbation," and the perturbed images are called adversarial examples. Unfortunately, it turns out that it's pretty easy to generate adversarial examples which (1) fool almost any model trained on the same dataset, and (2) continue to fool models even when printed out or viewed at different perspectives and scales. As neural networks start being used for things like face recognition and self-driving cars, this vulnerability poses an increasingly pressing problem.

In this repository, we try to tackle this problem directly, by training neural networks with a type of regularization that penalizes how sensitive their predictions are to infinitesimal changes in their inputs. This type of regularization moves examples further away from the decision boundary in input-space, and has the side-effect of making gradient-based explanations of the model -- as well as the adversarial perturbations themselves -- more human-interpretable. Check out the experiments below or the paper for more details!

Repository Structure

  • notebooks/ contains iPython notebooks replicating the main experiments from the paper:
    • MNIST compares robustness to two adversarial attack methods (the FGSM and TGSM) when CNNs are trained on the MNIST dataset with with various forms of regularization: defensive distillation, adversarial training, and two forms of input gradient regularization. This is a good one to look at first, since it's got both the results and some textual explanation of what's going on.
    • notMNIST does the same accuracy comparisons, but for the notMNIST dataset. We omit the textual explanations since it would be redundant with what's in the MNIST notebook.
    • SVHN does the same for the Street View House Numbers dataset.
  • scripts/ contains code used to train models and generate / animate adversarial examples.
  • cached/ contains data files with trained model parameters and adversarial examples. The actual data is gitignored, but you can download it (see instructions below).
  • adversarial_robustness/ contains code modeling Python code for representing neural networks, datasets, and training / explanation / visualization / adversarial perturbation. Some of the code is strongly influenced by cleverhans and tensorflow-adversarial, but we've modified everything to be more object-oriented.

Replication

To immediately run the notebooks using models and adversarial examples used to generate figures in the paper, you can download this zipped directory, which should replace the cached/ subdirectory of this folder.

To fully replicate all experiments, you can use the files in the scripts directory to retrain models and regenerate adversarial examples.

This code was tested with Python 3.5 and Tensorflow >= 1.2.1. Most files should also work with Python 2.7, but training may not work with earlier versions of Tensorflow, which lack second-derivative support for many CNN operations.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].