dtak / adversarial-robustness-public

Licence: MIT license

Code for AAAI 2018 accepted paper: "Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients"

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to adversarial-robustness-public

Aspect Based Sentiment Analysis

💭 Aspect-Based-Sentiment-Analysis: Transformer & Explainable ML (TensorFlow)

Stars: ✭ 206 (+320.41%)

Mutual labels: interpretability

ViTs-vs-CNNs

[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Stars: ✭ 145 (+195.92%)

Mutual labels: robustness

concept-based-xai

Library implementing state-of-the-art Concept-based and Disentanglement Learning methods for Explainable AI

Stars: ✭ 41 (-16.33%)

Mutual labels: interpretability

recentrifuge

Recentrifuge: robust comparative analysis and contamination removal for metagenomics

Stars: ✭ 79 (+61.22%)

Mutual labels: robustness

kernel-mod

NeurIPS 2018. Linear-time model comparison tests.

Stars: ✭ 17 (-65.31%)

Mutual labels: interpretability

xai-iml-sota

Interesting resources related to Explainable Artificial Intelligence, Interpretable Machine Learning, Interactive Machine Learning, Human in Loop and Visual Analytics.

Stars: ✭ 51 (+4.08%)

Mutual labels: interpretability

Explainx

Explainable AI framework for data scientists. Explain & debug any blackbox machine learning model with a single line of code.

Stars: ✭ 196 (+300%)

Mutual labels: interpretability

ArenaR

Data generator for Arena - interactive XAI dashboard

Stars: ✭ 28 (-42.86%)

Mutual labels: interpretability

spatial-smoothing

(ICML 2022) Official PyTorch implementation of “Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness”.

Stars: ✭ 68 (+38.78%)

Mutual labels: robustness

SimP-GCN

Implementation of the WSDM 2021 paper "Node Similarity Preserving Graph Convolutional Networks"

Stars: ✭ 43 (-12.24%)

Mutual labels: robustness

thermostat

Collection of NLP model explanations and accompanying analysis tools

Stars: ✭ 126 (+157.14%)

Mutual labels: interpretability

pre-training

Pre-Training Buys Better Robustness and Uncertainty Estimates (ICML 2019)

Stars: ✭ 90 (+83.67%)

Mutual labels: robustness

Advances-in-Label-Noise-Learning

A curated (most recent) list of resources for Learning with Noisy Labels

Stars: ✭ 360 (+634.69%)

Mutual labels: robustness

Torch Cam

Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM)

Stars: ✭ 249 (+408.16%)

Mutual labels: interpretability

POPQORN

An Algorithm to Quantify Robustness of Recurrent Neural Networks

Stars: ✭ 44 (-10.2%)

Mutual labels: robustness

Captum

Model interpretability and understanding for PyTorch

Stars: ✭ 2,830 (+5675.51%)

Mutual labels: interpretability

ALPS 2021

XAI Tutorial for the Explainable AI track in the ALPS winter school 2021

Stars: ✭ 55 (+12.24%)

Mutual labels: interpretability

aliyun-mns

阿里云MNS

Stars: ✭ 13 (-73.47%)

Mutual labels: robustness

EgoCNN

Code for "Distributed, Egocentric Representations of Graphs for Detecting Critical Structures" (ICML 2019)

Stars: ✭ 16 (-67.35%)

Mutual labels: interpretability

Denoised-Smoothing-TF

Minimal implementation of Denoised Smoothing (https://arxiv.org/abs/2003.01908) in TensorFlow.

Stars: ✭ 19 (-61.22%)

Mutual labels: robustness

View All Similar Projects ➔

Adversarial Robustness (and Interpretability) via Gradient Regularization

This repository contains Python code and iPython notebooks used to run the experiments in Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients.

Main Idea

If you add an imperceptibly small amount of carefully crafted noise to an image which a neural network classifies correctly, you can usually cause it to make an incorrect prediction. This type of noise addition is called "adversarial perturbation," and the perturbed images are called adversarial examples. Unfortunately, it turns out that it's pretty easy to generate adversarial examples which (1) fool almost any model trained on the same dataset, and (2) continue to fool models even when printed out or viewed at different perspectives and scales. As neural networks start being used for things like face recognition and self-driving cars, this vulnerability poses an increasingly pressing problem.

In this repository, we try to tackle this problem directly, by training neural networks with a type of regularization that penalizes how sensitive their predictions are to infinitesimal changes in their inputs. This type of regularization moves examples further away from the decision boundary in input-space, and has the side-effect of making gradient-based explanations of the model -- as well as the adversarial perturbations themselves -- more human-interpretable. Check out the experiments below or the paper for more details!

Repository Structure

notebooks/ contains iPython notebooks replicating the main experiments from the paper:
- MNIST compares robustness to two adversarial attack methods (the FGSM and TGSM) when CNNs are trained on the MNIST dataset with with various forms of regularization: defensive distillation, adversarial training, and two forms of input gradient regularization. This is a good one to look at first, since it's got both the results and some textual explanation of what's going on.
- notMNIST does the same accuracy comparisons, but for the notMNIST dataset. We omit the textual explanations since it would be redundant with what's in the MNIST notebook.
- SVHN does the same for the Street View House Numbers dataset.
scripts/ contains code used to train models and generate / animate adversarial examples.
cached/ contains data files with trained model parameters and adversarial examples. The actual data is gitignored, but you can download it (see instructions below).
adversarial_robustness/ contains code modeling Python code for representing neural networks, datasets, and training / explanation / visualization / adversarial perturbation. Some of the code is strongly influenced by cleverhans and tensorflow-adversarial, but we've modified everything to be more object-oriented.

Replication

To immediately run the notebooks using models and adversarial examples used to generate figures in the paper, you can download this zipped directory, which should replace the cached/ subdirectory of this folder.

To fully replicate all experiments, you can use the files in the scripts directory to retrain models and regenerate adversarial examples.

This code was tested with Python 3.5 and Tensorflow >= 1.2.1. Most files should also work with Python 2.7, but training may not work with earlier versions of Tensorflow, which lack second-derivative support for many CNN operations.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dtak / adversarial-robustness-public

Programming Languages

Labels

Projects that are alternatives of or similar to adversarial-robustness-public

Adversarial Robustness (and Interpretability) via Gradient Regularization

Main Idea

Repository Structure

Replication