All Projects → astorfi → attention-guided-sparsity

astorfi / attention-guided-sparsity

Licence: MIT License
Attention-Based Guided Structured Sparsity of Deep Neural Networks

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to attention-guided-sparsity

nystrom-attention
Implementation of Nyström Self-attention, from the paper Nyströmformer
Stars: ✭ 83 (+219.23%)
Mutual labels:  attention-mechanism
nuwa-pytorch
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Stars: ✭ 347 (+1234.62%)
Mutual labels:  attention-mechanism
MoChA-pytorch
PyTorch Implementation of "Monotonic Chunkwise Attention" (ICLR 2018)
Stars: ✭ 65 (+150%)
Mutual labels:  attention-mechanism
FragmentVC
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention
Stars: ✭ 134 (+415.38%)
Mutual labels:  attention-mechanism
PAM
[TPAMI 2020] Parallax Attention for Unsupervised Stereo Correspondence Learning
Stars: ✭ 62 (+138.46%)
Mutual labels:  attention-mechanism
SelfAttentive
Implementation of A Structured Self-attentive Sentence Embedding
Stars: ✭ 107 (+311.54%)
Mutual labels:  attention-mechanism
SMSR
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
Stars: ✭ 205 (+688.46%)
Mutual labels:  sparsity
Attention
一些不同的Attention机制代码
Stars: ✭ 17 (-34.62%)
Mutual labels:  attention-mechanism
QuantumForest
Fast Differentiable Forest lib with the advantages of both decision trees and neural networks
Stars: ✭ 63 (+142.31%)
Mutual labels:  attention-mechanism
CompareModels TRECQA
Compare six baseline deep learning models on TrecQA
Stars: ✭ 61 (+134.62%)
Mutual labels:  attention-mechanism
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (+23.08%)
Mutual labels:  attention-mechanism
Video-Cap
🎬 Video Captioning: ICCV '15 paper implementation
Stars: ✭ 44 (+69.23%)
Mutual labels:  attention-mechanism
Transformer-in-Transformer
An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches
Stars: ✭ 40 (+53.85%)
Mutual labels:  attention-mechanism
enformer-pytorch
Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch
Stars: ✭ 146 (+461.54%)
Mutual labels:  attention-mechanism
keras attention
🔖 An Attention Layer in Keras
Stars: ✭ 43 (+65.38%)
Mutual labels:  attention-mechanism
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-26.92%)
Mutual labels:  attention-mechanism
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (+38.46%)
Mutual labels:  attention-mechanism
pynmt
a simple and complete pytorch implementation of neural machine translation system
Stars: ✭ 13 (-50%)
Mutual labels:  attention-mechanism
co-attention
Pytorch implementation of "Dynamic Coattention Networks For Question Answering"
Stars: ✭ 54 (+107.69%)
Mutual labels:  attention-mechanism
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+507.69%)
Mutual labels:  attention-mechanism

Attention-Based Guided Structured Sparsity of Deep Neural Networks

https://travis-ci.org/astorfi/attention-guided-sparsity.svg?branch=master https://coveralls.io/repos/github/astorfi/attention-guided-sparsity/badge.svg?branch=master https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat https://badges.frapsoft.com/os/v2/open-source.svg?v=102 http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat Project Status: Active – The project has reached a stable, usable state and is being actively developed.

This repository contains the code developed by TensorFlow for our paper:

Table of Contents

Goal and Outcome

Network pruning is aimed at imposing sparsity in a neural network architecture by increasing the portion of zero-valued weights for reducing its size energy efficiency consideration and increasing evaluation speed. In most of the conducted research efforts, the sparsity is enforced for network pruning without any attention to the internal network characteristics such as unbalanced outputs of the neurons or more specifically the distribution of the weights and outputs of the neurons. That may cause severe accuracy drop due to uncontrolled sparsity. In this work, we propose an attention mechanism that simultaneously controls the sparsity intensity and supervised network pruning by keeping important information bottlenecks of the network to be active. On CIFAR-10, the proposed method outperforms the best baseline method by 6% and reduced the accuracy drop by 2.6× at the same level of sparsity.

Scope of the works

In this work, we proposed a controller mechanism for network pruning with the goal of (1) model compression for having few active parameters by enforcing group sparsity, (2) preventing the accuracy drop by controlling the sparsity of the network using an additional loss function by forcing a portion of the output neurons to stay alive in each layer of the network, and (3) capability of being incorporated for any layer type

im

Requirements

TensorFLow

This code is written in Python and requires TensorFlow as the framework. For installation on Ubuntu, installing TensorFlow with GPU support can be as follows:

sudo apt-get install python3-pip python3-dev # for Python 3.n
pip3 install tensorflow-gpu

Please refer to Official TensorFLow installation guideline for further details considering your specific system architecture.

Code Implementation

dataset

For this repository, the experiments are performed on MNIST dataset which is available online. MNIST can directly be downloaded using the following script supported by TensorFLow:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(FLAGS.data_dir, fake_data=FLAGS.fake_data)

For which the FLAGS are predefined by argument parser.

Architecture

In the experiment on MNIST dataset, an architecture similar to LeNet has been utilized as a baseline for investigation of our proposed method with no data augmentation. The baseline architecture has been defined as below:

def net(x,training_status):

    with tf.name_scope('reshape'):
        x_image = tf.reshape(x, [-1, 28, 28, 1])

    h_conv1 = nn_conv_layer(x_image, [5, 5, 1, 64], [64], 'conv1', \
                            training_status=training_status, act=tf.nn.relu)

    with tf.name_scope('pool1'):
        h_pool1 = max_pool_2x2(h_conv1)

    h_conv2 = nn_conv_layer(h_pool1, [5, 5, 64, 128], [128], 'conv2',\
                            training_status=training_status, act=tf.nn.relu)

    # Second pooling layer.
    with tf.name_scope('pool2'):
        h_pool2 = max_pool_2x2(h_conv2)

    h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 128])

    h_fc1 = nn_layer(h_pool2_flat, 7 * 7 * 128, 512, 'fc1', \
                     training_status=training_status, act=tf.nn.relu)
    dropped_h_fc1 = tf.nn.dropout(h_fc1, keep_prob)

    h_fc2 = nn_layer(dropped_h_fc1, 512, 256, 'fc2', \
                     training_status=training_status, act=tf.nn.relu)
    dropped_h_fc2 = tf.nn.dropout(h_fc2, keep_prob)

    # Do not apply softmax activation yet, see below.
    output = nn_layer(dropped_h_fc2, 256, 10, 'softmax', \
                      training_status=training_status, act=tf.identity)

    return output, keep_prob

Training / Evaluation

Demo

speakerrecognition

Description

At first, clone the repository. Then, cd to the dedicated directory:

cd python

Then, execute the main.py:

python main.py --max_steps=100000

Using the above script, the code does the following:

  • Automatically download the dataset
  • Starts training
  • Does the evaluation while training is running.
  • Continue training up to 100000 steps.

NOTE: If you are using a virtual environment which contains TensorFLow, make sure to activate it before running the model.

Results

The below figure depicts a comparison at different levels of sparsity. As it can be observed from the figure, our method demonstrates its superiority in higher levels of sparsity. We named our proposed method as Guided Structured Sparsity (GSS).

imcomp

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].