All Projects → eborboihuc → Soundnet Tensorflow

eborboihuc / Soundnet Tensorflow

Licence: mit
TensorFlow implementation of "SoundNet".

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Soundnet Tensorflow

Phantoscope
Open Source, Cloud Native, RESTful Search Engine Powered by Neural Networks
Stars: ✭ 127 (-7.3%)
Mutual labels:  deep-neural-networks
Top 10 Computer Vision Papers 2020
A list of the top 10 computer vision papers in 2020 with video demos, articles, code and paper reference.
Stars: ✭ 132 (-3.65%)
Mutual labels:  deep-neural-networks
Deeppose tf
DeepPose implementation on TensorFlow. Original Paper http://arxiv.org/abs/1312.4659
Stars: ✭ 135 (-1.46%)
Mutual labels:  deep-neural-networks
Swa object detection
SWA Object Detection
Stars: ✭ 128 (-6.57%)
Mutual labels:  deep-neural-networks
Labeld
LabelD is a quick and easy-to-use image annotation tool, built for academics, data scientists, and software engineers to enable single track or distributed image tagging. LabelD supports both localized, in-image (multi-)tagging, as well as image categorization.
Stars: ✭ 129 (-5.84%)
Mutual labels:  deep-neural-networks
Algobook
A beginner-friendly project to help you in open-source contributions. Data Structures & Algorithms in various programming languages Please leave a star ⭐ to support this project! ✨
Stars: ✭ 132 (-3.65%)
Mutual labels:  deep-neural-networks
Simple Neural Network
Creating a simple neural network in Python with one input layer (3 inputs) and one output neuron.
Stars: ✭ 126 (-8.03%)
Mutual labels:  deep-neural-networks
Deep Steganography
Hiding Images within other images using Deep Learning
Stars: ✭ 136 (-0.73%)
Mutual labels:  deep-neural-networks
Netdef models
Repository for different network models related to flow/disparity (ECCV 18)
Stars: ✭ 130 (-5.11%)
Mutual labels:  deep-neural-networks
Robot Grasp Detection
Detecting robot grasping positions with deep neural networks. The model is trained on Cornell Grasping Dataset. This is an implementation mainly based on the paper 'Real-Time Grasp Detection Using Convolutional Neural Networks' from Redmon and Angelova.
Stars: ✭ 134 (-2.19%)
Mutual labels:  deep-neural-networks
Condensa
Programmable Neural Network Compression
Stars: ✭ 129 (-5.84%)
Mutual labels:  deep-neural-networks
Hdr Expandnet
Training and inference code for ExpandNet
Stars: ✭ 128 (-6.57%)
Mutual labels:  deep-neural-networks
Voice activity detection
Voice Activity Detection based on Deep Learning & TensorFlow
Stars: ✭ 132 (-3.65%)
Mutual labels:  deep-neural-networks
Aboleth
A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation
Stars: ✭ 127 (-7.3%)
Mutual labels:  deep-neural-networks
Kiu Net Pytorch
Official Pytorch Code of KiU-Net for Image Segmentation - MICCAI 2020 (Oral)
Stars: ✭ 134 (-2.19%)
Mutual labels:  deep-neural-networks
Pytorch convlstm
convolutional lstm implementation in pytorch
Stars: ✭ 126 (-8.03%)
Mutual labels:  deep-neural-networks
Aognet
Code for CVPR 2019 paper: " Learning Deep Compositional Grammatical Architectures for Visual Recognition"
Stars: ✭ 132 (-3.65%)
Mutual labels:  deep-neural-networks
Invoicenet
Deep neural network to extract intelligent information from invoice documents.
Stars: ✭ 1,886 (+1276.64%)
Mutual labels:  deep-neural-networks
Adnet
Attention-guided CNN for image denoising(Neural Networks,2020)
Stars: ✭ 135 (-1.46%)
Mutual labels:  deep-neural-networks
Graffitist
Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow
Stars: ✭ 135 (-1.46%)
Mutual labels:  deep-neural-networks

SoundNet-tensorflow

TensorFlow implementation of "SoundNet" that learns rich natural sound representations.

Code for paper "SoundNet: Learning Sound Representations from Unlabeled Video" by Yusuf Aytar, Carl Vondrick, Antonio Torralba. NIPS 2016

from soundnet

Prerequisites

  • Linux
  • NVIDIA GPU + CUDA 8.0 + CuDNNv5.1
  • Python 2.7 with numpy or Python 3.5
  • Tensorflow 1.0.0 (up to 1.3.0)
  • librosa

Getting Started

  • Clone this repo:
git clone [email protected]:eborboihuc/SoundNet-tensorflow.git
cd SoundNet-tensorflow
  • Pretrained Model

I provide pre-trained models that are ported from soundnet. You can download the 8 layer model here. Please place it as ./models/sound8.npy in your folder.

  • Data

Prepare you input mp3 files and place them under ./data/

Generate a input file txt and place it under ./

./data/0001.mp3
./data/0002.mp3
./data/0003.mp3
...

Follow the steps in extract features

  • NOTE

If you found out that some audio with offset value start in FFMPEG will cause a tremendous difference between torch audio and librosa, please convert it with following command.

sox {input.mp3} {output.mp3} trim 0

After this, the result might be much better.

Demo

For demo, you can follow the following steps

i) Download a converted npy file demo.npy and place it under ./data/

ii) To extract multiple features from a pretrained model with torch lua audio loaded sound track: The sound track is equivalent with torch version.

python extract_feat.py -m {start layer number} -x {end layer numbe} -s

Then you can compare the outputs with torch ones.

Feature Extraction

Minimum example

i) Download input file demo.mp3 and place it under ./data/

ii) Prepare a file list in txt format (demo.txt) that includes the input mp3 file(s) and place it under ./

./data/demo.mp3

iii) Then extract features from raw wave in demo.txt: Please put the demo mp3 under ./data/demo.mp3

python extract_feat.py -m {start layer number} -x {end layer numbe} -s -p extract -t demo.txt

More options

To extract multiple features from a pretrained model with downloaded mp3 dataset:

python extract_feat.py -t {dataset_txt_name} -m {start layer number} -x {end layer numbe} -s -p extract

e.g. extract layer 4 to layer 17 and save as ./sound_out/tf_fea%02d.npy:

python extract_feat.py -o sound_out -m 4 -x 17 -s -p extract

More details are in:

python extract_feat.py -h

Finetuning

To train from an existing model:

python main.py 

Training

To train from scratch:

python main.py -p train

To extract features:

python main.py -p extract -m {start layer number} -x {end layer numbe} -s

More details are in:

python main.py -h

TODOs

  • [x] Change audio loader to soundnet format
  • [x] Make it compatible to Python 3 format
  • [ ] Batch Norm behaviour different from Torch
  • [ ] Fix conv8 padding issue in training phase
  • [ ] Change all config into tf.app.flags
  • [ ] Change dummy distribution of scene and object to useful placeholder
  • [ ] Add sound and feature loader from Data section

Known issues

  • Loaded audio length is not consist in torch7 audio and librosa. Here is the issue
  • Training with a short length audio will make conv8 complain about output size would be negative

FAQs

  • Why my loaded sound wave is different from torch7 audio to librosa: Here is my WiKi

Acknowledgments

Code ported from soundnet. And Torch7-Tensorflow loader are from tf_videogan. Thanks for their excellent work!

Author

Hou-Ning Hu / @eborboihuc

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].