Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → eborboihuc → Soundnet Tensorflow

eborboihuc / Soundnet Tensorflow

Licence: mit

TensorFlow implementation of "SoundNet".

Programming Languages

139335 projects - #7 most used programming language

Labels

tensorflow deep-neural-networks audio-analysis

Projects that are alternatives of or similar to Soundnet Tensorflow

Open Source, Cloud Native, RESTful Search Engine Powered by Neural Networks

Stars: ✭ 127 (-7.3%)

Mutual labels: deep-neural-networks

Top 10 Computer Vision Papers 2020

A list of the top 10 computer vision papers in 2020 with video demos, articles, code and paper reference.

Stars: ✭ 132 (-3.65%)

Mutual labels: deep-neural-networks

DeepPose implementation on TensorFlow. Original Paper http://arxiv.org/abs/1312.4659

Stars: ✭ 135 (-1.46%)

Mutual labels: deep-neural-networks

Swa object detection

SWA Object Detection

Stars: ✭ 128 (-6.57%)

Mutual labels: deep-neural-networks

LabelD is a quick and easy-to-use image annotation tool, built for academics, data scientists, and software engineers to enable single track or distributed image tagging. LabelD supports both localized, in-image (multi-)tagging, as well as image categorization.

Stars: ✭ 129 (-5.84%)

Mutual labels: deep-neural-networks

A beginner-friendly project to help you in open-source contributions. Data Structures & Algorithms in various programming languages Please leave a star ⭐ to support this project! ✨

Stars: ✭ 132 (-3.65%)

Mutual labels: deep-neural-networks

Simple Neural Network

Creating a simple neural network in Python with one input layer (3 inputs) and one output neuron.

Stars: ✭ 126 (-8.03%)

Mutual labels: deep-neural-networks

Deep Steganography

Hiding Images within other images using Deep Learning

Stars: ✭ 136 (-0.73%)

Mutual labels: deep-neural-networks

Repository for different network models related to flow/disparity (ECCV 18)

Stars: ✭ 130 (-5.11%)

Mutual labels: deep-neural-networks

Robot Grasp Detection

Detecting robot grasping positions with deep neural networks. The model is trained on Cornell Grasping Dataset. This is an implementation mainly based on the paper 'Real-Time Grasp Detection Using Convolutional Neural Networks' from Redmon and Angelova.

Stars: ✭ 134 (-2.19%)

Mutual labels: deep-neural-networks

Programmable Neural Network Compression

Stars: ✭ 129 (-5.84%)

Mutual labels: deep-neural-networks

Training and inference code for ExpandNet

Stars: ✭ 128 (-6.57%)

Mutual labels: deep-neural-networks

Voice activity detection

Voice Activity Detection based on Deep Learning & TensorFlow

Stars: ✭ 132 (-3.65%)

Mutual labels: deep-neural-networks

A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation

Stars: ✭ 127 (-7.3%)

Mutual labels: deep-neural-networks

Kiu Net Pytorch

Official Pytorch Code of KiU-Net for Image Segmentation - MICCAI 2020 (Oral)

Stars: ✭ 134 (-2.19%)

Mutual labels: deep-neural-networks

Pytorch convlstm

convolutional lstm implementation in pytorch

Stars: ✭ 126 (-8.03%)

Mutual labels: deep-neural-networks

Code for CVPR 2019 paper: " Learning Deep Compositional Grammatical Architectures for Visual Recognition"

Stars: ✭ 132 (-3.65%)

Mutual labels: deep-neural-networks

Deep neural network to extract intelligent information from invoice documents.

Stars: ✭ 1,886 (+1276.64%)

Mutual labels: deep-neural-networks

Attention-guided CNN for image denoising(Neural Networks,2020)

Stars: ✭ 135 (-1.46%)

Mutual labels: deep-neural-networks

Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow

Stars: ✭ 135 (-1.46%)

Mutual labels: deep-neural-networks

View All Similar Projects ➔

SoundNet-tensorflow

TensorFlow implementation of "SoundNet" that learns rich natural sound representations.

Code for paper "SoundNet: Learning Sound Representations from Unlabeled Video" by Yusuf Aytar, Carl Vondrick, Antonio Torralba. NIPS 2016

Prerequisites

Linux
NVIDIA GPU + CUDA 8.0 + CuDNNv5.1
Python 2.7 with numpy or Python 3.5
Tensorflow 1.0.0 (up to 1.3.0)
librosa

Getting Started

Clone this repo:

git clone [email protected]:eborboihuc/SoundNet-tensorflow.git
cd SoundNet-tensorflow

Pretrained Model

I provide pre-trained models that are ported from soundnet. You can download the 8 layer model here. Please place it as ./models/sound8.npy in your folder.

Data

Prepare you input mp3 files and place them under ./data/

Generate a input file txt and place it under ./

./data/0001.mp3
./data/0002.mp3
./data/0003.mp3
...

Follow the steps in extract features

NOTE

If you found out that some audio with offset value start in FFMPEG will cause a tremendous difference between torch audio and librosa, please convert it with following command.

sox {input.mp3} {output.mp3} trim 0

After this, the result might be much better.

Demo

For demo, you can follow the following steps

i) Download a converted npy file demo.npy and place it under ./data/

ii) To extract multiple features from a pretrained model with torch lua audio loaded sound track: The sound track is equivalent with torch version.

python extract_feat.py -m {start layer number} -x {end layer numbe} -s

Then you can compare the outputs with torch ones.

Feature Extraction

Minimum example

i) Download input file demo.mp3 and place it under ./data/

ii) Prepare a file list in txt format (demo.txt) that includes the input mp3 file(s) and place it under ./

./data/demo.mp3

iii) Then extract features from raw wave in demo.txt: Please put the demo mp3 under ./data/demo.mp3

python extract_feat.py -m {start layer number} -x {end layer numbe} -s -p extract -t demo.txt

More options

To extract multiple features from a pretrained model with downloaded mp3 dataset:

python extract_feat.py -t {dataset_txt_name} -m {start layer number} -x {end layer numbe} -s -p extract

e.g. extract layer 4 to layer 17 and save as ./sound_out/tf_fea%02d.npy:

python extract_feat.py -o sound_out -m 4 -x 17 -s -p extract

More details are in:

python extract_feat.py -h

Finetuning

To train from an existing model:

python main.py

Training

To train from scratch:

python main.py -p train

To extract features:

python main.py -p extract -m {start layer number} -x {end layer numbe} -s

More details are in:

python main.py -h

TODOs

[x] Change audio loader to soundnet format
[x] Make it compatible to Python 3 format
[ ] Batch Norm behaviour different from Torch
[ ] Fix conv8 padding issue in training phase
[ ] Change all config into tf.app.flags
[ ] Change dummy distribution of scene and object to useful placeholder
[ ] Add sound and feature loader from Data section

Known issues

Loaded audio length is not consist in torch7 audio and librosa. Here is the issue
Training with a short length audio will make conv8 complain about output size would be negative

FAQs

Why my loaded sound wave is different from torch7 audio to librosa: Here is my WiKi

Acknowledgments

Code ported from soundnet. And Torch7-Tensorflow loader are from tf_videogan. Thanks for their excellent work!

Author

Hou-Ning Hu / @eborboihuc

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 137

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗