All Projects → iamvishnuks → AudioNet

iamvishnuks / AudioNet

Licence: MIT License
Audio Classification using Image Classification

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to AudioNet

TransforLearning TensorFlow
使用预训练好的InceptionV3模型对自己的数据进行分类,用这个代码的同学希望可以给一个star
Stars: ✭ 58 (+26.09%)
Mutual labels:  inceptionv3
OCR
Optical character recognition Using Deep Learning
Stars: ✭ 25 (-45.65%)
Mutual labels:  tensorflow-experiments
VoiceNET.Library
.NET library to easily create Voice Command Control feature.
Stars: ✭ 14 (-69.57%)
Mutual labels:  spectrogram
equalizer
SoundCloud music player with equalizer
Stars: ✭ 25 (-45.65%)
Mutual labels:  spectrogram
spectrogram-tutorial
A walkthrough of how to make spectrograms in python that are customized for human speech research.
Stars: ✭ 31 (-32.61%)
Mutual labels:  spectrogram
Tensorflow-Wide-Deep-Local-Prediction
This project demonstrates how to run and save predictions locally using exported tensorflow estimator model
Stars: ✭ 28 (-39.13%)
Mutual labels:  tensorflow-experiments
RAE
基于tensorflow搭建的神经网络recursive autuencode,用于实现句子聚类
Stars: ✭ 12 (-73.91%)
Mutual labels:  tensorflow-experiments
FineGrainedVisualRecognition
Fine grained visual recognition tensorflow baseline on CUB, Stanford Cars, Dogs, Aircrafts, and Flower102.
Stars: ✭ 19 (-58.7%)
Mutual labels:  tensorflow-experiments
Dog-or-Cat-TensorflowSharp-Example
An example for TensorflowSharp - classify an image as a dog or cat.
Stars: ✭ 15 (-67.39%)
Mutual labels:  tensorflow-experiments
VirtualBLU
A Virtual Assistant for Windows PC with wicked Qt Graphics.
Stars: ✭ 41 (-10.87%)
Mutual labels:  tensorflow-experiments
FftSharp
A .NET Standard library for computing the Fast Fourier Transform (FFT) of real or complex data
Stars: ✭ 132 (+186.96%)
Mutual labels:  spectrogram
PSO in TensorFlow
PSO algorithm written in TensorFlow
Stars: ✭ 18 (-60.87%)
Mutual labels:  tensorflow-experiments
bird species classification
Supervised Classification of bird species 🐦 in high resolution images, especially for, Himalayan birds, having diverse species with fairly low amount of labelled data
Stars: ✭ 59 (+28.26%)
Mutual labels:  inceptionv3
Awesome-Tensorflow2
基于Tensorflow2开发的优秀扩展包及项目
Stars: ✭ 45 (-2.17%)
Mutual labels:  tensorflow-experiments
Vn-Accent-Restorer
This project applies multiple deep learning models to the problem of restoring diacritical marks to sentences in Vietnamese.
Stars: ✭ 23 (-50%)
Mutual labels:  tensorflow-experiments
stock-volatility-google-trends
Deep Learning Stock Volatility with Google Domestic Trends: https://arxiv.org/pdf/1512.04916.pdf
Stars: ✭ 74 (+60.87%)
Mutual labels:  tensorflow-experiments
deeper-traffic-lights
[repo not maintained] Check out https://diffgram.com if you want to build a visual intelligence
Stars: ✭ 91 (+97.83%)
Mutual labels:  tensorflow-experiments
ultimatevocalremovergui
GUI for a Vocal Remover that uses Deep Neural Networks.
Stars: ✭ 370 (+704.35%)
Mutual labels:  spectrogram
spectrogram
Taking an audio signal (wav) and converting it into a spectrogram. Written in Go programming language.
Stars: ✭ 34 (-26.09%)
Mutual labels:  spectrogram
char-VAE
Inspired by the neural style algorithm in the computer vision field, we propose a high-level language model with the aim of adapting the linguistic style.
Stars: ✭ 18 (-60.87%)
Mutual labels:  tensorflow-experiments

AudioNet

This project is only tested on Ubuntu 16.04

AudioNet is an opensource experiment done using tensorflow and Google's Inception model. What we are doing here in AudioNet is, we are converting audio files to spectrograms. Then we are trainning the model with spectrograms of audio files. Instead of inventing something new we are trying to make use of what is available and yes it was a successfull experiment. Application of audio classification is limitless, speaker recognition, speech recognition are few of them. We will see how to create a speaker recognition using Inception.

Speaker recognition

Before we begin we need data to train the model. Just for our experiment we can download any speech of great people from youtube as mp3. I have written a script to convert mp3's to wav files and then to process the wav file to make spectrogram out of it. To continnue with this experiment ensure this file is downloaded and extracted and kept in same folder where scripts folder you have downloaded. If you are using your mp3 files please check whether it is dual channel or not. If not please convert it to dual channel by using sox command in terminal.

$ sox testmono.mp3 -c testdual.mp3

Steps in our experiment:

  • Data preparation
  • Training the model
  • Testing the model

Data preparation

In this step, first thing you have to do is to make seperate folders for each speaker and name the folder with speakers name. For example, if you have voice clips of Barrack Obama and APJ Abdul Kalam, then you have to make seperate folder for each person, one for obama and one for kalam. Then you have to put voice clips of each person in respective folders. And the voice clips should be in mp3. It will be better if the total time duration of all voice clips in a folder is same with all other folders. Once you have different folders for speakers, put that folder inside data_audio folder in tf_files folder.

Now we are good to run the data_maker.py script in scripts folder. Open up a terminal in scripts folder and enter python data_maker.py

$ cd scripts
$ python data_maker.py

For running this script successfully you should have below packages installed in your machine.

  • sox
  • libsox-fmt-mp3
  • ffmpeg
  • python-tk

After running the script successfully you just go to each folders of speakers inside tf_files/data_audio/ , you might be able to observe the voice clips which where in mp3 have been converted to wav files and the wav files have been divided into 20 seconds chunks and for each chunks there is a spectrogram jpg image. This is our training data. If you want you can go to the data_maker script and change the time duration of chunks.

Training

As I already mentioned, we are using Google's Inception model. Run below commands to start training.

$ cd scripts
$ IMAGE_SIZE=224
$ ARCHITECTURE="inception_v3"
$ python retrain.py   --bottleneck_dir=../tf_files/bottlenecks  \
        --how_many_training_steps=500  \
        --model_dir=../tf_files/models/   \
        --summaries_dir=../tf_files/training_summaries/"${ARCHITECTURE}"   \
        --output_graph=../tf_files/retrained_graph.pb   \
        --output_labels=../tf_files/retrained_labels.txt   \
        --architecture="${ARCHITECTURE}"   \
        --image_dir=../tf_files/data_audio

You can increase the number of training steps if you want.

Testing

Get a voice clip of the speaker and generate spectrogram of his voice using data_maker.py script or Audio2Spectrogram. Then try testing the model by running below commands

$ cd scripts
$ python label_image.py \
    --graph=../tf_files/retrained_graph.pb  \
    --labels=../tf_files/retrained_labels.txt  \
    --image=../path/to/generated/spectrogram.jpg

Watch the video..

Audionet

Queries??

For any queries shoot a mail at [email protected]. Visit my blog too..

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].