All Projects → philayres → babble-rnn

philayres / babble-rnn

Licence: Apache-2.0 license
babble-rnn is a research project in the use of machine learning to generate new speech by modelling human speech audio, without any intermediate text or word representations. The idea is to learn to speak through imitation, much like a baby might.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to babble-rnn

DeepLearningCode
深度学习相关代码
Stars: ✭ 21 (-38.24%)
Mutual labels:  keras-models, keras-neural-networks
GTAV-Self-driving-car
Self driving car in GTAV with Deep Learning
Stars: ✭ 15 (-55.88%)
Mutual labels:  keras-models, keras-neural-networks
dl-relu
Deep Learning using Rectified Linear Units (ReLU)
Stars: ✭ 20 (-41.18%)
Mutual labels:  keras-models, keras-neural-networks
Deep-learning-model-deploy-with-django
Serving a keras model (neural networks) in a website with the python Django-REST framework.
Stars: ✭ 76 (+123.53%)
Mutual labels:  keras-models, keras-neural-networks
pytorch2keras
PyTorch to Keras model convertor
Stars: ✭ 788 (+2217.65%)
Mutual labels:  keras-models, keras-neural-networks
emusic net
Neural network to classify certain styles of Electronic music
Stars: ✭ 22 (-35.29%)
Mutual labels:  keras-models
Final-year-project-deep-learning-models
Deep learning for freehand sketch object recognition
Stars: ✭ 22 (-35.29%)
Mutual labels:  keras-neural-networks
Facial emotion recognition using Keras
I have used FER2013 dataset and try to build the Facial emotion recognition using Keras
Stars: ✭ 16 (-52.94%)
Mutual labels:  keras-models
Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (+441.18%)
Mutual labels:  keras-models
Keras4Delphi
Keras4Delphi is a high-level neural networks API, written in Pascal with Python Binding
Stars: ✭ 37 (+8.82%)
Mutual labels:  keras-neural-networks
open-solution-cdiscount-starter
Open solution to the Cdiscount’s Image Classification Challenge
Stars: ✭ 20 (-41.18%)
Mutual labels:  keras-models
MI-MVI 2016
Semestral project for the subject Methods of computational inteligence @ fit.cvut.cz
Stars: ✭ 24 (-29.41%)
Mutual labels:  keras-neural-networks
Recurrent-Neural-Network-for-BitCoin-price-prediction
Recurrent Neural Network (LSTM) by using TensorFlow and Keras in Python for BitCoin price prediction
Stars: ✭ 53 (+55.88%)
Mutual labels:  keras-neural-networks
AI-Chatbot
AI Chatbot using Dynamic Memory Network in Keras.
Stars: ✭ 64 (+88.24%)
Mutual labels:  keras-neural-networks
vrn-torch-to-keras
Transfer pre-trained VRN model from torch to Keras/Tensorflow
Stars: ✭ 63 (+85.29%)
Mutual labels:  keras-models
Video Classification Cnn And Lstm
To classify video into various classes using keras library with tensorflow as back-end.
Stars: ✭ 218 (+541.18%)
Mutual labels:  keras-models
keras-aquarium
a small collection of models implemented in keras, including matrix factorization(recommendation system), topic modeling, text classification, etc. Runs on tensorflow.
Stars: ✭ 14 (-58.82%)
Mutual labels:  keras-models
cartpole-rl-remote
CartPole game by Reinforcement Learning, a journey from training to inference
Stars: ✭ 24 (-29.41%)
Mutual labels:  keras-neural-networks
Machine-Learning-Notebooks
15+ Machine/Deep Learning Projects in Ipython Notebooks
Stars: ✭ 66 (+94.12%)
Mutual labels:  keras-neural-networks
Artificial-Neural-Networks-Visualizer
Visualizing Artificial Neural Networks (ANNs) with just One Line of Code
Stars: ✭ 21 (-38.24%)
Mutual labels:  keras-neural-networks

babble-rnn: Generating speech from speech with LSTM networks

babble-rnn is a research project in the use of machine learning to generate new speech by modelling human speech audio, without any intermediate text or word representations. The idea is to learn to speak through imitation, much like a baby might. The goal is to generate a babbling audio output that emulates the speech patterns of the original speaker, ideally incorporating real words into the output.

The implementation is based on Keras / Theano, generating an LSTM RNN; and Codec 2, an open source speech audio compression algorithm. The resulting models have learned the most common audio sequences of a 'performer', and can generate a probable babbling audio sequence when provided a seed sequence.

Read the babble-rnn tech post

View the babble-rnn code on Github

Wondering what babble-rnn can do? Listen to the latest babble produced by the experiments since the original tech report:

play the audio

This babbler is a stack of 11 bidirectional LSTMs, attempting to learning an encoded sequence of data (frame of 13 normalized parameters, representing 20ms of audio). Groups of LSTMs are trained together, while keeping others locked, to limit the complexity of learning such a deep network.

The audio itself is highly compressed through the Codec 2 (see the original tech post for details) producing a 3200 bit per second stream of frequency, energy, sinusoidal and voicing parameters. An autoencoder learns the features of this against a particular human speaker, to compress the output further. The encoder stage is a mix of 2D convolutional layers, picking features from the Codec 2 data over short time sequences, runnning in parallel with a series of standard hidden layers (to provide a compressed stream that helps feed through some of the original input), before being merged into a single encoded output at a quarter of the rate of the original Codec 2 input (80ms audio per frame, although more parameters output than the original).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].