vishalshar / SpeakerDiarization_RNN_CNN_LSTM

Licence: other

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SpeakerDiarization RNN CNN LSTM

tiny-rnn

Lightweight C++11 library for building deep recurrent neural networks

Stars: ✭ 41 (-26.79%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Deepseqslam

The Official Deep Learning Framework for Route-based Place Recognition

Stars: ✭ 49 (-12.5%)

Mutual labels: recurrent-neural-networks, lstm, rnn

sgrnn

Tensorflow implementation of Synthetic Gradient for RNN (LSTM)

Stars: ✭ 40 (-28.57%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Lstm Human Activity Recognition

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

Stars: ✭ 2,943 (+5155.36%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Linear Attention Recurrent Neural Network

A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Stars: ✭ 119 (+112.5%)

Mutual labels: recurrent-neural-networks, lstm, rnn

automatic-personality-prediction

[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings

Stars: ✭ 43 (-23.21%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Rnnsharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

Stars: ✭ 277 (+394.64%)

Mutual labels: recurrent-neural-networks, lstm, rnn

sequence-rnn-py

Sequence analyzing using Recurrent Neural Networks (RNN) based on Keras

Stars: ✭ 28 (-50%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Pytorch Learners Tutorial

PyTorch tutorial for learners

Stars: ✭ 97 (+73.21%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (+71.43%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Bitcoin Price Prediction Using Lstm

Bitcoin price Prediction ( Time Series ) using LSTM Recurrent neural network

Stars: ✭ 67 (+19.64%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Rnn ctc

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

Stars: ✭ 220 (+292.86%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Pytorch Kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Stars: ✭ 2,097 (+3644.64%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+5630.36%)

Mutual labels: recurrent-neural-networks, lstm, rnn

Human-Activity-Recognition

Human activity recognition using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six categories (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING).

Stars: ✭ 16 (-71.43%)

Mutual labels: recurrent-neural-networks, rnn

Probabilistic-RNN-DA-Classifier

Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model

Stars: ✭ 22 (-60.71%)

Mutual labels: recurrent-neural-networks, rnn

lstm har

LSTM based human activity recognition using smart phone sensor dataset

Stars: ✭ 20 (-64.29%)

Mutual labels: lstm, rnn

DrowsyDriverDetection

This is a project implementing Computer Vision and Deep Learning concepts to detect drowsiness of a driver and sound an alarm if drowsy.

Stars: ✭ 82 (+46.43%)

Mutual labels: lstm, rnn

ACT

Alternative approach for Adaptive Computation Time in TensorFlow

Stars: ✭ 16 (-71.43%)

Mutual labels: recurrent-neural-networks, rnn

dltf

Hands-on in-person workshop for Deep Learning with TensorFlow

Stars: ✭ 14 (-75%)

Mutual labels: lstm, rnn

View All Similar Projects ➔

Citation

If you find our project helpful please cite our arxiv report below:

@misc{sharma2020speaker,
    title={Speaker Diarization: Using Recurrent Neural Networks},
    author={Vishal Sharma and Zekun Zhang and Zachary Neubert and Curtis Dyreson},
    year={2020},
    eprint={2006.05596},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

SpeakerDiarization

Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2 speakers (on separate channel). We train Neural Network for learning when a person is speaking. We use different type of Neural Networks specifically, Single Layer Perceptron (SLP), Multi Layer Perceptron (MLP), Recurrent Neural Network (RNN) and Convolution Neural Network (CNN) we achieve 92% of accuracy with RNN.

Data

Data used in the process cannot be shared because of privacy concerns but if you need to test this code I can provide one sample data to try this code and test. Please email me for the sample data.

Dataset Description

Our dataset contains 37 audio files approximately of 15 minutes each with sampling rate of 44100 samples/second, recorded in 2 channels with exactly 2 speakers on 2 different microphones. Each audio file has been hand annotated for speakers timings. Annotating timing (in seconds) they start and stop speaking. We use this dataset and split in 3 parts for training, validation and testing.

Preprocessing

Data Normalization

We perform normalization of audio files after observing recorded audio was not in the same scale. Few audio files were louder than others and normalization can help bring all audio files to same scale.

Sampling Audio

With frame rate being high, we have a lot of data. To give an example, in a 15 min audio file we get about 40M samples in each channel. To reduce data without loosing much information, we down sample audio files by every 4 sample.

Cleaning Labels

Provided labels needed some cleaning described below:

Names of the speakers was not consistent throughout the data file, we cleaned it and made sure name is consistent.
File also contained unicode, which needed to be cleaned. Python goes crazy with unicodes lol
There were miss alignments as well in the data and needed to be removed and fixed.

Approach

Multi-layer Perceptron (MLP)

We start with a basic single layer perceptron model. We implement 3 different models with hidden layer of different sizes 100, 200, 500 neurons. We achieve approximately 86% accuracy. We next move to multi-layer perceptron model and try models with 2 layers deep. First layer had 100 and second 50 neurons and another with higher number of neurons (First Layer: 200, Second Layer: 100) (First Layer: 300, Second Layer: 50). For all the networks used in this project, the hidden neurons are ReLu \cite{relu} and the output neuron are sigmoid. The cost function used is cross entropy and mini-batch gradient descent with Adam optimization is used to train network.

Code for MLP is in file MLP_1201_2.py

Recurrent Neural Network (RNN)

Next we try Recurrent Neural Network on the classification problem. The RNN gives us the best result with 3 layers each with 150 Long short-term memory (LSTM) cells. The LSTM in the graph means a LSTM layer which consists of 150 LSTM cells. The output only has one neuron with sigmoid to predict 0 or 1.

Convolution Neural Network (CNN)

To apply CNN, we at first compute the spectrogram for each row of the data matrix, then store them into a new file by using pickle. In this way we don’t need to compute spectrogram online and hence can save a lot of training time. Function scipy. signal.spectrogram is used to compute the spectrogram for each segment. The recomputed spectrogram of each segment then is organized to a 3 dimension matrix with shape (number of segments, height, width). For example, the down sampled data matrix of a channel returned by get data has the shape (100, 1102) for a channel with 100 segments, then the shape of recomputed spectrogram matrix is (100,129,4). The number of segments remains the same. The height 129 and width 4 come from using the default parameters of function scipy. signal.spectrogram. Spectrogram matrices are computed and stored by using code in Spectrogram Generator.

Results

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

vishalshar / SpeakerDiarization_RNN_CNN_LSTM

Programming Languages

Labels

Projects that are alternatives of or similar to SpeakerDiarization RNN CNN LSTM

Citation

SpeakerDiarization

Data

Dataset Description

Preprocessing

Data Normalization

Sampling Audio

Cleaning Labels

Approach

Multi-layer Perceptron (MLP)

Recurrent Neural Network (RNN)

Convolution Neural Network (CNN)

Results