All Projects → tomasz-oponowicz → Spoken_language_identification

tomasz-oponowicz / Spoken_language_identification

Licence: mit
Identify a spoken language using artificial intelligence (LID).

Programming Languages

python
139335 projects - #7 most used programming language
languages
34 projects

Projects that are alternatives of or similar to Spoken language identification

Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-58.77%)
Mutual labels:  artificial-intelligence, audio
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+570.18%)
Mutual labels:  artificial-intelligence, audio
Poddycast
Podcast app made with Electron, lots of ❤️ and ☕️
Stars: ✭ 111 (-2.63%)
Mutual labels:  audio
Skqw
JavaScript Audio Visualizer
Stars: ✭ 112 (-1.75%)
Mutual labels:  audio
Simple Sdl2 Audio
A simple SDL2 audio library without SDL_Mixer for playing music and multiple sounds natively in SDL2
Stars: ✭ 111 (-2.63%)
Mutual labels:  audio
Sbplayerclient
支持全格式的mac版视频播放器
Stars: ✭ 110 (-3.51%)
Mutual labels:  audio
Bootcamp machine Learning
Bootcamp to learn basics in Machine Learning
Stars: ✭ 113 (-0.88%)
Mutual labels:  artificial-intelligence
Lanegcn
[ECCV2020 Oral] Learning Lane Graph Representations for Motion Forecasting
Stars: ✭ 110 (-3.51%)
Mutual labels:  artificial-intelligence
Arc
The Abstraction and Reasoning Corpus
Stars: ✭ 1,598 (+1301.75%)
Mutual labels:  artificial-intelligence
Blazingsql
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
Stars: ✭ 1,652 (+1349.12%)
Mutual labels:  artificial-intelligence
Owl Bt
owl-bt is editor for Behavior trees. It has been inspired by Unreal engine behavior trees in a way, that it supports special node items like decorators and services. This makes trees smaller and much more readable.
Stars: ✭ 112 (-1.75%)
Mutual labels:  artificial-intelligence
Miniaudio
Single file audio playback and capture library written in C.
Stars: ✭ 1,889 (+1557.02%)
Mutual labels:  audio
Awesome Causal Inference
A (concise) curated list of awesome Causal Inference resources.
Stars: ✭ 107 (-6.14%)
Mutual labels:  artificial-intelligence
Lightnn
The light deep learning framework for study and for fun. Join us!
Stars: ✭ 112 (-1.75%)
Mutual labels:  artificial-intelligence
Spotspot
A Spotify mini-player for macOS
Stars: ✭ 110 (-3.51%)
Mutual labels:  audio
Xaynet
Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.
Stars: ✭ 111 (-2.63%)
Mutual labels:  artificial-intelligence
Densepoint
DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing (ICCV 2019)
Stars: ✭ 110 (-3.51%)
Mutual labels:  artificial-intelligence
Tetris
👾 The original TETRIS game simulator
Stars: ✭ 111 (-2.63%)
Mutual labels:  audio
Bepasty Server
binary pastebin server
Stars: ✭ 111 (-2.63%)
Mutual labels:  audio
Pd Mkmr
A compilation of vanilla made abstractions
Stars: ✭ 112 (-1.75%)
Mutual labels:  audio

spoken language identification

Build Status

Identify a spoken language using artificial intelligence (LID). The solution uses the convolutional neural network in order to detect language specific phonemes. It supports 3 languages: English, German and Spanish. The inspiration for the project came from the TopCoder contest, Spoken Languages 2.

Take a look at the Demo section to try the project yourself against real life content.

Dataset

New dataset was created from scratch.

LibriVox recordings were used to prepare the dataset. Particular attention was paid to a big variety of unique speakers. Big variance forces the network to concentrate more on language properties than a specific voice. Samples are equally balanced between languages, genders and speakers in order not to favour any subgroup. Finally speakers present in the test set, are not present in the train set. This helps estimate a generalization error.

More information at tomasz-oponowicz/spoken_language_dataset.

Architecture

The first step is to normalize input audio. Each sample is an FLAC audio file with:

  • sample rate: 22050
  • bit depth: 16
  • channels: 1
  • duration: 10 seconds (sharp)

Next filter banks are extracted from samples. Mean and variance normalization is applied. Then data is scaled with the Min/Max scaler.

Finally preprocessed data is passed to the convolutional neural network. Please notice the AveragePooling2D layer which improved the performance. This strategy is called global average pooling. It effectively forces the previous layers to produce the confidence maps.

The output is multiclass.

Performance

The score against the test set (out-of-sample) is 97% (F1 metric). Additionally the network generalizes well and presents high score against real life content, for example podcasts or TV news.

Sound effects or languages other than English, German or Spanish may be badly classified. If you want to work with noisy audio consider filtering noise out beforehand.

Demo

Prerequisites

  • docker is installed (tested with 18.03.0)

Steps

  1. Create a temporary directory and change the current directory:

    $ mkdir examples && cd $_
    
  2. Download samples:

    NOTE: An audio file should contain speech and silence only. For example podcasts, interviews or audiobooks are a good fit. Sound effects or languages other than English, German or Spanish may be badly classified.

    • English (confidence 85.36%):

      $ wget "https://javascriptair.podbean.com/mf/player-preload/nkdkps/048_JavaScript_Air_-_JavaScript_and_the_Web_Platform_The_Grand_Finale_.mp3" -O en.mp3
      
    • German (confidence 85.53%):

      $ wget "http://mp3-download.ard.de/radio/radiofeature/auf-die-fresse-xa9c.l.mp3" -O de.mp3
      
    • Spanish (confidence 86.96%):

      $ wget "http://mvod.lvlt.rtve.es/resources/TE_SCINCOC/mp3/2/8/1526585716282.mp3" -O es.mp3
      

    ...other examples of real life content are listed in the EXAMPLES.md.

  3. Build the docker image:

    $ docker build -t sli --rm https://github.com/tomasz-oponowicz/spoken_language_identification.git
    
  4. Mount the examples directory and classify an audio file, for example:

    $ docker run --rm -it -v $(pwd):/data sli /data/en.mp3
    

    ...there are several options available through command line. For example you can tweak the noise reducer by increasing or decreasing the silence-threshold (0.5 by default):

    $ docker run --rm -it -v $(pwd):/data sli --silence-threshold=1 /data/es.mp3
    

Train

Prerequisites

  • ffmpeg is installed (tested with 3.4.2)
  • sox is installed (tested with 14.4.2)
  • docker is installed (tested with 18.03.0)

Steps

  1. Clone the repository:

    $ git clone [email protected]:tomasz-oponowicz/spoken_language_identification.git
    
  2. Go to the newly created directory:

    $ cd spoken_language_identification
    
  3. Generate samples:

    1. Fetch the spoken_language_dataset dataset:

      $ git submodule update --init --recursive
      
    2. Go to the dataset directory:

      $ cd spoken_language_dataset
      
    3. Generate samples:

      NOTE: Alternatively you can download the pregenerated dataset. Depending on your hardware it can save you 1-2 hours. After downloading, extract contents into build/train and build/test directories.

      $ make build
      
    4. Fix file permission of newly generated samples:

      $ make fix_permissions
      
    5. Return to the spoken_language_identification directory

      $ cd ..
      
  4. Install dependencies

    $ pip install -r requirements.txt
    

    ...the tensorflow package is installed by default (i.e. CPU support only). In order to speed up the training, install the tensorflow-gpu package instead (i.e. GPU support). More information at Installing TensorFlow.

  5. Generate features from samples:

    $ python features.py
    
  6. Normalize features and build folds:

    $ python folds.py
    
  7. Train the model:

    $ python model.py
    

    ...new model is stored at model.h5.

Release history

  • 2018-07-06 / v1.0 / Initial version
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].