All Projects → py-lidbox → lidbox

py-lidbox / lidbox

Licence: MIT license
End-to-end spoken language identification out of the box.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to lidbox

lingua-go
👄 The most accurate natural language detection library for Go, suitable for long and short text alike
Stars: ✭ 684 (+1653.85%)
Mutual labels:  language-recognition, language-identification
audio noise clustering
https://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Stars: ✭ 24 (-38.46%)
Mutual labels:  speech, audio-analysis
Inaspeechsegmenter
CNN-based audio segmentation toolkit. Allows to detect speech, music and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Stars: ✭ 352 (+802.56%)
Mutual labels:  speech, audio-analysis
ytpriv
YT metadata exporter
Stars: ✭ 28 (-28.21%)
Mutual labels:  big-data
sgd
An R package for large scale estimation with stochastic gradient descent
Stars: ✭ 55 (+41.03%)
Mutual labels:  big-data
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (+0%)
Mutual labels:  big-data
SignDetect
This application is developed to help speechless people interact with others with ease. It detects voice and converts the input speech into a sign language based video.
Stars: ✭ 21 (-46.15%)
Mutual labels:  speech
eidos-audition
Collection of auditory models.
Stars: ✭ 25 (-35.9%)
Mutual labels:  speech
merkle-db
High-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (+12.82%)
Mutual labels:  big-data
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+38.46%)
Mutual labels:  big-data
dialectID siam
Dialect identification using Siamese network
Stars: ✭ 15 (-61.54%)
Mutual labels:  language-recognition
Quantitative-Big-Imaging-2018
(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (+28.21%)
Mutual labels:  big-data
leetspeek
Open and collaborative content from leet hackers!
Stars: ✭ 11 (-71.79%)
Mutual labels:  big-data
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-48.72%)
Mutual labels:  big-data
awesome-tools
curated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-20.51%)
Mutual labels:  big-data
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+223.08%)
Mutual labels:  big-data
javaer-mind
Java 程序员进阶学习的思维导图
Stars: ✭ 66 (+69.23%)
Mutual labels:  big-data
phoenix-queryserver
Apache Phoenix Query Server
Stars: ✭ 33 (-15.38%)
Mutual labels:  big-data
couchdb-pkg
Apache CouchDB Packaging support files
Stars: ✭ 24 (-38.46%)
Mutual labels:  big-data
da-tacos
A Dataset for Cover Song Identification and Understanding
Stars: ✭ 50 (+28.21%)
Mutual labels:  audio-analysis

lidbox

  • Spoken language identification (LId) out of the box using TensorFlow.
  • Models implemented with tf.keras.
  • Metadata handling with pandas DataFrames.
  • High-performance, parallel preprocessing pipelines with tf.data
  • Simple spectral and cepstral feature extraction on the GPU with tf.signal.
  • Average detection cost (C_avg) implemented as a tf.keras.metrics.Metric subclass.
  • Angular proximity loss implemented as a tf.keras.losses.Loss subclass.

Why would I want to use this?

  • You need a simple, deep learning based speech classification pipeline. For example: waveform -> VAD filter -> augment audio data -> serialize all data to a single binary file -> extract log-scale Mel-spectra or MFCC -> use DNN/CNN/LSTM/GRU/attention (etc.) to classify by signal labels
  • You want to train a language vector/embedding extractor model (e.g. x-vector) on large amounts of data.
  • You have a TensorFlow/Keras model that you train on the GPU and want the tf.data.Dataset extraction pipeline to also be on the GPU
  • You want an end-to-end pipeline that uses TensorFlow 2 as much as possible

Why would I not want to use this?

  • You are happy doing everything with Kaldi or some other toolkits
  • You don't want to debug by reading the source code when something goes wrong
  • You don't want to install TensorFlow 2 and configure its dependencies (CUDA etc.)
  • You want to train phoneme recognizers or use CTC

Examples

Installing

Python 3.7 or 3.8 is required.

From source

python3 -m pip install https://github.com/py-lidbox/lidbox/archive/master.zip

Most recent version from PyPI

python3 -m pip install 'lidbox==1.0.0rc0'

TensorFlow

TensorFlow 2 is not included in the package requirements because you might want to do custom configuration to get the GPU working etc.

If you don't want to customize anything and instead prefer something that just works for now, the following should be enough:

python3 -m pip install tensorflow

Editable install

If you plan on making changes to the code, it is easier to install lidbox as a Python package in setuptools develop mode:

git clone --depth 1 https://github.com/py-lidbox/lidbox.git
python3 -m pip install --editable ./lidbox

Then, if you make changes to the code, there's no need to reinstall the package since the changes are reflected immediately. Just be careful not to make changes when lidbox is running, because TensorFlow will use its autograph package to convert some of the Python functions to TF graphs, which might fail if the code changes suddenly.

Citing lidbox

@inproceedings{Lindgren2020,
    author={Matias Lindgren and Tommi Jauhiainen and Mikko Kurimo},
    title={{Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets}},
    year=2020,
    booktitle={Proc. Interspeech 2020},
    pages={467--471},
    doi={10.21437/Interspeech.2020-2706},
    url={http://dx.doi.org/10.21437/Interspeech.2020-2706}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].