All Projects → lhotse-speech → Lhotse

lhotse-speech / Lhotse

Licence: apache-2.0

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Lhotse

Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+4625%)
Mutual labels:  speech, kaldi
React Native Quiet
🤫 Quiet for React Native.
Stars: ✭ 158 (-33.05%)
Mutual labels:  data, audio
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (-45.76%)
Mutual labels:  data, speech
Audiomate
Python library for handling audio datasets.
Stars: ✭ 99 (-58.05%)
Mutual labels:  speech, audio
100daysofcode
#100DaysOfCode - Learn by developing 100 unique apps to explore exciting tech stacks
Stars: ✭ 196 (-16.95%)
Mutual labels:  ai, data
Server Tech Tree
服务端软件技术树:服务端主流技术九大分类和全景图
Stars: ✭ 106 (-55.08%)
Mutual labels:  ai, data
Aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Stars: ✭ 1,942 (+722.88%)
Mutual labels:  speech, audio
Sound Source Localization Algorithm doa estimation
关于语音信号声源定位DOA估计所用的一些传统算法
Stars: ✭ 58 (-75.42%)
Mutual labels:  speech, audio
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (-18.64%)
Mutual labels:  ai, data
Emotion Classification From Audio Files
Understanding emotions from audio files using neural networks and multiple datasets.
Stars: ✭ 189 (-19.92%)
Mutual labels:  speech, audio
Audio
Data manipulation and transformation for audio signal processing, powered by PyTorch
Stars: ✭ 1,262 (+434.75%)
Mutual labels:  speech, audio
Source separation
Deep learning based speech source separation using Pytorch
Stars: ✭ 226 (-4.24%)
Mutual labels:  speech, audio
Cross Adaptive Audio
Evolving Artificial Neural Networks for Cross-Adaptive Audio Effects
Stars: ✭ 82 (-65.25%)
Mutual labels:  ai, audio
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-47.46%)
Mutual labels:  speech, kaldi
Korea Startups
🌟 국내 스타트업 목록 및 설명 🌟
Stars: ✭ 63 (-73.31%)
Mutual labels:  ai, data
Audioowl
Fast and simple music and audio analysis using RNN in Python 🕵️‍♀️ 🥁
Stars: ✭ 151 (-36.02%)
Mutual labels:  data, audio
Soloud
Free, easy, portable audio engine for games
Stars: ✭ 1,048 (+344.07%)
Mutual labels:  speech, audio
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+355.93%)
Mutual labels:  ai, data
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+788.56%)
Mutual labels:  speech, kaldi
Speech Denoiser
A speech denoise lv2 plugin based on RNNoise library
Stars: ✭ 220 (-6.78%)
Mutual labels:  speech, audio

PyPI Status Python Versions PyPI Status Build Status Documentation Status codecov Open In Colab

Lhotse

Lhotse is a Python library aiming to make speech and audio data preparation flexible and accessible to a wider community. Alongside k2, it is a part of the next generation Kaldi speech processing library.

⚠️ Lhotse is not fully stable yet - while many features are already implemented, the APIs are still subject to change! ⚠️

About

Main goals

  • Attract a wider community to speech processing tasks with a Python-centric design.
  • Accommodate experienced Kaldi users with an expressive command-line interface.
  • Provide standard data preparation recipes for commonly used corpora.
  • Provide PyTorch Dataset classes for speech and audio related tasks.
  • Flexible data preparation for model training with the notion of audio cuts.
  • Efficiency, especially in terms of I/O bandwidth and storage capacity.

Main ideas

Like Kaldi, Lhotse provides standard data preparation recipes, but extends that with a seamless PyTorch integration through task-specific Dataset classes. The data and meta-data are represented in human-readable text manifests and exposed to the user through convenient Python classes.

image

Lhotse introduces the notion of audio cuts, designed to ease the training data construction with operations such as mixing, truncation and padding that are performed on-the-fly to minimize the amount of storage required. Data augmentation and feature extraction are supported both in pre-computed mode, with highly-compressed feature matrices stored on disk, and on-the-fly mode that computes the transformations upon request. Additionally, Lhotse introduces feature-space cut mixing to make the best of both worlds.

image

Installation

Lhotse supports Python version 3.6 and later.

Pip

Lhotse is available on PyPI:

pip install lhotse

To install the latest, unreleased version, do:

pip install git+https://github.com/lhotse-speech/lhotse

Development installation

For development installation, you can fork/clone the GitHub repo and install with pip:

git clone https://github.com/lhotse-speech/lhotse
cd lhotse
pip install -e '.[dev]'

# Running unit tests
pytest test

This is an editable installation (-e option), meaning that your changes to the source code are automatically reflected when importing lhotse (no re-install needed). The [dev] part means you're installing extra dependencies that are used to run tests, build documentation or launch jupyter notebooks.

Examples

We have example recipes showing how to prepare data and load it in Python as a PyTorch Dataset. They are located in the examples directory.

A short snippet to show how Lhotse can make audio data prepartion quick and easy:

from torch.utils.data import DataLoader
from lhotse import CutSet, Fbank
from lhotse.dataset import VadDataset, SingleCutSampler
from lhotse.recipes import prepare_switchboard

# Prepare data manifests from a raw corpus distribution.
# The RecordingSet describes the metadata about audio recordings;
# the sampling rate, number of channels, duration, etc.
# The SupervisionSet describes metadata about supervision segments:
# the transcript, speaker, language, and so on.
swbd = prepare_switchboard('/export/corpora3/LDC/LDC97S62')

# CutSet is the workhorse of Lhotse, allowing for flexible data manipulation.
# We create 5-second cuts by traversing SWBD recordings in windows.
# No audio data is actually loaded into memory or stored to disk at this point.  
cuts = CutSet.from_manifests(
    recordings=swbd['recordings'],
    supervisions=swbd['supervisions']
).cut_into_windows(duration=5)

# We compute the log-Mel filter energies and store them on disk;
# Then, we pad the cuts to 5 seconds to ensure all cuts are of equal length,
# as the last window in each recording might have a shorter duration.
# The padding will be performed once the features are loaded into memory.
cuts = cuts.compute_and_store_features(
    extractor=Fbank(),
    storage_path='feats',
    num_jobs=8
).pad(duration=5.0)

# Construct a Pytorch Dataset class for Voice Activity Detection task:
dataset = VadDataset(cuts)
sampler = SingleCutSampler(cuts)
dataloader = DataLoader(dataset, sampler=sampler, batch_size=None)
batch = next(iter(dataloader))

The VadDataset will yield a batch with pairs of feature and supervision tensors such as the following - the speech starts roughly at the first second (100 frames):

image

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].