All Projects → markovka17 → Dla

markovka17 / Dla

Licence: mit
Deep learning for audio processing

Projects that are alternatives of or similar to Dla

Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+281.69%)
Mutual labels:  speech-recognition, tts
Itri Speech Recognition Dataset Generation
Automatic Speech Recognition Dataset Generation
Stars: ✭ 32 (-77.46%)
Mutual labels:  jupyter-notebook, speech-recognition
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+345.77%)
Mutual labels:  jupyter-notebook, speech-recognition
Tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (+114.79%)
Mutual labels:  jupyter-notebook, tts
Speech Emotion Recognition
Detecting emotions using MFCC features of human speech using Deep Learning
Stars: ✭ 89 (-37.32%)
Mutual labels:  jupyter-notebook, speech-recognition
Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (+176.06%)
Mutual labels:  jupyter-notebook, speech-recognition
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+438.03%)
Mutual labels:  speech-recognition, signal-processing
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+492.25%)
Mutual labels:  tts, speech-recognition
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-63.38%)
Mutual labels:  jupyter-notebook, tts
Parrots
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine for Chinese.
Stars: ✭ 48 (-66.2%)
Mutual labels:  speech-recognition, tts
Tts
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+3721.83%)
Mutual labels:  jupyter-notebook, tts
Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (-14.08%)
Mutual labels:  jupyter-notebook, tts
Audio Spectrum Analyzer In Python
A series of Jupyter notebooks and python files which stream audio from a microphone using pyaudio, then processes it.
Stars: ✭ 273 (+92.25%)
Mutual labels:  jupyter-notebook, signal-processing
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+267.61%)
Mutual labels:  jupyter-notebook, speech-recognition
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-63.38%)
Mutual labels:  tts, speech-recognition
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+380.28%)
Mutual labels:  jupyter-notebook, tts
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (-37.32%)
Mutual labels:  tts, speech-recognition
Chinese-automatic-speech-recognition
Chinese speech recognition
Stars: ✭ 147 (+3.52%)
Mutual labels:  signal-processing, speech-recognition
Py Nltools
A collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-67.61%)
Mutual labels:  speech-recognition, tts
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-27.46%)
Mutual labels:  speech-recognition, tts

logo5v1

Deep Learning for Audio (DLA)

  • Lecture and seminar materials for each week are in ./week* folders, see README.md for materials and instructions
  • Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
  • The current version of the course is conducted in autumn 2020 at the CS Faculty of HSE

Syllabus

  • week01 Introduction to Digital Signal Processing

    • Lecture: Signals, Fourier transform, Spectrograms, MFCC and etc
    • Seminar: Intro in PyTorch, DevOps, R&D in Deep Learning
  • week02 Automatic Speech Recognition I

    • Lecture: Metrics, Attention, LAS, CTC, BeamSearch
    • Seminar: Docker, W&B, Augmentations for Audio
  • week03 Automatic Speech Recognition II

    • Lecture: LM Fusing, RNN Transducer, Schedule Sampling, BPE
    • Seminar: Jasper, QurtzNet, Mixed Precision Training, DDP/DP
  • week04 Key-word spottind (KWS) and Voice Activity Detection (VAD)

    • Lecture: (DNN, CNN, RNN+Attention) based KWS, SVDF, Orthogonality Regularization and other Tricks
    • Seminar: Speeding Up NNs: Tensor Decomposition, Quantization, Pruning, Distilation and Architecture Design
  • week05 Speaker verification and identification

    • Lecture: Metric Learning: Cosine, Contrastive, Triplet Losses. Angular Softmax. ArcFace
    • Seminar: Generalized End2End Loss for Speaker Verification
  • week06 Text to Speech

    • Lecture: Tacotron, DeepVoice, GST, FastSpeech, Attention Tricks
    • Seminar: Location-Sensitive Attention
  • week07 Neural Vocoders

    • Lecture: Introduction into generative models: AR, GAN, NF. WaveNet, ParallelWaveNet, WaveGlow, WaveFlow, MelGAN, PWG.
  • week08 Voice Conversion

    • Lecture: AutoVC, ConVoice, TTS Skins, StarGAN-VC-1-2, CycleGAN-1-2-3, Blow
  • week09 Music Generation

    • Lecture: VQVAE, Sparse Transformer, MuseNet, JukeBox
  • week10 Speech Enhancement, Denoising and Speaker Diarization

    • Lecture: SEGAN, TF Masking, HiFi Denoising, Speaker Diarization, VAD
  • week11 Self-supervision in Audio and Speech

    • Lecture: Intro to SS Learning. InfoNCE, CPC

Homeworks

  • DSP Implementation of basic ops like FFT, Spectrogram and MelScale

  • ASR Implementation of small ASR model, beam search and LM fusing

  • KWS Implementation of attention based KWS model, streaming scoring and model distillation

  • TTS Implementation of TTS model with different tricks

  • NV Implementation of Neural Vocoder Model

Contributors & course staff

Course materials and teaching performed by

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].