All Projects → yandexdataschool → speech_course

yandexdataschool / speech_course

Licence: MIT license
YSDA course in Speech Processing.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to speech course

spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-44.09%)
Mutual labels:  tts, asr
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+482.8%)
Mutual labels:  tts, asr
klaam
Arabic speech recognition, classification and text-to-speech.
Stars: ✭ 151 (+62.37%)
Mutual labels:  tts, asr
spokestack-tray-android
A UI component that makes it easy to add voice interaction to your app.
Stars: ✭ 13 (-86.02%)
Mutual labels:  tts, asr
Wukong Robot
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,还可能是首个支持脑机交互的开源智能音箱项目。
Stars: ✭ 3,110 (+3244.09%)
Mutual labels:  tts, asr
Zerospeech Tts Without T
A Pytorch implementation for the ZeroSpeech 2019 challenge.
Stars: ✭ 100 (+7.53%)
Mutual labels:  tts, asr
leopard-chat-ui-teneo
Leopard Chat UI - A Teneo Chat Client based on Vue and Vuetify
Stars: ✭ 65 (-30.11%)
Mutual labels:  tts, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-43.01%)
Mutual labels:  tts, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+2438.71%)
Mutual labels:  tts, asr
Mrcp Plugin With Freeswitch
使用FreeSWITCH接受用户手机呼叫,通过UniMRCP Server集成讯飞开放平台(xfyun)插件将用户语音进行语音识别(ASR),并根据自定义业务逻辑调用语音合成(TTS),构建简单的端到端语音呼叫中心。
Stars: ✭ 168 (+80.65%)
Mutual labels:  tts, asr
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
Stars: ✭ 113 (+21.51%)
Mutual labels:  tts, asr
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-77.42%)
Mutual labels:  tts, asr
lessampler
lessampler is a Singing Voice Synthesizer
Stars: ✭ 59 (-36.56%)
Mutual labels:  dsp
audio noise clustering
https://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Stars: ✭ 24 (-74.19%)
Mutual labels:  dsp
dsp
DSP and filtering library
Stars: ✭ 36 (-61.29%)
Mutual labels:  dsp
TFGAN
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (-30.11%)
Mutual labels:  tts
tai5-uan5 gian5-gi2 kang1-ku7
臺灣言語工具
Stars: ✭ 79 (-15.05%)
Mutual labels:  tts
AnotherBadBeatSaberClone
This is a discontinued but perhaps helpful VR project created during my Master's degree at FH Wedel.
Stars: ✭ 22 (-76.34%)
Mutual labels:  dsp
DtBlkFx
Fast-Fourier-Transform (FFT) based VST plug-in
Stars: ✭ 99 (+6.45%)
Mutual labels:  dsp
avsr-tf1
Audio-Visual Speech Recognition using Sequence to Sequence Models
Stars: ✭ 76 (-18.28%)
Mutual labels:  asr

YSDA Speech Processing Course

  • Materials for each week are in ./week* folders

Course program

  • Week 1: Introduction to Speech

    • Lecture: In this lecture we introduce the area of speech processing, discuss historical background and current trends. In the second half of the lecture we introduce the concept fo speech as a separate modality from text or images and foreshadow concepts from later lectures.
  • Week 2: Digital Signal Processing

    • Lecture: In this lecture we discuss how to transform an audio signal into a form which is convenient for use in Speech Recognition and Synthesis. We discuss: how an audio wave is sampled and digitized; The Fourier Transform and the Discrete Fourier Transform and how they can be used to obtain the frequency spectrum of the signal; How to use the Short-Time-Fourier-Transform to represent sound as a Spectrogram; finally, we discuss the Mel-Scale and how to obtain a Mel-Spectrogram.
    • Seminar: In part 1 we will implement the Short-Time-Fourier-Transform and obtain a Mel-Spectrogram. In part 2 we will: recover a Spectrogram from a Mel-Spectrogram. Reconstruct the original audio signal via the Griffin-Lim algorithm and do some simple voice warping.
    • Homework: Audio-MNIST: Implement a Neural Network model to do simple digit classification based on a mel-spectrogram.

Contributors & course staff

  • Andrey Malinin - Course admin, lectures, seminars, homeworks
  • Vladimir Kirichenko - lectures, seminars, homeworks
  • Segey Dukanov - lecures, seminars, homeworks
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].