Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.

Stars: ✭ 490 (+438.46%)

Mutual labels: speech-recognition, speech-synthesis

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+824.18%)

Mutual labels: speech-synthesis, speech-recognition

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-42.86%)

Mutual labels: speech-synthesis, speech-recognition

web-speech-cognitive-services

Polyfill Web Speech API with Cognitive Services Bing Speech for both speech-to-text and text-to-speech service.

Stars: ✭ 35 (-61.54%)

Mutual labels: speech-synthesis, speech-recognition

Espnet

End-to-End Speech Processing Toolkit

Stars: ✭ 4,533 (+4881.32%)

Mutual labels: speech-recognition, speech-synthesis

Libfaceid

libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.

Stars: ✭ 354 (+289.01%)

Mutual labels: speech-recognition, speech-synthesis

AmazonSpeechTranslator

End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.

Stars: ✭ 50 (-45.05%)

Mutual labels: speech-synthesis, speech-recognition

Artyom.js

A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.

Stars: ✭ 1,011 (+1010.99%)

Mutual labels: speech-recognition, speech-synthesis

speechrec

a simple speech recognition app using the Web Speech API Interfaces

Stars: ✭ 18 (-80.22%)

Mutual labels: speech-synthesis, speech-recognition

Speech-Backbones

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Stars: ✭ 205 (+125.27%)

Mutual labels: speech-synthesis, speech-recognition

Khronos

The open source intelligent personal assistant

Stars: ✭ 25 (-72.53%)

Mutual labels: speech-synthesis, speech-recognition

porfir

Голосовой ассистент Порфирьевич

Stars: ✭ 23 (-74.73%)

Mutual labels: speech-synthesis, speech-recognition

idear

🎙️ Handsfree Audio Development Interface

Stars: ✭ 84 (-7.69%)

Mutual labels: speech-synthesis, speech-recognition

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (-41.76%)

Mutual labels: speech-synthesis, speech-recognition

voicekit-examples

Examples on how to use Tinkoff Voicekit

Stars: ✭ 35 (-61.54%)

Mutual labels: speech-synthesis, speech-recognition

Athena

an open-source implementation of sequence-to-sequence based speech processing engine

Stars: ✭ 542 (+495.6%)

Mutual labels: speech-recognition, speech-synthesis

View All Similar Projects ➔

Cross-lingual Voice Conversion

I wish I could speak many languages. Wait. Actually I do. But only 4 or 5 languages with limited proficiency. Instead, can I create a voice model that can copy any voice in any language? Possibly! A while ago, me and my colleage Dabi opened a simple voice conversion project. Based on it, I expanded the idea to cross-languages. I found it's very challenging with my limited knowledge. Unfortunately, the results I have for now are not good, but hopefully it will be helpful for some people.

February 2018

Author: Kyubyong Park ([email protected])

Version: 1.0

Requirements

NumPy >= 1.11.1
TensorFlow >= 1.3
librosa
tqdm
scipy

Data

Training 1: TIMIT
Training 2: CMU ARCTIC SLT
Conversion Sample Files: 50LANGUAGES MP3 audio files

Architecture

Train 1: MFCCs of TIMIT speakers -> Triphone PPGs
Train 2: MFCCs of ARTCTIC speaker -> Triphone PPGs -> linear spectrogram
Convert: MFCCs of Any speakers -> Triphone PPGs -> linear spectrogram -> (Griffin-Lim) -> wav file

(To see what PPGs are, consult this)

Training

STEP 0. Prepare datasets
STEP 1. Run python train1.py for phoneme recognition model.
STEP 2. Run python train2.py for speech synthesis model.

Training Curves

Training 1

Training 2

Sample Synthesis

Run python convert.py and check the generated samples in 50lang-output folder.

Generated Samples

Check here and compare original speech samples in 16 languages and their converted counterparts.
Don't expect too much!

References

L. Sun, S. Kang, K. Li, and H. Meng, “Personalized, cross-lingual TTS using phonetic posteriorgrams,” in Proc. INTERSPEECH, San Francisco, U.S.A., Sep. 2016, pp. 322–326.
Dabi Ahn & Kyubyong Park, Voice Conversion with Non-Parallel Data. https://github.com/andabi/deep-voice-conversion

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 91

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗