All Projects → ARBML → klaam

ARBML / klaam

Licence: MIT license
Arabic speech recognition, classification and text-to-speech.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to klaam

leopard-chat-ui-teneo
Leopard Chat UI - A Teneo Chat Client based on Vue and Vuetify
Stars: ✭ 65 (-56.95%)
Mutual labels:  tts, asr
Zerospeech Tts Without T
A Pytorch implementation for the ZeroSpeech 2019 challenge.
Stars: ✭ 100 (-33.77%)
Mutual labels:  tts, asr
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-65.56%)
Mutual labels:  tts, asr
spokestack-tray-android
A UI component that makes it easy to add voice interaction to your app.
Stars: ✭ 13 (-91.39%)
Mutual labels:  tts, asr
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
Stars: ✭ 113 (-25.17%)
Mutual labels:  tts, asr
Mrcp Plugin With Freeswitch
使用FreeSWITCH接受用户手机呼叫,通过UniMRCP Server集成讯飞开放平台(xfyun)插件将用户语音进行语音识别(ASR),并根据自定义业务逻辑调用语音合成(TTS),构建简单的端到端语音呼叫中心。
Stars: ✭ 168 (+11.26%)
Mutual labels:  tts, asr
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+258.94%)
Mutual labels:  tts, asr
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-86.09%)
Mutual labels:  tts, asr
Wukong Robot
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,还可能是首个支持脑机交互的开源智能音箱项目。
Stars: ✭ 3,110 (+1959.6%)
Mutual labels:  tts, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+1463.58%)
Mutual labels:  tts, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-64.9%)
Mutual labels:  tts, asr
speech course
YSDA course in Speech Processing.
Stars: ✭ 93 (-38.41%)
Mutual labels:  tts, asr
EMPHASIS-pytorch
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
Stars: ✭ 15 (-90.07%)
Mutual labels:  tts
State-TalentMAP
A comprehensive research, bidding, and matching system to match Foreign Service employees with the right skills to available posts and positions. API Layer - https://github.com/USStateDept/State-TalentMAP-API
Stars: ✭ 25 (-83.44%)
Mutual labels:  tts
JSpeak
A Text to Speech Reader Front-end that Reads from the Clipboard and with Exceptionable Features
Stars: ✭ 16 (-89.4%)
Mutual labels:  tts
qahiri
Qahiri (قاهري) is a manuscript Kufic typeface
Stars: ✭ 45 (-70.2%)
Mutual labels:  arabic
ukrainian-tts
Ukrainian TTS (text-to-speech) using Coqui TTS
Stars: ✭ 74 (-50.99%)
Mutual labels:  tts
language-detector
Detect the language of text
Stars: ✭ 28 (-81.46%)
Mutual labels:  arabic
ms-ra-forwarder
A free online TTS API
Stars: ✭ 397 (+162.91%)
Mutual labels:  tts
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-86.75%)
Mutual labels:  asr

klaam

Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows training and prediction using pretrained models.

Usage

Speech Classification

from klaam import SpeechClassification
model = SpeechClassification()
model.classify(wav_file)

Speech Recongnition

from klaam import SpeechRecognition
model = SpeechRecognition()
model.transcribe(wav_file)

Text To Speech

from klaam import TextToSpeech
prepare_tts_model_path = "../cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "../cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "../cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "../cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "../data/model_weights/hifigan/generator_universal.pth.tar"

model = TextToSpeech(prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path)

model.synthesize(sample_text)

There are two avilable models for recognition trageting MSA and egyptian dialect . You can set any of them using the lang attribute

from klaam import SpeechRecognition
model = SpeechRecognition(lang = 'msa')
model.transcribe('file.wav')

Datasets

Dataset Description link
MGB-3 Egyptian Arabic Speech recognition in the wild. Every sentence was annotated by four annotators. More than 15 hours have been collected from YouTube. requires registeration here
ADI-5 More than 50 hours collected from Aljazeera TV. 4 regional dialectal: Egyptian (EGY), Levantine (LAV), Gulf (GLF), North African (NOR), and Modern Standard Arabic (MSA). This dataset is a part of the MGB-3 challenge. requires registeration here
Common voice Multlilingual dataset avilable on huggingface here.
Arabic Speech Corpus Arabic dataset with alignment and transcriptions here.

Models

We currently support four models, three of them are avilable on transformers.

Language Description Source
Egyptian Speech recognition wav2vec2-large-xlsr-53-arabic-egyptian
Standard Arabic Speech recognition wav2vec2-large-xlsr-53-arabic
EGY, NOR, LAV, GLF, MSA Speech classification wav2vec2-large-xlsr-dialect-classification
Standard Arabic Text-to-Speech fastspeech2

Example Notebooks

Name Description Notebook
Demo Classification, Recongition and Text-to-speech in a few lines of code.
Demo with mic Audio Recongition and classification with recording.

Training

The scripts are a modification of jqueguiner/wav2vec2-sprint.

classification

This script is used for the classification task on the 5 classes.

python run_classifier.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --freeze_feature_extractor \
    --num_train_epochs="50" \
    --per_device_train_batch_size="32" \
    --preprocessing_num_workers="1" \
    --learning_rate="3e-5" \
    --warmup_steps="20" \
    --evaluation_strategy="steps"\
    --save_steps="100" \
    --eval_steps="100" \
    --save_total_limit="1" \
    --logging_steps="100" \
    --do_eval \
    --do_train \

Recognition

This script is for training on the dataset for pretraining on the egyption dialects dataset.

python run_mgb3.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --freeze_feature_extractor \
    --num_train_epochs="50" \
    --per_device_train_batch_size="32" \
    --preprocessing_num_workers="1" \
    --learning_rate="3e-5" \
    --warmup_steps="20" \
    --evaluation_strategy="steps"\
    --save_steps="100" \
    --eval_steps="100" \
    --save_total_limit="1" \
    --logging_steps="100" \
    --do_eval \
    --do_train \

This script can be used for Arabic common voice training

python run_common_voice.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --dataset_config_name="ar" \
    --output_dir=/path/to/output/ \
    --cache_dir=/path/to/cache \
    --overwrite_output_dir \
    --num_train_epochs="1" \
    --per_device_train_batch_size="32" \
    --per_device_eval_batch_size="32" \
    --evaluation_strategy="steps" \
    --learning_rate="3e-4" \
    --warmup_steps="500" \
    --fp16 \
    --freeze_feature_extractor \
    --save_steps="10" \
    --eval_steps="10" \
    --save_total_limit="1" \
    --logging_steps="10" \
    --group_by_length \
    --feat_proj_dropout="0.0" \
    --layerdrop="0.1" \
    --gradient_checkpointing \
    --do_train --do_eval \
    --max_train_samples 100 --max_val_samples 100

Text To Speech

We use the pytorch implementation of fastspeech2 by ming024. The procedure is as follows

Download the dataset

wget http://en.arabicspeechcorpus.com/arabic-speech-corpus.zip
unzip arabic-speech-corpus.zip

Create multiple directories for data

mkdir -p raw_data/Arabic/Arabic preprocessed_data/Arabic/TextGrid/Arabic
cp arabic-speech-corpus/textgrid/* preprocessed_data/Arabic/TextGrid/Arabic

Prepare metadata

import os
base_dir = '/content/arabic-speech-corpus'
lines = []
for lab_file in os.listdir(f'{base_dir}/lab'):
  lines.append(lab_file[:-4]+'|'+open(f'{base_dir}/lab/{lab_file}', 'r').read())


open(f'{base_dir}/metadata.csv', 'w').write(('\n').join(lines))

Clone my fork

git clone --depth 1 https://github.com/zaidalyafeai/FastSpeech2
cd FastSpeech2
pip install -r requirements.txt

Prepare alignments and prepreocessed data

python3 prepare_align.py config/Arabic/preprocess.yaml
python3 preprocess.py config/Arabic/preprocess.yaml

Unzip vocoders

unzip hifigan/generator_LJSpeech.pth.tar.zip -d hifigan
unzip hifigan/generator_universal.pth.tar.zip -d hifigan

Start training

python3 train.py -p config/Arabic/preprocess.yaml -m config/Arabic/model.yaml -t config/Arabic/train.yaml
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].