m3hrdadfi / soxan

Licence: Apache-2.0 license

Wav2Vec for speech recognition, classification, and audio classification

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to soxan

Awesome Speech Recognition Speech Synthesis Papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Stars: ✭ 2,085 (+1745.13%)

Mutual labels: speech-recognition, automatic-speech-recognition

deep avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Stars: ✭ 104 (-7.96%)

Mutual labels: speech-recognition, automatic-speech-recognition

Delta

DELTA is a deep learning based natural language and speech processing platform.

Stars: ✭ 1,479 (+1208.85%)

Mutual labels: speech-recognition, emotion-recognition

sova-asr

SOVA ASR (Automatic Speech Recognition)

Stars: ✭ 123 (+8.85%)

Mutual labels: speech-recognition, automatic-speech-recognition

2018-dlsl

UPC Deep Learning for Speech and Language 2018

Stars: ✭ 18 (-84.07%)

Mutual labels: speech-recognition, automatic-speech-recognition

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-81.42%)

Mutual labels: speech-recognition, automatic-speech-recognition

speech-emotion-recognition

Speaker independent emotion recognition

Stars: ✭ 269 (+138.05%)

Mutual labels: emotion-recognition, speech-emotion-recognition

demo vietasr

Vietnamese Speech Recognition

Stars: ✭ 22 (-80.53%)

Mutual labels: speech-recognition, automatic-speech-recognition

hf-experiments

Experiments with Hugging Face 🔬 🤗

Stars: ✭ 37 (-67.26%)

Mutual labels: speech-recognition, automatic-speech-recognition

leopard

On-device speech-to-text engine powered by deep learning

Stars: ✭ 354 (+213.27%)

Mutual labels: speech-recognition, automatic-speech-recognition

Automatic speech recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Stars: ✭ 2,751 (+2334.51%)

Mutual labels: speech-recognition, automatic-speech-recognition

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+644.25%)

Mutual labels: speech-recognition, speech-emotion-recognition

obvi

A Polymer 3+ webcomponent / button for doing speech recognition

Stars: ✭ 54 (-52.21%)

Mutual labels: speech-recognition, automatic-speech-recognition

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Stars: ✭ 2,384 (+2009.73%)

Mutual labels: speech-recognition, automatic-speech-recognition

Interaction-Aware-Attention-Network

[ICASSP19] An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs

Stars: ✭ 32 (-71.68%)

Mutual labels: emotion-recognition, speech-emotion-recognition

react-client

An React client library for Speechly API

Stars: ✭ 71 (-37.17%)

Mutual labels: speech-recognition

Deep-learning-And-Paper

【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等

Stars: ✭ 62 (-45.13%)

Mutual labels: speech-recognition

srvk-eesen-offline-transcriber

Top level code to transcribe English audio/video files into text/subtitles

Stars: ✭ 22 (-80.53%)

Mutual labels: speech-recognition

converse

Conversational text Analysis using various NLP techniques

Stars: ✭ 147 (+30.09%)

Mutual labels: emotion-recognition

revai-node-sdk

Node.js SDK for the Rev AI API

Stars: ✭ 21 (-81.42%)

Mutual labels: speech-recognition

View All Similar Projects ➔

Soxan

در زبان پارسی به نام سخن

This repository consists of models, scripts, and notebooks that help you to use all the benefits of Wav2Vec 2.0 in your research. In the following, I'll show you how to train speech tasks in your dataset and how to use the pretrained models.

How to train

I'm just at the beginning of all the possible speech tasks. To start, we continue the training script with the speech emotion recognition problem.

Training - Notebook

Task	Notebook
Speech Emotion Recognition (Wav2Vec 2.0)
Speech Emotion Recognition (Hubert)
Audio Classification (Wav2Vec 2.0)

Training - CMD

python3 run_wav2vec_clf.py \
    --pooling_mode="mean" \
    --model_name_or_path="lighteternal/wav2vec2-large-xlsr-53-greek" \
    --model_mode="wav2vec2" \ # or you can use hubert
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --train_file=/path/to/train.csv \
    --validation_file=/path/to/dev.csv \
    --test_file=/path/to/test.csv \
    --per_device_train_batch_size=4 \
    --per_device_eval_batch_size=4 \
    --gradient_accumulation_steps=2 \
    --learning_rate=1e-4 \
    --num_train_epochs=5.0 \
    --evaluation_strategy="steps"\
    --save_steps=100 \
    --eval_steps=100 \
    --logging_steps=100 \
    --save_total_limit=2 \
    --do_eval \
    --do_train \
    --fp16 \
    --freeze_feature_extractor

Prediction

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, Wav2Vec2FeatureExtractor
from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification

model_name_or_path = "path/to/your-pretrained-model"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AutoConfig.from_pretrained(model_name_or_path)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
sampling_rate = feature_extractor.sampling_rate

# for wav2vec
model = Wav2Vec2ForSpeechClassification.from_pretrained(model_name_or_path).to(device)

# for hubert
model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)


def speech_file_to_array_fn(path, sampling_rate):
    speech_array, _sampling_rate = torchaudio.load(path)
    resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
    speech = resampler(speech_array).squeeze().numpy()
    return speech


def predict(path, sampling_rate):
    speech = speech_file_to_array_fn(path, sampling_rate)
    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
    inputs = {key: inputs[key].to(device) for key in inputs}

    with torch.no_grad():
        logits = model(**inputs).logits

    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
               enumerate(scores)]
    return outputs


path = "/path/to/disgust.wav"
outputs = predict(path, sampling_rate)

Output:

[
    {'Emotion': 'anger', 'Score': '0.0%'},
    {'Emotion': 'disgust', 'Score': '99.2%'},
    {'Emotion': 'fear', 'Score': '0.1%'},
    {'Emotion': 'happiness', 'Score': '0.3%'},
    {'Emotion': 'sadness', 'Score': '0.5%'}
]

Demos

Demo	Link
Speech To Text With Emotion Recognition (Persian) - soon	huggingface.co/spaces/m3hrdadfi/speech-text-emotion

Models

Dataset	Model
ShEMO: a large-scale validated database for Persian speech emotion detection	m3hrdadfi/wav2vec2-xlsr-persian-speech-emotion-recognition
ShEMO: a large-scale validated database for Persian speech emotion detection	m3hrdadfi/hubert-base-persian-speech-emotion-recognition
ShEMO: a large-scale validated database for Persian speech emotion detection	m3hrdadfi/hubert-base-persian-speech-gender-recognition
Speech Emotion Recognition (Greek) (AESDD)	m3hrdadfi/hubert-large-greek-speech-emotion-recognition
Speech Emotion Recognition (Greek) (AESDD)	m3hrdadfi/hubert-base-greek-speech-emotion-recognition
Speech Emotion Recognition (Greek) (AESDD)	m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition
Eating Sound Collection	m3hrdadfi/wav2vec2-base-100k-eating-sound-collection
GTZAN Dataset - Music Genre Classification	m3hrdadfi/wav2vec2-base-100k-gtzan-music-genres

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

m3hrdadfi / soxan

Programming Languages

Labels

Projects that are alternatives of or similar to soxan

Soxan

How to train

Training - Notebook

Training - CMD

Prediction

Demos

Models