All Projects → m3hrdadfi → soxan

m3hrdadfi / soxan

Licence: Apache-2.0 license
Wav2Vec for speech recognition, classification, and audio classification

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to soxan

Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+1745.13%)
Mutual labels:  speech-recognition, automatic-speech-recognition
deep avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Stars: ✭ 104 (-7.96%)
Mutual labels:  speech-recognition, automatic-speech-recognition
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+1208.85%)
Mutual labels:  speech-recognition, emotion-recognition
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+8.85%)
Mutual labels:  speech-recognition, automatic-speech-recognition
2018-dlsl
UPC Deep Learning for Speech and Language 2018
Stars: ✭ 18 (-84.07%)
Mutual labels:  speech-recognition, automatic-speech-recognition
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-81.42%)
Mutual labels:  speech-recognition, automatic-speech-recognition
speech-emotion-recognition
Speaker independent emotion recognition
Stars: ✭ 269 (+138.05%)
Mutual labels:  emotion-recognition, speech-emotion-recognition
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (-80.53%)
Mutual labels:  speech-recognition, automatic-speech-recognition
hf-experiments
Experiments with Hugging Face 🔬 🤗
Stars: ✭ 37 (-67.26%)
Mutual labels:  speech-recognition, automatic-speech-recognition
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+213.27%)
Mutual labels:  speech-recognition, automatic-speech-recognition
Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+2334.51%)
Mutual labels:  speech-recognition, automatic-speech-recognition
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+644.25%)
Mutual labels:  speech-recognition, speech-emotion-recognition
obvi
A Polymer 3+ webcomponent / button for doing speech recognition
Stars: ✭ 54 (-52.21%)
Mutual labels:  speech-recognition, automatic-speech-recognition
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+2009.73%)
Mutual labels:  speech-recognition, automatic-speech-recognition
Interaction-Aware-Attention-Network
[ICASSP19] An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs
Stars: ✭ 32 (-71.68%)
Mutual labels:  emotion-recognition, speech-emotion-recognition
react-client
An React client library for Speechly API
Stars: ✭ 71 (-37.17%)
Mutual labels:  speech-recognition
Deep-learning-And-Paper
【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等
Stars: ✭ 62 (-45.13%)
Mutual labels:  speech-recognition
srvk-eesen-offline-transcriber
Top level code to transcribe English audio/video files into text/subtitles
Stars: ✭ 22 (-80.53%)
Mutual labels:  speech-recognition
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (+30.09%)
Mutual labels:  emotion-recognition
revai-node-sdk
Node.js SDK for the Rev AI API
Stars: ✭ 21 (-81.42%)
Mutual labels:  speech-recognition

Soxan

در زبان پارسی به نام سخن

This repository consists of models, scripts, and notebooks that help you to use all the benefits of Wav2Vec 2.0 in your research. In the following, I'll show you how to train speech tasks in your dataset and how to use the pretrained models.

How to train

I'm just at the beginning of all the possible speech tasks. To start, we continue the training script with the speech emotion recognition problem.

Training - Notebook

Task Notebook
Speech Emotion Recognition (Wav2Vec 2.0) Open In Colab
Speech Emotion Recognition (Hubert) Open In Colab
Audio Classification (Wav2Vec 2.0) Open In Colab

Training - CMD

python3 run_wav2vec_clf.py \
    --pooling_mode="mean" \
    --model_name_or_path="lighteternal/wav2vec2-large-xlsr-53-greek" \
    --model_mode="wav2vec2" \ # or you can use hubert
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --train_file=/path/to/train.csv \
    --validation_file=/path/to/dev.csv \
    --test_file=/path/to/test.csv \
    --per_device_train_batch_size=4 \
    --per_device_eval_batch_size=4 \
    --gradient_accumulation_steps=2 \
    --learning_rate=1e-4 \
    --num_train_epochs=5.0 \
    --evaluation_strategy="steps"\
    --save_steps=100 \
    --eval_steps=100 \
    --logging_steps=100 \
    --save_total_limit=2 \
    --do_eval \
    --do_train \
    --fp16 \
    --freeze_feature_extractor

Prediction

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchaudio
from transformers import AutoConfig, Wav2Vec2FeatureExtractor
from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification

model_name_or_path = "path/to/your-pretrained-model"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
config = AutoConfig.from_pretrained(model_name_or_path)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
sampling_rate = feature_extractor.sampling_rate

# for wav2vec
model = Wav2Vec2ForSpeechClassification.from_pretrained(model_name_or_path).to(device)

# for hubert
model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)


def speech_file_to_array_fn(path, sampling_rate):
    speech_array, _sampling_rate = torchaudio.load(path)
    resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
    speech = resampler(speech_array).squeeze().numpy()
    return speech


def predict(path, sampling_rate):
    speech = speech_file_to_array_fn(path, sampling_rate)
    inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
    inputs = {key: inputs[key].to(device) for key in inputs}

    with torch.no_grad():
        logits = model(**inputs).logits

    scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
    outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
               enumerate(scores)]
    return outputs


path = "/path/to/disgust.wav"
outputs = predict(path, sampling_rate)    

Output:

[
    {'Emotion': 'anger', 'Score': '0.0%'},
    {'Emotion': 'disgust', 'Score': '99.2%'},
    {'Emotion': 'fear', 'Score': '0.1%'},
    {'Emotion': 'happiness', 'Score': '0.3%'},
    {'Emotion': 'sadness', 'Score': '0.5%'}
]

Demos

Demo Link
Speech To Text With Emotion Recognition (Persian) - soon huggingface.co/spaces/m3hrdadfi/speech-text-emotion

Models

Dataset Model
ShEMO: a large-scale validated database for Persian speech emotion detection m3hrdadfi/wav2vec2-xlsr-persian-speech-emotion-recognition
ShEMO: a large-scale validated database for Persian speech emotion detection m3hrdadfi/hubert-base-persian-speech-emotion-recognition
ShEMO: a large-scale validated database for Persian speech emotion detection m3hrdadfi/hubert-base-persian-speech-gender-recognition
Speech Emotion Recognition (Greek) (AESDD) m3hrdadfi/hubert-large-greek-speech-emotion-recognition
Speech Emotion Recognition (Greek) (AESDD) m3hrdadfi/hubert-base-greek-speech-emotion-recognition
Speech Emotion Recognition (Greek) (AESDD) m3hrdadfi/wav2vec2-xlsr-greek-speech-emotion-recognition
Eating Sound Collection m3hrdadfi/wav2vec2-base-100k-eating-sound-collection
GTZAN Dataset - Music Genre Classification m3hrdadfi/wav2vec2-base-100k-gtzan-music-genres
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].