cvqluu / simple_diarizer

Licence: GPL-3.0 License

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to simple diarizer

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-19.23%)

Mutual labels: speech-to-text, transcription, asr

leopard

On-device speech-to-text engine powered by deep learning

Stars: ✭ 354 (+1261.54%)

Mutual labels: speech-to-text, transcription, asr

asr24

24-hour Automatic Speech Recognition

Stars: ✭ 27 (+3.85%)

Mutual labels: transcription, asr

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (+588.46%)

Mutual labels: speech-to-text, asr

speechmatics-python

Python library and CLI for Speechmatics

Stars: ✭ 24 (-7.69%)

Mutual labels: speech-to-text, transcription

megs

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (-19.23%)

Mutual labels: speech-to-text, asr

wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Stars: ✭ 205 (+688.46%)

Mutual labels: speech-to-text, asr

speech-recognition-evaluation

Evaluate results from ASR/Speech-to-Text quickly

Stars: ✭ 25 (-3.85%)

Mutual labels: speech-to-text, asr

Speecht

An opensource speech-to-text software written in tensorflow

Stars: ✭ 152 (+484.62%)

Mutual labels: speech-to-text, asr

speech-to-text

Python helper for Google and IBM Watson speech-to-text cloud APIs.

Stars: ✭ 14 (-46.15%)

Mutual labels: speech-to-text, transcription

vosk-asterisk

Speech Recognition in Asterisk with Vosk Server

Stars: ✭ 52 (+100%)

Mutual labels: speech-to-text, asr

Kerasdeepspeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

Stars: ✭ 245 (+842.31%)

Mutual labels: speech-to-text, asr

Edgedict

Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

Stars: ✭ 205 (+688.46%)

Mutual labels: speech-to-text, asr

Lingvo

Stars: ✭ 2,361 (+8980.77%)

Mutual labels: speech-to-text, asr

spokestack-ios

Spokestack: give your iOS app a voice interface!

Stars: ✭ 27 (+3.85%)

Mutual labels: speech-to-text, asr

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (+103.85%)

Mutual labels: speech-to-text, asr

Asr audio data links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 128 (+392.31%)

Mutual labels: speech-to-text, asr

Speech To Text Russian

Проект для распознавания речи на русском языке на основе pykaldi.

Stars: ✭ 151 (+480.77%)

Mutual labels: speech-to-text, asr

PCPM

Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.

Stars: ✭ 21 (-19.23%)

Mutual labels: speech-to-text, asr

kaldi helpers

🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.

Stars: ✭ 13 (-50%)

Mutual labels: speech-to-text, transcription

View All Similar Projects ➔

simple_diarizer

Simplified diarization pipeline using some pretrained models.

Made to be a simple as possible to go from an input audio file to diarized segments.

import soundfile as sf
import matplotlib.pyplot as plt

from simple_diarizer.diarizer import Diarizer
from simple_diarizer.utils import combined_waveplot

diar = Diarizer(
                  embed_model='xvec', # 'xvec' and 'ecapa' supported
                  cluster_method='sc' # 'ahc' and 'sc' supported
               )

segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)

signal, fs = sf.read(WAV_FILE)
combined_waveplot(signal, fs, segments)
plt.show()

Install

Simplified diarization is available on PyPI:

pip install simple-diarizer

Source Video

"Some Quick Advice from Barack Obama!"

Pre-trained Models

The following pretrained models are used:

Voice Activity Detection (VAD)
- Silero VAD
Deep speaker embedding extraction
- SpeechBrain
  - X-Vector
  - ECAPA-TDNN
(Optional/Experimental) Speech-to-text
- ESPnet Model Zoo
  - English ASR model

Demo

It can be checked out in the above link, where it will try and diarize any input YouTube URL. It will also use YouTube's autogenerated transcriptions to produce a speaker labelled transcription.

Hopefully this can be of use as a free basic tool to produce a diarized transcript of a video/audio of interest.

Other References

Spectral clustering methods lifted from https://github.com/wq2012/SpectralCluster

Planned Features

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

cvqluu / simple_diarizer

Programming Languages

Labels

Projects that are alternatives of or similar to simple diarizer

simple_diarizer

Install

Source Video

Pre-trained Models

Demo

Other References

Planned Features