pbcquoc / voice_zaloai

Licence: other

dentifying gender and regional accent from speech

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to voice zaloai

Sample app used to demonstrate the use of Microsoft Cognitive Services Text-to-Speech APIs (aka Speech Synthesis) from within Unity.

Stars: ✭ 67 (+91.43%)

Mutual labels: voice

bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

Stars: ✭ 56 (+60%)

Mutual labels: lstm

extkeras

Playground for implementing custom layers and other components compatible with keras, with the purpose to learn the framework better and perhaps in future offer some utils for others.

Stars: ✭ 18 (-48.57%)

Mutual labels: lstm

UniSpyServer

An Open source GameSpy emulator written in C#

Stars: ✭ 110 (+214.29%)

Mutual labels: voice

cookiecutter-flask-ask

Cookiecutter template for Alexa skills based on the fantastic Flask-Ask framework 🍾🗣❓

Stars: ✭ 51 (+45.71%)

Mutual labels: voice

react-client

An React client library for Speechly API

Stars: ✭ 71 (+102.86%)

Mutual labels: voice

VoiceDictation

迅飞语音听写 WebAPI - 把语音(≤60秒)转换成对应的文字信息，让机器能够“听懂”人类语言，相当于给机器安装上“耳朵”，使其具备“能听”的功能。

Stars: ✭ 36 (+2.86%)

Mutual labels: voice

air writing

Online Hand Writing Recognition using BLSTM

Stars: ✭ 26 (-25.71%)

Mutual labels: lstm

voice

Implementation of the Discord Voice API for discord.js and other JS/TS libraries

Stars: ✭ 310 (+785.71%)

Mutual labels: voice

keras-malicious-url-detector

Malicious URL detector using keras recurrent networks and scikit-learn classifiers

Stars: ✭ 24 (-31.43%)

Mutual labels: lstm

twilio-voice.js

Twilio's JavaScript Voice SDK

Stars: ✭ 21 (-40%)

Mutual labels: voice

Show and Tell

Show and Tell : A Neural Image Caption Generator

Stars: ✭ 74 (+111.43%)

Mutual labels: lstm

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Stars: ✭ 20 (-42.86%)

Mutual labels: lstm

tensorflow-node-examples

Tensorflow Node.js Examples

Stars: ✭ 21 (-40%)

Mutual labels: lstm

deep-char-cnn-lstm

Deep Character CNN LSTM Encoder with Classification and Similarity Models

Stars: ✭ 20 (-42.86%)

Mutual labels: lstm

MachineLearning

Implementations of machine learning algorithm by Python 3

Stars: ✭ 16 (-54.29%)

Mutual labels: lstm

kaspersky hackathon

https://events.kaspersky.com/hackathon/

Stars: ✭ 25 (-28.57%)

Mutual labels: lstm

SpeakerDiarization RNN CNN LSTM

Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2 speakers (on separate channels).

Stars: ✭ 56 (+60%)

Mutual labels: lstm

twilio-client.js

Twilio’s Programmable Voice JavaScript SDK

Stars: ✭ 63 (+80%)

Mutual labels: voice

novel writer

Train LSTM to writer novel (HongLouMeng here) in Pytorch.

Stars: ✭ 14 (-60%)

Mutual labels: lstm

View All Similar Projects ➔

Zalo AI Challenge

Giới thiệu

Zalo AI challenge là cuộc thi AI đầu tiên do zalo tổ chức. Nội dung liên quan đến xử dụng âm thanh, hình ảnh và các loại dữ liệu khác. Source code này mình hướng dẫn basic cho các bạn để extract feature mfcc, chrom bằng librosa sử dụng multi process, cũng như mô hình LSTM đơn giản nhất đạt được 67% trên public leaderboard. Về cơ bản, thì có thể sử dụng CNN, và LSTM cũng như các mô hình tree-based như XGBoost để giải quyết bài toán này

Feature Extract

Mình extract nhiều loại features:

MFCC
spectral centroid
chroma stft
spectral contrast

Những feature này được extract với hop length = 512ms, mình chỉ giữ lại 3s đầu tương ứng với 128 timestep. Sau đó các feature được concat với nhau và padding nếu bé hơn 3s

Mô hình

Mình sử dụng mô hình LSTM 2 tầng đơn giản, đặc trưng được tổng hợp ở timestep cuối cùng được qua hàm softmax và predict nhãn cho mẫu dữ liễu, đối với accent thì là bắc/trung/nam, còn đối với gender thì là nam/nữ. Thời gian train khoảng 10s trên một epoch.

Train

Các bạn cần tải tập train và test, rồi để vào folder như trong code, sau đó chạy lệnh sau để build tập train và test. Dữ liệu sẽ lưu xuống thư mục được config và dùng để train model

python make_data.py

Sau khi tạo dữ liệu xong, các bạn cần chạy lệnh sau để huấn luyện model. Mình huấn luyện model cho gender, và accent riêng. Sau khoảng 600 epochs thì acc của gender trên tập validate là 96%, còn accent là 85%. Trên public leaderboard các bạn sẽ được khoảng 67.8%, và nằm trong top 10.

python lstm.py

Kết quả

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

pbcquoc / voice_zaloai

Programming Languages

Labels

Projects that are alternatives of or similar to voice zaloai

Zalo AI Challenge

Giới thiệu

Feature Extract

Mô hình

Train

Kết quả