Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → JusperLee → Looking To Listen At The Cocktail Party

JusperLee / Looking To Listen At The Cocktail Party

Licence: mit

Executable code based on Google articles

Programming Languages

python

139335 projects - #7 most used programming language

Labels

audio

Projects that are alternatives of or similar to Looking To Listen At The Cocktail Party

Agora Miniapp Tutorial

Hello world for Agora SDK running in https://en.wikipedia.org/wiki/WeChat#WeChat_Mini_Program

Stars: ✭ 75 (-9.64%)

Mutual labels: audio

Abmediaview

Media view which subclasses UIImageView, and can display & load images, videos, GIFs, and audio and from the web, and has functionality to minimize from fullscreen, as well as show GIF previews for videos.

Stars: ✭ 79 (-4.82%)

Mutual labels: audio

Faad2

Freeware Advanced Audio (AAC) Decoder faad2 mirror

Stars: ✭ 82 (-1.2%)

Mutual labels: audio

React Native Jw Media Player

React-Native Android/iOS bridge for JWPlayer SDK (https://www.jwplayer.com/)

Stars: ✭ 76 (-8.43%)

Mutual labels: audio

Squeezer

Flexible general-purpose compressor with a touch of citrus

Stars: ✭ 78 (-6.02%)

Mutual labels: audio

Figaro

Real-time voice-changer for voice-chat, etc. Will support many different voice-filters and features in the future. 🎵

Stars: ✭ 80 (-3.61%)

Mutual labels: audio

Acme.jl

ACME.jl - Analog Circuit Modeling and Emulation for Julia

Stars: ✭ 74 (-10.84%)

Mutual labels: audio

Minibae

The platform-neutral Beatnik Audio Engine, Mini Edition (miniBAE) is an exceptionally mature, well-rounded, and reliable computer music and sound system specially customized for small-footprint and embedded applications.

Stars: ✭ 82 (-1.2%)

Mutual labels: audio

Soundable

Soundable allows you to play sounds, single and in sequence, in a very easy way

Stars: ✭ 78 (-6.02%)

Mutual labels: audio

Proxy Audio Device

A virtual audio driver for macOS to sends all audio to another output

Stars: ✭ 81 (-2.41%)

Mutual labels: audio

Viwaveformview

Generate waveform view from audio data.

Stars: ✭ 76 (-8.43%)

Mutual labels: audio

Android Rtmp Muxer

Implementation of the RTMP protocol to broadcast video and audio on Android in pure Java

Stars: ✭ 78 (-6.02%)

Mutual labels: audio

Drumbot

Drumbot loves drum machines so much that she made an API dedicated to them. Nothing would make her happier than for you to bring this API to life.

Stars: ✭ 80 (-3.61%)

Mutual labels: audio

Rem

Audio and video processing media library

Stars: ✭ 75 (-9.64%)

Mutual labels: audio

Cross Adaptive Audio

Evolving Artificial Neural Networks for Cross-Adaptive Audio Effects

Stars: ✭ 82 (-1.2%)

Mutual labels: audio

Soundswitch

C# application to switch default playing device. Download: https://soundswitch.aaflalo.me/

Stars: ✭ 1,190 (+1333.73%)

Mutual labels: audio

Synstack

Modular soft synth & Forth based VM for audio DSL experiments

Stars: ✭ 79 (-4.82%)

Mutual labels: audio

Rust Game Development Frameworks

List of curated frameworks by the **Game Development in Rust** community.

Stars: ✭ 81 (-2.41%)

Mutual labels: audio

Muse

🎧 All you need is a simple and diligent HTML 5 music player written in React.

Stars: ✭ 82 (-1.2%)

Mutual labels: audio

Vk Audio Token

Library that obtains VK tokens that work for VK audio API. Библиотека для получения токена VK, подходящего для Audio API.

Stars: ✭ 81 (-2.41%)

Mutual labels: audio

View All Similar Projects ➔

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

The project is an audiovisual model reproduced by the contents of the paper Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation.

Ephrat A, Mosseri I, Lang O, et al. Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation[J]. arXiv preprint arXiv:1804.03619, 2018.

Requirement

Python3.7
TensorFlow 2.0.0
Keras 2.3.1
librosa 0.7.0
youtube-dl(https://github.com/ytdl-org/youtube-dl)(Any version)
ffmpeg(https://www.ffmpeg.org/)（Any version)
sox

To install requirements:

pip install -r requirements.txt

You can install ffmpeg and sox using homebrew:

brew install ffmpeg
brew install sox

Pretreatment

Video Data

Download the dataset from here and place files in data/csv.
First use this command to download the YouTube video and use ffmpeg to capture the 3 second video as 75 images.

python3 video_download.py

Then use mtcnn to get the image bounding box of the face, and then use the CSV x, y to locate the face center point.

pip install mtcnn
python3 face_detected.py
python3 check_vaild_face.py

Audio Data

For the audio section, use the YouTube download tool to download the audio, then set the sample rate to 16000 via the librosa library. Finally, the audio data is normalized.

python3 audio_downloads.py
python3 audio_norm.py # audio_data normalized

Pre-processing audio data, including stft, Power-law, blending, generating complex masks, etc....

python3 audio_data.py

Face embedding Feature

Here we use Google's FaceNet method to map face images to high-dimensional Euclidean space. In this project, we use David Sandberg's open source FaceNet preprocessing model "20180402-114759". Then use the TensorFlow_to_Keras script in this project to convert.（Model/face_embedding/）

Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 815-823.

Change the path tf_model_dir in Tensorflow_to_Keras.py

python3 Tensorflow_to_Keras.py
python3 face_emb.py

Create AVdataset_train.txt and AVdataset_val.txt

python3 AV_data_log.py

Training

Support continuous training after interrupt training
Support multi-GPU multi-process training.
According to the description in the paper, set the following parameters:

people_num = 2 # How many people you want to separate?
epochs = 100
initial_epoch = 0
batch_size = 1 # 2,4 need to GPU
gamma_loss = 0.1
beta_loss = gamma_loss * 2

Then use the script train.py to train

Plan to achieve

[ ] Implemented with Pytorch
[ ] Provide a trained model
[ ] Optimize code style
[ ] ......

Part of the code reference this github https://github.com/bill9800/speech_separation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 83

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗