Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → mayurnewase → looking-to-listen-at-cocktail-party

mayurnewase / looking-to-listen-at-cocktail-party

Licence: other

Looking to listen at cocktail party

Programming Languages

Jupyter Notebook

11667 projects

Labels

machine-learning tensorflow artificial-intelligence video-processing audio-processing

Projects that are alternatives of or similar to looking-to-listen-at-cocktail-party

video-audio-tools

To process/edit video and audio with Python+FFmpeg. [简单实用] 基于Python+FFmpeg的视频和音频的处理/剪辑。

Stars: ✭ 164 (+396.97%)

Mutual labels: video-processing, audio-processing

Auto-Editor: Effort free video editing!

Stars: ✭ 382 (+1057.58%)

Mutual labels: video-processing, audio-processing

A fast, versatile, easy-to-use and cross-platform Media Encoder based on FFmpeg

Stars: ✭ 66 (+100%)

Mutual labels: video-processing, audio-processing

eloquent-ffmpeg

High-level API for FFmpeg's Command Line Tools

Stars: ✭ 71 (+115.15%)

Mutual labels: video-processing, audio-processing

Demo projects for iOS Audio & Video development.

Stars: ✭ 136 (+312.12%)

Mutual labels: video-processing, audio-processing

Demonstrations for the interactive exploration of selected core concepts of audio, image and video processing as well as related topics

Stars: ✭ 12 (-63.64%)

Mutual labels: video-processing, audio-processing

Vector Hub - Library for easy discovery, and consumption of State-of-the-art models to turn data into vectors. (text2vec, image2vec, video2vec, graph2vec, bert, inception, etc)

Stars: ✭ 317 (+860.61%)

Mutual labels: video-processing, audio-processing

Audio/Video Processing Service

Stars: ✭ 55 (+66.67%)

Mutual labels: video-processing, audio-processing

Video2description

Video to Text: Generates description in natural language for given video (Video Captioning)

Stars: ✭ 107 (+224.24%)

Mutual labels: video-processing, audio-processing

Arcan - [Display Server, Multimedia Framework, Game Engine] -> "Desktop Engine"

Stars: ✭ 885 (+2581.82%)

Mutual labels: video-processing, audio-processing

MLT Multimedia Framework

Stars: ✭ 836 (+2433.33%)

Mutual labels: video-processing, audio-processing

Console Interface and Library to remove silent parts of a media file 🔈

Stars: ✭ 197 (+496.97%)

Mutual labels: video-processing, audio-processing

Cross-platform, customizable ML solutions for live and streaming media.

Stars: ✭ 15,338 (+46378.79%)

Mutual labels: video-processing, audio-processing

ffmpeg convert wrapper tool

Stars: ✭ 32 (-3.03%)

Mutual labels: video-processing, audio-processing

Asynchronous Audio / Video Library for H264 / MJPEG / OPUS / AAC / MP2 encoding, transcoding, recording and streaming from live sources

Stars: ✭ 50 (+51.52%)

Mutual labels: video-processing

A real-time netlist based audio circuit plugin

Stars: ✭ 51 (+54.55%)

Mutual labels: audio-processing

All the jargon you need to understand the world of Digital Signal Processing.

Stars: ✭ 37 (+12.12%)

Mutual labels: audio-processing

Codebase for RS-MET products (Robin Schmidt's Music Engineering Tools)

Stars: ✭ 32 (-3.03%)

Mutual labels: audio-processing

A simple but powerful audio editor

Stars: ✭ 41 (+24.24%)

Mutual labels: audio-processing

An easy way to use anime4k in python

Stars: ✭ 80 (+142.42%)

Mutual labels: video-processing

View All Similar Projects ➔

This is Keras+Tensorflow implementation of paper "Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation" by Ephrat et el. from Google Research. The project also uses ideas from the paper "Seeing Through Noise:Visually Driven Speaker Seperation and Enhancement"

Compatibility

The code is tested using Tensorflow 1.13.1 under Ubuntu 18.00 with python 3.6.

News

Date	Update
26-06-2019	Readymade datasets removed from kaggle server for storage issue,please make your own from script.
08-06-2019	Notebook added for full pipeline with pretrained model.
25-05-2019	Datasets added for mixed user videos.
23-04-2019	Added automated scripts for creating database structure.

External Dependencies

This repo uses code from facenet and face_recognition for tracking and extracting features from faces in videos.

Usage

Database structure

Given a way to store audio and video datasets efficiently without much duplication.

|--speaker_background_spectrograms/
|  |--per speaker part 1/
|  |    |--speaker_clean.pkl
|  |    |--speaker_chatter_i.pkl
|  |--per speaker part 2/
|  |--  |--speaker_clean.pkl
|       |--speaker_chatter_i.pkl
|--two_speakers_mix_spectrograms/
|	 |--per speaker/
|	 |	|--clean.pkl
|	 |	|--mix_with_other_i.pkl
|--speaker_video_spectrograms
|	 |--per_speaker part 1/
|	 |	|--clean.pkl
|	 |--per_speaker part 2/
|	 |	|--clean.pkl
|--chatter audios/
|  |--part1/
|  |--part2/
|  |--part3/
|--clean audios/
|	 |--videos/
|	 |--frames/
|	 |--pretrained_model/
|	 |  |--facenet_model.h5

Getting started

1.Install all dependencies

pip install -r requirements.txt

2.Run prepare_directory script

./data/prepare_directory.sh

3.download avspeech train and test csv files and put in data/

4.Run background chatter files downloader and slicer to download and slice chatter files.This will download chatter files with tag "/m/07rkbfh" from Audioset

python data/chatter_download.py
python data/chatter_slicer.py

5.Start Downloading data for avspeech dataset and process with your choice with arguments.

python data/data_data_download.py --from_id=0 --to_id=1000 --type_of_dataset=audio_dataset

Arguments available

from_id -> start downloading youtube clips download from train.csv from this id

to_id -> start downloading youtube clips download from train.csv to this id

type_of_dataset -> type of dataset to prepare.
  audio_dataset -> create audio spectrogram mixed with background chatter
  audio_video_dataset -> create audio spectrogram and video embeddings and spectrograms of speaker mixed other speakers audio.
  
low_memory -> clear unnecessary stuff

chatter_part -> user different slots of chatter files to be mixed with clean speakers audio

sample_rate,duration,fps,mono,window,stride,fft_length,amp_norm,chatter_norm -> arguments for STFT and audio processing

face_extraction_model -> select which model to use for facial embedding extraction
  hog -> faster on cpu but less accurate
  cnn -> slower on cpu,faster on nvidia gpu,more accurate

Datasets

Video mixed dataset is availble on my kaggle page in 10 parts.(created using default parameters above)

Go to my kaggle profile(https://www.kaggle.com/mayurnewase)
Click on datasets
Sort by new
Datasets are named by mix_speakers_ultimate_*
Total 10 parts are available.

To do

Check here

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 33

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗