All Projects → TaoRuijie → TalkNet_ASD

TaoRuijie / TalkNet_ASD

Licence: MIT license
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TalkNet ASD

emacs-application-framework
EAF, an extensible framework that revolutionizes the graphical capabilities of Emacs
Stars: ✭ 2,454 (+2456.25%)
Mutual labels:  multimedia
gstcefsrc
A simple gstreamer wrapper around Chromium Embedded Framework
Stars: ✭ 46 (-52.08%)
Mutual labels:  multimedia
OCamlSDL2
OCaml interface to SDL 2.0 (for Linux, Windows, MacOS, and ChromeBook)
Stars: ✭ 42 (-56.25%)
Mutual labels:  multimedia
conan-sfml
[OBSOLETE] The recipe is now in https://github.com/bincrafters/community
Stars: ✭ 13 (-86.46%)
Mutual labels:  multimedia
UDLF
An Unsupervised Distance Learning Framework for Multimedia Retrieval
Stars: ✭ 40 (-58.33%)
Mutual labels:  multimedia
pd-lua
Lua bindings for Pd, updated for Lua 5.3+
Stars: ✭ 20 (-79.17%)
Mutual labels:  multimedia
RSS-to-Telegram-Bot
A Telegram RSS bot that cares about your reading experience
Stars: ✭ 482 (+402.08%)
Mutual labels:  multimedia
AVSD-DSTC10 Official
Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)
Stars: ✭ 22 (-77.08%)
Mutual labels:  audio-visual
rtsp-types
RTSP (RFC 7826) types and parsers/serializers
Stars: ✭ 16 (-83.33%)
Mutual labels:  multimedia
cottontaildb
Cottontail DB is a column store aimed at multimedia retrieval. It allows for classical boolean as well as vector-space retrieval (nearest neighbour search) used in similarity search using a unified data and query model.
Stars: ✭ 16 (-83.33%)
Mutual labels:  multimedia
HumanRecognition
Person Recognition System on PIPA dataset
Stars: ✭ 28 (-70.83%)
Mutual labels:  multimedia
ebml-go
A pure Go implementation of bi-directional EBML encoder/decoder
Stars: ✭ 60 (-37.5%)
Mutual labels:  multimedia
Modaily-Aware-Audio-Visual-Video-Parsing
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
Stars: ✭ 19 (-80.21%)
Mutual labels:  audio-visual
smk
SMK - Simple multimedia kit - C++ WebAssembly
Stars: ✭ 89 (-7.29%)
Mutual labels:  multimedia
blackcrownproject
The archive of The Black Crown Project, a now-dismembered narrative web game.
Stars: ✭ 18 (-81.25%)
Mutual labels:  multimedia
awesome-puredata
A list of Pure Data libraries, abstractions, projects and presentations
Stars: ✭ 36 (-62.5%)
Mutual labels:  multimedia
SSffmpegVideoOperation
This is a library of FFmpeg for android... 📸 🎞 🚑
Stars: ✭ 261 (+171.88%)
Mutual labels:  multimedia
awesome-vlc
👻 A curated list of awesome VLC and LibVLC resources.
Stars: ✭ 45 (-53.12%)
Mutual labels:  multimedia
AudioVisualSceneAwareDialog
No description or website provided.
Stars: ✭ 22 (-77.08%)
Mutual labels:  audio-visual
anchovy
D language multimedia library for games and gui applications
Stars: ✭ 22 (-77.08%)
Mutual labels:  multimedia

Is someone talking? TalkNet: Audio-visual active speaker detection Model

This repository contains the code for our ACM MM 2021 paper (oral), TalkNet, an active speaker detection model to detect 'whether the face in the screen is speaking or not?'. [Paper] [Video_English] [Video_Chinese].

overall.png

  • Awesome ASD: Papers about active speaker detection in last years.

  • TalkNet in AVA-Activespeaker dataset: The code to preprocess the AVA-ActiveSpeaker dataset, train TalkNet in AVA train set and evaluate it in AVA val/test set.

  • TalkNet in TalkSet and Columbia ASD dataset: The code to generate TalkSet, an ASD dataset in the wild, based on VoxCeleb2 and LRS3, train TalkNet in TalkSet and evaluate it in Columnbia ASD dataset.

  • An ASD Demo with pretrained TalkNet model: An end-to-end script to detect and mark the speaking face by the pretrained TalkNet model.


Dependencies

Start from building the environment

conda create -n TalkNet python=3.7.9 anaconda
conda activate TalkNet
pip install -r requirement.txt

Start from the existing environment

pip install -r requirement.txt

TalkNet in AVA-Activespeaker dataset

Data preparation

The following script can be used to download and prepare the AVA dataset for training.

python trainTalkNet.py --dataPathAVA AVADataPath --download 

AVADataPath is the folder you want to save the AVA dataset and its preprocessing outputs, the details can be found in here . Please read them carefully.

Training

Then you can train TalkNet in AVA end-to-end by using:

python trainTalkNet.py --dataPathAVA AVADataPath

exps/exps1/score.txt: output score file, exps/exp1/model/model_00xx.model: trained model, exps/exps1/val_res.csv: prediction for val set.

Pretrained model

Our pretrained model performs mAP: 92.3 in validation set, you can check it by using:

python trainTalkNet.py --dataPathAVA AVADataPath --evaluation

The pretrained model will automaticly be downloaded into TalkNet_ASD/pretrain_AVA.model. It performs mAP: 90.8 in the testing set.


TalkNet in TalkSet and Columbia ASD dataset

Data preparation

We find that it is challenge to apply the model we trained in AVA for the videos not in AVA (Reason is here, Q3.1). So we build TalkSet, an active speaker detection dataset in the wild, based on VoxCeleb2 and LRS3.

We do not plan to upload this dataset since we just modify it, instead of building it. In TalkSet folder we provide these .txt files to describe which files we used to generate the TalkSet and their ASD labels. You can generate this TalkSet if you are interested to train an ASD model in the wild.

Also, we have provided our pretrained TalkNet model in TalkSet. You can evaluate it in Columbia ASD dataset or other raw videos in the wild.

Usage

A pretrain model in TalkSet will be download into TalkNet_ASD/pretrain_TalkSet.model when using the following script:

python demoTalkNet.py --evalCol --colSavePath colDataPath

Also, Columnbia ASD dataset and the labels will be downloaded into colDataPath. Finally you can get the following F1 result.

Name Bell Boll Lieb Long Sick Avg.
F1 98.1 88.8 98.7 98.0 97.7 96.3

(This result is different from that in our paper because we train the model again, while the avg. F1 is very similar)


An ASD Demo with pretrained TalkNet model

Data preparation

We build an end-to-end script to detect and extract the active speaker from the raw video by our pretrain model in TalkSet.

You can put the raw video (.mp4 and .avi are both fine) into the demo folder, such as 001.mp4.

Usage

python demoTalkNet.py --videoName 001

A pretrain model in TalkSet will be downloaded into TalkNet_ASD/pretrain_TalkSet.model. The structure of the output reults can be found in here.

You can get the output video demo/001/pyavi/video_out.avi, which has marked the active speaker by green box and non-active speaker by red box.

If you want to evaluate by using cpu only, you can modify demoTalkNet.py and talkNet.py file: modify all cuda into cpu. Then replace line 83 in talkNet.py into loadedState = torch.load(path,map_location=torch.device('cpu'))


Citation

Please cite the following if our paper or code is helpful to your research.

@inproceedings{tao2021someone,
  title={Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection},
  author={Tao, Ruijie and Pan, Zexu and Das, Rohan Kumar and Qian, Xinyuan and Shou, Mike Zheng and Li, Haizhou},
  booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},
  pages = {3927–3935},
  year={2021}
}

I have summaried some potential FAQs. You can also check the issues in Github for other questions that I have answered.

This is my first open-source work, please let me know if I can future improve in this repositories or there is anything wrong in our work. Thanks for your support!

Acknowledge

We study many useful projects in our codeing process, which includes:

The structure of the project layout and the audio encoder is learnt from this repository.

Demo for visulization is modified from this repository.

AVA data download code is learnt from this repository.

The model for the visual frontend is learnt from this repository.

Thanks for these authors to open source their code!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].