All Projects → adobe-research → Makeittalk

adobe-research / Makeittalk

Licence: other

Projects that are alternatives of or similar to Makeittalk

Manipulation
Course notes for MIT manipulation class
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Tf objectdetection api
Tutorial on how to create your own object detection dataset and train using TensorFlow's API
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Cgoes
Research by Carlos Góes
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Spring2019 tutorials
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Kaggle Ds Bowl 2018 Baseline
Full train/inference/submission pipeline adapted to the competition from https://github.com/matterport/Mask_RCNN
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Ml4music Workshop
Machine Learning for Music and Sound Synthesis workshop
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Content Aws Mls C01
AWS Certified Machine Learning - Specialty (MLS-C01)
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Intro machine learning
Introduction to Machine Learning, a series of IPython Notebook and accompanying slideshow and video
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Simple adversarial examples
Repo of simple adversarial examples on vanilla neural networks trained on MNIST
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Anomaly Detection
Anomaly detection algorithm implementation in Python
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Openplan
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Face Classification
Face model to classify gender and race. Trained on LFWA+ Dataset.
Stars: ✭ 104 (-0.95%)
Mutual labels:  jupyter-notebook
How To Generate Art Demo
This is the code for "How to Generate Art - Intro to Deep Learning #8' by Siraj Raval on YouTube
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Unet Segmentation In Keras Tensorflow
UNet is a fully convolutional network(FCN) that does image segmentation. Its goal is to predict each pixel's class. It is built upon the FCN and modified in a way that it yields better segmentation in medical imaging.
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Intro To Deep Learning For Nlp
The repository contains code walkthroughs which introduces Deep Learning in the field of Natural Language Processing.
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
D2l Torch
《动手学深度学习》 PyTorch 版本
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Deepai
Detection of Accounting Anomalies using Deep Autoencoder Neural Networks - A lab we prepared for NVIDIA's GPU Technology Conference 2018 that will walk you through the detection of accounting anomalies using deep autoencoder neural networks. The majority of the lab content is based on Jupyter Notebook, Python and PyTorch.
Stars: ✭ 104 (-0.95%)
Mutual labels:  jupyter-notebook
Ipywidgets Static
[obsolete] Static Widgets for IPython Notebooks
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Openomni
Documentation and library for decoding omnipod communications.
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Pixel2style2pixel
Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation"
Stars: ✭ 1,395 (+1228.57%)
Mutual labels:  jupyter-notebook

MakeItTalk: Speaker-Aware Talking-Head Animation

This is the code repository implementing the paper:

MakeItTalk: Speaker-Aware Talking-Head Animation

Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria , Evangelos Kalogerakis, Dingzeyu Li

SIGGRAPH Asia 2020

Abstract We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures. In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.

[Project page] [Paper] [Video] [Arxiv] [Colab Demo] [Colab Demo TDLR]

img

Figure. Given an audio speech signal and a single portrait image as input (left), our model generates speaker-aware talking-head animations (right). Both the speech signal and the input face image are not observed during the model training process. Our method creates both non-photorealistic cartoon animations (top) and natural human face videos (bottom).

Updates

  • [x] Pre-trained models
  • [x] Google colab quick demo for natural faces [detail] [TDLR]
  • [ ] Training code for each module
  • [ ] Customized puppet creating tool

Requirements

  • Python environment 3.6
conda create -n makeittalk_env python=3.6
conda activate makeittalk_env
sudo apt-get install ffmpeg
  • python packages
pip install -r requirements.txt
sudo dpkg --add-architecture i386
wget -nc https://dl.winehq.org/wine-builds/winehq.key
sudo apt-key add winehq.key
sudo apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ xenial main'
sudo apt update
sudo apt install --install-recommends winehq-stable

Pre-trained Models

Download the following pre-trained models to examples/ckpt folder for testing your own animation.

Model Link to the model
Voice Conversion Link
Speech Content Module Link
Speaker-aware Module Link
Image2Image Translation Module Link
Non-photorealistic Warping (.exe) Link

Animate You Portraits!

  • Download pre-trained embedding [here] and save to examples/dump folder.

Nature Human Faces / Paintings

  • crop your portrait image into size 256x256 and put it under examples folder with .jpg format. Make sure the head is almost in the middle (check existing examples for a reference).

  • put test audio files under examples folder as well with .wav format.

  • animate!

python main_end2end.py --jpg <portrait_file>  
  • use addition args --amp_lip_x <x> --amp_lip_y <y> --amp_pos <pos> to amply lip motion (in x/y-axis direction) and head motion displacements, default values are <x>=2., <y>=2., <pos>=.5

Cartoon Faces

  • put test audio files under examples folder as well with .wav format.

  • animate one of the existing puppets

Puppet Name wilk roy sketch color cartoonM danbooru1
Image img img img img img img
python main_end2end_cartoon.py --jpg <cartoon_puppet_name_with_extension> --jpg_bg <puppet_background_with_extension>
  • --jpg_bg takes a same-size image as the background image to create the animation, such as the puppet's body, the overall fixed background image. If you want to use the background, make sure the puppet face image (i.e. --jpg image) is in png format and is transparent on the non-face area. If you don't need any background, please also create a same-size image (e.g. a pure white image) to hold the argument place.

  • use addition args --amp_lip_x <x> --amp_lip_y <y> --amp_pos <pos> to amply lip motion (in x/y-axis direction) and head motion displacements, default values are <x>=2., <y>=2., <pos>=.5

  • create your own puppets (ToDo...)

Train

Train Voice Conversion Module

Todo...

Train Content Branch

  • Create dataset root directory <root_dir>

  • Dataset: Download preprocessed dataset [here], and put it under <root_dir>/dump.

  • Train script: Run script below. Models will be saved in <root_dir>/ckpt/<train_instance_name>.

    python main_train_content.py --train --write --root_dir <root_dir> --name <train_instance_name>
    

Train Speaker-Aware Branch

Todo...

Train Image-to-Image Translation

Todo...

License

Acknowledgement

We would like to thank Timothy Langlois for the narration, and Kaizhi Qian for the help with the voice conversion module. We thank Jakub Fiser for implementing the real-time GPU version of the triangle morphing algorithm. We thank Daichi Ito for sharing the caricature image and Dave Werner for Wilk, the gruff but ultimately lovable puppet.

This research is partially funded by NSF (EAGER-1942069) and a gift from Adobe. Our experiments were performed in the UMass GPU cluster obtained under the Collaborative Fund managed by the MassTech Collaborative.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].