All Projects → leoli3024 → Portrait_FCN_and_3D_Reconstruction

leoli3024 / Portrait_FCN_and_3D_Reconstruction

Licence: MIT license
This project is to convert PortraitFCN+ (by Xiaoyong Shen) from Matlab to Tensorflow, then refine the outputs from it (converted to a trimap) using KNN and ResNet, supervised by Richard Berwick.

Programming Languages

Jupyter Notebook
11667 projects
CSS
56736 projects
python
139335 projects - #7 most used programming language
HTML
75241 projects
javascript
184084 projects - #8 most used programming language
matlab
3953 projects
PHP
23972 projects - #3 most used programming language

Projects that are alternatives of or similar to Portrait FCN and 3D Reconstruction

pytorch2keras
PyTorch to Keras model convertor
Stars: ✭ 788 (+1191.8%)
Mutual labels:  resnet, tensorflow-models
Pytorch2keras
PyTorch to Keras model convertor
Stars: ✭ 676 (+1008.2%)
Mutual labels:  resnet, tensorflow-models
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+18096.72%)
Mutual labels:  resnet, knn
R Centernet
detector for rotated-object based on CenterNet/基于CenterNet的旋转目标检测
Stars: ✭ 226 (+270.49%)
Mutual labels:  resnet
Fusenet
Deep fusion project of deeply-fused nets, and the study on the connection to ensembling
Stars: ✭ 230 (+277.05%)
Mutual labels:  resnet
aistplusplus api
API to support AIST++ Dataset: https://google.github.io/aistplusplus_dataset
Stars: ✭ 277 (+354.1%)
Mutual labels:  3d-reconstruction
EEG-Motor-Imagery-Classification-CNNs-TensorFlow
EEG Motor Imagery Tasks Classification (by Channels) via Convolutional Neural Networks (CNNs) based on TensorFlow
Stars: ✭ 125 (+104.92%)
Mutual labels:  tensorflow-models
Tensorflow Computer Vision Tutorial
Tutorials of deep learning for computer vision.
Stars: ✭ 206 (+237.7%)
Mutual labels:  resnet
mxnet-retrain
Create mxnet finetuner (retrain) for mac/linux ,no need install docker and supports CPU, GPU(eGpu/cudnn).support the inception,resnet ,squeeznet,mobilenet...
Stars: ✭ 32 (-47.54%)
Mutual labels:  resnet
CNN-models
YOLO-v2, ResNet-32, GoogLeNet-lite
Stars: ✭ 32 (-47.54%)
Mutual labels:  resnet
pose-estimation-3d-with-stereo-camera
This demo uses a deep neural network and two generic cameras to perform 3D pose estimation.
Stars: ✭ 40 (-34.43%)
Mutual labels:  3d-reconstruction
Pyramidnet Pytorch
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks, https://arxiv.org/abs/1610.02915)
Stars: ✭ 234 (+283.61%)
Mutual labels:  resnet
drowsiness-detection
To identify the driver's drowsiness based on real-time camera image and image processing techniques. 졸음운전 감지 시스템. OpenCV
Stars: ✭ 31 (-49.18%)
Mutual labels:  knn
Octconv.pytorch
PyTorch implementation of Octave Convolution with pre-trained Oct-ResNet and Oct-MobileNet models
Stars: ✭ 229 (+275.41%)
Mutual labels:  resnet
SSVIO
Graduation Project: A point cloud semantic segmentation and VIO based 3D reconstruction method using RGB-D and IMU
Stars: ✭ 25 (-59.02%)
Mutual labels:  3d-reconstruction
Deepfake Detection
Towards deepfake detection that actually works
Stars: ✭ 213 (+249.18%)
Mutual labels:  resnet
TensorFlow-Binary-Image-Classification-using-CNN-s
Binary Image Classification in TensorFlow
Stars: ✭ 26 (-57.38%)
Mutual labels:  tensorflow-models
Ai papers
AI Papers
Stars: ✭ 253 (+314.75%)
Mutual labels:  resnet
Wideresnet Pytorch
Wide Residual Networks (WideResNets) in PyTorch
Stars: ✭ 249 (+308.2%)
Mutual labels:  resnet
maks
Motion Averaging
Stars: ✭ 52 (-14.75%)
Mutual labels:  3d-reconstruction

High-level Objective

This project is to convert PortraitFCN+ (by Xiaoyong Shen, found here) from Matlab to Tensorflow, then refine the outputs from it (converted to a trimap) using KNN & ResNet, supervised by Richard Berwick.

Second stage is to explore creating a 3D face-model using only one 2D-image.

This project is presentated on May 2nd at Sutardja Center of Entrepreneurship & Technology (SCET). The presentation slide can be assessed here.

Acknowledge

Thanks to Xiaoyong Shen and his team for making this project possible. Their paper introducing this algorithm - Portrait FCN+ - can be found here. We also would like to appreciate Patrik Huber and his team for enabling 3D face reconstruction using basic and simple 2D-inputs. Find their the paper - A Multiresolution 3D Morphable Face Model and Fitting Framework, P. Huber, G. Hu, R. Tena, P. Mortazavian, W. Koppen, W. Christmas, M. Rätsch, J. Kittler - at International Conference on Computer Vision Theory and Applications (VISAPP) 2016.

Please note you might need to request Xiaoyeng's paper's code for dataset in order to train and run certain files, which we use to reference for preprocessing purposes. You might also need to pull certain dataset from PASCOL. The training datasets could also be accessed here: https://drive.google.com/file/d/1TuVO2N_vthca4_B8GhT4JFbV2CJVVIW7/view?usp=sharing.

Twindom - Deep Image Segmentation in Tensorflow (Portrait FCN+)

The green screen technique, used by filmmakers and producers in Hollywood, involves filming actors in front of a uniform background, which is typically green. This process allows editors to easily separate the subjects from the background and insert a new background, but it has two main problems: time and cost. In fact, a Hollywood movie can takes 6 months to post-produce/edit. Moreover, the most recent Avengers movie Avengers: Infinity War had a budget of $320 million with 25% of that going to production costs. Time and cost problems are surmountable for Hollywood movies but can be deal-breakers for would-be filmmakers.

Overview

Our solution automates the green screen & editing process. We take in any image of a person and output a state of the art alpha matte, an image of black (background) and white (foreground/person), which can be used to isolate foreground. Our algorithm is shown below:

    1. Our preprocessing stage captures the positional offsets of all our portrait inputs with respect to a reference image using deep metric learning, a facial featurization technique. We a) identify 49 points which captures distinct facial features for both images b) find the affine transformation between these two sets of points and c) outputs the affine transforms of the mean mask & positional offsets of the reference.
    1. We take the 3 channels outputted via preprocessing and the portrait as inputs into a 20 layer encoder-decoder network called Portrait FCN+, which outputs an unrefined alpha matte. We train Portrait FCN+ on Amazon EC2 against the ground truth alpha matte (i.e. true subject area) of the images. We generate a trimap, an alpha matte with an additional region in grey representing unknown, by setting 10 pixels on either side of the subject and background segmentation line as the unknown.
    1. The trimap is then put into two refinement stages: a) KNN-matting applies K-nearest neighbors to classify the unknown (grey) region and b) ResNet deals with miniscule errors that might have occurred in the PortraitFCN+. The output here is an alpha matte. Our refinement algorithm is much less computationally expensive than the current state of the art refinement procedure, DIM, while maintaining the same accuracy: a 97% IoU. In fact, we shown that our refine-ment algorithm work on a Launchpad setup of 4KB, a miniscule amount compared to an IPhone, which has 64-256 GB.

(Hardware setup testing computational advantage of KNN)

User Interface

We made a website built on Flask in which users can upload a portrait and receive a trimap, alpha matte, and a new image of themselves on a different background. The image is then run by our algorithm and then outputted onto either a web/mobile framework.

Work in Progress - 3D Morphable Face Model

We are currently on generating a 3D model from the segmented image. We use isomap texture extraction to obtain a pose-invariant representation of facial features and compute the linear-scaled orthographic projection camera pose estimation. We then fit these features onto the pre-developed Surrey Face Model, which is a Principal Component Analysis (PCA) model of shape variation built from 3D face scans.

Applications

Our project automates the process of separating the subject from the background in an image, having the potential to replace an entire sector of tasks in Hollywood, VR, and 3D-printing, all of which still uses the green screen techniques. In its current form, our project is useful for amateur photographers and filmmakers looking to change the background of an image, as we kept the whole runtime ~5 minutes, which is shorter than how long manually segmenting an image would take. Our project could be extended to segment video files, which would go a long way in automating the green screen technique. We are also currently working on creating a 3D model from the segmented image.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].