Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → LouisFoucard → W Net

LouisFoucard / W Net

w-net: a convolutional neural network architecture for the self-supervised learning of depthmap from pairs of stereo images.

Labels

jupyter-notebook

Projects that are alternatives of or similar to W Net

Python For Finance Cookbook

Python for Finance Cookbook, published by Packt

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

🍔

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

Medium articles

Scripts/Notebooks used for my articles published on Medium

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

Intro to deep learning for medical imaging lesson, by MD.ai

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

책) 파이썬으로 데이터 주무르기 - 소스코드 및 데이터 공개

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL

Stars: ✭ 2,505 (+1152.5%)

Mutual labels: jupyter-notebook

Pytorch Geometric Yoochoose

This is a tutorial for PyTorch Geometric on the YooChoose dataset

Stars: ✭ 198 (-1%)

Mutual labels: jupyter-notebook

Traffic Sign Detection

Traffic Sign Detection. Code for the paper entitled "Evaluation of deep neural networks for traffic sign detection systems".

Stars: ✭ 200 (+0%)

Mutual labels: jupyter-notebook

Decentralized Machine Learning Client

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

retrain gpt-2 in colab

Stars: ✭ 200 (+0%)

Mutual labels: jupyter-notebook

This trading-gym is the first trading for agent to train with episode of short term trading itself.

Stars: ✭ 194 (-3%)

Mutual labels: jupyter-notebook

Multi-Graph Convolutional Neural Networks

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

Convolutional neural network analysis for predicting DNA sequence activity.

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

Go binding for TensorFlow Lite

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

Machine Learning With Python Cookbook Notes

(Part of) Chris Albon's Machine Learning with Python Cookbook in .ipynb form

Stars: ✭ 197 (-1.5%)

Mutual labels: jupyter-notebook

Nas fpn tensorflow

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection.

Stars: ✭ 198 (-1%)

Mutual labels: jupyter-notebook

Neuralnetworks.thought Experiments

Observations and notes to understand the workings of neural network models and other thought experiments using Tensorflow

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

🎄 Virtual DOM for Python

Stars: ✭ 200 (+0%)

Mutual labels: jupyter-notebook

Pyspark And Mllib

Getting start with PySpark and MLlib

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

An electronic simulation library written in pure Python

Stars: ✭ 199 (-0.5%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

W-net: Self-Supervised Learning of Depthmap from Stereo Images.

W-net is a self-supervised convolutional neural network architecture to learn to predict depth maps from pairs of stereo images. The network is trained directly on pairs of stereo images to jointly reconstruct the left view from the right view and the rihgt view from the left view, using learned disparity maps and the L1 norm as a reconstruction error metric. A probabilistic selection layer that applies simple geometric transformations is used to reconstruct the left/right view from the right/left view and the corresponding disparity map. This probabilistic selection layer was first introduced by Deep3d, an architecture to predic depth maps from monocular images and convert 2d movies to 3d (see https://arxiv.org/abs/1604.03650). To handle the texture-less region where the reconstruction problem is ill-defined, we use an auxiliary output loss to minimize the spatial gradient of the learned disparity maps. This auxiliary output loss however is weighted by the original image's gradient so that it is only enforced in regions of image where there is no texture, and still allows the disparity map to have sharp transition at the edges of foreground object.

The gif shown on the right is an example of inference of depthmap from stereo images from the movie point break 2, one of the movies held out for validation.

The processes of calculating the disparity map, reconstructing the two views and computing the spatial gradients are all encompassed into a single, end to end framework that can be trained without ground truth, but rather by checking that the learned function is self-consistent.

A test on te KITTI dataset is coming soon. There is still a lot of room for improvement, but the model is capable of infering depthmap at a rate of 20fps from images of resolution 192x336 on a GTX 1070. More importantly, gathering more data for training is an easy process in this case since the model does not require depthmap ground truth. Eventually, this type of neural net for infering depth from stereo images could become a much cheaper and much higher resolution and range alternative to lidar systems.

Architecture

The architecture of w-net is heavily inspired by the u-net architecture, a residual learning architecture with with multiscale contracting paths used to perform nerve segmentation in the 2016 Ultrasound Nerve Segmentation Kaggle competition. Its particularity is to bring back activation from the lower layers to the higher level, so that, in the case of the nerve segmentation task, the network has a highly detailed reference against which to draw the masks deliminating the nerve cross sections.

In the case of w-net, the disaprity needs to be calculated at the same spatial resolution as the original image, which is intuitively why bringing back up the lower level activations would help.

The second particularity of w-net is the fact that its input consists in a pair of stereo images, which are concatenated along the channel axis, as shown in the schematic above. We use depth wise separable convolutions(https://arxiv.org/abs/1610.02357).

The third particulrity of w-net is the presence of both a probabilistic selection layer, which uses the calculated disparity to apply geometrical transformations from the left to right images, as well as a gradient layer which computes the spatial gradient of the calculated disparity map in order to enforce some level of smoothness with the help of an auxiliary loss function.

The selection layer is as described in deep3d. The network predicts a probability distribution across possible disparity values d at each pixel location. To get the right image from the left image and the disparity map, and check that the disparity map is correct, we first define a shifted stack of the left view, then we use the disparity map to calculate the right image as the sum of the disparity level and the shifted image pixel value.

This operation differentiable with respect to the disparity. This means that we will be able to train the network using backpropagation to modify the disparity map infered from both the left and right image until it is correct, e.g. we can properly reconstruct the right image from the left image and the disparity map using the equation above.

This selection layer will not contain any learnable weight, its output is completely deterministic given an image and a disparity map. Its only role is to contain information about the simple geometric rules that allow one to calculate right image from left and disparity. This infusion of expert knowledge is all the network needs to start being able to compute depth from pairs of images.

In summary, the selection layer needs to do the following:

construct the shifted left image stack
compute the right image by performing a dot product along the disparity values axis

See the included notebook for a detailed explanation and implementation.

Training

The model is implemented in Keras/Tensoflow, and is trained on data from 22 3d movies, sampled at 1 fps. Validation is perfomred on 3 held out movies. The total number of stereo frame is about 125K, training took 4 days on a gtx 1070 with batches of 6 stereo images with resolution 192x336 per eye. Batch normalization is used.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 200

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗