All Projects → sayakpaul → Learnable-Image-Resizing

sayakpaul / Learnable-Image-Resizing

Licence: Apache-2.0 license
TF 2 implementation Learning to Resize Images for Computer Vision Tasks (https://arxiv.org/abs/2103.09950v1).

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Learnable-Image-Resizing

CarLens-iOS
CarLens - Recognize and Collect Cars
Stars: ✭ 124 (+158.33%)
Mutual labels:  vision, image-recognition
Arc Robot Vision
MIT-Princeton Vision Toolbox for Robotic Pick-and-Place at the Amazon Robotics Challenge 2017 - Robotic Grasping and One-shot Recognition of Novel Objects with Deep Learning.
Stars: ✭ 224 (+366.67%)
Mutual labels:  vision
Knn Matting
Source Code for KNN Matting, CVPR 2012 / TPAMI 2013. MATLAB code ready to run. Simple and robust implementation under 40 lines.
Stars: ✭ 130 (+170.83%)
Mutual labels:  vision
Apriltag ros
A ROS wrapper of the AprilTag 3 visual fiducial detector
Stars: ✭ 160 (+233.33%)
Mutual labels:  vision
Ravens
Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet. Transporter Nets, CoRL 2020.
Stars: ✭ 133 (+177.08%)
Mutual labels:  vision
Donkeycar
Open source hardware and software platform to build a small scale self driving car.
Stars: ✭ 2,192 (+4466.67%)
Mutual labels:  vision
Nvidia Gpu Tensor Core Accelerator Pytorch Opencv
A complete machine vision container that includes Jupyter notebooks with built-in code hinting, Anaconda, CUDA-X, TensorRT inference accelerator for Tensor cores, CuPy (GPU drop in replacement for Numpy), PyTorch, TF2, Tensorboard, and OpenCV for accelerated workloads on NVIDIA Tensor cores and GPUs.
Stars: ✭ 110 (+129.17%)
Mutual labels:  vision
Opencv
📷 Computer-Vision Demos
Stars: ✭ 244 (+408.33%)
Mutual labels:  vision
Simplecv
Stars: ✭ 2,522 (+5154.17%)
Mutual labels:  vision
Arucogen
Online ArUco markers generator
Stars: ✭ 155 (+222.92%)
Mutual labels:  vision
Openkai
OpenKAI: A modern framework for unmanned vehicle and robot control
Stars: ✭ 150 (+212.5%)
Mutual labels:  vision
Flowiz
Converts Optical Flow files to images and optionally compiles them to a video. Flow viewer GUI is also available. Check out mockup right from Github Pages:
Stars: ✭ 144 (+200%)
Mutual labels:  vision
Opticalflow visualization
Python optical flow visualization following Baker et al. (ICCV 2007) as used by the MPI-Sintel challenge
Stars: ✭ 183 (+281.25%)
Mutual labels:  vision
Cocoaai
🤖 The Cocoa Artificial Intelligence Lab
Stars: ✭ 134 (+179.17%)
Mutual labels:  vision
Cs231a Notes
The course notes for Stanford's CS231A course on computer vision
Stars: ✭ 230 (+379.17%)
Mutual labels:  vision
Facelandmarksdetection
Finds facial features such as face contour, eyes, mouth and nose in an image.
Stars: ✭ 130 (+170.83%)
Mutual labels:  vision
Nextlevel
NextLevel was initally a weekend project that has now grown into a open community of camera platform enthusists. The software provides foundational components for managing media recording, camera interface customization, gestural interaction customization, and image streaming on iOS. The same capabilities can also be found in apps such as Snapchat, Instagram, and Vine.
Stars: ✭ 1,940 (+3941.67%)
Mutual labels:  vision
Attendance Using Face
Face-recognition using Siamese network
Stars: ✭ 174 (+262.5%)
Mutual labels:  vision
TF2DeepFloorplan
TF2 Deep FloorPlan Recognition using a Multi-task Network with Room-boundary-Guided Attention. Enable tensorboard, quantization, flask, tflite, docker, github actions and google colab.
Stars: ✭ 98 (+104.17%)
Mutual labels:  image-recognition
Amazing Arkit
ARKit相关资源汇总 群:326705018
Stars: ✭ 239 (+397.92%)
Mutual labels:  vision

Learnable-Image-Resizing

TensorFlow 2 implementation of Learning to Resize Images for Computer Vision Tasks by Talebi et al.

Accompanying blog post on keras.io: Learning to Resize in Computer Vision.

The above-mentioned paper proposes a simple framework to optimally learning representations for a given network architecture and given image resolution (such as 224x224). The authors find that the representations that are more coherent with the human perception system may not always improve the performance of vision models. Instead, optimizing the representations that are better suited for the models can substantially improve their performance.

The diagram presents the proposed learnable resizer module (source: original paper):


Here's how the resized images look like after being passed through a learned resizer:

On the left hand side, we see the outputs of an untrained learnable resizer. On the right, the outputs are from the same learnable resizer but with 10 epochs of training. The images may not make sense to our eyes in terms of their perceptual quality, but they help to improve the recognition performance of the vision models.

About the notebooks

  • Standard_Training.ipynb: Shows how to train a DenseNet-121 on the Cats and Dogs dataset with bilinear resizing (150 x 150).
  • Learnable_Resizer.ipynb: Shows how to train the same network with the learnable resizing module included. Here, the inputs are first resized to 300 x 300 and then the learnable resizer module helps learn optimal representations for 150 x 150.

These incorporate mixed-precision training along with distributed training.

Results

Model Number of parameters (Million) Top-1 accuracy
With learnable resizer 7.051717 67.67%
Without learnable resizer 7.039554 60.19%

Both the models were trained for only 10 epochs from the same initial checkpoint.

You can reproduce these results with the model weights provided here.

Paper citation

@InProceedings{Talebi_2021_ICCV,
    author    = {Talebi, Hossein and Milanfar, Peyman},
    title     = {Learning To Resize Images for Computer Vision Tasks},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {497-506}
}

Acknowledgements

  • ML-GDE program for providing GCP credit support.
  • Mark Doust (of Google) for feedback.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].