All Projects → makatx → Yolo_resnet

makatx / Yolo_resnet

Implementing YOLO using ResNet as the feature extraction network

Projects that are alternatives of or similar to Yolo resnet

Getting Things Done With Pytorch
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BERT.
Stars: ✭ 738 (+800%)
Mutual labels:  jupyter-notebook, yolo
Tensornets
High level network definitions with pre-trained weights in TensorFlow
Stars: ✭ 982 (+1097.56%)
Mutual labels:  yolo, resnet
Python Zero To Hero Beginners Course
Materials for a Python Beginner's Course. First given at the Royal Society of Biology. Designed and delivered by Chas Nelson and Mikolaj Kundegorski.
Stars: ✭ 22 (-73.17%)
Mutual labels:  jupyter-notebook, training
Amazon Sagemaker Examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Stars: ✭ 6,346 (+7639.02%)
Mutual labels:  jupyter-notebook, training
Imagenet
Trial on kaggle imagenet object localization by yolo v3 in google cloud
Stars: ✭ 56 (-31.71%)
Mutual labels:  jupyter-notebook, yolo
Machine Learning
머신러닝 입문자 혹은 스터디를 준비하시는 분들에게 도움이 되고자 만든 repository입니다. (This repository is intented for helping whom are interested in machine learning study)
Stars: ✭ 705 (+759.76%)
Mutual labels:  jupyter-notebook, udacity
Particle Filter Prototype
Particle Filter Implementations in Python and C++, with lecture notes and visualizations
Stars: ✭ 29 (-64.63%)
Mutual labels:  jupyter-notebook, udacity
Udacity Deep Learning
Udacity Deep Learning MOOC assignments
Stars: ✭ 463 (+464.63%)
Mutual labels:  jupyter-notebook, udacity
Lung Diseases Classifier
Diseases Detection from NIH Chest X-ray data
Stars: ✭ 52 (-36.59%)
Mutual labels:  jupyter-notebook, udacity
Cvnd Udacity
Computer Vision Nanodegree program from Udacity
Stars: ✭ 52 (-36.59%)
Mutual labels:  jupyter-notebook, udacity
Dogs vs cats
猫狗大战
Stars: ✭ 570 (+595.12%)
Mutual labels:  jupyter-notebook, resnet
Advanced Lane Detection
An advanced lane-finding algorithm using distortion correction, image rectification, color transforms, and gradient thresholding.
Stars: ✭ 71 (-13.41%)
Mutual labels:  jupyter-notebook, udacity
Video Classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Stars: ✭ 543 (+562.2%)
Mutual labels:  jupyter-notebook, resnet
Keras Idiomatic Programmer
Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework
Stars: ✭ 720 (+778.05%)
Mutual labels:  jupyter-notebook, resnet
Tf Tutorials
A collection of deep learning tutorials using Tensorflow and Python
Stars: ✭ 524 (+539.02%)
Mutual labels:  jupyter-notebook, resnet
Resnet
Tensorflow ResNet implementation on cifar10
Stars: ✭ 10 (-87.8%)
Mutual labels:  jupyter-notebook, resnet
Trainyourownyolo
Train a state-of-the-art yolov3 object detector from scratch!
Stars: ✭ 399 (+386.59%)
Mutual labels:  jupyter-notebook, yolo
Ssd Tensorflow
Single Shot MultiBox Detector in TensorFlow
Stars: ✭ 4,066 (+4858.54%)
Mutual labels:  jupyter-notebook, yolo
Minerva Training Materials
Learn advanced data science on real-life, curated problems
Stars: ✭ 37 (-54.88%)
Mutual labels:  jupyter-notebook, training
Vehicle Detection
Vehicle detection using machine learning and computer vision techniques for Udacity's Self-Driving Car Engineer Nanodegree.
Stars: ✭ 1,093 (+1232.93%)
Mutual labels:  jupyter-notebook, udacity

Implementing YOLO using ResNet as feature extractor

Trained on Udacity's car dataset (no pre-trained classifier weights)

img

Abstract

In this project I have used a pre-trained ResNet50 network, removed its classifier layers so it becomes a feature extractor and add the YOLO classifier layer instead (randomly initialized). I then train the network on Udacity's crowdAI dataset to detect cars in image frames.

This project was made only as a means to learn more about deep learning, training networks, transfer learning and implementing an actual paper (my first!).

The Dataset

Udacity has made available an annotated car (and a few other objects) dataset here:https://github.com/udacity/self-driving-car/tree/master/annotations I've used the 'Dataset1', annotated by CrowdAI for this project. Please take a look at the 'Dataset Exploration.ipynb' jupyter notebook where I've explored the dataset.

In summary, the dataset identifies 3 classes: Car, Truck and Pedestrian and also lists bounding box coordinates for each of the objects in datapoint (image), in a CSV file. The dataset is uneven across different classes. It can additionally be noted that a certain view (rear) of cars dominates the rest (side and front). The lighting condition is constant throughout the capture and hence we will need data augumentation to help the network generalize better.

If you'd like to play with the training bit, download the dataset and extract it to 'udacity-object-detection-crowdai' in the root of the project folder.

The Model

The YOLO network has two components as do most networks:

  • A feature extractor
  • A classifier

The paper's author explains that they used GoogLeNet (inception) inspired architecture for their feature extractor, that was trained on PASCAL VOC dataset prior to making it part of the object detection network. We can skip this step and use a pre-trained network, that performs well on classification tasks. I've chosen ResNet for this purpose.

I then add two dense/fully connected layers to the feature extractor's output that has random weight initialization and produces an output with the desired dimensions.

architecture

YOLO

There have been many articles and videos describing this approach originally presented in the paper: https://arxiv.org/abs/1506.02640

A resource I have found useful was the demo by author of the paper itself: https://youtu.be/NM6lrxy0bxs

This README won't go into how YOLO itself works but instead focuses on how to prepare/explore dataset and train the network on a specific dataset.

ResNet

ResNet (https://arxiv.org/abs/1512.03385) has won several competitions and its architecture allows for better learning in deeper networks. I've used the Keras implementation with weights of ResNet50 from here https://github.com/fchollet/deep-learning-models.git and modified the code to have the YOLO classifier at the end.

Training and Data Augumentation

The key part of this implementation was training the network, as it required defining the custom loss function in Tensorflow and image and frame data manipulation for better generalization.

Grids and bounding boxes

The object detection approach in YOLO requires us to divide the image into gridcell(S) and that each grid cell will be responsible for the detection and prediction(C) of bounding boxes (B). I've chosen to use a 11x11 grid over the images and 2 bounding box predictions per grid cell, to keep sufficient resolution and at the same time have a smaller output prediction to train for.

Since we have 3 classes, the output we will need is: S*S (C + B(5)) = 121(3+2(5)) = 1573

tensor

Having a 11x11 grid does however put some detections within the same grid cell and for the sake of simplicity (and computation power), I've only considered one detection in such cases.

bboxes grid

The network is trained on 224x224x3 images and so our dataset images are resized with their corresponding label coordinates adjusted as well.

The Loss function

Keras and TF have the standard loss defintions however, the YOLO paper uses a custom objective function that is fine tuned to improve stability (penalize loss from grid cells that do not have an object) and weigh dimension error in smaller boxes more than that in larger boxes:

loss function

I've used Tensorflow's 'while_loop' to create the graph that calculates loss per each batch. All operations in the my loss function (see loop_body() in model_continue_train.py) are tensorflow operations, hence these will all be run only when the graph is computed, taking advantage of any hardware optimization.

Data augumentation

As mentioned in the paper, I've also randomly scaled, translated and adjusted the saturation values of the data point while generating a batch for training and validation:

    #translate_factor
    tr = np.random.random() * 0.2 + 0.01
    tr_y = np.random.randint(rows*-tr, rows*tr)
    tr_x = np.random.randint(cols*-tr, cols*tr)
        r = np.random.rand()

    if r < 0.3:
        #translate image
        M = np.float32([[1,0,tr_x], [0,1,tr_y]])
        img = cv2.warpAffine(img, M, (cols,rows))
        frame = coord_translate(frame, tr_x, tr_y)
    elif r < 0.6:
        #scale image keeping the same size
        placeholder = np.zeros_like(img)
        meta = cv2.resize(img, (0,0), fx=sc, fy=sc)
        if sc < 1:
            placeholder[:meta.shape[0], :meta.shape[1]] = meta
        else:
            placeholder = meta[:placeholder.shape[0], :placeholder.shape[1]]
        img = placeholder
        frame = coord_scale(frame, sc)

Learning Rate and Epochs

As described in the paper, I started to train with 1e-3 learning rate, then 1e-2 followed by 1e-3, 1e-4, 1e-5. All along saving model checkpoints using Keras' callback feature.

Amazon AWS GPU Instance

Training of this magnitude definitely needed some beefed up hardware and since I'm a console guy (PS4), I resorted to the EC instances Amazon provides (https://aws.amazon.com/ec2/instance-types/). Udacity's Amazon credits came in handy!

At first, I tried the g2.xlarge instance that Udacity's project on Traffic sign classifier had suggested (did that on my laptop back then) but the memory or the compute capability was nowhere near sufficient, since TF apparently drops to CPU and RAM after detecting that there isn't sufficient capacity on the GPU.

In the end, p2.xlarge EC2 instance were what I trained my network on. There was ~10GB GPU memory utilization and ~92% GPU at peak. My network trained pretty well on this setup.

NOTE: I faced a lot of issues when getting setup on the remote instance due to issues with certain libraries being out of date and anaconda not having those updates. Luckily Amazon released its latest (v6 at time) deep learning Ubuntu AMI which worked just fine out of the box. So if you are using EC2, make sure to test sample code and library imports in python first to make sure the platform is ready for your code.

Testing

Check out the 'Vehicle Detection.ipynb' notebook to see the network in use. It performs well on the dataset and also the sample highway video that the network has never seen before.

If you're interested to try it out yourself you can follow the notebook ('Vehicle Detection.ipynb') or use the predict.py as follows:

python predict.py <path_to_image>

Sample output images: from dataset

Prediction on images the network has never seen: new image new image

Conclusion

It worked! Alhamdulillah. The network trained from scratch and was able to detect cars in a video that it had never seen before. It has some problems with far away objects and also the detections are not very smooth across frames.

It will be worth trying to remove some layers from ResNet and see if the network performs any faster.

I'd love to hear your suggestions on improving the project and also if you have any questions on this.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].