All Projects → justadudewhohacks → Tfjs Tiny Yolov2

justadudewhohacks / Tfjs Tiny Yolov2

Licence: mit
Tiny YOLO v2 object detection with tensorflow.js.

Programming Languages

javascript
184084 projects - #8 most used programming language
typescript
32286 projects

Projects that are alternatives of or similar to Tfjs Tiny Yolov2

Opentpod
Open Toolkit for Painless Object Detection
Stars: ✭ 106 (-7.83%)
Mutual labels:  object-detection
Yolov3 tensorflow
Complete YOLO v3 TensorFlow implementation. Support training on your own dataset.
Stars: ✭ 1,498 (+1202.61%)
Mutual labels:  object-detection
Pytorch cpp
Deep Learning sample programs using PyTorch in C++
Stars: ✭ 114 (-0.87%)
Mutual labels:  object-detection
Ssd Pytorch
SSD: Single Shot MultiBox Detector pytorch implementation focusing on simplicity
Stars: ✭ 107 (-6.96%)
Mutual labels:  object-detection
Deep learning object detection
A paper list of object detection using deep learning.
Stars: ✭ 10,334 (+8886.09%)
Mutual labels:  object-detection
Tensorflow Yolov4 Tflite
YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
Stars: ✭ 1,881 (+1535.65%)
Mutual labels:  object-detection
Yolov3 Model Pruning
在 oxford hand 数据集上对 YOLOv3 做模型剪枝(network slimming)
Stars: ✭ 1,386 (+1105.22%)
Mutual labels:  object-detection
Colab Mask Rcnn
How to run Object Detection and Segmentation on a Video Fast for Free
Stars: ✭ 114 (-0.87%)
Mutual labels:  object-detection
Links Detector
📖 👆🏻 Links Detector makes printed links clickable via your smartphone camera. No need to type a link in, just scan and click on it.
Stars: ✭ 106 (-7.83%)
Mutual labels:  object-detection
Tensorflow Object Detection Tutorial
The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
Stars: ✭ 113 (-1.74%)
Mutual labels:  object-detection
Sod
An Embedded Computer Vision & Machine Learning Library (CPU Optimized & IoT Capable)
Stars: ✭ 1,460 (+1169.57%)
Mutual labels:  object-detection
Yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 19,914 (+17216.52%)
Mutual labels:  object-detection
Feature intertwiner
Codebase of the paper "Feature Intertwiner for Object Detection", ICLR 2019
Stars: ✭ 111 (-3.48%)
Mutual labels:  object-detection
Refinedet
Single-Shot Refinement Neural Network for Object Detection, CVPR, 2018
Stars: ✭ 1,430 (+1143.48%)
Mutual labels:  object-detection
Kerasobjectdetector
Keras Object Detection API with YOLK project 🍳
Stars: ✭ 113 (-1.74%)
Mutual labels:  object-detection
Tensorflow2.0 Examples
🙄 Difficult algorithm, Simple code.
Stars: ✭ 1,397 (+1114.78%)
Mutual labels:  object-detection
Bag Of Visual Words Python
Implementing Bag of Visual words approach for object classification and detection
Stars: ✭ 109 (-5.22%)
Mutual labels:  object-detection
Yolo mark
GUI for marking bounded boxes of objects in images for training neural network Yolo v3 and v2
Stars: ✭ 1,624 (+1312.17%)
Mutual labels:  object-detection
Id Card Detector
💳 Detecting the National Identification Cards with Deep Learning (Faster R-CNN)
Stars: ✭ 114 (-0.87%)
Mutual labels:  object-detection
Mobilenet Yolo
MobileNetV2-YoloV3-Nano: 0.5BFlops 3MB HUAWEI P40: 6ms/img, YoloFace-500k:0.1Bflops 420KB🔥🔥🔥
Stars: ✭ 1,566 (+1261.74%)
Mutual labels:  object-detection

tfjs-tiny-yolov2

Build Status

JavaScript object detection in the browser based on a tensorflow.js implementation of tiny yolov2.

Table of Contents:

Pre Trained Models

The VOC and COCO models correspond to the quantized weights from the official darknet repo. The face detector uses depthwise separable convolutions instead of regular convolutions allowing for much faster prediction and a tiny model size, which is well suited for object detection on mobile devices as well. I trained the face detection model from scratch. Have a look at the Training your own Object Detector section if you want to train such a model for your own dataset!

Pascal VOC

voc1 voc2

COCO

coco1 coco2

Face Detection

The face detection model is one of the models available in face-api.js.

face

Running the Examples

cd examples
npm i
npm start

Browse to http://localhost:3000/.

Usage

Get the latest build from dist/tiny-yolov2.js or dist/tiny-yolov2.min.js and include the script:

<script src="tiny-yolov2.js"></script>

Simply load the model:

const config = // yolo config
const net = new yolo.TinyYolov2(config)
await net.load(`voc_model-weights_manifest.json`)

The config file of the VOC model looks as follows:

{
  // the pre trained VOC model uses regular convolutions
  "withSeparableConvs": false,
  // iou threshold for nonMaxSuppression
  "iouThreshold": 0.4,
  // anchor box dimensions, relative to cell size (32px)
  "anchors": [
    { "x": 1.08, "y": 1.19 },
    { "x": 3.42, "y": 4.41 },
    { "x": 6.63, "y": 11.38 },
    { "x": 9.42, "y": 5.11 },
    { "x": 16.62, "y": 10.52 }
  ],
  // class labels in correct order
  "classes": [
    "aeroplane", "bicycle", "bird", "boat", "bottle",
    "bus", "car", "cat", "chair", "cow",
    "diningtable", "dog", "horse", "motorbike", "person",
    "pottedplant", "sheep", "sofa", "train", "tvmonitor"
  ]
}

Inference and drawing the results:

const forwardParams = {
  inputSize: 416,
  scoreThreshold: 0.8
}

const detections = await net.detect('myInputImage', forwardParams)
yolo.drawDetection('myCanvas', detections)

Also check out the examples.

Training your own Object Detector

If you want to train your own object detector, I would suggest training a model using separable convolutions, as it will allow for much faster inference times and the training process will converge much faster, as there are significantly less parameters to train.

Training a multiclass detector will take quite some time, depending on how much classes you are training your object detector on. However, training a single class detector it is possible to get already pretty good results after training for only a few epochs.

Defining your Model Config

{
  // use separable convolutions over regular convolutions
  "withSeparableConvs": true,
  // iou threshold for nonMaxSuppression
  "iouThreshold": 0.4,
  // instructions for how to determine anchors is given below
  "anchors": [...],
  // whatever kind of objects you are training your object detector on
  "classes": ["cat"],
  // optionally you can compute the mean RGB value for your dataset and
  // pass it in the config for performing mean value subtraction on your
  // input images
  "meanRgb": [...],
  // scale factors for each loss term (only required for training),
  // explained below
  "objectScale": 5,
  "noObjectScale": 1,
  "coordScale": 1,
  "classScale": 1
}

Labeling your Data with Ground Truth Boxes

For each image in your training set, you should create a corresponding json file, containing the bounding boxes and class labels of each of the instance of objects located in that image. The bounding box dimensions should be relative to the image dimensions.

Consider an image with a width and height of 400px, showing a single cat, which is spanned by the bounding box at x = 50px, y = 100px (upper left corner) with a box size of width = 200px and height = 100px. The corresponding json file should look as follows (note, it is an array of all bounding boxes for that image):

[
  {
    "x": 0.125,
    "y": 0.25,
    "width": 0.5,
    "height": 0.25,
    "label": "cat"
  }
]

Computing Box Anchors

Before training your detector, you want to compute 5 anchor boxes over your training set. An anchor box is basically an object of shape { "x": boxWidth / 32, "y": boxHeight / 32 } where x and y are the anchor box sizes relative to the grid cell size (32px).

To determine the 5 anchor boxes, you want to simply perform kmeans clustering with 5 clusters over the width and height of each ground truth box of your training set. There should be plenty of options out there, which you can use for kmeans clustering, but I will provide a script for that, coming soon...

Yolo Loss Function

The Yolo loss function computes the sum of the coordinate, object, class and no object loss. You can tune the weight of each loss term contributing to the totoal loss by adjusting the corresponding scale parameters in your config file, as mentioned above.

The no object loss term penalizes the scores of the bounding box of all the box anchors in the grid, which do not have a corresponding ground truth bounding box. In other words, they should optimally predict a score of 0, if there is no object of interest at that position.

On the other hand, the object, class and coordinate loss terms refer to the accuracy of the prediction at each anchor position where there is a ground truth bounding box. The coordinate loss simply penalizes the difference between predicted bounding box coordinates and ground truth box coordinates, the object loss penalizes the difference of the predicted confidence score to the box IOU.

The class loss penalizes the confidence score of the predicted score. Note, that training a single class object detector you can simply ignore that parameter, as the class loss is always 0 in that case.

PS: You can simply go with the default values in the above shown config example.

Initializing the Model Weights

Training a model from scratch, you need some weights to begin with. Simply open initWeights.html located in the /train folder of the repo in your browser. Enter the number of classes, hit save and use the saved file as the initial checkpoint weight file.

Start Training

For a complete example, also check out the /train folder at the root of this repo, which also contains some tooling to save intermediary checkpoints of your model weights as well as statistics of the average loss after each epoch.

Set up the model for training:

const config = // your config

// simply use any of the optimizer provided by tfjs (I usually use adam)
const learningRate = 0.001
const optimizer = tf.train.adam(learningRate, 0.9, 0.999, 1e-8)

// initialize a trainable TinyYolov2
const net = new yolo.TinyYolov2Trainable(config, optimizer)

// load initial weights or the weights of any checkpoint
const checkpointUri = 'checkpoints/initial_glorot_1_classes.weights'
const weights = new Float32Array(await (await fetch(checkpointUri)).arrayBuffer())
await net.load(weights)

What I usually do is naming the json files the same as the corresponding image, e.g. img1.jpg and img1.json and provide an endpoint to retrieve the json file names as an array:

const boxJsonUris = (await fetch('/boxJsonUris')).json()

Furthermore you can choose to train your model on a fixed input size or you can perform multi scale training, which is a good way to improve the accuracy of your model at different scales. This can also be helpful to augment your data, in case you only have a limited number of training samples:

// should be multiples of 32 (grid cell size)
const trainingSizes = [160, 224, 320, 416]

Then we can actually train it:

for (let epoch = startEpoch; epoch < maxEpoch; epoch++) {

  // always shuffle your inputs for each epoch
  const shuffledInputs = yolo.shuffleArray(boxJsonUris)

  // loop through shuffled inputs
  for (let dataIdx = 0; dataIdx < shuffledInputs.length; dataIdx++) {

    // fetch image and corresponding ground truth bounding boxes
    const boxJsonUri = shuffledInputs[dataIdx]
    const imgUri = boxJsonUri.replace('.json', '.jpg')

    const groundTruth = await (await fetch(boxJsonUri)).json()
    const img = await yolo.bufferToImage(await (await fetch(imgUri)).blob())

    // rescale and backward pass input image for each input size
    for (let sizeIdx = 0; sizeIdx < trainSizes.length; sizeIdx++) {

      const inputSize = trainSizes[sizeIdx]

      const backwardOptions = {
        // filter boxes with width < 32 or height < 32
        minBoxSize: 32,
        // log computed losses
        reportLosses: function({ losses, numBoxes, inputSize }) {
          console.log(`ground truth boxes: ${numBoxes} (${inputSize})`)
          console.log(`noObjectLoss[${dataIdx}]: ${yolo.round(losses.noObjectLoss, 4)}`)
          console.log(`objectLoss[${dataIdx}]: ${yolo.round(losses.objectLoss, 4)}`)
          console.log(`coordLoss[${dataIdx}]: ${yolo.round(losses.coordLoss, 4)}`)
          console.log(`classLoss[${dataIdx}]: ${yolo.round(losses.classLoss, 4)}`)
          console.log(`totalLoss[${dataIdx}]: ${yolo.round(losses.totalLoss, 4)}`)
        }
      }

      const loss = await net.backward(img, groundTruth, inputSize, backwardOptions)

      if (loss) {
        // don't forget to free the loss tensor
        loss.dispose()
      } else {
        console.log('no boxes remaining after filtering')
      }

    }
  }
}

Overfit first!

Generally it's a good idea, to overfit on a small subset of your training data, to verify, that the loss is converging and that your detector is actually learning something. Therefore, you can simply train your detector on 10 - 20 images of your training data for some epochs. Once the loss converges, save the model, run inference on these 10 - 20 images to view the predicted bounding boxes and compare them to the ground truth boxes.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].