All Projects → paul-pias → Object Detection And Distance Measurement

paul-pias / Object Detection And Distance Measurement

Using yolov3 & yolov4 weights objects are being detected from live video frame along with the measurement of the object from the camera without the support of any extra hardware device.

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Object Detection And Distance Measurement

Yolov3
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 8,159 (+6994.78%)
Mutual labels:  yolov3
Yolox
More Than YOLO(v3, v4, v3-tiny, v4-tiny)
Stars: ✭ 83 (-27.83%)
Mutual labels:  yolov3
Yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 19,914 (+17216.52%)
Mutual labels:  yolov3
Ros yolo as template matching
Run 3 scripts to (1) Synthesize images (by putting few template images onto backgrounds), (2) Train YOLOv3, and (3) Detect objects for: one image, images, video, webcam, or ROS topic.
Stars: ✭ 32 (-72.17%)
Mutual labels:  yolov3
Facedetector Base Yolov3 Spp
Stars: ✭ 65 (-43.48%)
Mutual labels:  yolov3
Yolov3 Object Detection Tutorial
Stars: ✭ 95 (-17.39%)
Mutual labels:  yolov3
Darknet ros
校内赛无人机追踪气球
Stars: ✭ 27 (-76.52%)
Mutual labels:  yolov3
Mobilenet Yolo
MobileNetV2-YoloV3-Nano: 0.5BFlops 3MB HUAWEI P40: 6ms/img, YoloFace-500k:0.1Bflops 420KB🔥🔥🔥
Stars: ✭ 1,566 (+1261.74%)
Mutual labels:  yolov3
Pytorch Onnx Tensorrt
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Stars: ✭ 66 (-42.61%)
Mutual labels:  yolov3
Tensorflow2.0 Examples
🙄 Difficult algorithm, Simple code.
Stars: ✭ 1,397 (+1114.78%)
Mutual labels:  yolov3
Tensornets
High level network definitions with pre-trained weights in TensorFlow
Stars: ✭ 982 (+753.91%)
Mutual labels:  yolov3
Imagenet
Trial on kaggle imagenet object localization by yolo v3 in google cloud
Stars: ✭ 56 (-51.3%)
Mutual labels:  yolov3
Person remover
People removal in images using Pix2Pix and YOLO.
Stars: ✭ 96 (-16.52%)
Mutual labels:  yolov3
Yolo Vehicle Counter
This project aims to count every vehicle (motorcycle, bus, car, cycle, truck, train) detected in the input video using YOLOv3 object-detection algorithm.
Stars: ✭ 28 (-75.65%)
Mutual labels:  yolov3
Yolov3 tensorflow
Complete YOLO v3 TensorFlow implementation. Support training on your own dataset.
Stars: ✭ 1,498 (+1202.61%)
Mutual labels:  yolov3
Tensorflow Yolo V3
Implementation of YOLO v3 object detector in Tensorflow (TF-Slim)
Stars: ✭ 862 (+649.57%)
Mutual labels:  yolov3
License Plate Detection
This project using yolo3 to detection license plate in street
Stars: ✭ 93 (-19.13%)
Mutual labels:  yolov3
Yolov3 On Android
Build an Android App for deploying YOLO V3 source code on mobile phone directly.
Stars: ✭ 113 (-1.74%)
Mutual labels:  yolov3
Tensorflow Yolov4 Tflite
YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
Stars: ✭ 1,881 (+1535.65%)
Mutual labels:  yolov3
Yolov3 Model Pruning
在 oxford hand 数据集上对 YOLOv3 做模型剪枝(network slimming)
Stars: ✭ 1,386 (+1105.22%)
Mutual labels:  yolov3

Object Detection and Distance Measurement

N|Solid

Introduction

This repo contains object_detection.py which is able to perform the following task -

  • Object detection from live video frame, in any video file or in a image
  • Counting the number of objects in a frame
  • Measuring the distance of object using depth information
  • Inferece on Multiple Camera feed at a time

For object detection YOLO-V3 has been used which is able to detect 80 different objects. Some of those are-

  • person
  • car
  • bus
  • stop sign
  • bench
  • dog
  • bear
  • backpack and so on.

User Instruction

Update

There is a new update with yolov4 new release. All you have to do a simple step which is after downloading the project run the following command and follow the rest of the process as it is.

  cd YOLOv4

You can also use Yolact++ as a object detector using this repo.

To execute object_dection.py you require Python version > 3.5 (depends if you are using gpu or not) and have to install the following libraries.

Instalation

    $ pip install -r requirements.txt
         or
    $ pip install opencv-python
    $ pip install numpy
    $ pip install pandas
    $ pip install matplotlib
    $ pip install Pillow
    $ pip install imutils

For the installation of torch using "pip"

    $ pip3 install torch===1.2.0 torchvision===0.4.0 -f https://download.pytorch.org/whl/torch_stable.html

or please follow the instructions from Pytorch

For installing the "win32com.client" which is Text-to-Speech module for windows you have follow this

First open the cmd as an administrator, then run

   $ python -m pip install pywin32
   #After installing open your python shell and run
      import win32com.client
      speaker = win32com.client.Dispatch("SAPI.SpVoice")
      speaker.Speak("Good Morning")

You need to clone the repository using gitbash (if gitbash is already installed) or you can download the zip file.

    $ git clone https://github.com/paul-pias/Object-Detection-and-Distance-Measurement.git

After unzipping the project, there are two ways to run this. If want to see your output in your browser execute the "app.py" script or else run "object_detection.py" to execute it locally.

If you want to run object detection and distance measurement on a video file just write the name of the video file to variable id in either "app.py" or "object_detection.py" or if you want to run it on your webcam just put 0 in id.

However, if you want to run the infeence on a feed of IP Camera , use the following convention while assigning it to the variable "id"

    "rtsp://assigned_name_of_the_camera:[email protected]_ip/"

You can check the performance on differet weights of YOLO which I have added on google drive and also available in YOLO

For multiple camera support you need to add few codes as follows in app.py-

   def simulate(camera):
       while True:
           frame = camera.main()
           if frame != "":
               yield (b'--frame\r\n'
                   b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n\r\n')

   @app.route('/video_simulate')
   def video_simulate():
       id = 0
       return Response(gen(ObjectDetection(id)), mimetype='multipart/x-mixed-replace; boundary=frame')

Depending on how many feed you need, you have to add the two methods in "app.py" with different names and add a section in index.html.

<div class="column is-narrow">
        <div class="box" style="width: 500px;">
            <p class="title is-5">Camera - 01</p>
            <hr>
            <img id="bg" width=640px height=360px src="{{ url_for('video_simulate') }}">
            <hr>

        </div>
    </div>
    <hr>

Note:

You have to use git-lfs to download the yolov3.weight file. However you can also download it from here YOLOv3 @ Google-Drive || YOLOv4 @ Google-Drive


Theory

In a traditional image classification approach for object detection there are two well-known strategies.

For single object in a image there are two scenarios.

  • Classification
  • Localization

For multiple objects in a image there are two scenarios.

  • Object detection and localization
  • Object segmentation

For Single Objects

For Multiple Objects

Distance Measurement


Traditionally we measure distance of any object using Ultrasonic sensors such as HC-sr04 or any other high frquency devices which generate sound waves to calculates the distance it traverse. However, when you are working with a embedded device to make a compact design which has functionalities such as
  • Object detection (with camera) and
  • Distance measurement

you don't always want to make your device heavier by adding unnnecessary hardware modules. To avoid such cases you can follow a more convinent and feasible apporoach. As you have already integrated a camera for object detection, you can use the depth information that camera uses to draw the bounding boxes for localizing objects to calculate the distance of that object from the camera.

How the object detection works?

From the initial part we understood that, to measure distance from an image we to localize it first to get the depth information. Now, how actually localization works?

Localize objects with regression

Regression is about returning a number istead of a class. The number can be represented as (x0,y0,width,height) which are related to a bounding box. In the images illustrated above for single object if you want to only classify the object type then we don't need to draw the bounding box around that object that's why this part is known as Classification . However, if we are interested to know where does this object locates in the image then we need to know that 4 numbers that a regreesion layer will return. As you can see there is a black rectangle shape box in the image of white dog which was drawn using the regression layer. What happens here is that after the final convolutional layer + Fully connected layers instead of asking for class scores to compare with some offsets a regression layer is introduced. Regression layer is nothing but some rectangular box which represents individual objects. For every frame/image to detect objects the following things happens.

  • Using the inference on any pre-trained imagenet model the last fully connected layer will need to be re-trained to the desired objects.
  • After that all the proposals (=~2000proposal/image) will be resized to maatch the inputs of the cnn.
  • A SVM is need to be trained to classify between object and background (One binary SVM(Support Vector Machine) for each class)
  • And to put the bounding box perfectly over the image a linear regression classifier is needed to be trained which will output some correction factor. Problem with this approch is that one part of the network is dedicated for region proposals. After the full connected layers the model tries to propose certain regions on that image which may contain object/objects. So it also requires a high qulaity classifier to filter out valid proposals which will definitely contains object/objects. Although these methos is very accurate but it comes with a big computational cost (low frame-rate) and that's why it is not suitable for embedded devices such as Arduino or Raspberry Pi which has less processing power.

Localizing with Convolution neural networks

Another way of doing object detection and to reduce this tedious work is by combining the previous two task into one network. Here, instead of proposing regions for every images the model is fed with a set of pre-defined boxes to look for objects. So prior to the training phase of a neural network some pre-defined rectangular boxes that represents some objects are given to the network to train with. So when a image is gone through the network, after the fully connected layer the trained model tries to match predefined boxes to objects on that image by using non-maxima suppression algorithm to completely tied. If the comparison crosses some threshold the model tries to draw the bounding box over the object. For example, in the case of the picture of white dog, the model knows what is the coordinates of the box of the dog object and when the image classification is done the model uses L2 distance to calculate the loss between the actual box coordinates that was predefined and the coordinate that the model gave so that it can perfectly draw the bounding box over the object on that image.

The main idea is to using the convolutional feature maps from the later layers of a network to run small CONV filters over these feature maps to predict class scores and bounding box offsets. Here, we are reusing the computation that is already made during classification to localize objects is to grab the activation from the final conv layers. At this point we still have the spatial infomation of an image that model start training with but represented in a much smaller scope. So, in the final layers each "pixel" represent a larger area of the input image so we can use those cells to infer object position. Here the tensor that contains the information of the original image is quite deep as it is now squeezed to a lower dimension. At this point a 1x1 CONV layer can be used to classify each cell as a class and also from the same layer we can add another CPNV or FC(Fully Connected) layer to predict 4 numbers( Bounding Box). In this way we get both class scores and location from one. This approach is known as Single Shot Detection . Overall strategy in this approach can be summarised as follows:-

  • Train a CNN with regression(bounding box) and classification objective.
  • Gather Activation from a particular layer or layers to infer classification and location with FC layer or another CONV layer that works like a FC layer.
  • During prediction use algorithms like non-maxima suppression to filter multiple boxes around same object.
  • During training time use algorithms like IoU to relate the predictions during training the the ground truth.

Yolo follows the strategy of Single Shot Detection. It uses a single activation map for prediction of classes and bounding boxes at a time that's why it called "You Only Look Once".

Here pre-trained of yolo-v3 has used which can detect 80 different objects. Although this model is faster but it doesn't give the reliability of predicting the actual object in a given frame/image. It's a kind of trade-off between accuracy and precision.

How the distance measurement works?

This formula is used for determing the distance

    distancei = (2 x 3.14 x 180) ÷ (w + h x 360) x 1000 + 3

For measuring distance, atfirst we have to understand how a camera sees a object.

You can relate this image the white dog picture where the dog was localized. Again we will get 4 numbers in the bounding box which is (x0,y0,width,height). Here x0,y0 is used to tiled or adjust the bounding box. Width and Height these two variable are used in the formula of measuring the object and actually describing the detail of the detected object/objects. Width and Height will vary depending on the distance of the object from the camera.

As we know an image goes refracted when it goes through a lens because the ray of light can also enter the lens whereas in the case of mirror the light can reflected that's why we get exact reflection of the image. But in the case of lens image gets little stretched. The following image illustrates how the image and the corresponding angles looks when it enters through a lens.

If we see there are three variable named:
  • do (Distance of object from the lens)
  • di (Distance of the refracted image from the convex lens)
  • f (focal length or focal distance)

So the green line "do" represents the actual distance of the object from the convex length. And "di" gives a sense of how the actual image looks like. Now if we consider a triangle in the left side of the image(new refracted image) with base "do" and draw a opposite triangle similar to the left side one. So the new base of the opposite triangle will also be do with the same perpendicular distance. Now if we compare the two triangles from right side we will see "do" and "di" is parallel and the angle that create on each side of both the triangle are opposite to each other. From which we can infer that, both the triangles on the right side is also similar. Now, as they are similar, ratio of the corresponding sides will be also similar. So do/di = A/B. Again if we compare between two triangles in right side of the image where opposite angles are equal and one angle of both the triangles are right angle (90°) (dark blue area). So A:B is both hypotenuse of the similar triangle where both triangle has a right angle. So the new equation can be defined as :

Now, if we derive from that equation we will find:-

And eventually will come to at

Where f is focal length or also called the arc length by using the following formula

we will get our final result in "inchs" from this formula of distance.
    distancei = (2 x 3.14 x 180) ÷ (w + h x 360) x 1000 + 3
  • Notes - As mentioned earlier YOLO prefers performance over accuracy that's why the model predicts wrong objects frquently.

If anyone using this code for any kind of publications, kindly cite this work.

M. A. Khan, P. Paul, M. Rashid, M. Hossain and M. A. R. Ahad, "An AI-Based Visual Aid With Integrated Reading Assistant for the Completely Blind," in IEEE Transactions on Human-Machine Systems. doi: 10.1109/THMS.2020.3027534

Reference

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].