All Projects → noxouille → rt-mrcnn

noxouille / rt-mrcnn

Licence: other
Real time instance segmentation with Mask R-CNN, live from webcam feed.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to rt-mrcnn

ee.Yrewind
Can rewind and save YouTube live stream
Stars: ✭ 133 (+182.98%)
Mutual labels:  livestream, live
image-segmentation
Mask R-CNN, FPN, LinkNet, PSPNet and UNet with multiple backbone architectures support readily available
Stars: ✭ 62 (+31.91%)
Mutual labels:  instance-segmentation, mask-rcnn
Super Bpd
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation (CVPR 2020)
Stars: ✭ 153 (+225.53%)
Mutual labels:  real-time, image-segmentation
Yolact edge
The first competitive instance segmentation approach that runs on small edge devices at real-time speeds.
Stars: ✭ 697 (+1382.98%)
Mutual labels:  real-time, instance-segmentation
LiveHiddenCamera
Live Hidden Camera is a library which record live video and audio from Android device without displaying a preview.
Stars: ✭ 69 (+46.81%)
Mutual labels:  livestream, live
Jeelizweboji
JavaScript/WebGL real-time face tracking and expression detection library. Build your own emoticons animated in real time in the browser! SVG and THREE.js integration demos are provided.
Stars: ✭ 835 (+1676.6%)
Mutual labels:  real-time, webcam
instance-segmentation
No description or website provided.
Stars: ✭ 40 (-14.89%)
Mutual labels:  instance-segmentation, mask-rcnn
jeelizPupillometry
Real-time pupillometry in the web browser using a 4K webcam video feed processed by this WebGL/Javascript library. 2 demo experiments are included.
Stars: ✭ 78 (+65.96%)
Mutual labels:  real-time, webcam
DeepFaceLive
Real-time face swap for PC streaming or video calls
Stars: ✭ 7,917 (+16744.68%)
Mutual labels:  real-time, webcam
celldetection
Cell Detection with PyTorch.
Stars: ✭ 44 (-6.38%)
Mutual labels:  instance-segmentation, mask-rcnn
Centermask2
Real-time Anchor-Free Instance Segmentation, in CVPR 2020
Stars: ✭ 596 (+1168.09%)
Mutual labels:  real-time, instance-segmentation
jeelizGlanceTracker
JavaScript/WebGL lib: detect if the user is looking at the screen or not from the webcam video feed. Lightweight and robust to all lighting conditions. Great for play/pause videos if the user is looking or not, or for person detection. Link to live demo.
Stars: ✭ 68 (+44.68%)
Mutual labels:  real-time, webcam
Yolact
A simple, fully convolutional model for real-time instance segmentation.
Stars: ✭ 4,057 (+8531.91%)
Mutual labels:  real-time, instance-segmentation
Realtime Detectron
Real-time Detectron using webcam.
Stars: ✭ 42 (-10.64%)
Mutual labels:  real-time, webcam
yolact
Tensorflow 2.x implementation YOLACT
Stars: ✭ 23 (-51.06%)
Mutual labels:  real-time, instance-segmentation
Mocapnet
We present MocapNET2, a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocular color images. Our contributions include: (a) A novel and compact 2D pose NSRM representation. (b) A human body orientation classifier and an ensemble of orientation-tuned neural networks that regress the 3D human pose by also allowing for the decomposition of the body to an upper and lower kinematic hierarchy. This permits the recovery of the human pose even in the case of significant occlusions. (c) An efficient Inverse Kinematics solver that refines the neural-network-based solution providing 3D human pose estimations that are consistent with the limb sizes of a target person (if known). All the above yield a 33% accuracy improvement on the Human 3.6 Million (H3.6M) dataset compared to the baseline method (MocapNET) while maintaining real-time performance (70 fps in CPU-only execution).
Stars: ✭ 194 (+312.77%)
Mutual labels:  real-time, webcam
Live Dl
Download live streams from YouTube
Stars: ✭ 82 (+74.47%)
Mutual labels:  livestream, live
VideoRecognition-realtime-autotrainer-alerts
State of the art object detection in real-time using YOLOV3 algorithm. Augmented with a process that allows easy training of the classifier as a plug & play solution . Provides alert if an item in an alert list is detected.
Stars: ✭ 36 (-23.4%)
Mutual labels:  real-time, webcam
Deeplab-pytorch
Deeplab for semantic segmentation tasks
Stars: ✭ 61 (+29.79%)
Mutual labels:  image-segmentation, object-segmentation
Nager.VideoStream
Get images from a network camera stream or webcam
Stars: ✭ 27 (-42.55%)
Mutual labels:  webcam, webcam-streaming

Live Mask R-CNN for Object Detection and Segmentation

UPDATE: Confirmed to be working fine on Ubuntu (Gnome3 & Budgie) 18.04.1, with CUDA 9.0, cuDNN v7.3.1.20, Keras 2.2.4 and TensorFlow 1.11.0

This work is based on Matterport repo. A run_webcam.py script has been added for live demo purposes, which focuses on runtime performance instead of model accuracy. This was tested using a single 1080Ti. With no screen recording, it achieved 9-10 fps on average. During the recording, the performance dropped to 8-9fps on average. Any constructive comments and suggestions are greatly appreciated. Pull requests are also welcome!

Demo

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

Table of Contents

The repository includes:

  • Source code of Mask R-CNN built on FPN and ResNet101.
  • Training code for MS COCO
  • Pre-trained weights for MS COCO
  • Jupyter notebooks to visualize the detection pipeline at every step
  • ParallelModel class for multi-GPU training
  • Evaluation on MS COCO metrics (AP)
  • Example of training on your own dataset

Installation

For quick setup for webcam demo, run step 1-3.

  1. Clone this repository and go into your conda environment

  2. Install dependencies in order

    cat requirements.txt | xargs -n 1 -L 1 pip install
  3. Run setup from the repository root directory

    python setup.py install
  4. Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.

  5. (Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).

Getting Started

  • demo.ipynb Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images. It includes code to run object detection and instance segmentation on arbitrary images.

  • train_shapes.ipynb shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.

  • (model.py, utils.py, config.py): These files contain the main Mask RCNN implementation.

  • inspect_data.ipynb. This notebook visualizes the different pre-processing steps to prepare the training data.

  • inspect_model.ipynb This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.

  • inspect_weights.ipynb This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.

Step by Step Detection

To help with debugging and understanding the model, there are 3 notebooks (inspect_data.ipynb, inspect_model.ipynb, inspect_weights.ipynb) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:

1. Anchor sorting and filtering

Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.

2. Bounding Box Refinement

This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.

3. Mask Generation

Examples of generated masks. These then get scaled and placed on the image in the right location.

4. Layer activations

Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).

5. Weight Histograms

Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.

6. Logging to TensorBoard

TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.

7. Composing the different pieces into a final result

Training on MS COCO

We're providing pre-trained weights for MS COCO to make it easier to start. You can use those weights as a starting point to train your own variation on the network. Training and evaluation code is in samples/coco/coco.py. You can import this module in Jupyter notebook (see the provided notebooks for examples) or you can run it directly from the command line as such:

# Train a new model starting from pre-trained COCO weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=coco

# Train a new model starting from ImageNet weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet

# Continue training a model that you had trained earlier
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5

# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=last

You can also run the COCO evaluation code with:

# Run COCO evaluation on the last trained model
python3 samples/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last

The training schedule, learning rate, and other parameters should be set in samples/coco/coco.py.

Training on Your Own Dataset

Start by reading this blog post about the balloon color splash sample. It covers the process starting from annotating images to training to using the results in a sample application.

In summary, to train the model on your own dataset you'll need to extend two classes:

Config This class contains the default configuration. Subclass it and modify the attributes you need to change.

Dataset This class provides a consistent way to work with any dataset. It allows you to use new datasets for training without having to change the code of the model. It also supports loading multiple datasets at the same time, which is useful if the objects you want to detect are not all available in one dataset.

See examples in samples/shapes/train_shapes.ipynb, samples/coco/coco.py, samples/balloon/balloon.py, and samples/nucleus/nucleus.py.

Differences from the Official Paper

This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.

  • Image Resizing: To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.

  • Bounding Boxes: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply image augmentations that would otherwise be harder to apply to bounding boxes, such as image rotation.

    To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset. We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, and only 0.01% differed by 10px or more.

  • Learning Rate: The paper uses a learning rate of 0.02, but we found that to be too high, and often causes the weights to explode, especially when using a small batch size. It might be related to differences between how Caffe and TensorFlow compute gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively. We found that smaller learning rates converge faster anyway so we go with that.

Citation

Use this bibtex to cite this repository:

@misc{matterport_maskrcnn_2017,
  title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},
  author={Waleed Abdulla},
  year={2017},
  publisher={Github},
  journal={GitHub repository},
  howpublished={\url{https://github.com/matterport/Mask_RCNN}},
}

Contributing

Contributions to this repository are welcome. Examples of things you can contribute:

  • Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.
  • Training on other datasets.
  • Accuracy Improvements.
  • Visualizations and examples.

You can also join our team and help us build even more projects like this one.

Requirements

Python 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in requirements.txt.

MS COCO Requirements:

To train or test on MS COCO, you'll also need:

If you use Docker, the code has been verified to work on this Docker container.

Projects Using this Model

If you extend this model to other datasets or build projects that use it, we'd love to hear from you.

4K Video Demo by Karol Majek.

Mask RCNN on 4K Video

Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.

Identify sport fields in satellite images

Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.

Balloon Color Splash

Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl

Code is in the samples/nucleus directory.

Nucleus Segmentation

Detection and Segmentation for Surgery Robots by the NUS Control & Mechatronics Lab.

Surgery Robot Detection and Segmentation

Reconstructing 3D buildings from aerial LiDAR

A proof of concept project by Esri, in collaboration with Nvidia and Miami-Dade County. Along with a great write up and code by Dmitry Kudinov, Daniel Hedges, and Omar Maher. 3D Building Reconstruction

Usiigaci: Label-free Cell Tracking in Phase Contrast Microscopy

A project from Japan to automatically track cells in a microfluidics platform. Paper is pending, but the source code is released.

Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery

Research project to understand the complex processes between degradations in the Arctic and climate change. By Weixing Zhang, Chandi Witharana, Anna Liljedahl, and Mikhail Kanevskiy. image

Mask-RCNN Shiny

A computer vision class project by HU Shiyu to apply the color pop effect on people with beautiful results.

Mapping Challenge: Convert satellite imagery to maps for use by humanitarian organisations.

Mapping Challenge

GRASS GIS Addon to generate vector masks from geospatial imagery. Based on a Master's thesis by Ondřej Pešek.

GRASS GIS Image

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].