All Projects → kevalmorabia97 → Object-and-Semantic-Part-Detection-pyTorch

kevalmorabia97 / Object-and-Semantic-Part-Detection-pyTorch

Licence: Apache-2.0 license
Joint detection of Object and its Semantic parts using Attention-based Feature Fusion on PASCAL Parts 2010 dataset

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Object-and-Semantic-Part-Detection-pyTorch

Transformer-in-PyTorch
Transformer/Transformer-XL/R-Transformer examples and explanations
Stars: ✭ 21 (+16.67%)
Mutual labels:  self-attention
Multi-Hop-Knowledge-Paths-Human-Needs
Ranking and Selecting Multi-Hop Knowledge Paths to Better Predict Human Needs
Stars: ✭ 17 (-5.56%)
Mutual labels:  self-attention
Real-Time-Object-Detection-API-using-TensorFlow
A Transfer Learning based Object Detection API that detects all objects in an image, video or live webcam. An SSD model and a Faster R-CNN model was pretrained on Mobile net coco dataset along with a label map in Tensorflow. This model were used to detect objects captured in an image, video or real time webcam. Open CV was used for streaming obj…
Stars: ✭ 50 (+177.78%)
Mutual labels:  faster-rcnn
Generative MLZSL
[TPAMI Under Submission] Generative Multi-Label Zero-Shot Learning
Stars: ✭ 37 (+105.56%)
Mutual labels:  self-attention
MMTOD
Multi-modal Thermal Object Detector
Stars: ✭ 38 (+111.11%)
Mutual labels:  faster-rcnn
smd
Simple mmdetection CPU inference
Stars: ✭ 27 (+50%)
Mutual labels:  faster-rcnn
R-MeN
Transformer-based Memory Networks for Knowledge Graph Embeddings (ACL 2020) (Pytorch and Tensorflow)
Stars: ✭ 74 (+311.11%)
Mutual labels:  self-attention
multimodal-deep-learning-for-disaster-response
Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset
Stars: ✭ 43 (+138.89%)
Mutual labels:  feature-fusion
FasterRCNN-pytorch
FasterRCNN is implemented in VGG, ResNet and FPN base.
Stars: ✭ 121 (+572.22%)
Mutual labels:  faster-rcnn
GIouloss CIouloss caffe
Caffe version Generalized & Distance & Complete Iou loss Implementation for Faster RCNN/FPN bbox regression
Stars: ✭ 42 (+133.33%)
Mutual labels:  faster-rcnn
FedFusion
The implementation of "Towards Faster and Better Federated Learning: A Feature Fusion Approach" (ICIP 2019)
Stars: ✭ 30 (+66.67%)
Mutual labels:  feature-fusion
Shadowless
A Fast and Open Source Autonomous Perception System.
Stars: ✭ 29 (+61.11%)
Mutual labels:  faster-rcnn
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (+266.67%)
Mutual labels:  self-attention
publications-arruda-ijcnn-2019
Cross-Domain Car Detection Using Unsupervised Image-to-Image Translation: From Day to Night
Stars: ✭ 59 (+227.78%)
Mutual labels:  faster-rcnn
lightweight-temporal-attention-pytorch
A PyTorch implementation of the Light Temporal Attention Encoder (L-TAE) for satellite image time series. classification
Stars: ✭ 43 (+138.89%)
Mutual labels:  self-attention
py-faster-rcnn-imagenet
Train faster rcnn on imagine dataset, related blog post: https://andrewliao11.github.io/object/detection/2016/07/23/detection/
Stars: ✭ 133 (+638.89%)
Mutual labels:  faster-rcnn
AttnSleep
[IEEE TNSRE] "An Attention-based Deep Learning Approach for Sleep Stage Classification with Single-Channel EEG"
Stars: ✭ 76 (+322.22%)
Mutual labels:  self-attention
Awesome-Vision-Transformer-Collection
Variants of Vision Transformer and its downstream tasks
Stars: ✭ 124 (+588.89%)
Mutual labels:  self-attention
Relational Deep Reinforcement Learning
No description or website provided.
Stars: ✭ 44 (+144.44%)
Mutual labels:  self-attention
CrabNet
Predict materials properties using only the composition information!
Stars: ✭ 57 (+216.67%)
Mutual labels:  self-attention

Attention-based Joint Detection of Object and Semantic Part

Joint detection of Object and its Semantic parts using Attention-based feature fusion for 2 Faster RCNN models. This project is done as a part of our CS543 Computer Vision Project at UIUC.
Link To ArXiv Pre-Print

Model Architecture:

We build our model on top of torchvision's Faster-RCNN model. Our model architecture is highly motivated from this paper in that we replace the Relationship modeling and LSTM based feature fusion with an Attention-based feature fusion architecture. We define a hyperparameter called fusion_thresh that decides which object and part proposals boxes are related to each other and should undergo fusion. fusion_thresh=0.9 means that we consider those object and part boxes where their intersection area is atleast 0.9*area_of_part.

More details in Project_Report.pdf file and Video Presentation.

Dataset Info:

Dataset: PASCAL VOC 2010 dataset
Annotations from: PASCAL Parts dataset
Part Annotations are preprocessed from *.mat format to *.json using scipy.io.loadmat module and segmentation masks are used to get corresponding bounding box localizations. Link to parsed part annotations

Directory structure for data is as follows:

data
└── VOCdevkit
    └── VOC2010
        ├── Annotations_Part_json
        │   └── *.json
        ├── Classes
        │   └── *class2ind.txt
        ├── ImageSets
        │   └── Main
        │       └── *train/val.txt
        └── JPEGImages
            └── *.jpg

Dataset Preprocessing:
Our goal is to show that having part information can improve object detection performance, and vice versa.
Some classes e.g. boat do not have part annotations. So we discard them from our vehicles_train/val.txt file.
Some images don't have part annotations for any objects. So we discard them from our corresponding train/val.txt file.
All the above kinds of removed images were about 0.5% only.
No images have been removed from combined train/val/trainval files. They are only removed from animals/indoor/person/vehicles train/val files. So running the part detection model on entire dataset for all classes would result in lots of samples where there will be no part annotations.


In total, there were 166 part classes. We coarse-grained these part annotations by merging multiple parts into a single class. For example FACE is the new part class combining [beak, hair, head, nose, lear, lebrow, leye, mouth, rear, rebrow, reye]. After all merging, number of part classes has reduced to 19 which is present in Classes/part_mergedclass2ind.txt.
Below image shows part classes before and after merging. More example can be found in data/VOCdevkit/VOC2010/example_merged_part_images.

Before Merging Part Classes After merging Part Classes

Running the code (with default parameters):

For training a single model for animal object detection:
python3 main.py -e 15 --use_objects -tr animals_train -val animals_val -cf animals_object_class2ind

For training a single model for animal part detection:
python3 main.py -e 15 --use_parts -tr animals_train -val animals_val -cf animals_part_mergedclass2ind

For training the joint model for simultaneous animal object and part detection:
python3 main_joint.py -e 15 -ft 0.9

Results:

From our limited experiments for Animal Object (bird, cat, cow, dog, horse, sheep) and Part (face, leg, neck, tail, torso, wings) Detection, we find that the Attention-based Joint Detection model gives improvement for Part classes in terms of mean Average Precision @IoU=0.5.

Model Object Detection mAP@IoU=0.5 Part Detection mAP@IoU=0.5
Single Object Detection Model 87.2 --
Single Part Detection Model -- 51.3
Joint Object and Part Detection 87.5 52.0
Object Detections from Joint Model Part Detections from Joint Model

Requirements:

The code has been tested on the following requirements:

numpy==1.19.1
Pillow>=8.1.1
pycocotools==2.0.1
torch==1.4.0
torchvision==0.5.0
tqdm==4.48.0

Contributors:

  1. Keval Morabia (kevalmorabia97)
  2. Jatin Arora (jatinarora2702)
  3. Tara Vijaykumar (tara-vijaykumar)

Cite

If you find this useful in your research, please cite our ArXiv pre-print:

@misc{morabia2020attentionbased,
    title={Attention-based Joint Detection of Object and Semantic Part},
    author={Keval Morabia and Jatin Arora and Tara Vijaykumar},
    year={2020},
    eprint={2007.02419},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Abstract:

In this paper, we address the problem of joint detection of objects like dog and its semantic parts like face, leg, etc. Our model is created on top of two Faster-RCNN models that share their features to perform a novel Attention-based feature fusion of related Object and Part features to get enhanced representations of both. These representations are used for final classification and bounding box regression separately for both models. Our experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection in terms of mean Average Precision (mAP) at IoU=0.5.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].