DirtyHarryLYL / Hoi Learning List
Labels
Projects that are alternatives of or similar to Hoi Learning List
HOI-Learning-List
Some recent (2015-now) Human-Object Interaction Learing studies. If you find any errors or problems, please feel free to comment.
A list of Transfomer-based vision works: https://github.com/DirtyHarryLYL/Transformer-in-Vision.
Dataset
-
PaStaNet-HOI (TPAMI2021) [Benchmark]
-
HAKE (CVPR2020) [YouTube] [bilibili] [Website] [Paper] [HAKE-Action-Torch] [HAKE-Action-TF]
-
PIC [Website]
More...
Method
HOI Recognition: Image-based, to recognize all the HOIs in one image.
-
PaStaNet: Toward Human Activity Knowledge Engine (CVPR2020) [Code] [Data] [Paper] [YouTube] [bilibili]
-
Pairwise (ECCV2018) [Paper]
-
Attentional Pooling for Action Recognition (NIPS2017) [Code] [Paper]
-
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering (ECCV2016) [Code] [Paper]
-
Contextual Action Recognition with R*CNN (ICCV2015) [Code] [Paper]
-
SGAP-Net (AAAI2020) [Paper]
More...
Unseen or zero-shot learning (image-level recognition).
-
Compositional Learning for Human Object Interaction (ECCV2018) [Paper]
-
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]
More...
HOI Detection: Instance-based, to detect the human-object pairs and classify the interactions.
-
End-to-End Human Object Interaction Detection with HOI Transformer (CVPR2021), [Paper], [Code]
-
DIRV (AAAI2021) [Paper]
-
DecAug (AAAI2021) [Paper]
-
OSGNet (IEEE Access) [Paper]
-
PFNet (CVM) [Paper]
-
UniDet (ECCV2020) [Paper]
-
FCMNet (ECCV2020) [Paper]
-
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection (ECCV2020) [Paper]
-
Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection (IJCAI2020) [Paper]
-
PaStaNet (CVPR2020) [Code] [Data] [Paper] [YouTube] [bilibili]
-
Cascaded Human-Object Interaction Recognition (CVPR2020) [Code] [Paper]
-
Diagnosing Rarity in Human-Object Interaction Detection (CVPRW2020) [Paper]
-
MLCNet (ICMR2020) [Paper]
-
SIGN (ICME2020) [Paper]
-
In-GraphNet (IJCAI-PRICAI 2020) [Paper]
-
RPNN (ICCV2019) [Paper]
-
Deep Contextual Attention for Human-Object Interaction Detection (ICCV2019) [Paper]
-
Turbo (AAAI2019) [Paper]
-
InteractNet (CVPR2018) [Paper]
-
Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [Paper]
-
VS-GATs (Mar. 2020) [Paper]
-
Classifying All Interacting Pairs in a Single Shot (Jan. 2020) [Paper]
-
Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [Paper]
-
SABRA (Dec 2020) [Paper]
More...
Unseen or zero-shot learning (instance-level detection).
-
Detecting Human-Object Interaction with Mixed Supervision (WACV 2021) [Paper]
-
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]
-
Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [Paper]
-
Functional (AAAI2020) [Paper]
-
Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [Paper]
More...
Video HOI methods
-
Generating Videos of Zero-Shot Compositions of Actions and Objects (Jul 2020), HOI GAN, [Paper]
-
Grounded Human-Object Interaction Hotspots from Video (ICCV2019) [Code] [Paper]
More...
Result
PaStaNet-HOI:
Proposed by TIN (TPAMI version, Transferable Interactiveness Network). It is built on HAKE data, includes 110K+ images and 520 HOIs (without the 80 "no_interaction" HOIs of HICO-DET to avoid the incomplete labeling). It has a more severe long-tailed data distribution thus is more difficult.
Detector: COCO pre-trained
Method | mAP |
---|---|
iCAN | 11.00 |
iCAN+NIS | 13.13 |
TIN | 15.38 |
HICO-DET:
1) Detector: COCO pre-trained
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
Shen et al. | WACV2018 | 6.46 | 4.24 | 7.12 | - | - | - |
HO-RCNN | WACV2018 | 7.81 | 5.37 | 8.54 | 10.41 | 8.94 | 10.85 |
InteractNet | CVPR2018 | 9.94 | 7.16 | 10.77 | - | - | - |
Turbo | AAAI2019 | 11.40 | 7.30 | 12.60 | - | - | - |
GPNN | ECCV2018 | 13.11 | 9.34 | 14.23 | - | - | - |
Xu et. al | ICCV2019 | 14.70 | 13.26 | 15.13 | - | - | - |
iCAN | BMVC2018 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
Wang et. al. | ICCV2019 | 16.24 | 11.16 | 17.75 | 17.73 | 12.78 | 19.21 |
Lin et. al | IJCAI2020 | 16.63 | 11.30 | 18.22 | 19.22 | 14.56 | 20.61 |
Functional (suppl) | AAAI2020 | 16.96 | 11.73 | 18.52 | - | - | - |
Interactiveness | CVPR2019 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
No-Frills | ICCV2019 | 17.18 | 12.17 | 18.68 | - | - | - |
RPNN | ICCV2019 | 17.35 | 12.78 | 18.71 | - | - | - |
PMFNet | ICCV2019 | 17.46 | 15.65 | 18.00 | 20.34 | 17.47 | 21.20 |
SIGN | ICME2020 | 17.51 | 15.31 | 18.53 | 20.49 | 17.53 | 21.51 |
Interactiveness-optimized | CVPR2019 | 17.54 | 13.80 | 18.65 | 19.75 | 15.70 | 20.96 |
Wang et al. | ECCV2020 | 17.57 | 16.85 | 17.78 | 21.00 | 20.74 | 21.08 |
In-GraphNet | IJCAI-PRICAI 2020 | 17.72 | 12.93 | 19.31 | - | - | - |
HOID | CVPR2020 | 17.85 | 12.85 | 19.34 | - | - | - |
MLCNet | ICMR2020 | 17.95 | 16.62 | 18.35 | 22.28 | 20.73 | 22.74 |
SAG | arXiv | 18.26 | 13.40 | 19.71 | - | - | - |
Sarullo et al. | arXiv | 18.74 | - | - | - | - | - |
DRG | ECCV2020 | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 |
Analogy | ICCV2019 | 19.40 | 14.60 | 20.90 | - | - | - |
VCL | ECCV2020 | 19.43 | 16.55 | 20.29 | 22.00 | 19.09 | 22.87 |
VS-GATs | arXiv | 19.66 | 15.79 | 20.81 | - | - | - |
VSGNet | CVPR2020 | 19.80 | 16.05 | 20.91 | - | - | - |
PFNet | CVM | 20.05 | 16.66 | 21.07 | 24.01 | 21.09 | 24.89 |
FCMNet | ECCV2020 | 20.41 | 17.34 | 21.56 | 22.04 | 18.97 | 23.12 |
ACP | ECCV2020 | 20.59 | 15.92 | 21.98 | - | - | - |
PD-Net | ECCV2020 | 20.81 | 15.90 | 22.28 | 24.78 | 18.88 | 26.54 |
TIN-PAMI | TAPMI2021 | 20.93 | 18.95 | 21.32 | 23.02 | 20.96 | 23.42 |
PMN | arXiv | 21.21 | 17.60 | 22.29 | - | - | - |
DJ-RN | CVPR2020 | 21.34 | 18.53 | 22.18 | 23.69 | 20.64 | 24.60 |
OSGNet | IEEE Access | 21.40 | 18.12 | 22.38 | - | - | - |
DIRV | AAAI2021 | 21.78 | 16.38 | 23.39 | 25.52 | 20.84 | 26.92 |
ConsNet | ACMMM2020 | 22.15 | 17.12 | 23.65 | - | - | - |
IDN | NeurIPS2020 | 23.36 | 22.47 | 23.63 | 26.43 | 25.01 | 26.85 |
2) Detector: pre-trained on COCO, fine-tuned on HICO-DET train set (with GT human-object pair boxes) or one-stage detector
Finetuned detector would learn to only detect the interactive humans and objects (with interactiveness), thus suppress many wrong pairings (non-interactive human-object pairs) and boost the performance. |Method| Pub|Full(def) | Rare(def) | None-Rare(def)| Full(ko) | Rare(ko) | None-Rare(ko) | |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |UniDet|ECCV2020|17.58 |11.72 |19.33 |19.76 |14.68 |21.27| |IP-Net | CVPR2020| 19.56 |12.79| 21.58 |22.05 |15.77 |23.92| |PPDM (paper) |CVPR2020|21.10 |14.46| 23.09| -|-|-| |PPDM (github-hourglass104) |CVPR2020|21.73/21.94 |13.78/13.97 |24.10/24.32 |24.58/24.81| 16.65/17.09| 26.84/27.12| |Functional |AAAI2020|21.96 |16.43|23.62| -|-|-| |SABRA-Res50| arXiv| 23.48| 16.39| 25.59| 28.79| 22.75| 30.54| |VCL|ECCV2020|23.63 |17.21 |25.55 |25.98 |19.12 |28.03| |SABRA-Res50FPN| arXiv| 24.12 |15.91| 26.57| 29.65| 22.92| 31.65| |ConsNet|ACMMM2020|24.39 |17.10 |26.56|-|-|-| |DRG|ECCV2020|24.53 |19.47 |26.04 |27.98 |23.11 |29.43| |SABRA-Res152| arXiv| 26.09 |16.29| 29.02| 31.08| 23.44| 33.37| |IDN|NeurIPS2020|26.29|22.61|27.39|28.24|24.47|29.37| |Zou et al.|CVPR2021|26.61 |19.15| 28.84| 29.13| 20.98| 31.57| |AS-Net|CVPR2021|28.87 |24.25 |30.25 |31.74 |27.07 |33.14| |QPIC-Res50|CVPR2021| 29.07 |21.85 |31.23 |31.68 |24.14 |33.93| |FCL|CVPR2021|29.12 |23.67 |30.75 |31.31 |25.62 |33.02| |QPIC-Res101|CVPR2021|29.90 |23.92 |31.69 |32.38 |26.06 |34.27|
3) Ground Truth human-object pair boxes (only evaluating HOI recognition)
Method | Pub | Full(def) | Rare(def) | None-Rare(def) |
---|---|---|---|---|
iCAN | BMVC2018 | 33.38 | 21.43 | 36.95 |
Interactiveness | CVPR2019 | 34.26 | 22.90 | 37.65 |
Analogy | ICCV2019 | 34.35 | 27.57 | 36.38 |
IDN | NeurIPS2020 | 43.98 | 40.27 | 45.09 |
FCL | CVPR2021 | 45.25 | 36.27 | 47.94 |
4) Enhanced with HAKE:
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
iCAN | BMVC2018 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
iCAN + HAKE-HICO-DET | CVPR2020 | 19.61 (+4.77) | 17.29 | 20.30 | 22.10 | 20.46 | 22.59 |
Interactiveness | CVPR2019 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
Interactiveness + HAKE-HICO-DET | CVPR2020 | 22.12 (+5.09) | 20.19 | 22.69 | 24.06 | 22.19 | 24.62 |
Interactiveness + HAKE-Large | CVPR2020 | 22.66 (+5.63) | 21.17 | 23.09 | 24.53 | 23.00 | 24.99 |
Ambiguous-HOI
Detector: COCO pre-trained
Method | mAP |
---|---|
iCAN | 8.14 |
Interactiveness | 8.22 |
Analogy(reproduced) | 9.72 |
DJ-RN | 10.37 |
V-COCO: Scenario1
1) Detector: COCO pre-trained or one-stage detector
Method | Pub | AP(role) |
---|---|---|
Gupta et al. | arXiv | 31.8 |
InteractNet | CVPR2018 | 40.0 |
Turbo | AAAI2019 | 42.0 |
GPNN | ECCV2018 | 44.0 |
iCAN | BMVC2018 | 45.3 |
Xu et. al | CVPR2019 | 45.9 |
Wang et. al. | ICCV2019 | 47.3 |
UniDet | ECCV2020 | 47.5 |
Interactiveness | CVPR2019 | 47.8 |
Lin et. al | IJCAI2020 | 48.1 |
VCL | ECCV2020 | 48.3 |
Zhou et. al. | CVPR2020 | 48.9 |
In-GraphNet | IJCAI-PRICAI 2020 | 48.9 |
Interactiveness-optimized | CVPR2019 | 49.0 |
TIN-PAMI | TAPMI2021 | 49.1 |
IP-Net | CVPR2020 | 51.0 |
DRG | ECCV2020 | 51.0 |
VSGNet | CVPR2020 | 51.8 |
PMN | arXiv | 51.8 |
PMFNet | ICCV2019 | 52.0 |
FCL | CVPR2021 | 52.35 |
PD-Net | ECCV2020 | 52.6 |
Wang et.al. | ECCV2020 | 52.7 |
PFNet | CVM | 52.8 |
Zou et al. | CVPR2021 | 52.9 |
SIGN | ICME2020 | 53.1 |
ACP | ECCV2020 | 52.98 (53.23) |
FCMNet | ECCV2020 | 53.1 |
ConsNet | ACMMM2020 | 53.2 |
IDN | NeurIPS2020 | 53.3 |
OSGNet | IEEE Access | 53.43 |
SABRA-Res50 | arXiv | 53.57 |
AS-Net | CVPR2021 | 53.9 |
SABRA-Res50FPN | arXiv | 54.69 |
MLCNet | ICMR2020 | 55.2 |
DIRV | AAAI2021 | 56.1 |
SABRA-Res152 | arXiv | 56.62 |
QPIC-Res101 | CVPR2021 | 58.3 |
QPIC-Res50 | CVPR2021 | 58.8 |
2) Enhanced with HAKE:
Method | Pub | AP(role) |
---|---|---|
iCAN | CVPR2019 | 45.3 |
iCAN + HAKE-Large (transfer learning) | CVPR2020 | 49.2 (+3.9) |
Interactiveness | CVPR2019 | 47.8 |
Interactiveness + HAKE-Large (transfer learning) | CVPR2020 | 51.0 (+3.2) |
HICO
1) Default
Method | mAP |
---|---|
R*CNN | 28.5 |
Girdhar et.al. | 34.6 |
Mallya et.al. | 36.1 |
Pairwise | 39.9 |
2) Enhanced with HAKE:
Method | mAP |
---|---|
Mallya et.al. | 36.1 |
Mallya et.al.+HAKE-HICO | 45.0 (+8.9) |
Pairwise | 39.9 |
Pairwise+HAKE-HICO | 45.9 (+6.0) |
Pairwise+HAKE-Large | 46.3 (+6.4) |