All Projects → vita-epfl → Monoloco

vita-epfl / Monoloco

Licence: other
[ICCV 2019] Official implementation of "MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation" in PyTorch + Social Distancing

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Monoloco

Convolutional Pose Machines Tensorflow
Stars: ✭ 758 (+213.22%)
Mutual labels:  pose-estimation, human-pose-estimation
Hierarchical perception library in Python for pose estimation, object detection, instance segmentation, keypoint estimation, face recognition, etc.
Stars: ✭ 131 (-45.87%)
Mutual labels:  object-detection, pose-estimation
💃 Real-time single person pose estimation for Android and iOS.
Stars: ✭ 783 (+223.55%)
Mutual labels:  pose-estimation, human-pose-estimation
Ai Basketball Analysis
🏀🤖🏀 AI web app and API to analyze basketball shots and shooting pose.
Stars: ✭ 582 (+140.5%)
Mutual labels:  object-detection, pose-estimation
Awesome Human Pose Estimation
A collection of awesome resources in Human Pose estimation.
Stars: ✭ 2,022 (+735.54%)
Mutual labels:  pose-estimation, human-pose-estimation
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.
Stars: ✭ 662 (+173.55%)
Mutual labels:  pose-estimation, human-pose-estimation
Awesome Computer Vision
Awesome Resources for Advanced Computer Vision Topics
Stars: ✭ 92 (-61.98%)
Mutual labels:  object-detection, pose-estimation
Dataset synthesizer
NVIDIA Deep learning Dataset Synthesizer (NDDS)
Stars: ✭ 417 (+72.31%)
Mutual labels:  object-detection, pose-estimation
SynthDet - An end-to-end object detection pipeline using synthetic data
Stars: ✭ 148 (-38.84%)
Mutual labels:  object-detection, pose-estimation
Gccpm Look Into Person Cvpr19.pytorch
Fast and accurate single-person pose estimation, ranked 10th at CVPR'19 LIP challenge. Contains implementation of "Global Context for Convolutional Pose Machines" paper.
Stars: ✭ 137 (-43.39%)
Mutual labels:  pose-estimation, human-pose-estimation
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
Stars: ✭ 5,697 (+2254.13%)
Mutual labels:  pose-estimation, human-pose-estimation
Ml Auto Baseball Pitching Overlay
⚾🤖⚾ Automatic baseball pitching overlay in realtime
Stars: ✭ 200 (-17.36%)
Mutual labels:  object-detection, pose-estimation
Gluon Cv
Gluon CV Toolkit
Stars: ✭ 5,001 (+1966.53%)
Mutual labels:  object-detection, pose-estimation
Keras realtime multi Person pose estimation
Keras version of Realtime Multi-Person Pose Estimation project
Stars: ✭ 728 (+200.83%)
Mutual labels:  pose-estimation, human-pose-estimation
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Stars: ✭ 22,892 (+9359.5%)
Mutual labels:  pose-estimation, human-pose-estimation
Pytorch Pose
A PyTorch toolkit for 2D Human Pose Estimation.
Stars: ✭ 932 (+285.12%)
Mutual labels:  pose-estimation, human-pose-estimation
Pytorch Human Pose Estimation
Implementation of various human pose estimation models in pytorch on multiple datasets (MPII & COCO) along with pretrained models
Stars: ✭ 346 (+42.98%)
Mutual labels:  pose-estimation, human-pose-estimation
Tf Pose Estimation
Deep Pose Estimation implemented using Tensorflow with Custom Architectures for fast inference.
Stars: ✭ 3,856 (+1493.39%)
Mutual labels:  pose-estimation, human-pose-estimation
Tensorflow realtime multi Person pose estimation
Multi-Person Pose Estimation project for Tensorflow 2.0 with a small and fast model based on MobilenetV3
Stars: ✭ 129 (-46.69%)
Mutual labels:  pose-estimation, human-pose-estimation
pytorch implementation of MultiPoseNet (ECCV 2018, Muhammed Kocabas et al.)
Stars: ✭ 191 (-21.07%)
Mutual labels:  pose-estimation, human-pose-estimation


We tackle the fundamentally ill-posed problem of 3D human localization from monocular RGB images. Driven by the limitation of neural networks outputting point estimates, we address the ambiguity in the task by predicting confidence intervals through a loss function based on the Laplace distribution. Our architecture is a light-weight feed-forward neural network that predicts 3D locations and corresponding confidence intervals given 2D human poses. The design is particularly well suited for small training data, cross-dataset generalization, and real-time applications. Our experiments show that we (i) outperform state-of-the-art results on KITTI and nuScenes datasets, (ii) even outperform a stereo-based method for far-away pedestrians, and (iii) estimate meaningful confidence intervals. We further share insights on our model of uncertainty in cases of limited observations and out-of-distribution samples.

author = {Bertoni, Lorenzo and Kreiss, Sven and Alahi, Alexandre},
title = {MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}


NEW! MonoLoco++ is available:

  • It estimates 3D localization, orientation, and bounding box dimensions
  • It verifies social distance requirements. More info: video and project page
  • It works with OpenPifPaf 0.12 and PyTorch 1.7


  • Paper on ICCV'19 website or ArXiv

  • Check our video with method description and qualitative results on YouTube

  • Live demo available! (more info in the webcam section)

  • Continuously tested with Travis CI: Build Status



Python 3 is required. Python 2 is not supported. Do not clone this repository and make sure there is no folder named monoloco in your current directory.

pip3 install monoloco

For development of the monoloco source code itself, you need to clone this repository and then:

pip3 install -e '.[test, prep]'

Python 3.6 or 3.7 is required for nuScenes development kit. All details for Pifpaf pose detector at openpifpaf.

Data structure

├── arrays                 
├── models
├── kitti
├── nuscenes
├── logs

Run the following to create the folders:

mkdir data
cd data
mkdir arrays models kitti nuscenes logs

Pre-trained Models

  • Download a MonoLoco pre-trained model from Google Drive and save it in data/models (default) or in any folder and call it through the command line option --model <model path>
  • Pifpaf pre-trained model will be automatically downloaded at the first run. Three standard, pretrained models are available when using the command line option --checkpoint resnet50, --checkpoint resnet101 and --checkpoint resnet152. Alternatively, you can download a Pifpaf pre-trained model from openpifpaf and call it with --checkpoint <pifpaf model path>


All the commands are run through a main file called using subparsers. To check all the commands for the parser and the subparsers (including openpifpaf ones) run:

  • python3 -m --help
  • python3 -m predict --help
  • python3 -m train --help
  • python3 -m eval --help
  • python3 -m prep --help

or check the file monoloco/


The predict script receives an image (or an entire folder using glob expressions), calls PifPaf for 2d human pose detection over the image and runs Monoloco for 3d location of the detected poses. The command --networks defines if saving pifpaf outputs, MonoLoco outputs or both. You can check all commands for Pifpaf at openpifpaf.

Output options include json files and/or visualization of the predictions on the image in frontal mode, birds-eye-view mode or combined mode and can be specified with --output_types

Ground truth matching

  • In case you provide a ground-truth json file to compare the predictions of MonoLoco, the script will match every detection using Intersection over Union metric. The ground truth file can be generated using the subparser prep and called with the command --path_gt. Check preprocess section for more details or download the file from here.

  • In case you don't provide a ground-truth file, the script will look for a predefined path. If it does not find the file, it will generate images with all the predictions without ground-truth matching.

Below an example with and without ground-truth matching. They have been created (adding or removing --path_gt) with: python3 -m predict --glob docs/002282.png --output_types combined --scale 2 --model data/models/monoloco-190513-1437.pkl --n_dropout 50 --z_max 30

With ground truth matching (only matching people): predict_ground_truth

Without ground_truth matching (all the detected people): predict_no_matching

Images without calibration matrix

To accurately estimate distance, the focal length is necessary. However, it is still possible to test Monoloco on images where the calibration matrix is not available. Absolute distances are not meaningful but relative distance still are. Below an example on a generic image from the web, created with: python3 -m predict --glob docs/surf.jpg --output_types combined --model data/models/monoloco-190513-1437.pkl --n_dropout 50 --z_max 25

no calibration


example image

MonoLoco can run on personal computers with only CPU and low resolution images (e.g. 256x144) at ~2fps. It support 3 types of visualizations: front, bird and combined. Multiple visualizations can be combined in different windows.

The above gif has been obtained running on a Macbook the command:

python3 -m predict --webcam --scale 0.2 --output_types combined --z_max 10 --checkpoint resnet50 --model data/models/monoloco-190513-1437.pkl



1) KITTI dataset

Download KITTI ground truth files and camera calibration matrices for training from here and save them respectively into data/kitti/gt and data/kitti/calib. To extract pifpaf joints, you also need to download training images soft link the folder in data/kitti/images

2) nuScenes dataset

Download nuScenes dataset from nuScenes (either Mini or TrainVal), save it anywhere and soft link it in data/nuscenes

nuScenes preprocessing requires pip3 install nuscenes-devkit

Annotations to preprocess

MonoLoco is trained using 2D human pose joints. To create them run pifaf over KITTI or nuScenes training images. You can create them running the predict script and using --networks pifpaf.

Inputs joints for training

MonoLoco is trained using 2D human pose joints matched with the ground truth location provided by nuScenes or KITTI Dataset. To create the joints run: python3 -m prep specifying:

  1. --dir_ann annotation directory containing Pifpaf joints of KITTI or nuScenes.

  2. --dataset Which dataset to preprocess. For nuscenes, all three versions of the dataset are supported: nuscenes_mini, nuscenes, nuscenes_teaser.

Ground truth file for evaluation

The preprocessing script also outputs a second json file called names-.json which provide a dictionary indexed by the image name to easily access ground truth files for evaluation and prediction purposes.


Provide the json file containing the preprocess joints as argument.

As simple as python3 -m --train --joints <json file path>

All the hyperparameters options can be checked at python3 -m train --help.

Hyperparameters tuning

Random search in log space is provided. An example: python3 -m train --hyp --multiplier 10 --r_seed 1. One iteration of the multiplier includes 6 runs.

Evaluation (KITTI Dataset)

We provide evaluation on KITTI for models trained on nuScenes or KITTI. We compare them with other monocular and stereo Baselines:

Mono3D, 3DOP, MonoDepth and our Geometrical Baseline.

  • Mono3D: download validation files from here and save them into data/kitti/m3d
  • 3DOP: download validation files from here and save them into data/kitti/3dop
  • MonoDepth: compute an average depth for every instance using the following script here and save them into data/kitti/monodepth
  • GeometricalBaseline: A geometrical baseline comparison is provided. The average geometrical value for comparison can be obtained running: python3 -m eval --geometric --model data/models/monoloco-190719-0923.pkl --joints data/arrays/joints-nuscenes_teaser-190717-1424.json

The following results are obtained running: python3 -m eval --model data/models/monoloco-190719-0923.pkl --generate --dir_ann <folder containing pifpaf annotations of KITTI images>

kitti_evaluation kitti_evaluation_table

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].