All Projects → Moving-AI → Virtual Walk

Moving-AI / Virtual Walk

Licence: mit
Virtual walks in Google Street View using PoseNet and applying Deep Learning models to recognize actions.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Virtual Walk

virtual-reality-tour
📍 Virtual reality travel in Google Street View.
Stars: ✭ 34 (-76.06%)
Mutual labels:  google-maps, virtual-reality
Centroui
CentroUI is a library for building user interfaces for WebVR
Stars: ✭ 135 (-4.93%)
Mutual labels:  virtual-reality
Viro
ViroReact: AR and VR using React Native
Stars: ✭ 1,735 (+1121.83%)
Mutual labels:  virtual-reality
Holoviveobserver
Shared Reality: Observe a VR session from the same room using a HoloLens!
Stars: ✭ 126 (-11.27%)
Mutual labels:  virtual-reality
Eon Map
Realtime maps with PubNub and MapBox.
Stars: ✭ 121 (-14.79%)
Mutual labels:  google-maps
Scan For Webcams
scan for webcams on the internet
Stars: ✭ 128 (-9.86%)
Mutual labels:  webcam
Googlemap
Google Map to use create path on map and play vehicle on path like Uber and Ola
Stars: ✭ 112 (-21.13%)
Mutual labels:  google-maps
Simplemap
A beautifully simple map field type for Craft CMS.
Stars: ✭ 136 (-4.23%)
Mutual labels:  google-maps
Unity Webxr Export
Develop and export WebXR experiences using Unity WebGL
Stars: ✭ 130 (-8.45%)
Mutual labels:  virtual-reality
Entropy
Entropy Toolkit is a set of tools to provide Netwave and GoAhead IP webcams attacks. Entropy Toolkit is a powerful toolkit for webcams penetration testing.
Stars: ✭ 126 (-11.27%)
Mutual labels:  webcam
React Native 360
A React Native wrapper for Google VR Cardboard SDK
Stars: ✭ 125 (-11.97%)
Mutual labels:  virtual-reality
Mapdrawingtools
this library Drawing polygon, polyline and points in Google Map and return coordinates to your App
Stars: ✭ 122 (-14.08%)
Mutual labels:  google-maps
Remixvr
RemixVR is a tool for collaboratively building customisable VR experiences.
Stars: ✭ 129 (-9.15%)
Mutual labels:  virtual-reality
Geomapping With Unity Mapbox
Geomap is the virtualization of data that maps a Country. Mapbox Unity SDK gives data(Global map layers of Streets, Buildings, Elev, and Satellite) generating custom 3D worlds for Mobile VR/AR apps.
Stars: ✭ 118 (-16.9%)
Mutual labels:  virtual-reality
Node Camera
Access and stream web camera in nodejs using opencv and websockets.
Stars: ✭ 135 (-4.93%)
Mutual labels:  webcam
Apertusvr
Virtual Reality Software Library
Stars: ✭ 112 (-21.13%)
Mutual labels:  virtual-reality
Ipyvolume
3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL
Stars: ✭ 1,696 (+1094.37%)
Mutual labels:  virtual-reality
Placepicker
Free Android Map Place Picker alternative using Geocoder instead of Google APIs
Stars: ✭ 126 (-11.27%)
Mutual labels:  google-maps
Arcarmovement
This is navigation example on google map. Here Marker move as vehicles moves with turns as uber does in their app. Using old and new coordinates animating bearing value the markers are moving.
Stars: ✭ 137 (-3.52%)
Mutual labels:  google-maps
Track My Location
Android real-time location tracker app (learn using Firebase 🔥, Google Maps & Location Api) 🌐
Stars: ✭ 136 (-4.23%)
Mutual labels:  google-maps

Virtual walks in Google Street View

Para la versión en español, haz click aquí.

During the quarantine, we're currently experiencing due to the COVID-19 pandemic our rights to move freely on the street are trimmed in favour of the common wellbeing. People can only go out in certain situations like doing the grocery. Many borders are closed and travelling is almosy totally banned in most countries.

Virtual Walks is a project that uses Pose Estimation models along with LSTM neural networks in order to simulate walks in Google Street View. For pose estimation, PoseNet model has been adapted, while for the action detection part, an LSTM model has been developed using TensorFlow 2.0.

This project is capable of simulating walking around the street all over the world with the help of Google Street View.

Tensorflow 2.0, Selenium and Python 3.7 are the main technologies used in this project.

How does it work

PoseNet has been combined with an LSTM model to infer the action that the person is performing. Once the action is detected it is pased to the controller; the part that interacts with Google Street View.

  1. A Selenium Firefox window is opened.

  2. Using the webcam, the system takes photos of the person who will be making one of the four main actions used for walking:

    • Stand
    • Walk
    • Turn right
    • Turn left
  3. For each photo taken, PoseNet is used to infer the position of the joints in the image.

  4. Groups of 5 frames are made, starting from a frame that has to meet certain considerations of confidence in the joints detected. Missing joint inference is made in frames behind 1st one.

  5. Each group of frames is passed to a LSTM model with a FF Neural Network attached after it and an action is predicted.

  6. The predicted action is passed to the selenium controller and brings the action to reality in the opened Firefox Window

Currently, there is another model that can be used to run this program. Instead of a LSTM, joint velocities are calculated across the frames in the 5-frame groups and passed along with the joint positions to a PCA and FF Neural Network to predict the action. The default model is the LSTM, as we consider it the methodologically correct one and is the model with the highest precission.

As the action prediction could be (depending on the host computer's specifications) much faster than the average walking speed, an action can be only executed once every 0.5 seconds. This parameter is customizable.

Use case example

As it can be seen in the image, the skeleton is inferred form the image and an action is predicted and executed.

Example walk in Paris

Installation and use

Remember that a Webcam is needed to use this program, as actions are predicted from the frames taken with it.

It is recommended to install it in a new Python 3.7 environment to avoid issues and version conflicts.

Install tensorflowjs, required to run ResNet:

pip install tensorflowjs

Clone and install tensorflowjs graph model converter, following the steps in tfjs-to-tf

Clone the git repository

git clone https://github.com/Moving-AI/virtual-walk.git

Install dependencies by running

pip install -r requirements.txt

Install Firefox and download Geckodriver. Then specify the path in config_resnet.yml under the "driver_path" option.

Download the used models by running the download_models file. This script will download PoseNet models (MobileNet and ResNet with both output strides, 16 and 32), LSTM, PCA, scaler and neural network. The link to download the models separately can be found below.

cd virtual-walk
python3 download_models.py

Finally, you can run execute.py to try it.

python3 execute.py

Considerations during usage:

  • Our experience using the model tells us that a slightly bright enviroment is preferred rather than a very bright one.

  • The system is sensitive to the position of the webcam.

To sum up, a position close to the one shown in the GIF should be used.

Links to our models

Training

Probably the training part is the weakest in this project, due to our lack of training data and computing power. Our training data generation process consisted on 40 minutes of recordings. In each video, one person appeared making one specific action for a certain period of time. As it will be discussed in the next steps section, our models tend to overfit in spite of having a working system. An example of the training data can be seen below.

The models we have trained and the ones from which the examples have been generated can be downloaded running the download_models file. In the images below the training performance is shown:

If someone wants to train another LSTM model, the DataProcessor class is provided. It can process the videos located in a folder, reading the valid frame numbers from a labels.txt file and generating a CSV file with the training examples. This file can be used in train.py to generate a new LSTM model. The path for this model would be passed to the WebcamPredictor class and the system would use this new model.

Next steps

  • Generating more training data. In this project we have tried to get what could be considered a MVP, robustness has never been a main goal. As it can be seen in the Training section, the model does not appear to overfit, even knowing that LSTM tend very much to overfit. However, the training and testing data are very similar, as the videos are people making "loop" actions. So we expect the model to have underlying overfitting that cannot be detected witout more videos. Probably, recording more videos in different light conditions would make the model more robust and consistent.

  • Turning to the right and to the left are not predicted with the same accuracy in spite of being symmetric actions. A specular reflection of the coordinates could be used to be more consistent in the turn predictions.

Authors

License

This project is under MIT license. See LICENSE for more details.

Acknowledgments

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].