All Projects → Meituan-Dianping → vision-ml

Meituan-Dianping / vision-ml

Licence: MIT license
A R-CNN machine learning model for handling Pop-up window in mobile Apps.

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to vision-ml

DonkeyDrift
Open-source self-driving car based on DonkeyCar and programmable chassis
Stars: ✭ 15 (-69.39%)
Mutual labels:  vision
keras-faster-rcnn
keras实现faster rcnn,end2end训练、预测; 持续更新中,见todo... ;欢迎试用、关注并反馈问题
Stars: ✭ 85 (+73.47%)
Mutual labels:  r-cnn
extensiveautomation-server
Extensive Automation server
Stars: ✭ 19 (-61.22%)
Mutual labels:  mobile-testing
non-contact-sleep-apnea-detection
Gihan Jayatilaka, Harshana Weligampola, Suren Sritharan, Pankayaraj Pathmanathan, Roshan Ragel and Isuru Nawinne, "Non-contact Infant Sleep Apnea Detection," 2019 14th Conference on Industrial and Information Systems (ICIIS), Kandy, Sri Lanka, 2019, pp. 260-265, doi: 10.1109/ICIIS47346.2019.9063269.
Stars: ✭ 15 (-69.39%)
Mutual labels:  vision
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (-16.33%)
Mutual labels:  vision
fuse-med-ml
A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
Stars: ✭ 66 (+34.69%)
Mutual labels:  vision
mediapipe plus
The purpose of this project is to apply mediapipe to more AI chips.
Stars: ✭ 38 (-22.45%)
Mutual labels:  vision
Vision CoreML-App
This app predicts the age of a person from the picture input using camera or photos gallery. The app uses Core ML framework of iOS for the predictions. The Vision library of CoreML is used here. The trained model fed to the system is AgeNet.
Stars: ✭ 15 (-69.39%)
Mutual labels:  vision
MathSolver
⌨️Camera calculator with Vision
Stars: ✭ 70 (+42.86%)
Mutual labels:  vision
rt-mrcnn
Real time instance segmentation with Mask R-CNN, live from webcam feed.
Stars: ✭ 47 (-4.08%)
Mutual labels:  r-cnn
Vision
Computer Vision And Neural Network with Xamarin
Stars: ✭ 54 (+10.2%)
Mutual labels:  vision
SemanticSegmentation-Libtorch
Libtorch Examples
Stars: ✭ 38 (-22.45%)
Mutual labels:  vision
TinyCog
Small Robot, Toy Robot platform
Stars: ✭ 29 (-40.82%)
Mutual labels:  vision
xxy
xxy xxy.js alert 移动端弹窗 弹窗 上拉加载 下拉刷新 移动端UI 轮播 banner
Stars: ✭ 84 (+71.43%)
Mutual labels:  popup-window
UAV-Stereo-Vision
A program for controlling a micro-UAV for obstacle detection and collision avoidance using disparity mapping
Stars: ✭ 30 (-38.78%)
Mutual labels:  vision
FaceData
A macOS app to parse face landmarks from a video for GANs training
Stars: ✭ 71 (+44.9%)
Mutual labels:  vision
vision-api
Google Vision API made easy!
Stars: ✭ 19 (-61.22%)
Mutual labels:  vision
halonet-pytorch
Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones
Stars: ✭ 181 (+269.39%)
Mutual labels:  vision
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+316.33%)
Mutual labels:  vision
iOS14-Resources
A curated collection of iOS 14 projects ranging from SwiftUI to ML, AR etc.
Stars: ✭ 85 (+73.47%)
Mutual labels:  vision

Vision-ml

Build Status GitHub GitHub

See also Vision-ui a series algorithms for mobile UI testing.

A R-CNN (Region-based Convolutional Neural Networks) machine learning model for handling pop-up window in mobile apps.

Mobile UI Recognition

Vision-ml is a machine learning model that identifies the UI element that closes the Pop-up window and return its UI coordinate (x, y) on the screen.

A typical usage scenario would be:

  • In mobile testing, when using Appium or similar framework for UI automation, it is usually very tricky to locate the components on the Pop-up window which is rendered on top of the current screen.

  • Input a mobile App screenshot with the Pop-up, and you will get the predicted result (as shown in the blue box).

1 2 3

Requirements

Python3.6.x

# create venv before install requirements
pip install  -r requirements.txt

Usage

You can use Vision with a pre-trained model in "model/trained_model_1.h5", the number in the file name is for version control, you can update it in file named "all_config".

There are two ways of using Vision.

Predict an image with Python script

  1. Update your file path in "rcnn_predict.py"
model_predict("path/to/image.png", view=True)
  1. Run script and you will get the result
python rcnn_predict.py

Predict an image with a web server

  1. Start the web server

You can create server with Dockerfile

python vision_server.py
  1. Post image to web server
curl  http://localhost:9092/client/vision  -F "file=@${IMAGE_PATH}.png"
  1. The response from the web server will have the coordinate or the UI element, alone with a value of score 0 or 1.0 (0 means not found, 1.0 means found).
{
  "code": 0,
  "data": {
    "position": [
      618,
      1763
    ],
    "score": 1.0
  }
}

Train your own model

  • You can choose to use your train image if your close button has different feature from the given training set. Just take a screenshot and put it in the "image/" folder

  • Rename the train image with prefix "1" for close button and "0" for background.

  • You can refer to the given training images in the repo for examples.

Button image named 1_1.png:

Background image named 0_3.png:

  1. There are some images in this repo for training.
0_0.png 0_1.png 0_2.png 0_3.png 0_4.png 0_5.png 0_6.png 1_0.png 1_1.png 1_2.png 1_3.png 1_4.png 1_5.png 1_6.png
  1. Get augmentation of your image, run method in "rcnn_train.py"
Image().get_augmentation()
  1. Train your image, run method in "rcnn_train.py"
train_model()

Model layers and params

Model layers

  • Model input image will be processed from 3d to 1d and pixel value will be set to 255 or 0, which makes the model has great classification.
  • There are 5 layers and the 196,450 parameters in total, which is a light model and makes the training easier and the model is also robust for different Pop-up windows.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 48, 48, 32)        320       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 46, 46, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 23, 23, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 21, 21, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 10, 10, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               131200    
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 258       
=================================================================

Total params: 196,450
Trainable params: 196,450
Non-trainable params: 0

Training params

In all_config.py we have training params of batch_size and epochs

  • batch_size is the number of training image for updating the model
  • epochs is the number for training the model with all the training image.

Performance

With CPU of [email protected]:

  • Training a model takes 30s with 10 epochs
  • A mobile Pop-up window screen shot with 1080p takes 10s for calculating.

Reference

The R-CNN model refers to this paper.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].