All Projects โ†’ lannguyen0910 โ†’ food-detection-yolov5

lannguyen0910 / food-detection-yolov5

Licence: MIT license
๐Ÿ”๐ŸŸ๐Ÿ— Food analysis baseline with Theseus. Integrate object detection, image classification and multi-class semantic segmentation. ๐Ÿž๐Ÿ–๐Ÿ•

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to food-detection-yolov5

HugsVision
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision
Stars: โœญ 154 (+126.47%)
Mutual labels:  yolo, image-classification, semantic-segmentation
Label Studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Stars: โœญ 7,264 (+10582.35%)
Mutual labels:  yolo, image-classification, semantic-segmentation
awesome-computer-vision-models
A list of popular deep learning models related to classification, segmentation and detection problems
Stars: โœญ 419 (+516.18%)
Mutual labels:  image-classification, semantic-segmentation, efficientnet
Yolov3
YOLOv3 in PyTorch > ONNX > CoreML > TFLite
Stars: โœญ 8,159 (+11898.53%)
Mutual labels:  yolo, yolov5
Rectlabel Support
RectLabel - An image annotation tool to label images for bounding box object detection and segmentation.
Stars: โœญ 338 (+397.06%)
Mutual labels:  yolo, image-classification
Dmsmsgrcg
A photo OCR project aims to output DMS messages contained in sign structure images.
Stars: โœญ 18 (-73.53%)
Mutual labels:  yolo, image-classification
live-cctv
To detect any reasonable change in a live cctv to avoid large storage of data. Once, we notice a change, our goal would be track that object or person causing it. We would be using Computer vision concepts. Our major focus will be on Deep Learning and will try to add as many features in the process.
Stars: โœญ 23 (-66.18%)
Mutual labels:  yolo, image-classification
Pytorch cpp
Deep Learning sample programs using PyTorch in C++
Stars: โœญ 114 (+67.65%)
Mutual labels:  yolo, semantic-segmentation
Imagenet
Trial on kaggle imagenet object localization by yolo v3 in google cloud
Stars: โœญ 56 (-17.65%)
Mutual labels:  yolo, image-classification
Yolo segmentation
image (semantic segmentation) instance segmentation by darknet or yolo
Stars: โœญ 143 (+110.29%)
Mutual labels:  yolo, semantic-segmentation
Vehicle-Detection
Vehicle Detection Using Deep Learning and YOLO Algorithm
Stars: โœญ 96 (+41.18%)
Mutual labels:  yolo, yolov5
Lightnet
๐ŸŒ“ Bringing pjreddie's DarkNet out of the shadows #yolo
Stars: โœญ 322 (+373.53%)
Mutual labels:  yolo, image-classification
Alturos.yolo
C# Yolo Darknet Wrapper (real-time object detection)
Stars: โœญ 308 (+352.94%)
Mutual labels:  yolo, image-classification
MixNet-PyTorch
Concise, Modular, Human-friendly PyTorch implementation of MixNet with Pre-trained Weights.
Stars: โœญ 16 (-76.47%)
Mutual labels:  image-classification, efficientnet
UniFormer
[ICLR2022] official implementation of UniFormer
Stars: โœญ 574 (+744.12%)
Mutual labels:  image-classification, semantic-segmentation
Yolov5
YOLOv5 ๐Ÿš€ in PyTorch > ONNX > CoreML > TFLite
Stars: โœญ 19,914 (+29185.29%)
Mutual labels:  yolo, yolov5
TNN Demo
๐Ÿ‰ ็งปๅŠจ็ซฏTNN้ƒจ็ฝฒๅญฆไน ็ฌ”่ฎฐ๏ผŒๆ”ฏๆŒAndroidไธŽiOSใ€‚
Stars: โœญ 51 (-25%)
Mutual labels:  yolo, yolov5
realtime-object-detection
Detects objects in images/streaming video
Stars: โœญ 16 (-76.47%)
Mutual labels:  yolo, yolov5
EfficientUNetPlusPlus
Decoder architecture based on the UNet++. Combining residual bottlenecks with depthwise convolutions and attention mechanisms, it outperforms the UNet++ in a coronary artery segmentation task, while being significantly more computationally efficient.
Stars: โœญ 37 (-45.59%)
Mutual labels:  efficientnet, unetplusplus
Alturos.ImageAnnotation
A collaborative tool for labeling image data for yolo
Stars: โœญ 47 (-30.88%)
Mutual labels:  yolo, image-classification

๐Ÿ”๐ŸŸ๐Ÿ— Meal analysis with Theseus ๐Ÿž๐Ÿ–๐Ÿ•


MIT CodeFactor Python

Dev logs [07/03/2022] Big refactor. Integrate object detection, image classification, semantic segmentation into one Ship of Theseus.
[31/01/2022] Update to new YOLOv5 latest versions P5-P6. Can load checkpoints from original repo.
[26/12/2021] Update app on Android.
[12/09/2021] Update all features to the web app.
[16/07/2021] All trained checkpoints on custom data have been lost. Now use pretrained models on COCO for inference.

๐Ÿ“” Notebook

  • For inference, use this notebook to run the web app Notebook
  • For training, refer to these notebooks for your own training:
    • Detection: Notebook
    • Classification: Notebook
    • Semantic segmentation: Notebook

๐Ÿฅ‡ Pretrained-weights

Models Image Size Epochs [email protected] [email protected]:0.95
YOLOv5s 640x640 172 90.7 67.1
YOLOv5m 640x640 112 89.7 66.6
YOLOv5l 640x640 118 94 73
YOLOv5x 640x640 62 77.9 53.3
  • Segmentation:
Models Image Size Epochs Pixel AP Pixel AR Dice score
UNet++ 640x640 5 0.931 0.935 99.95
  • Classification:
Models Image Size Epochs Acc Balanced Acc F1-score
EfficientNet-B4 640x640 7 84.069 86.033 84.116

๐ŸŒŸ Logs detail

In total, there are 3 implementation versions:

  1. Training using our own object detection's template. The model's source code is inherited from the Ultralytics source code repo, the dataset is used in COCO format and the training and data processing steps are reinstalled by us using Pytorch. Ensemble technique, merge result of 4 models, only for images. Label enhancement technique, if the output label (after detection) is either "Food" or "Food-drinks", we use a pretrained Efficientnet-B4 classifier (on 255 classes) to re-classify it to another reasonable label.
  2. Big refactor, update the training steps, used from Ultralytics source code repo too. The models yield better accuracy. Test-time augmentation technique is added to the web app.
  3. Update Theseus template, currently supports food detection, food classification, multi-class food semantic segmentation only on images. For this version, we introduce Theseus, which is just a part of Theseus template. Moreover, we omitted some weak or unnecessary features to make the project more robust. Theseus adapted from big project templates such as: mmocr, fairseq, timm, paddleocr,...

For those who want to play around with the first version, which remains some features, differ from the new version. You can check out the v1 branch.

๐ŸŒŸ Inference

  • File structure
this repo
โ”‚   app.py
โ””โ”€โ”€โ”€configs
โ”‚     โ””โ”€โ”€โ”€classification          # Contains classification's configurations
|             โ””โ”€โ”€โ”€test.yaml 
โ”‚     โ””โ”€โ”€โ”€detection          # Contains detection's configurations
|             โ””โ”€โ”€โ”€....
โ”‚     โ””โ”€โ”€โ”€segmentation          # Contains segmentation's configurations
|             โ””โ”€โ”€โ”€....

  • Install requirements.
pip install -e .
  • Start the app. Safe to run in insecure connection http on localhost. You can generate SSL certificate to run the app in https.
run.bat

๐ŸŒŸ Dataset

  • Detection: link (merged OID and Vietnamese Lunch dataset)
  • Classification: link (MAFood121)
  • Semantic segmentation: link (UECFood)

๐ŸŒŸ Dataset details

To train the food detection model, we survey the following datasets:
  • Open Images V6-Food: Open Images V6 is a huge dataset from Google for Computer Vision tasks. To solve our problem, we extracted from a large dataset on food related labels. The extracted set includes 18 labels with more than 20,000 images.
  • School Lunch Dataset: includes 3940 photos of a lunch of Japanese high school students, taken at the same frontal angle with the goal of assessing student nutrition. Labels consist of coordinates and types of dishes are attached and divided into 21 different dishes, in the dataset there is also a label "Other Foods" if the dishes do not belong to the remaining 20 dishes.
  • Vietnamese Food: a self-collected dataset on Vietnamese dishes, including 10 simple dishes of our country such as: Pho, Com Tam, Hu Tieu, Banh Mi,... Each category has about 20-30 images, divided 80-20 for training and evaluation.

We aggregate all the above datasets to proceed training. Dishes that appear in different sets will be grouped into one to avoid duplication. After aggregating, a large data set of 60,305 images with 44 different foods from all regions of the world.

In addition, we find that if we expand the problem to include classification, the dataset will increase significantly. Therefore, to further enhance the diversity of dishes, we collect additional datasets to additionally train a classification model:

  • MAFood-121: consisting of 21,175 training image samples. The dishes are selected from the top 11 most popular cuisines in the world according to Google Trends statistics, these cuisines come from many countries around the world, especially Vietnam. For each type of cuisine, 11 typical traditional dishes are selected. The dataset has a total of 121 different types of dishes, each belonging to at least 1 of 10 food categories: Bread, Eggs, Fried, Meat, Noodles, Rice, Seafood, Soup, Dumplings, and Vegetables . 85% of the images are used for training and the remaining 15% for evaluation.
  • Food-101: includes 101 different types of dishes, with 101,000 sets of photos. For each dish, 250 images were used as test images and the remaining 750 images were used for training. The training images in this set still have a lot of noise, sometimes the colors are too sharp or some of the data samples are mislabeled, these noises are intentional by the author (mentioned in the study).

We also perform the aggregation of the two data sets above into one. The new set includes 93,748 training images and 26,825 evaluation images with a total of 180 different dishes. It can be seen that the number of dishes has increased significantly, if the model detects a dish labeled "Other Foods", the classification model will be applied to this dish and classified again.

๐ŸŒŸ Server

Implementation details

The function get_prediction is an inference function for detection, classification and semantic segmentation tasks, depends on which inputs you choose. Implemented in modules.py, where the image detection process will call the Edamam API to get nutritional information in the food. We also save nutritional information in csv files in the folder /static/csv.

We provide the user with the ability to customize the threshold of confidence and iou so that the user can find a suitable threshold for the input image. In order not to have to rerun the whole model every time these parameters are changed, when the image is sent from the client, the server will perform a perceptual hash encryption algorithm to encrypt the image and using that resulting string to name the image when saving to the server. This helps when the client sends an image whose encoding already exists in the database, the server will only post-process the previously predicted result without having to re-execute the prediction.

๐ŸŒŸ Additional Methods

To increase the variety of dishes, we apply a classification model:
After testing and observing, we use a simple and effective model: EfficientNet. EfficientNet is proposed by Google and is one of the state-of-the-art models in this classification problem, and efficiency is also guaranteed. We apply the EfficientNet model source code from rwightman, we select the EfficientNet-B4 version for retraining on the aggregated dataset. This model is used as an additional improvement to the YOLOv5 model in case the model detects a dish labeled as "Other Foods", only then EfficientNet is applied to predict the label again for this dish.
To increase the accuracy of the algorithm, we use the ensemble models technique:

For each image, models with different versions are used to predict, the results are then aggregated using the "weighted box fusion" method to give the final result.

To increase users' interactivity with the application:

When a dish is predicted, we provide more information about the nutritional level of that dish to the user. This information is queried from the application's database, which will be periodically updated from the Edamam API - an API that allows querying the nutrition of a dish by dish name. When doing prediction, the nutrition information will be saved along with the dish name under CSV format. We then fetch the CSV file on the client site to proceed drawing nutritrion statistics chart using Chart.js library. There are a total of 2 chart types, which appear when the user clicks on that chart type.

๐Ÿฑ Sample Results

๐Ÿ“™ Credits

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].