All Projects → LitleCarl → Showandtell

LitleCarl / Showandtell

A Show And Tell implementation for iOS 11.0 based on CoreML

Programming Languages

swift
15916 projects

Projects that are alternatives of or similar to Showandtell

Coremldemo
A simple demo for Core ML
Stars: ✭ 90 (-41.94%)
Mutual labels:  coreml
Fast Style Transfer Coreml
Stars: ✭ 109 (-29.68%)
Mutual labels:  coreml
Awesome Ml
Discover, download, compile & launch different image processing & style transfer CoreML models on iOS.
Stars: ✭ 142 (-8.39%)
Mutual labels:  coreml
Arkit Sampler
Code examples for ARKit.
Stars: ✭ 1,334 (+760.65%)
Mutual labels:  coreml
Yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Stars: ✭ 19,914 (+12747.74%)
Mutual labels:  coreml
Food101 Coreml
A CoreML model which classifies images of food
Stars: ✭ 119 (-23.23%)
Mutual labels:  coreml
Unityvision Ios
This native plugin enables Unity to take advantage of specific features of Core-ML and Vision Framework on the iOS platform.
Stars: ✭ 85 (-45.16%)
Mutual labels:  coreml
Ios Coreml Yolo
Almost Real-time Object Detection using Apple's CoreML and YOLO v1 -
Stars: ✭ 153 (-1.29%)
Mutual labels:  coreml
Exermote
Using Machine Learning to predict the type of exercise from movement data
Stars: ✭ 108 (-30.32%)
Mutual labels:  coreml
Mnist draw
This is a sample project demonstrating the use of Keras (Tensorflow) for the training of a MNIST model for handwriting recognition using CoreML on iOS 11 for inference.
Stars: ✭ 139 (-10.32%)
Mutual labels:  coreml
Nsfwdetector
A NSFW (aka porn) detector with CoreML
Stars: ✭ 1,364 (+780%)
Mutual labels:  coreml
Sentimentcoremldemo
😃 iOS11 demo application for sentiment polarity analysis.
Stars: ✭ 104 (-32.9%)
Mutual labels:  coreml
Cocoaai
🤖 The Cocoa Artificial Intelligence Lab
Stars: ✭ 134 (-13.55%)
Mutual labels:  coreml
Face landmark dnn
Face Landmark Detector based on Mobilenet V1
Stars: ✭ 92 (-40.65%)
Mutual labels:  coreml
Posenet Coreml
I checked the performance by running PoseNet on CoreML
Stars: ✭ 143 (-7.74%)
Mutual labels:  coreml
Flowersvisiondemo
🌸 iOS11 demo application for flower classification.
Stars: ✭ 90 (-41.94%)
Mutual labels:  coreml
Coreml In Arkit
Simple project to detect objects and display 3D labels above them in AR. This serves as a basic Template for an ARKit project to use CoreML.
Stars: ✭ 1,534 (+889.68%)
Mutual labels:  coreml
Styletransfer Ios
Stars: ✭ 155 (+0%)
Mutual labels:  coreml
Gestureai Coreml Ios
Hand-gesture recognition on iOS app using CoreML
Stars: ✭ 145 (-6.45%)
Mutual labels:  coreml
Ssdmobilenet coreml
Real-time object-detection using SSD on Mobilenet on iOS using CoreML, exported using tf-coreml
Stars: ✭ 136 (-12.26%)
Mutual labels:  coreml

ShowAndTell

Show and Tell: A Neural Image Caption Generator

🎉🎉🎉 Keras part is public now

Brief

Pull requests and issues: @litleCarl

A CoreML implementation of the image-to-text model described in the paper:

"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge."

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

IEEE transactions on pattern analysis and machine intelligence (2016).

Full text available at: http://arxiv.org/abs/1609.06647

Demo

Usage

Simple use

let showAndTell = ShowAndTell()
let results = showAndTell.predict(image: uiimage2predict, beamSize: 3, maxWordNumber: 30)
// Parameter explaination
//    image:         The image to be used to generate the caption.
//    beamSize:      Max caption count in result to be reserved in beam search.(Affect the performance greatly)
//    maxWordNumber: Max number of words in a sentence to be predicted.
class ShowAndTell {
  ...
  func predict(image: UIImage, beamSize: Int = 3, maxWordNumber:Int = 20) -> PriorityQueue<Caption>
  ...
}

Benchmark (Tested on iPhone 7+, welcome PR for more devices)

maxWordNumber = 20 maxWordNumber = 30
beamSize Time (ms)
1 480.12
2 845.78
3 1443.82
4 2001.30
5 2648.48
6 3158.53
7 4179.14
8 4861.66
9 6003.65
10 7087.97
11 8134.95
12 9627.79
beamSize Time (ms)
1 451.12
2 1194.65
3 1965.27
4 2971.92
5 3798.28
6 4391.35
7 5714.87
8 6937.60
9 8482.03
10 10421.52
11 12460.80
12 13777.67

Line chart for Time vs Beam Size (When maxWordNumber = 30)

So it is recommeneded to set beamSize=1 on mobile devices due to less gpu/cpu time usage for saving battery life.

Requirements

  • iOS 11.0+
  • Xcode 9.0+ (Swift 4.x)

Original Model

This coreml model is exported from keras which is trained with MSCOCO dataset for about 40k steps. And presently it is not in the state of art yet. You may not use this in production. I trained the dataset with only one GTX Force 1080Ti for about 48 hours and currently don't have more time to train on it.Hope for community to keep on it.

Keras part

  • Train
        python ./train.py --weight_path WEIGHT_FILE_PATH_TO_CONTINUE_TRAINING  --TFRecord_pattern TFRECORD_FILE_PATTERN
    
    For example:
        python ./train.py --weight_path ./keras_weight/weights_full.h5  --TFRecord_pattern ./tfrecords/train-?????-of-00256
    
  • Test
        python ./inference.py --weight_path WEIGHT_FILE_PATH  --image_path TEST_IMAGE_PATH --max_sentence_length 20
    
    For example:
        python ./inference.py --weight_path ./keras_weight/weights_full.h5  --image_path ./test.jpg --max_sentence_length 20
    
  • Convert to CoreML Model
        python ./convert_coreml.py --export_lstm False
    
    export_lstm determine whether to export the inception part or lstm part model.(The whole model is split into 2 parts. One for image encoding, one for decoding words)

Pretained Weight

Pretained Keras weight file will be uploaded to google driver in short time.

Training dataset

We use MS-COCO dataset, you can fetch raw data and build them into tfrecords according to the origin tensorflow im2txt

TODO

  • Train on the dataset to 100k steps. (currently 40k)
  • Open source origin model based on Keras which is trained with.
  • More language support (Chinese).

Thanks for third party lib in demo

Contact

  • 曹佳鑫 (tsao)An iOS developer with experience in deep learning living in Shanghai.
  • Pull requests and issues are welcome.
  • Mail: [email protected]

License

ShowAndTell is available under the MIT license. See the LICENSE file for more info.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].