All Projects → mv-lab → kuzushiji-recognition

mv-lab / kuzushiji-recognition

Licence: other
Kuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kuzushiji-recognition

kaggle
Kaggle solutions
Stars: ✭ 17 (+6.25%)
Mutual labels:  kaggle, deeplearning
YuzuMarker
🍋 [WIP] Manga Translation Tool
Stars: ✭ 76 (+375%)
Mutual labels:  ocr, japanese
google-retrieval-challenge-2019-fastai-starter
fast.ai starter kit for Google Landmark Retrieval 2019 challenge
Stars: ✭ 62 (+287.5%)
Mutual labels:  kaggle, fastai
Deep-Learning-Experiments-implemented-using-Google-Colab
Colab Compatible FastAI notebooks for NLP and Computer Vision Datasets
Stars: ✭ 16 (+0%)
Mutual labels:  kaggle, fastai
Tr
Free Offline OCR 离线的中文文本检测+识别SDK
Stars: ✭ 598 (+3637.5%)
Mutual labels:  ocr, deeplearning
Data-Scientist-In-Python
This repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (+43.75%)
Mutual labels:  kaggle, deeplearning
Deeplearning
深度学习入门教程, 优秀文章, Deep Learning Tutorial
Stars: ✭ 6,783 (+42293.75%)
Mutual labels:  kaggle, deeplearning
jp-ocr-prunned-cnn
Attempting feature map prunning on a CNN trained for Japanese OCR
Stars: ✭ 15 (-6.25%)
Mutual labels:  ocr, japanese
East
This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.
Stars: ✭ 478 (+2887.5%)
Mutual labels:  ocr, deeplearning
gazou
Japanese OCR for Linux & Windows
Stars: ✭ 32 (+100%)
Mutual labels:  ocr, japanese
kuzushiji-recognition
5th place solution for the Kaggle Kuzushiji Recognition Challenge
Stars: ✭ 41 (+156.25%)
Mutual labels:  ocr, centernet
Kaku
画 - Japanese OCR Dictionary
Stars: ✭ 160 (+900%)
Mutual labels:  ocr, japanese
Printed-Chinese-Character-OCR
This is a Chinese Character ocr system based on Deep learning (VGG like CNN neural net work),this rep include trainning set generating,image preprocesing,NN model optimizing based on Keras high level NN framwork
Stars: ✭ 21 (+31.25%)
Mutual labels:  ocr, deeplearning
KerasR
DeepLearning using Keras with R
Stars: ✭ 26 (+62.5%)
Mutual labels:  deeplearning
rawr
Extract raw R code directly from webpages, including Github, Kaggle, Stack Overflow, and sites made using Blogdown.
Stars: ✭ 15 (-6.25%)
Mutual labels:  kaggle
Shadow
计算机基础知识,数据结构,设计模式,Tomcat中间件的实现
Stars: ✭ 19 (+18.75%)
Mutual labels:  ocr
ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (+150%)
Mutual labels:  ocr
CenterNetPerson
CenterNet used for pedestrian detection
Stars: ✭ 26 (+62.5%)
Mutual labels:  centernet
cl-skkserv
Common LispによるSKK辞書サーバーとその拡張
Stars: ✭ 22 (+37.5%)
Mutual labels:  japanese
stock-prediction-with-DL
深度学习与股票分析预测
Stars: ✭ 13 (-18.75%)
Mutual labels:  deeplearning

Kuzushiji Recognition

Opening the door to a thousand years of Japanese culture

Official Kaggle Competition


Japanese Culture and AI Symposium 2019

We were invited to present this solution at Japanese Culture and AI Symposium 2019 in Tokyo, Japan on November 11.





Build a model to transcribe ancient Kuzushiji into contemporary Japanese characters

Imagine the history contained in a thousand years of books. What stories are in those books? What knowledge can we learn from the world before our time? What was the weather like 500 years ago? What happened when Mt. Fuji erupted? How can one fold 100 cranes using only one piece of paper? The answers to these questions are in those books.

Japan has millions of books and over a billion historical documents such as personal letters or diaries preserved nationwide. Most of them cannot be read by the majority of Japanese people living today because they were written in “Kuzushiji”.

Even though Kuzushiji, a cursive writing style, had been used in Japan for over a thousand years, there are very few fluent readers of Kuzushiji today (only 0.01% of modern Japanese natives). Due to the lack of available human resources, there has been a great deal of interest in using Machine Learning to automatically recognize these historical texts and transcribe them into modern Japanese characters. Nevertheless, several challenges in Kuzushiji recognition have made the performance of existing systems extremely poor.

The hosts need help from machine learning experts to transcribe Kuzushiji into contemporary Japanese characters. With your help, Center for Open Data in the Humanities (CODH) will be able to develop better algorithms for Kuzushiji recognition. The model is not only a great contribution to the machine learning community, but also a great help for making millions of documents more accessible and leading to new discoveries in Japanese history and culture.

Screenshot-from-2019-10-15-21-06-43

Team


9th place Solution: Simple but efficient.

Please check the notebook: Kuzushiji Recognition Starter


From the beginning @ollieperree was using a 2-stage approach. Our approach to detection was directly inspired by K_mat's kernel, with the main takeaway being the idea of predicting a heatmap showing the centers of characters. Initially, we used a U-Net with a resnet18 backbone to predict a heatmap consisting of ellipses placed at the centers of characters, with the radii proportional to the width and height of the bounding box, with the input to the model being a 1024x1024 pixel crop of the page resized to 256x256 pixels. Predictions for the centers were then obtained by picking the local maxima (note that the width and height of the bounding box were not predicted). Performance was improved by changing the ellipses to circles of constant radius.

We tried using focal loss and binary cross-entropy as loss functions, but using mean squared error resulted in the cleanest predictions for us (though more epochs were needed to get sensible-looking predictions).

One issue with using 1024x1024 crops of the page as the input were "artifacts" around the edges of the input. We tried a few things to try to counteract this, such as moving the sliding window over the page with a stride less than 1024x1024, then removing duplicate predictions by detecting when two predicted points of the same class were within a certain distance of each other. However, these did not give an improvement on the LB - we think that tuning parameters for these methods on the validation set, as well as the parameters for selecting maxima in the heatmap, might have caused us to "overfit".

These artifacts were related with the drawings and annotations! (See carefully the red dots at the images) How did we fix this? Check ensemble


We have a 2 stage model: detection and classification.

Detection

We used as starter code the great kernel: CenterNet -Keypoint Detector- by @kmat2019 Then I realized that @seesee had his own keras-centernet. At the end we used Hourglass and the output are boxes instead of only the centers (like the original paper).

Model

  • Detection by hourglassnet
  • Output heatmaps + width/height maps
  • generate_heatmap_crops_circular(crop1024,resize256)
  • Validation: w/o outliers GroupKFold
  • resnet34
  • MSELoss, dice and IOU (oss=0.00179, dice=0.6270, F1=0.9856, iou=0.8142)
  • Augmentations: aug(randombrightness0.2,scale0.5)
  • Learning rate: (1e-4:1e-6)
  • 20epochs

Classification

The classification model was a resnet18, pretrained on ImageNet, with the input being a fixed 256x256 pixel area, scaled down to 128x128, centered at the (predicted) center of the character. The training data included a background class, whose training examples were random 256x256 crops of the pages with no labelled characters. Training was done using the fastai library, with standard fastai transforms and MixUp. This model achieved a Classification accuracy of 93.6% on a validation set (20% of the train data, group split by book).

Augmentations

We are using standard augmentations from https://github.com/albu/albumentations/ library, including adjusting colors that helps simulate different paper styles

> Can you tell me wich one is the real one?

Code

import albumentations

colorize = albumentations.RGBShift(r_shift_limit=0, g_shift_limit=0, b_shift_limit=[-80,0])

def color_get_params():
    a = random.uniform(-40, 0)
    b = random.uniform(-80, -30)
    return {"r_shift": a,
            "g_shift": a,
            "b_shift": b}

colorize.get_params = color_get_params

aug = albumentations.Compose([albumentations.RandomBrightnessContrast(contrast_limit=0.2, brightness_limit=0.2),
                              albumentations.ToGray(),
                              albumentations.Blur(),
                              albumentations.Rotate(limit=5),
                              colorize
                             ])

Ensemble

We had a problem... our centers weren't ordered! so in order to improve the accuracy and delete false positives we thought in the following ensemble method. For the image IMG we take the most external centers from 3 predictions:

(xmin, ymin), (xmin, ymax), (xmax,ymin), (xmax,ymax)

At the picture this boxes are represented by 3 different colours (yellow, blue, red). Finally, we take the intersection of those 3 boxes, the black rectangle defined as (X,Y,Z,W), and we drop all the centers out of the black box! with this technique we could eliminate artifacts like predictions at the edges.

References

Acknowledgements

We would like to thank the organizers:

and the Official Collaborators: Mikel Bober-Irizar (anokas) Kaggle Grandmaster and Alex Lamb (MILA. Quebec Artificial Intelligence Institute)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].