All Projects → ichisadashioko → kanji-recognition

ichisadashioko / kanji-recognition

Licence: other
ichisadashioko.github.io/kanji-recognition/

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kanji-recognition

kanji-handwriting-swift
Kanji handwriting recognition for iOS using Zinnia.
Stars: ✭ 27 (+28.57%)
Mutual labels:  handwriting-recognition
PaperSynth
Handwritten text to synths!
Stars: ✭ 18 (-14.29%)
Mutual labels:  handwriting-recognition
neural net handwriting
neural network for handwriting recognition from scratch in C
Stars: ✭ 17 (-19.05%)
Mutual labels:  handwriting-recognition
tensorflow-example
Tensorflow-example:使用MNIST训练模型,并识别手写数字图片
Stars: ✭ 26 (+23.81%)
Mutual labels:  handwriting-recognition
iinkJS
✏️ ☁️ iinkJS is the fastest way to integrate rich handwriting recognition features in your webapp.
Stars: ✭ 65 (+209.52%)
Mutual labels:  handwriting-recognition
PyCasia
A python library to work with the CASIA Chinese handwriting database.
Stars: ✭ 38 (+80.95%)
Mutual labels:  handwriting-recognition
gestures-ml-js
[WIP] - Gesture recognition using hardware and Tensorflow.js
Stars: ✭ 69 (+228.57%)
Mutual labels:  tensorflow-js
handwriting-recognition
Handwriting Recognition Web API Proposal
Stars: ✭ 51 (+142.86%)
Mutual labels:  handwriting-recognition
Character-recognition-by-neural-network
Back Propagation, Python
Stars: ✭ 32 (+52.38%)
Mutual labels:  handwriting-recognition
hwrt
A toolset for handwriting recognition
Stars: ✭ 61 (+190.48%)
Mutual labels:  handwriting-recognition
form-segmentation
Let's explore how we can extract text from forms
Stars: ✭ 42 (+100%)
Mutual labels:  handwriting-recognition
rnnt decoder cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
Stars: ✭ 60 (+185.71%)
Mutual labels:  handwriting-recognition
handwriting.js
A simple API access to the handwriting recognition service of Google IME
Stars: ✭ 89 (+323.81%)
Mutual labels:  handwriting-recognition
Handwritten-Names-Recognition
The goal of this project is to solve the task of name transcription from handwriting images implementing a NN approach.
Stars: ✭ 54 (+157.14%)
Mutual labels:  handwriting-recognition
KoreanClassification Keras Coreml
한글 손글씨 분류 모델을 만들어 iOS 애플리케이션에 적용해보았습니다 📱
Stars: ✭ 29 (+38.1%)
Mutual labels:  handwriting-recognition
Nsfwjs
NSFW detection on the client-side via TensorFlow.js
Stars: ✭ 5,223 (+24771.43%)
Mutual labels:  tensorflow-js
air writing
Online Hand Writing Recognition using BLSTM
Stars: ✭ 26 (+23.81%)
Mutual labels:  handwriting-recognition
tensorflow-image-recognition-chrome-extension
Chrome browser extension for using TensorFlow image recognition on web pages
Stars: ✭ 88 (+319.05%)
Mutual labels:  tensorflow-js
recrossable
crossword game with simplistic handwriting recognition and automatic generation of crosswords
Stars: ✭ 36 (+71.43%)
Mutual labels:  handwriting-recognition
Handwriting-Recognition
Software to recognize handwriting
Stars: ✭ 46 (+119.05%)
Mutual labels:  handwriting-recognition

Kanji recognition

Introduction

This project is inspired by the Tensorflow tutorial on MNIST handwritten digit when I was learning Convolutional Nerual Networks.

This project demonstrates building a CNN to recognize Japanese kanji characters.

Write-up

Labels

Moving away from the MNIST example, my first problem was the labels. As I was learning RTK at that time, I thought those characters would be a good starting point as they fit my need (my need for using those characters during learning Japanese). I spent a few days writing some scrappers for getting those characters from a memrise course and wikipedia.

Data

While writing those scrappers, I realized that I had no dataset for training. Because of that, I created a drawing/note taking app with cordova to generate some data without labeling. That took a few weeks and I was happy with that because that was one of my first mobile app experience.

A few weeks later, I realized that I was not generating nowhere enough data for training. The MNIST example has around 10,000 records for each labels. I had ~2000 labels and less than 10 records for 20% of those labels. I needed to find a way to create data. "Fonts" - a thing that came to my mind. While learning Japanese with Anki, the default font for rendering Japanese was pretty bad for learners - the characters were not rendered as we suppose to write them. I got my hand on some of the Japanese fonts that is suitable for Japanese learners from the community. I got the ideal to use Japanese fonts to generate image data. It took me a 1-2 months to complete the project.

Training

With the data ready, I was able to train reasonable good models (in a few weeks). I spent the next few months to build some applications that utilize that model - a web app demo, an android app, and a desktop app for labeling my hard-work writing data.

New data

One days, I stumbled on ETL Character Database - an image dataset which is perfect for my need. It contains more data than I can write in the next 5 years. In addition, all of them has been labeled. I took me a few weeks to process one part of the dataset. It was the first time I had to research about text encoding (ASCII, UTF-8, SHIFT-JIS, UTF-16, etc.). With the new found dataset, the model performed significant better than being trained with the fonts dataset.

Implementations

  • TensorFlow - Python - Train model with Python and TensorFlow.

  • tfjs (on gh-pages branch) - Use trained model to recognize handwriting from HTML canvas with JavaScript.

  • TensorFlow Lite - Use trained model to create handwriting input app on Android device with Java/Kotlin.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].