All Projects → croath → han

croath / han

Licence: other
Using Tensorflow to train a model to detect miswritten Chinese characters.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to han

chinese-tokenizer
Tokenizes Chinese texts into words.
Stars: ✭ 72 (+500%)
Mutual labels:  chinese
flask-docs-zh
Flask 文档简体中文翻译
Stars: ✭ 93 (+675%)
Mutual labels:  chinese
Soft-CHS
用于存放一些自行汉化的小软件 1.madvr
Stars: ✭ 97 (+708.33%)
Mutual labels:  chinese
predict Lottery ticket
双色球+大乐透彩票AI预测
Stars: ✭ 341 (+2741.67%)
Mutual labels:  chinese
pinyin data
🐼 Easy to use and portable pronunciation data for Hanzi characters.
Stars: ✭ 13 (+8.33%)
Mutual labels:  chinese
rasa bot
整理:基于Rasa-NLU和Rasa-Core的任务型ChatBot
Stars: ✭ 51 (+325%)
Mutual labels:  chinese
ime.vim
A Vim input method engine
Stars: ✭ 74 (+516.67%)
Mutual labels:  chinese
hsk-vocabulary
🇨🇳Open source Chinese HSK vocabulary list with example sentences
Stars: ✭ 27 (+125%)
Mutual labels:  chinese
PHP-Chinese
PHP Chinese Conversion (中文繁簡轉換)
Stars: ✭ 37 (+208.33%)
Mutual labels:  chinese
react-flashcards
A simple React + Firebase flashcard application
Stars: ✭ 29 (+141.67%)
Mutual labels:  chinese
stable-baselines-zh
Stable Baselines官方文档中文版
Stars: ✭ 75 (+525%)
Mutual labels:  chinese
ChineseNames
🀄 Chinese Name Database (1930-2008)
Stars: ✭ 99 (+725%)
Mutual labels:  chinese
iTop-CN
iTop in chinese
Stars: ✭ 36 (+200%)
Mutual labels:  chinese
BERT-chinese-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained BERT model for text classification.
Stars: ✭ 92 (+666.67%)
Mutual labels:  chinese
shudu
Shudu 為一個開源文字處理平台,目的是讓閱讀者能夠舒服的閱讀、編寫文案。
Stars: ✭ 25 (+108.33%)
Mutual labels:  chinese
Vanhiupun.github.io
🏖️ Vanhiupun's Awesome Site ==> another theme for elegant writers with modern flat style and beautiful night/dark mode.
Stars: ✭ 57 (+375%)
Mutual labels:  chinese
fuzzychinese
A small package to fuzzy match chinese words
Stars: ✭ 50 (+316.67%)
Mutual labels:  chinese
weapp-poem
诗词墨客 - 最全中华古诗词小程序
Stars: ✭ 409 (+3308.33%)
Mutual labels:  chinese
DataCLUE
DataCLUE: 数据为中心的NLP基准和工具包
Stars: ✭ 133 (+1008.33%)
Mutual labels:  chinese
lwodf
The Chinese edition of Live Working or Die Fighting: How the Working Class Went Global (劳工的全球化), authored by Paul Mason, translated by the CNPolitics translation team.
Stars: ✭ 25 (+108.33%)
Mutual labels:  chinese

Han

Description

Han is a deep-learning project dealing with misspelled handwriting Chinese characters.

Its primary purpose is to find out the misspelled Chinese characters written by professional Chinese font designers, to review the result.

As there are about 5,000+ most common characters in Chinese, and regular Chinese font may contain more than 8,000. So there will be a chance for font designers, to miswrite it.

The project may not operate well on Chinese OCR purpose.

Tech details can be found with PDF: https://drive.google.com/file/d/0B-noE_nG9ncQV2R2M2F0eW5kbk0/view?usp=sharing

How to use it

The project is writen with Python 3.6.

First of all, you should prepare some data to train. A good way to prepare train-data is to generate images from an existing font file.

dump_ttf.sh will do the most work.

Training progress will be triggered by run.sh script. And rerun.sh can help to run some new data on an existing checkpoint.

Benchmark

With 20 different fonts and 8,877 images generated by those fonts as one epoch, all 15 epochs will be done within 7 hours on a K80 machine on Azure.

The accuracy will be higher than 99% on an entirely new font to detect wrong characters.

You can check the result with my model file: https://drive.google.com/file/d/0B-noE_nG9ncQY0ZZd29HWnNsUDQ/view?usp=sharing . It's a quantized model, can be loaded with server.sh script.

Feel free to send me an email if there's anything that confused you.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].