All Projects → EliasCai → idcard-ocr

EliasCai / idcard-ocr

Licence: other
端到端的针对身份证的文字识别

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to idcard-ocr

blog
技术资料日常积累(欢迎投稿)
Stars: ✭ 59 (+168.18%)
Mutual labels:  ocr
ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (+81.82%)
Mutual labels:  ocr
alfresco-simple-ocr
Simple OCR action for Alfresco
Stars: ✭ 40 (+81.82%)
Mutual labels:  ocr
dinglehopper
An OCR evaluation tool
Stars: ✭ 38 (+72.73%)
Mutual labels:  ocr
omynote
众山小笔记 - 集中管理你的读书笔记
Stars: ✭ 154 (+600%)
Mutual labels:  ocr
ImageToText
OCR with Google's AI technology (Cloud Vision API)
Stars: ✭ 30 (+36.36%)
Mutual labels:  ocr
How-to-use-tesseract-ocr-4.0-with-csharp
How to use Tesseract OCR 4.0 with C#
Stars: ✭ 60 (+172.73%)
Mutual labels:  ocr
vrpdr
Deep Learning Applied To Vehicle Registration Plate Detection and Recognition in PyTorch.
Stars: ✭ 36 (+63.64%)
Mutual labels:  ocr
Php-Google-Vision-Api
Google Vision Api for PHP (https://cloud.google.com/vision/)
Stars: ✭ 61 (+177.27%)
Mutual labels:  ocr
fakemenot
Application to check authenticity of Twitter screenshots. Written in Python 🐍
Stars: ✭ 29 (+31.82%)
Mutual labels:  ocr
DocTr
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Stars: ✭ 202 (+818.18%)
Mutual labels:  ocr
ruzzle-solver
A python script that solves ruzzle boards
Stars: ✭ 46 (+109.09%)
Mutual labels:  ocr
mirador-textoverlay
Text Overlay plugin for Mirador 3
Stars: ✭ 35 (+59.09%)
Mutual labels:  ocr
Printed-Chinese-Character-OCR
This is a Chinese Character ocr system based on Deep learning (VGG like CNN neural net work),this rep include trainning set generating,image preprocesing,NN model optimizing based on Keras high level NN framwork
Stars: ✭ 21 (-4.55%)
Mutual labels:  ocr
Table-Extractor-From-Image
This repository contains the code that extracts a table from an image and exports it to an Excel.
Stars: ✭ 46 (+109.09%)
Mutual labels:  ocr
Tess4Android
A new fork base on tess-two and Tesseract 4.0.0
Stars: ✭ 31 (+40.91%)
Mutual labels:  ocr
Shadow
计算机基础知识,数据结构,设计模式,Tomcat中间件的实现
Stars: ✭ 19 (-13.64%)
Mutual labels:  ocr
tesseract-unity
Standalone OCR plugin for Unity using Tesseract
Stars: ✭ 35 (+59.09%)
Mutual labels:  ocr
extract-information-from-identity-card
From identity card image, this repo detect 4 corners, align by OpenCV, then detect word in image and recognize word by Transformer OCR.
Stars: ✭ 81 (+268.18%)
Mutual labels:  ocr
kuzushiji-recognition
Kuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.
Stars: ✭ 16 (-27.27%)
Mutual labels:  ocr

idcard-ocr

用途:

主要针对身份证的地址文字OCR,其他有文字的图片也可以,但没有文字检测的功能

文件夹介绍

  1. background:生成文字的背景图片
  2. code:代码库
  3. corpus:语料库(需要自己准备,这里上传的是一个样例)
  4. font:字体库
  5. log:存放模型权重

代码说明

├── code
│   ├── comp_face++.py (与face++的检测结果进行比较)
│   ├── data_generator.py (数据生成器)
│   ├── demo_new.py (将模型的输出结果用json存储)
│   ├── densenet.py (模型的网络结构)
│   ├── detect_ocr.py (检测及识别身份证文本)
│   ├── gen_real.py (生成真实的身份证图片及文本)
│   ├── id_label_map.pbtxt (文本检测的类别说明)
│   ├── keys.py (字库)
│   ├── model.png (模型的网络结构图)
│   ├── model.py (编译模型并指定优化器)
│   ├── test.py (输出生成图片的检测结果)
│   ├── train.py (训练代码)
│   └── train_test.py (测试训练代码)

运行环境

  1. keras == 2.2.0
  2. tensorflow == 1.6.0

训练:

  1. 建立语料库(corpus/address_lite.txt)
  2. 训练模型(code/train.py)

改进

代码参考该库,主要改进以下:

  1. 扩充了汉字库(增加了200多个,目前有6000个以上汉字)
  2. end2end的训练模式,从语料库直接生成文字图片,更加方便
  3. 文字和图片的随机增强(文字随机排列、图片的背景和字体的随机生成、文字的角度等)

精度

loss为0.56,精度为0.93

Todo List

  1. 模型与训练的代码分开
  2. 除了地址,增加其他项目的识别(身份证号、有效日期等)
  3. 增加文字检测功能(只针对身份证)
  4. 文字增强(文字旋转图像不旋转、文字设置透明度)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].