All Projects → alexschultz → ReadToMe

alexschultz / ReadToMe

Licence: Apache-2.0 license
No description or website provided.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ReadToMe

ScribeBot
A highly scriptable automation system full of cool features. Automate everything with a little bit of Lua.
Stars: ✭ 72 (+41.18%)
Mutual labels:  ocr, tesseract
pmOCR
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
Stars: ✭ 53 (+3.92%)
Mutual labels:  ocr, tesseract
Lambda Text Extractor
AWS Lambda functions to extract text from various binary formats.
Stars: ✭ 159 (+211.76%)
Mutual labels:  ocr, tesseract
Tesseract4android
Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.
Stars: ✭ 148 (+190.2%)
Mutual labels:  ocr, tesseract
Tesseract
Bindings to Tesseract OCR engine for R
Stars: ✭ 192 (+276.47%)
Mutual labels:  ocr, tesseract
Tesseract Macos
Objective C wrapper for the open source OCR Engine Tesseract (macOS)
Stars: ✭ 154 (+201.96%)
Mutual labels:  ocr, tesseract
Ocr Table
Extract tables from scanned image PDFs using Optical Character Recognition.
Stars: ✭ 165 (+223.53%)
Mutual labels:  ocr, tesseract
Links Detector
📖 👆🏻 Links Detector makes printed links clickable via your smartphone camera. No need to type a link in, just scan and click on it.
Stars: ✭ 106 (+107.84%)
Mutual labels:  ocr, tesseract
Android Ocr
Experimental optical character recognition app
Stars: ✭ 2,177 (+4168.63%)
Mutual labels:  ocr, tesseract
Tesseract Ocr For Php
A wrapper to work with Tesseract OCR inside PHP.
Stars: ✭ 2,247 (+4305.88%)
Mutual labels:  ocr, tesseract
Tesseract Ocr for windows
Visual Studio Projects for Tessearct and dependencies
Stars: ✭ 122 (+139.22%)
Mutual labels:  ocr, tesseract
Image2text
📋 Python wrapper to grab text from images and save as text files using Tesseract Engine
Stars: ✭ 243 (+376.47%)
Mutual labels:  ocr, tesseract
Aadhaar Card Ocr
Extract text information from Aadhaar Card using tesseract-ocr 😎
Stars: ✭ 112 (+119.61%)
Mutual labels:  ocr, tesseract
Ocrtable
Recognize tables and text from scanned images that contain tables. 从包含表格的扫描图片中识别表格和文字
Stars: ✭ 155 (+203.92%)
Mutual labels:  ocr, tesseract
Tabulo
Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)
Stars: ✭ 110 (+115.69%)
Mutual labels:  ocr, tesseract
Crnn Mxnet Chinese Text Recognition
An implementation of CRNN (CNN+LSTM+warpCTC) on MxNet for chinese text recognition
Stars: ✭ 161 (+215.69%)
Mutual labels:  ocr, mxnet
Gosseract
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library
Stars: ✭ 1,622 (+3080.39%)
Mutual labels:  ocr, tesseract
Tesseract
This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
Stars: ✭ 43,199 (+84603.92%)
Mutual labels:  ocr, tesseract
Swiftytesseract
A Swift wrapper around Tesseract for use in iOS, macOS, and Linux applications
Stars: ✭ 170 (+233.33%)
Mutual labels:  ocr, tesseract
Tessdata fast
Fast integer versions of trained LSTM models
Stars: ✭ 221 (+333.33%)
Mutual labels:  ocr, tesseract

Read To Me

Project submission for AWS DeepLens Challenge


ReadToMe

Solution

For this project, I wanted to build an application that could read books to children. In order to achieve this, I designed a workflow which performs the following steps.

  • Determine when a page with text is in the camera frame
  • Clean up the image using OpenCV
  • Perform OCR (Optical Character Recognition)
  • Transform text into audio using AWS Polly
  • Play back the audio through speakers plugged into DeepLens

Model Training

My dataset was made from hundreds of photos of my kids' books as well as a number of library books taken in various lighting conditions, orientations, and distances. I used labelImg to annotate my dataset with bounding boxes so I could train the model to identify Text Blocks on a page.

The Model was trained using MXNet using a VGG 16 model as a base. The steps used for training are outlined in this notebook

Architecture

This project is built using GreenGrass, Python, MXNet, OpenCV, Tesseract, and AWS Polly.

In order to get the Text Area cleaned up to perform OCR, the program needs perform pre-processing on the image using a number of filters in OpenCV. This graphic shows an example of the steps that ReadToMe goes through with each image before trying to turn the image into text.

IOT Console

The lambda consists of three main files.

  • readToMeLambda.py
    • Contains main workflow for project (imports imageProcessing.py)
  • imageProcessing.py
    • Contains helper functions used for image and text cleanup
  • speak.py
    • Contains helper functions used to call AWS Polly and synthesize the audio

Because the user has no way to tell the DeepLens when a book is in front of the camera, we use the model to detect blocks of text on the page. When we find a text block, we isolate the image using the getRoi() function inside of imageProcessing.py.

Another important step that is performed is correctSkew() in imageProcessing.py. This warps/rotates the text block to try to make the text horizontal. If the text is angled or skewed, there will be problems when trying to do OCR.

Finally we remove any non utf-8 characters after doinc ocr. RemoveNonUtf8BadChars() in imageProcessing.py. This step just attempts to clean up the text before turning the text to speach.

Deploy Project to Device

To run this project on the deeplens, you will need to first install a few packages from a terminal window on the DeepLens.

sudo apt-get update && sudo apt-get install tesseract-ocr && apt-get install python-gi

Also, in order to get the audio to work on DeepLens, I had to perform the following steps:

  1. Log into the DeepLens and take any audio .mp3 file and double click it.
  2. A prompt will open up asking you to install some required packages which are needed to play back audio on the device Prompt Additional Prompt
  3. You will need to enter an administrator password to proceed

Instructions for creating a DeepLens project can be found in the online documentation

The model files for this project are located here: https://github.com/alexschultz/ReadToMe/tree/master/mxnet-model

You will need to tar up the files and put them in S3 when you create the project for the DeepLens. See the official AWS documentation for detailed instructions

The Lambda function code is contained inside the lambda directory in this repo.

In order to build the lambda function, you will need to install the project dependencies and package up the lambda to be deployed to AWS. This needs to be done from a linux machine as the dependencies are OS specific.

To install the pip packages, cd into the lambda directory and run the following command:

pip install -r requirements.txt

Once you have the pip packages installed, you need to bundle up the files into a distributable so it can be uploaded to the Lambda console.

I have included a helper python script to package up the lambda in the format that AWS requires See the documentation referenced above for more details on how to create a custom DeepLens project.

If you have any questions or find any issues with this project, please open an issue, Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].