alexschultz / ReadToMe

Licence: Apache-2.0 license

No description or website provided.

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ReadToMe

ScribeBot

A highly scriptable automation system full of cool features. Automate everything with a little bit of Lua.

Stars: ✭ 72 (+41.18%)

Mutual labels: ocr, tesseract

pmOCR

A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity

Stars: ✭ 53 (+3.92%)

Mutual labels: ocr, tesseract

Lambda Text Extractor

AWS Lambda functions to extract text from various binary formats.

Stars: ✭ 159 (+211.76%)

Mutual labels: ocr, tesseract

Tesseract4android

Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.

Stars: ✭ 148 (+190.2%)

Mutual labels: ocr, tesseract

Tesseract

Bindings to Tesseract OCR engine for R

Stars: ✭ 192 (+276.47%)

Mutual labels: ocr, tesseract

Tesseract Macos

Objective C wrapper for the open source OCR Engine Tesseract (macOS)

Stars: ✭ 154 (+201.96%)

Mutual labels: ocr, tesseract

Ocr Table

Extract tables from scanned image PDFs using Optical Character Recognition.

Stars: ✭ 165 (+223.53%)

Mutual labels: ocr, tesseract

Links Detector

📖 👆🏻 Links Detector makes printed links clickable via your smartphone camera. No need to type a link in, just scan and click on it.

Stars: ✭ 106 (+107.84%)

Mutual labels: ocr, tesseract

Android Ocr

Experimental optical character recognition app

Stars: ✭ 2,177 (+4168.63%)

Mutual labels: ocr, tesseract

Tesseract Ocr For Php

A wrapper to work with Tesseract OCR inside PHP.

Stars: ✭ 2,247 (+4305.88%)

Mutual labels: ocr, tesseract

Tesseract Ocr for windows

Visual Studio Projects for Tessearct and dependencies

Stars: ✭ 122 (+139.22%)

Mutual labels: ocr, tesseract

Image2text

📋 Python wrapper to grab text from images and save as text files using Tesseract Engine

Stars: ✭ 243 (+376.47%)

Mutual labels: ocr, tesseract

Aadhaar Card Ocr

Extract text information from Aadhaar Card using tesseract-ocr 😎

Stars: ✭ 112 (+119.61%)

Mutual labels: ocr, tesseract

Ocrtable

Recognize tables and text from scanned images that contain tables. 从包含表格的扫描图片中识别表格和文字

Stars: ✭ 155 (+203.92%)

Mutual labels: ocr, tesseract

Tabulo

Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)

Stars: ✭ 110 (+115.69%)

Mutual labels: ocr, tesseract

Crnn Mxnet Chinese Text Recognition

An implementation of CRNN (CNN+LSTM+warpCTC) on MxNet for chinese text recognition

Stars: ✭ 161 (+215.69%)

Mutual labels: ocr, mxnet

Gosseract

Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

Stars: ✭ 1,622 (+3080.39%)

Mutual labels: ocr, tesseract

Tesseract

This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.

Stars: ✭ 43,199 (+84603.92%)

Mutual labels: ocr, tesseract

Swiftytesseract

A Swift wrapper around Tesseract for use in iOS, macOS, and Linux applications

Stars: ✭ 170 (+233.33%)

Mutual labels: ocr, tesseract

Tessdata fast

Fast integer versions of trained LSTM models

Stars: ✭ 221 (+333.33%)

Mutual labels: ocr, tesseract

View All Similar Projects ➔

Read To Me

Project submission for AWS DeepLens Challenge

Solution

For this project, I wanted to build an application that could read books to children. In order to achieve this, I designed a workflow which performs the following steps.

Determine when a page with text is in the camera frame
Clean up the image using OpenCV
Perform OCR (Optical Character Recognition)
Transform text into audio using AWS Polly
Play back the audio through speakers plugged into DeepLens

Model Training

My dataset was made from hundreds of photos of my kids' books as well as a number of library books taken in various lighting conditions, orientations, and distances. I used labelImg to annotate my dataset with bounding boxes so I could train the model to identify Text Blocks on a page.

The Model was trained using MXNet using a VGG 16 model as a base. The steps used for training are outlined in this notebook

Architecture

This project is built using GreenGrass, Python, MXNet, OpenCV, Tesseract, and AWS Polly.

In order to get the Text Area cleaned up to perform OCR, the program needs perform pre-processing on the image using a number of filters in OpenCV. This graphic shows an example of the steps that ReadToMe goes through with each image before trying to turn the image into text.

The lambda consists of three main files.

readToMeLambda.py
- Contains main workflow for project (imports imageProcessing.py)
imageProcessing.py
- Contains helper functions used for image and text cleanup
speak.py
- Contains helper functions used to call AWS Polly and synthesize the audio

Because the user has no way to tell the DeepLens when a book is in front of the camera, we use the model to detect blocks of text on the page. When we find a text block, we isolate the image using the getRoi() function inside of imageProcessing.py.

Another important step that is performed is correctSkew() in imageProcessing.py. This warps/rotates the text block to try to make the text horizontal. If the text is angled or skewed, there will be problems when trying to do OCR.

Finally we remove any non utf-8 characters after doinc ocr. RemoveNonUtf8BadChars() in imageProcessing.py. This step just attempts to clean up the text before turning the text to speach.

Deploy Project to Device

To run this project on the deeplens, you will need to first install a few packages from a terminal window on the DeepLens.

sudo apt-get update && sudo apt-get install tesseract-ocr && apt-get install python-gi

Also, in order to get the audio to work on DeepLens, I had to perform the following steps:

Log into the DeepLens and take any audio .mp3 file and double click it.
A prompt will open up asking you to install some required packages which are needed to play back audio on the device
You will need to enter an administrator password to proceed

Instructions for creating a DeepLens project can be found in the online documentation

The model files for this project are located here: https://github.com/alexschultz/ReadToMe/tree/master/mxnet-model

You will need to tar up the files and put them in S3 when you create the project for the DeepLens. See the official AWS documentation for detailed instructions

The Lambda function code is contained inside the lambda directory in this repo.

In order to build the lambda function, you will need to install the project dependencies and package up the lambda to be deployed to AWS. This needs to be done from a linux machine as the dependencies are OS specific.

To install the pip packages, cd into the lambda directory and run the following command:

pip install -r requirements.txt

Once you have the pip packages installed, you need to bundle up the files into a distributable so it can be uploaded to the Lambda console.

I have included a helper python script to package up the lambda in the format that AWS requires See the documentation referenced above for more details on how to create a custom DeepLens project.

If you have any questions or find any issues with this project, please open an issue, Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

alexschultz / ReadToMe

Programming Languages

Labels

Projects that are alternatives of or similar to ReadToMe

Read To Me

Project submission for AWS DeepLens Challenge

Solution

Model Training

Architecture

Deploy Project to Device