Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ropensci → Tesseract

ropensci / Tesseract

Bindings to Tesseract OCR engine for R

Programming Languages

7636 projects

Labels

rstats ocr r-package tesseract tesseract-ocr

Projects that are alternatives of or similar to Tesseract

How-to-use-tesseract-ocr-4.0-with-csharp

How to use Tesseract OCR 4.0 with C#

Stars: ✭ 60 (-68.75%)

Mutual labels: ocr, tesseract, tesseract-ocr

Ccextractor

CCExtractor - Official version maintained by the core team

Stars: ✭ 356 (+85.42%)

Mutual labels: ocr, tesseract, tesseract-ocr

Nkocr

🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.

Stars: ✭ 15 (-92.19%)

Mutual labels: ocr, tesseract, tesseract-ocr

Image2text

📋 Python wrapper to grab text from images and save as text files using Tesseract Engine

Stars: ✭ 243 (+26.56%)

Mutual labels: ocr, tesseract, tesseract-ocr

Tesseract

This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.

Stars: ✭ 43,199 (+22399.48%)

Mutual labels: ocr, tesseract, tesseract-ocr

TesseractStudio.Net

A free Windows graphical interface to the Tesseract 4.0 OCR engine.

Stars: ✭ 38 (-80.21%)

Mutual labels: ocr, tesseract, tesseract-ocr

breach-protocol-autosolver

Solve breach protocol minigame in second(s). Windows/Linux/GeForce Now/Google Stadia. Every language.

Stars: ✭ 28 (-85.42%)

Mutual labels: ocr, tesseract, tesseract-ocr

React Native Tesseract Ocr

Tesseract OCR wrapper for React Native

Stars: ✭ 384 (+100%)

Mutual labels: ocr, tesseract, tesseract-ocr

Gosseract

Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

Stars: ✭ 1,622 (+744.79%)

Mutual labels: ocr, tesseract, tesseract-ocr

Textshot

Python tool for grabbing text via screenshot

Stars: ✭ 1,163 (+505.73%)

Mutual labels: ocr, tesseract, tesseract-ocr

Tesseract Ocr for windows

Visual Studio Projects for Tessearct and dependencies

Stars: ✭ 122 (-36.46%)

Mutual labels: ocr, tesseract, tesseract-ocr

Aadhaar Card Ocr

Extract text information from Aadhaar Card using tesseract-ocr 😎

Stars: ✭ 112 (-41.67%)

Mutual labels: ocr, tesseract, tesseract-ocr

Tesseract4android

Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.

Stars: ✭ 148 (-22.92%)

Mutual labels: ocr, tesseract, tesseract-ocr

Rentrez

talk with NCBI entrez using R

Stars: ✭ 151 (-21.35%)

Mutual labels: r-package, rstats

Qualtrics

Download ⬇️ Qualtrics survey data directly into R!

Stars: ✭ 151 (-21.35%)

Mutual labels: r-package, rstats

Tesseract Macos

Objective C wrapper for the open source OCR Engine Tesseract (macOS)

Stars: ✭ 154 (-19.79%)

Mutual labels: ocr, tesseract

Tesseract Ocr For Php

A wrapper to work with Tesseract OCR inside PHP.

Stars: ✭ 2,247 (+1070.31%)

Mutual labels: ocr, tesseract

Gender

Predict Gender from Names Using Historical Data

Stars: ✭ 149 (-22.4%)

Mutual labels: r-package, rstats

Ocrtable

Recognize tables and text from scanned images that contain tables. 从包含表格的扫描图片中识别表格和文字

Stars: ✭ 155 (-19.27%)

Mutual labels: ocr, tesseract

Textreuse

Detect text reuse and document similarity

Stars: ✭ 156 (-18.75%)

Mutual labels: r-package, rstats

View All Similar Projects ➔

tesseract

Extract text from an image. Requires that you have training data for the language you are reading. Works best for images with high contrast, little noise and horizontal text.

Hello World

Simple example

# Simple example
text <- ocr("https://jeroen.github.io/images/testocr.png")
cat(text)

# Get XML HOCR output
xml <- ocr("https://jeroen.github.io/images/testocr.png", HOCR = TRUE)
cat(xml)

Roundtrip test: render PDF to image and OCR it back to text

# Full roundtrip test: render PDF to image and OCR it back to text
curl::curl_download("https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf", "R-intro.pdf")
orig <- pdftools::pdf_text("R-intro.pdf")[1]

# Render pdf to png image
img_file <- pdftools::pdf_convert("R-intro.pdf", format = 'tiff', pages = 1, dpi = 400)

# Extract text from png image
text <- ocr(img_file)
unlink(img_file)
cat(text)

Installation

On Windows and MacOS the package binary package can be installed from CRAN:

install.packages("tesseract")

Installation from source on Linux or OSX requires the Tesseract library (see below).

Install from source

On Debian or Ubuntu install libtesseract-dev and libleptonica-dev. Also install tesseract-ocr-eng to run examples.

sudo apt-get install -y libtesseract-dev libleptonica-dev tesseract-ocr-eng

On Ubuntu Xenial and Ubuntu Bionic you can use this PPA to get the latest version of Tesseract:

sudo add-apt-repository ppa:cran/tesseract
sudo apt-get install -y libtesseract-dev tesseract-ocr-eng

On Fedora we need tesseract-devel and leptonica-devel

sudo yum install tesseract-devel leptonica-devel

On RHEL and CentOS we need tesseract-devel and leptonica-devel from EPEL

sudo yum install epel-release
sudo yum install tesseract-devel leptonica-devel

On OS-X use tesseract from Homebrew:

brew install tesseract

Tesseract uses training data to perform OCR. Most systems default to English training data. To improve OCR results for other languages you can to install the appropriate training data. On Windows and OSX you can do this in R using tesseract_download():

tesseract_download('fra')

On Linux you need to install the appropriate training data from your distribution. For example to install the spanish training data:

tesseract-ocr-spa (Debian, Ubuntu)
tesseract-langpack-spa (Fedora, EPEL)

Alternatively you can manually download training data from github and store it in a path on disk that you pass in the datapath parameter or set a default path via the TESSDATA_PREFIX environment variable. Note that the Tesseract 4 and Tesseract 3 use different training data format. Make sure to download training data from the branch that matches your libtesseract version.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 192

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (14) 🔗