All Projects → prabhakar267 → Image2text

prabhakar267 / Image2text

📋 Python wrapper to grab text from images and save as text files using Tesseract Engine

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Image2text

React Native Tesseract Ocr
Tesseract OCR wrapper for React Native
Stars: ✭ 384 (+58.02%)
Mutual labels:  ocr, tesseract, tesseract-ocr, optical-character-recognition
Tesseract4android
Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.
Stars: ✭ 148 (-39.09%)
Mutual labels:  ocr, tesseract, tesseract-ocr, optical-character-recognition
Aadhaar Card Ocr
Extract text information from Aadhaar Card using tesseract-ocr 😎
Stars: ✭ 112 (-53.91%)
Mutual labels:  ocr, tesseract, tesseract-ocr
Pan card ocr project
To extract details from Indian National Identification Cards such as PAN (completed) & Aadhar, Passport, Driving License (WIP) in a structured format
Stars: ✭ 39 (-83.95%)
Mutual labels:  ocr, tesseract, optical-character-recognition
Textshot
Python tool for grabbing text via screenshot
Stars: ✭ 1,163 (+378.6%)
Mutual labels:  ocr, tesseract, tesseract-ocr
Tesseract Ocr for windows
Visual Studio Projects for Tessearct and dependencies
Stars: ✭ 122 (-49.79%)
Mutual labels:  ocr, tesseract, tesseract-ocr
TesseractStudio.Net
A free Windows graphical interface to the Tesseract 4.0 OCR engine.
Stars: ✭ 38 (-84.36%)
Mutual labels:  ocr, tesseract, tesseract-ocr
Ocr Table
Extract tables from scanned image PDFs using Optical Character Recognition.
Stars: ✭ 165 (-32.1%)
Mutual labels:  ocr, tesseract, optical-character-recognition
IdCardRecognition
Android id card recognition based on OCR. 安卓基于OCR的身份证识别。
Stars: ✭ 35 (-85.6%)
Mutual labels:  ocr, tesseract, optical-character-recognition
Swiftytesseract
A Swift wrapper around Tesseract for use in iOS, macOS, and Linux applications
Stars: ✭ 170 (-30.04%)
Mutual labels:  ocr, tesseract, optical-character-recognition
Tesserocr
A Python wrapper for the tesseract-ocr API
Stars: ✭ 1,567 (+544.86%)
Mutual labels:  ocr, tesseract, optical-character-recognition
Android Ocr
Experimental optical character recognition app
Stars: ✭ 2,177 (+795.88%)
Mutual labels:  ocr, tesseract, optical-character-recognition
Ccextractor
CCExtractor - Official version maintained by the core team
Stars: ✭ 356 (+46.5%)
Mutual labels:  ocr, tesseract, tesseract-ocr
breach-protocol-autosolver
Solve breach protocol minigame in second(s). Windows/Linux/GeForce Now/Google Stadia. Every language.
Stars: ✭ 28 (-88.48%)
Mutual labels:  ocr, tesseract, tesseract-ocr
Tesseract
This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the tessdata repository.
Stars: ✭ 43,199 (+17677.37%)
Mutual labels:  ocr, tesseract, tesseract-ocr
Swiftytesseractrte
SwiftyTesseract Real-Time Engine
Stars: ✭ 49 (-79.84%)
Mutual labels:  ocr, tesseract, optical-character-recognition
How-to-use-tesseract-ocr-4.0-with-csharp
How to use Tesseract OCR 4.0 with C#
Stars: ✭ 60 (-75.31%)
Mutual labels:  ocr, tesseract, tesseract-ocr
Nkocr
🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.
Stars: ✭ 15 (-93.83%)
Mutual labels:  ocr, tesseract, tesseract-ocr
Penteract Ocr
⭐️ The native node.js bindings to the Tesseract OCR project.
Stars: ✭ 86 (-64.61%)
Mutual labels:  ocr, tesseract, optical-character-recognition
Gosseract
Go package for OCR (Optical Character Recognition), by using Tesseract C++ library
Stars: ✭ 1,622 (+567.49%)
Mutual labels:  ocr, tesseract, tesseract-ocr

Image2Text

Build Status

Image2Text is a python wrapper to grab text from images and save as text files using Google Tesseract Engine. Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0, and development has been sponsored by Google since 2006. In 2006 Tesseract was considered one of the most accurate open-source OCR engines then available.

Quick Links:

Usage

python main.py -i <input_path> -o <output_path>
usage: main.py [-h] -i INPUT [-o OUTPUT] [-d]

required arguments:
  -i INPUT, --input INPUT       Single image file path or images directory path

optional arguments:
  -o OUTPUT, --output OUTPUT    (Optional) Output directory for converted text
  -d, --debug                   Enable verbose DEBUG logging
python main.py -i sample/

or

python main.py -i sample/ -o output/

Running Tests

python -m unittest

Tesseract Installation

Linux

[sudo] apt-get install tesseract-ocr

Windows

  1. Install tesseract-ocr from UB Mannheim here: https://github.com/UB-Mannheim/tesseract/wiki
  2. Add the installed Tesseract-OCR directory path to PATH system variable

Sample Results

Sample Image

(Wikipedia page for Google | Lang : Simple English)

Text output

A man signing in at Google’s main afice, Googleplex.

Google Inc. is an American multinational corporation
that is best known for running one of the largest search
engines on the World Wide Web (WWW). Every day,
200 million (200,000,000) people use it. Google’s main
office (“Googleplex”) is in Mountain View, California,
USA.

With Google Search, people can also search for pictures,
Usenet newsgroups, news, and things to buy online. By
June 2004, Google had 4.28 billion web pages on its
database, 880 million (880,000,000) pictures and 845
million (845,000,000) Usenet messages — six billion
things.

“To google,” as an action word (verb) means “to search
for something on Google”. Because Google is so popular
(more than half of people on the web use it) it has been
used to mean “to search the web”. Google dislikes this
use since the name of the company is a trademark.

As a public company, Google Inc. trades on the
NASDAQ under the tickers GOOG and GOOGL.

In August 2015, Google announced it was being restruc-
tured under a new holding company called Alphabet Inc.

1 History

Google was started in early 1996 by Larry Page and
Sergey Brin, two students at Stanford University, USA.
It used to be called Backrub. Later, they made it into a
company, Google Inc., on September 7, 1998 at a friend’s
garage in Menlo Park, California. In February 1999, the
company moved to 165 University Ave., Palo Alto, Cal-
ifornia. Later that year, it moved to another place, now

called the “Googleplex”.

In September 2001, Google’s rating system (“PageR-
ank”, for saying which information is more helpful) got a
US. Patent. The patent was to Stanford University, with
Lawrence (Larry) Page as the inventor (the person who
first had the idea).

Google makes an important, though shrinking, percent-
age of its money through its friends like America Online
and InterActiveCorp. It has a special group known as the
Partner Solutions Organization (PSO) which helps make
contracts, helps making accounts better, and gives engi-
neering help.

2 How Google makes money

Google makes money by advertising. People or compa-
nies who want people to buy their product, service, or
ideas give Google money, and Google shows an adver-
tisement to people Google thinks will click on the adver-
tisement. Google only gets money when people click on
the link, so it tries to know as much about people as pos-
sible to only show the advertisement to the “right people”.
It does this with Google Analytics, which sends data back
to Google whenever someone visits a web site. From this
and other data, Google makes a profile about the person,
which it then uses to figure out which advertisements to
show.

3 The name “Google”

The name “Google” is a misspelling of the word
g00g01.[7][8] Milton Sirotta, nephew of US. mathemati-
cian Edward Kasner, made this word in 1938, for the
number 1 followed by one hundred zeroes ( 10100 ). It
is said that the word “googol” was chosen as a name for
this number because it sounded like baby talk. Google
uses this word because the company wants to make lots
of stuff on the Web easy to find and use. Andy Bechtol-
sheim first thought of the name.

The name for Google’s main office, the “Googleplex,” is a
play on a different, even bigger number, the "googolpleX",
which is 1 followed by one googol of zeroes.


Stargazers over time

Stargazers over time

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].