All Projects β†’ aryaminus β†’ saram

aryaminus / saram

Licence: MIT license
Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with auto rotation for wrong orientation. PYPI:

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to saram

memento
Organize your meme image cluster in a better format using OCR from the meme to sort them using tesseract along with editing memes by segmenting them using OpenCV within a directory
Stars: ✭ 70 (+37.25%)
Mutual labels:  ocr, tesseract, pillow, character-recognition, orientation-detection
Nkocr
πŸ”ŽπŸ“ This is a module to make specifics OCRs at food products and nutritional tables.
Stars: ✭ 15 (-70.59%)
Mutual labels:  ocr, tesseract, pytesseract
ruzzle-solver
A python script that solves ruzzle boards
Stars: ✭ 46 (-9.8%)
Mutual labels:  ocr, tesseract, pytesseract
Tesseract Python
Examples to implement OCR(Optical Character Recognition) using tesseract using Python
Stars: ✭ 49 (-3.92%)
Mutual labels:  ocr, tesseract, pillow
Ocr Table
Extract tables from scanned image PDFs using Optical Character Recognition.
Stars: ✭ 165 (+223.53%)
Mutual labels:  ocr, tesseract
Lambda Text Extractor
AWS Lambda functions to extract text from various binary formats.
Stars: ✭ 159 (+211.76%)
Mutual labels:  ocr, tesseract
Swiftytesseract
A Swift wrapper around Tesseract for use in iOS, macOS, and Linux applications
Stars: ✭ 170 (+233.33%)
Mutual labels:  ocr, tesseract
Tesseract Ocr For Php
A wrapper to work with Tesseract OCR inside PHP.
Stars: ✭ 2,247 (+4305.88%)
Mutual labels:  ocr, tesseract
Android Ocr
Experimental optical character recognition app
Stars: ✭ 2,177 (+4168.63%)
Mutual labels:  ocr, tesseract
Tesseract
Bindings to Tesseract OCR engine for R
Stars: ✭ 192 (+276.47%)
Mutual labels:  ocr, tesseract
MouseTooltipTranslator
chrome extension - When mouse hover on text, it shows translated tooltip using google translate
Stars: ✭ 93 (+82.35%)
Mutual labels:  ocr, tesseract
Ocrtable
Recognize tables and text from scanned images that contain tables. δ»ŽεŒ…ε«θ‘¨ζ Όηš„ζ‰«ζε›Ύη‰‡δΈ­θ―†εˆ«θ‘¨ζ Όε’Œζ–‡ε­—
Stars: ✭ 155 (+203.92%)
Mutual labels:  ocr, tesseract
Tesseract Macos
Objective C wrapper for the open source OCR Engine Tesseract (macOS)
Stars: ✭ 154 (+201.96%)
Mutual labels:  ocr, tesseract
pmOCR
A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR conversion on file activity
Stars: ✭ 53 (+3.92%)
Mutual labels:  ocr, tesseract
Tesseract4android
Fork of tess-two rewritten from scratch to support latest version of Tesseract OCR.
Stars: ✭ 148 (+190.2%)
Mutual labels:  ocr, tesseract
Tesseract Ocr for windows
Visual Studio Projects for Tessearct and dependencies
Stars: ✭ 122 (+139.22%)
Mutual labels:  ocr, tesseract
Image2text
πŸ“‹ Python wrapper to grab text from images and save as text files using Tesseract Engine
Stars: ✭ 243 (+376.47%)
Mutual labels:  ocr, tesseract
Tesstrain
Train Tesseract LSTM with make
Stars: ✭ 251 (+392.16%)
Mutual labels:  ocr, tesseract
ReadToMe
No description or website provided.
Stars: ✭ 51 (+0%)
Mutual labels:  ocr, tesseract
Tabulo
Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)
Stars: ✭ 110 (+115.69%)
Mutual labels:  ocr, tesseract

Saram - Image/PDF OCR detection system

Get OCR in txt form from an image or pdf extension supporting multiple files from directory using pytesseract with support for rotation in case of wrong orientation along.

Currently in beta state

Follow: Demo run

Saram features

Note: Make sure you have a OCR tool like tesseract and certain data value for comparing OCR, eg tesseract-data-eng along with Pillow and Wand for image conversion and loading which will be fetched during pip install.

For using in python: Refer to the py-module branch

Installation

Install using PIP:

$ pip install saram
$ saram <dirname>

else

Clone the source locally:

$ git clone https://github.com/aryaminus/saram
$ cd saram
$ git checkout py-module
$ python main.py <dirname>

Todo

  • Add support for PDF by PDF -> Image -> Txt with converted image deletion after processing
  • Double check for orientation in case of image and PDF
  • Make a PIP package
  • Add NLP to process the most repeated frequent characters to filer content
  • Add Cloud Vision support for effective character recognization
  • Suppot for GUI using tkinter

Reference

  1. pdf-to-txt
  2. ocr-convert-image-to-text
  3. fix-image-rotation
  4. python-packaging

Contributing

  1. Fork it (https://github.com/aryaminus/saram/fork)
  2. Create your feature branch (git checkout -b feature/fooBar)
  3. Commit your changes (git commit -am 'Add some fooBar')
  4. Push to the branch (git push origin feature/fooBar)
  5. Create a new Pull Request

Enjoy!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].