All Categories → Text Processing → text-extraction

Top 23 text-extraction open source projects

Datashare
Better analyze information, in all its forms
Srt
A simple library for parsing, modifying, and composing SRT files.
Breadability
Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)
Lambda Text Extractor
AWS Lambda functions to extract text from various binary formats.
Php Apache Tika
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Unipdf
Golang PDF library for creating and processing PDF files (pure go)
Pdfio.jl
PDF Reader Library for Native Julia.
Tika Python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Articleparse
Heuristic text extraction from news sites in Python3
Image Text Localization Recognition
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約
Unidoc
This repository has moved! https://github.com/unidoc/unipdf
Justext
Heuristic based boilerplate removal tool
Nlp
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
Pdftools
Text Extraction, Rendering and Converting of PDF Documents
ocr
Simple app to extract text from pictures using Tesseract
mobi
python based software to unpack kindlegen generated ebooks
pd3f
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
1-23 of 23 text-extraction projects