All Projects → SimpleApp → PDFParser

SimpleApp / PDFParser

Licence: other
Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser

Programming Languages

swift
15916 projects
objective c
16641 projects - #2 most used programming language

Projects that are alternatives of or similar to PDFParser

linkedin-pdf-resume-parser
Parse LinkedIn PDF Resume and extract out name, email, education and work experiences.
Stars: ✭ 22 (-12%)
Mutual labels:  pdf-parser
typedesigner
Unified Font Object Editor for macOS
Stars: ✭ 20 (-20%)
Mutual labels:  truetype
bitsnpicas
Bits'N'Picas - Bitmap & Emoji Font Creation & Conversion Tools
Stars: ✭ 171 (+584%)
Mutual labels:  truetype
ttf2hershey
Convert True Type Fonts (.ttf) to Hershey vector fonts
Stars: ✭ 29 (+16%)
Mutual labels:  truetype
sypht-golang-client
A Golang client for the Sypht API
Stars: ✭ 33 (+32%)
Mutual labels:  pdf-parser
ttf-explorer
A simple tool to explore a TrueType font content as a tree
Stars: ✭ 22 (-12%)
Mutual labels:  truetype
content-parser
Content data parser for Ridibooks services
Stars: ✭ 16 (-36%)
Mutual labels:  pdf-parser
scipdf parser
Python PDF parser for scientific publications
Stars: ✭ 76 (+204%)
Mutual labels:  pdf-parser
hpdft
tools to poke pdf using haskell
Stars: ✭ 36 (+44%)
Mutual labels:  pdf-parser
Docotic.Pdf.Samples
C# and VB.NET samples for Docotic.Pdf library
Stars: ✭ 52 (+108%)
Mutual labels:  pdf-parser
InupiaqNumbers
Font for displaying Inupiaq Numerals
Stars: ✭ 27 (+8%)
Mutual labels:  truetype
fonterator
Load fonts as vector graphics in pure Rust with advanced text layout.
Stars: ✭ 34 (+36%)
Mutual labels:  truetype
Php Font Lib
A library to read, parse, export and make subsets of different types of font files.
Stars: ✭ 1,530 (+6020%)
Mutual labels:  truetype
Font Spider
Smart webfont compression and format conversion tool
Stars: ✭ 4,550 (+18100%)
Mutual labels:  truetype
Opentype.js
Read and write OpenType fonts using JavaScript.
Stars: ✭ 3,393 (+13472%)
Mutual labels:  truetype
pixel font
All-in-one tool for creating TrueType outline fonts from bitmap glyph data, purely written in Elixir.
Stars: ✭ 16 (-36%)
Mutual labels:  truetype
pyxpdf
Fast and memory-efficient Python PDF Parser based on xpdf sources
Stars: ✭ 26 (+4%)
Mutual labels:  pdf-parser

PDFParser

A pure Swift library for extracting text information from pdf files, such as text blocks with coordinates and font information. Also includes a true type font parser for glyph width computation.

Parsing code based on PDFKitten https://github.com/KurtCode/PDFKitten TrueType parser based on http://stevehanov.ca/blog/index.php?id=143

Parsing is done very simply, and returns TextBlocks structs, that can be later indexed by custom code. A simple indexer is provided, assuming single column layout, aggregating words.

var documentIndexer = SimpleDocumentIndexer()
let documentPath = Bundle.main.path(forResource: "Kurt the Cat", ofType: "pdf", inDirectory: nil, forLocalization: nil)

let parser = try! Parser(documentURL: URL(fileURLWithPath: documentPath!), delegate:self, indexer: documentIndexer)
parser.parse()

print( "All Text Blocks Raw dump : \n")
print(documentIndexer.pageIndexes[1]!.textBlocks)

print( "\nWords per lines : \n")
print(documentIndexer.pageIndexes[1]!.allLinesDescription())

ViewController in the DemoApp displays UILabel for textblocks. This lets you see if the frames for the textblock returned by the parser is correct.

This code is not ready for production. Use at your own risk. This code is probably way too unoptimized to be used for anything latency-sensitive. It was meant to be easy to understand and correct first and foremost.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].