All Projects → Yuras → Pdf Toolbox

Yuras / Pdf Toolbox

A collection of tools for processing PDF files in Haskell

Programming Languages

haskell
3896 projects

Labels

Projects that are alternatives of or similar to Pdf Toolbox

Documents
收集的程序开发相关的书籍与文档,多数为 PDF 格式文件,欢迎 fork 和 star。
Stars: ✭ 130 (-10.34%)
Mutual labels:  pdf
Cheat Sheets
🌟 All the cheat-sheets mentioned on my blog in pdf format
Stars: ✭ 136 (-6.21%)
Mutual labels:  pdf
Cs Books Pdf
编程电子书pdf,计算机常用电子书整理(高质量/附下载链接)包括 Java, Python, Linux, Go, C, C++, 数据结构与算法, AI人工智能, 计算机基础, 面试, 设计模式, 数据库, 前端等编程书籍。
Stars: ✭ 140 (-3.45%)
Mutual labels:  pdf
Rapipdf
PDF generation from OpenAPI / Swagger Spec
Stars: ✭ 132 (-8.97%)
Mutual labels:  pdf
Pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Stars: ✭ 2,261 (+1459.31%)
Mutual labels:  pdf
Educative.io Downloader
📖 This tool is to download course from educative.io for offline usage. It uses your login credentials and download the course.
Stars: ✭ 139 (-4.14%)
Mutual labels:  pdf
Pdfcreatorandroid
Simple library to generate and view PDF in Android
Stars: ✭ 128 (-11.72%)
Mutual labels:  pdf
Decktape
PDF exporter for HTML presentations
Stars: ✭ 1,847 (+1173.79%)
Mutual labels:  pdf
Easytable
Small table drawing library built upon Apache PDFBox
Stars: ✭ 136 (-6.21%)
Mutual labels:  pdf
Pdfcropmargins
pdfCropMargins -- a program to crop the margins of PDF files
Stars: ✭ 141 (-2.76%)
Mutual labels:  pdf
Net Core Docx Html To Pdf Converter
.NET Core library to create custom reports based on Word docx or HTML documents and convert to PDF
Stars: ✭ 133 (-8.28%)
Mutual labels:  pdf
Dvisvgm
A fast DVI, EPS, and PDF to SVG converter
Stars: ✭ 134 (-7.59%)
Mutual labels:  pdf
Ambar
🔍 Ambar: Document Search Engine
Stars: ✭ 1,829 (+1161.38%)
Mutual labels:  pdf
Pdfview Android
Small Android library to show PDF files
Stars: ✭ 132 (-8.97%)
Mutual labels:  pdf
Pdf reports
📕 Python library and CSS theme to generate PDF reports from HTML/Pug
Stars: ✭ 142 (-2.07%)
Mutual labels:  pdf
Markdown Themeable Pdf
ARCHIVED. NOT MAINTAINED. Themeable Markdown Converter (Print to PDF, HTML, JPEG or PNG)
Stars: ✭ 130 (-10.34%)
Mutual labels:  pdf
Pdfinverter
darken (or lighten) a PDF
Stars: ✭ 139 (-4.14%)
Mutual labels:  pdf
Doctron
Docker-powered html convert to pdf(html2pdf), html to image(html2image like jpeg,png),which using chrome(golang) kernel, add watermarks to pdf, convert pdf to images etc.
Stars: ✭ 141 (-2.76%)
Mutual labels:  pdf
Pyecharts Snapshot
renders the output of pyecharts as png, jpeg, gif, svg, eps, pdf and raw base64
Stars: ✭ 142 (-2.07%)
Mutual labels:  pdf
Svglib
Read SVG files and convert them to other formats.
Stars: ✭ 139 (-4.14%)
Mutual labels:  pdf

pdf-toolbox

Build Status

A collection of tools for processing PDF files

Stable and HEAD

See "stable" branch for Hackage version. The current "master" branch is in a middle of API rewrite, see here for details.

Features

  • Written in Haskell
  • Parsing on demand. You don't need to parse or load into memory the entire PDF file just to extract one image
  • Different levels of abstraction. You can inspect high level (catalog, page tree, pages) or low level (xref, trailer, object) structure of PDF file. You can even switch between levels of details on the fly.
  • Extremely fast and memory efficient when you need to inspect only part of the document
  • Resonably fast and memory efficient in general case
  • Text extraction with exact glyph positions (mostly works, but in progress yet). It can be used e.g. to implement text selection and copying in pdf viewer
  • Full support of xref streams and object streams
  • Supports editing of PDF files (incremental updates)
  • Basic support for PDF file generating
  • Encrypted PDF documents are partially supported

Still in TODO list

  • Linearized PDF files
  • Content stream tools: extract text, images, etc (basic implementation is already included)
  • Higher level API for incremental updates and PDF generating

Examples

(Also see examples and viewer directories)

Inspect high level structure:

import Pdf.Document

main =
  withPdfFile "input.pdf" $ \pdf ->
    encrypted <- isEncrypted pdf
    when encrypted $ do
      ok <- setUserPassword pdf defaultUserPassword
      unless ok $
        fail "need password"
    doc <- document pdf
    catalog <- documentCatalog doc
    rootNode <- catalogPageNode catalog
    count <- pageNodeNKids rootNode
    print count
    -- the first page of the document
    page <- pageNodePageByNum rootNode 0
    -- extract text
    txt <- pageExtractText page
    print txt
    ...
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].