All Projects → J-F-Liu → Lopdf

J-F-Liu / Lopdf

Licence: mit
A Rust library for PDF document manipulation.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to Lopdf

libjail-rs
Rust implementation of a FreeBSD jail library
Stars: ✭ 32 (-95.56%)
Mutual labels:  rust-library
Technical Ebooks
PDFs for programming tutorials.
Stars: ✭ 342 (-52.5%)
Mutual labels:  pdf-document
Ml Note
📙慢慢整理所学的机器学习算法,并根据自己所理解的样子叙述出来。(注重数学推导)
Stars: ✭ 537 (-25.42%)
Mutual labels:  pdf-document
xfg-rs
eXtensible Framegraph
Stars: ✭ 34 (-95.28%)
Mutual labels:  rust-library
Gba
A crate that helps you make GBA games
Stars: ✭ 286 (-60.28%)
Mutual labels:  rust-library
Pdfpig
Read and extract text and other content from PDFs in C# (port of PdfBox)
Stars: ✭ 391 (-45.69%)
Mutual labels:  pdf-document
spherical-cow
A high volume fraction sphere packing library
Stars: ✭ 24 (-96.67%)
Mutual labels:  rust-library
Redbpf
Rust library for building and running BPF/eBPF modules
Stars: ✭ 611 (-15.14%)
Mutual labels:  rust-library
Imageproc
Image processing operations
Stars: ✭ 340 (-52.78%)
Mutual labels:  rust-library
Printable Mockups
Create printable UI mockups & wireframes templates
Stars: ✭ 479 (-33.47%)
Mutual labels:  pdf-document
gui
A generic UI library/framework.
Stars: ✭ 16 (-97.78%)
Mutual labels:  rust-library
Lingua Rs
👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike
Stars: ✭ 260 (-63.89%)
Mutual labels:  rust-library
Printpdf
An easy-to-use library for writing PDF in Rust
Stars: ✭ 404 (-43.89%)
Mutual labels:  rust-library
pdf2xml-viewer
A simple viewer and inspection tool for text boxes in PDF documents
Stars: ✭ 82 (-88.61%)
Mutual labels:  pdf-document
Ureq
Minimal request library in rust.
Stars: ✭ 545 (-24.31%)
Mutual labels:  rust-library
poem
A full-featured and easy-to-use web framework with the Rust programming language.
Stars: ✭ 1,167 (+62.08%)
Mutual labels:  rust-library
Juniper
GraphQL server library for Rust
Stars: ✭ 4,187 (+481.53%)
Mutual labels:  rust-library
Quicksilver
A simple framework for 2D games on desktop and web
Stars: ✭ 710 (-1.39%)
Mutual labels:  rust-library
Go Wkhtmltopdf
Golang commandline wrapper for wkhtmltopdf
Stars: ✭ 564 (-21.67%)
Mutual labels:  pdf-document
Tabulizer
Bindings for Tabula PDF Table Extractor Library
Stars: ✭ 413 (-42.64%)
Mutual labels:  pdf-document

lopdf

Crates.io Build Status Docs

A Rust library for PDF document manipulation.

Example Code

  • Create PDF document
#[macro_use]
extern crate lopdf;
use lopdf::{Document, Object, Stream};
use lopdf::content::{Content, Operation};

let mut doc = Document::with_version("1.5");
let pages_id = doc.new_object_id();
let font_id = doc.add_object(dictionary! {
    "Type" => "Font",
    "Subtype" => "Type1",
    "BaseFont" => "Courier",
});
let resources_id = doc.add_object(dictionary! {
    "Font" => dictionary! {
        "F1" => font_id,
    },
});
let content = Content {
    operations: vec![
        Operation::new("BT", vec![]),
        Operation::new("Tf", vec!["F1".into(), 48.into()]),
        Operation::new("Td", vec![100.into(), 600.into()]),
        Operation::new("Tj", vec![Object::string_literal("Hello World!")]),
        Operation::new("ET", vec![]),
    ],
};
let content_id = doc.add_object(Stream::new(dictionary! {}, content.encode().unwrap()));
let page_id = doc.add_object(dictionary! {
    "Type" => "Page",
    "Parent" => pages_id,
    "Contents" => content_id,
});
let pages = dictionary! {
    "Type" => "Pages",
    "Kids" => vec![page_id.into()],
    "Count" => 1,
    "Resources" => resources_id,
    "MediaBox" => vec![0.into(), 0.into(), 595.into(), 842.into()],
};
doc.objects.insert(pages_id, Object::Dictionary(pages));
let catalog_id = doc.add_object(dictionary! {
    "Type" => "Catalog",
    "Pages" => pages_id,
});
doc.trailer.set("Root", catalog_id);
doc.compress();
doc.save("example.pdf").unwrap();
  • Merge PDF documents
#[macro_use]
extern crate lopdf;

use std::collections::BTreeMap;

use lopdf::content::{Content, Operation};
use lopdf::{Document, Object, ObjectId, Stream, BookMark};

pub fn generate_fake_document() -> Document {
    let mut doc = Document::with_version("1.5");
    let pages_id = doc.new_object_id();
    let font_id = doc.add_object(dictionary! {
        "Type" => "Font",
        "Subtype" => "Type1",
        "BaseFont" => "Courier",
    });
    let resources_id = doc.add_object(dictionary! {
        "Font" => dictionary! {
            "F1" => font_id,
        },
    });
    let content = Content {
        operations: vec![
            Operation::new("BT", vec![]),
            Operation::new("Tf", vec!["F1".into(), 48.into()]),
            Operation::new("Td", vec![100.into(), 600.into()]),
            Operation::new("Tj", vec![Object::string_literal("Hello World!")]),
            Operation::new("ET", vec![]),
        ],
    };
    let content_id = doc.add_object(Stream::new(dictionary! {}, content.encode().unwrap()));
    let page_id = doc.add_object(dictionary! {
        "Type" => "Page",
        "Parent" => pages_id,
        "Contents" => content_id,
        "Resources" => resources_id,
        "MediaBox" => vec![0.into(), 0.into(), 595.into(), 842.into()],
    });
    let pages = dictionary! {
        "Type" => "Pages",
        "Kids" => vec![page_id.into()],
        "Count" => 1,
    };
    doc.objects.insert(pages_id, Object::Dictionary(pages));
    let catalog_id = doc.add_object(dictionary! {
        "Type" => "Catalog",
        "Pages" => pages_id,
    });
    doc.trailer.set("Root", catalog_id);

    doc
}

fn main() {
    // Generate a stack of Documents to merge
    let documents = vec![
        generate_fake_document(),
        generate_fake_document(),
        generate_fake_document(),
        generate_fake_document(),
    ];

    // Define a starting max_id (will be used as start index for object_ids)
    let mut max_id = 1;
    let mut pagenum = 1;
    // Collect all Documents Objects grouped by a map
    let mut documents_pages = BTreeMap::new();
    let mut documents_objects = BTreeMap::new();
    let mut document = Document::with_version("1.5");

    for mut doc in documents {
        let mut first = false;
        doc.renumber_objects_with(max_id);

        max_id = doc.max_id + 1;

        documents_pages.extend(
            doc
                    .get_pages()
                    .into_iter()
                    .map(|(_, object_id)| {
                        if !first {
                            let bookmark = BookMark::new(String::from(format!("Page_{}", pagenum)), [0.0, 0.0, 1.0], 0, object_id);
                            document.add_bookmark(bookmark, None);
                            first = true;
                            pagenum += 1;
                        }

                        (
                            object_id,
                            doc.get_object(object_id).unwrap().to_owned(),
                        )
                    })
                    .collect::<BTreeMap<ObjectId, Object>>(),
        );
        documents_objects.extend(doc.objects);
    }

    // Catalog and Pages are mandatory
    let mut catalog_object: Option<(ObjectId, Object)> = None;
    let mut pages_object: Option<(ObjectId, Object)> = None;

    // Process all objects except "Page" type
    for (object_id, object) in documents_objects.iter() {
        // We have to ignore "Page" (as are processed later), "Outlines" and "Outline" objects
        // All other objects should be collected and inserted into the main Document
        match object.type_name().unwrap_or("") {
            "Catalog" => {
                // Collect a first "Catalog" object and use it for the future "Pages"
                catalog_object = Some((
                    if let Some((id, _)) = catalog_object {
                        id
                    } else {
                        *object_id
                    },
                    object.clone(),
                ));
            }
            "Pages" => {
                // Collect and update a first "Pages" object and use it for the future "Catalog"
                // We have also to merge all dictionaries of the old and the new "Pages" object
                if let Ok(dictionary) = object.as_dict() {
                    let mut dictionary = dictionary.clone();
                    if let Some((_, ref object)) = pages_object {
                        if let Ok(old_dictionary) = object.as_dict() {
                            dictionary.extend(old_dictionary);
                        }
                    }

                    pages_object = Some((
                        if let Some((id, _)) = pages_object {
                            id
                        } else {
                            *object_id
                        },
                        Object::Dictionary(dictionary),
                    ));
                }
            }
            "Page" => {}     // Ignored, processed later and separately
            "Outlines" => {} // Ignored, not supported yet
            "Outline" => {}  // Ignored, not supported yet
            _ => {
                document.objects.insert(*object_id, object.clone());
            }
        }
    }

    // If no "Pages" found abort
    if pages_object.is_none() {
        println!("Pages root not found.");

        return;
    }

    // Iter over all "Page" and collect with the parent "Pages" created before
    for (object_id, object) in documents_pages.iter() {
        if let Ok(dictionary) = object.as_dict() {
            let mut dictionary = dictionary.clone();
            dictionary.set("Parent", pages_object.as_ref().unwrap().0);

            document
                    .objects
                    .insert(*object_id, Object::Dictionary(dictionary));
        }
    }

    // If no "Catalog" found abort
    if catalog_object.is_none() {
        println!("Catalog root not found.");

        return;
    }

    let catalog_object = catalog_object.unwrap();
    let pages_object = pages_object.unwrap();

    // Build a new "Pages" with updated fields
    if let Ok(dictionary) = pages_object.1.as_dict() {
        let mut dictionary = dictionary.clone();

        // Set new pages count
        dictionary.set("Count", documents_pages.len() as u32);

        // Set new "Kids" list (collected from documents pages) for "Pages"
        dictionary.set(
            "Kids",
            documents_pages
                    .into_iter()
                    .map(|(object_id, _)| Object::Reference(object_id))
                    .collect::<Vec<_>>(),
        );

        document
                .objects
                .insert(pages_object.0, Object::Dictionary(dictionary));
    }

    // Build a new "Catalog" with updated fields
    if let Ok(dictionary) = catalog_object.1.as_dict() {
        let mut dictionary = dictionary.clone();
        dictionary.set("Pages", pages_object.0);
        dictionary.remove(b"Outlines"); // Outlines not supported in merged PDFs

        document
                .objects
                .insert(catalog_object.0, Object::Dictionary(dictionary));
    }

    document.trailer.set("Root", catalog_object.0);

    // Update the max internal ID as wasn't updated before due to direct objects insertion
    document.max_id = document.objects.len() as u32;

    // Reorder all new Document objects
    document.renumber_objects();

     //Set any Bookmarks to the First child if they are not set to a page
    document.adjust_zero_pages();

    //Set all bookmarks to the PDF Object tree then set the Outlines to the Bookmark content map.
    if let Some(n) = document.build_outline() {
        if let Ok(x) = document.get_object_mut(catalog_object.0) {
            if let Object::Dictionary(ref mut dict) = x {
                dict.set("Outlines", Object::Reference(n));
            }
        }
    }

    document.compress();

    // Save the merged PDF
    document.save("merged.pdf").unwrap();
}
  • Modify PDF document
let mut doc = Document::load("example.pdf")?;
doc.version = "1.4".to_string();
doc.replace_text(1, "Hello World!", "Modified text!");
doc.save("modified.pdf")?;

FAQ

  • Why keeping everything in memory as high-level objects until finally serializing the entire document?

    Normally a PDF document won't be very large, ranging form tens of KB to hundreds of MB. Memory size is not a bottle neck for today's computer. By keep the whole document in memory, stream length can be pre-calculated, no need to use a reference object for the Length entry, the resulting PDF file is smaller for distribution and faster for PDF consumers to process.

    Producing is a one-time effort, while consuming is many more.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].