All Projects → jonmagic → Grim

jonmagic / Grim

Licence: mit
Tool for extracting pages from pdf as images and text as strings.

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Grim

Libvips
A fast image processing library with low memory needs.
Stars: ✭ 6,094 (+2829.81%)
Mutual labels:  pdf, imagemagick
Govips
A lightning fast image processing and resizing library for Go
Stars: ✭ 442 (+112.5%)
Mutual labels:  pdf, imagemagick
Node Webkitgtk
webkitgtk bindings for 🚀 Node.js
Stars: ✭ 185 (-11.06%)
Mutual labels:  pdf
Pdfgen
Simple C PDF Writer/Generation library
Stars: ✭ 200 (-3.85%)
Mutual labels:  pdf
Pdfcpu
A PDF processor written in Go.
Stars: ✭ 2,852 (+1271.15%)
Mutual labels:  pdf
Awesome Cv
📄 Awesome CV is LaTeX template for your outstanding job application
Stars: ✭ 14,957 (+7090.87%)
Mutual labels:  pdf
Pdfxkit
A drop-in replacement for Apple PDFKit powered by our PSPDFKit framework under the hood.
Stars: ✭ 195 (-6.25%)
Mutual labels:  pdf
Clawpdf
Open Source virtual PDF printer for Windows // Print to PDF, PDF/A, PDF/X, PNG, JPEG, TIF and text
Stars: ✭ 183 (-12.02%)
Mutual labels:  pdf
Fe Books
📖 📖 前端书籍pdf整理
Stars: ✭ 206 (-0.96%)
Mutual labels:  pdf
Html To Pdfmake
This module permits to convert HTML to the PDFMake format
Stars: ✭ 190 (-8.65%)
Mutual labels:  pdf
Ibm Z Zos
The helpful and handy location for finding and sharing z/OS files, which are not included in the product.
Stars: ✭ 198 (-4.81%)
Mutual labels:  pdf
P2.
📄 p2. - Simple and secure PDF to PNG server.
Stars: ✭ 191 (-8.17%)
Mutual labels:  pdf
Algorithms
This repository is for learning and understanding how algorithms work.
Stars: ✭ 189 (-9.13%)
Mutual labels:  pdf
Paper
Hassle-free HTML to PDF conversion abstraction library.
Stars: ✭ 196 (-5.77%)
Mutual labels:  pdf
Elispcheatsheet
Quick reference to the core language of Emacs ---Editor MACroS.
Stars: ✭ 186 (-10.58%)
Mutual labels:  pdf
Googliser
a fast BASH multiple-image downloader
Stars: ✭ 202 (-2.88%)
Mutual labels:  imagemagick
Dspdfviewer
Dual-Screen PDF Viewer for latex-beamer
Stars: ✭ 184 (-11.54%)
Mutual labels:  pdf
Androidpdf
Stars: ✭ 190 (-8.65%)
Mutual labels:  pdf
Markdown Pdf
📄 Markdown to PDF converter
Stars: ✭ 2,365 (+1037.02%)
Mutual labels:  pdf
Squid
A Ruby library to plot charts in PDF files
Stars: ✭ 205 (-1.44%)
Mutual labels:  pdf
                    ,____
                    |---.\
            ___     |    `
           / .-\  ./=)
          |  |"|_/\/|
          ;  |-;| /_|
         / \_| |/ \ |
        /      \/\( |
        |   /  |` ) |
        /   \ _/    |
       /--._/  \    |
       `/|)    |    /
         /     |   |
       .'      |   |
      /         \  |
     (_.-.__.__./  /

Grim

Grim is a simple gem for extracting (reaping) a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.

Prerequisites

You will need ghostscript, imagemagick, and xpdf installed. On the Mac (OSX) I highly recommend using Homebrew to get them installed.

$ brew install ghostscript imagemagick xpdf

Installation

$ gem install grim

Usage

pdf   = Grim.reap("/path/to/pdf")         # returns Grim::Pdf instance for pdf
count = pdf.count                         # returns the number of pages in the pdf
png   = pdf[3].save('/path/to/image.png') # will return true if page was saved or false if not
text  = pdf[3].text                       # returns text as a String

pdf.each do |page|
  puts page.text
end

We also support using other processors (the default is whatever version of Imagemagick/Ghostscript is in your path).

# specifying one processor with specific ImageMagick and GhostScript paths
Grim.processor =  Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/convert", :ghostscript_path => "/path/to/gs"})

# multiple processors with fallback if first fails, useful if you need multiple versions of convert/gs
Grim.processor = Grim::MultiProcessor.new([
  Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.7/convert", :ghostscript_path => "/path/to/9.04/gs"}),
  Grim::ImageMagickProcessor.new({:imagemagick_path => "/path/to/6.6/convert", :ghostscript_path => "/path/to/9.02/gs"})
])

pdf = Grim.reap('/path/to/pdf')

You can even specify a Windows executable ⚡️

# specifying another ghostscript executable, win64 in this example
# the ghostscript/bin folder still has to be in the PATH for this to work
Grim.processor =  Grim::ImageMagickProcessor.new({:ghostscript_path => "gswin64c.exe"})

pdf = Grim.reap('/path/to/pdf')

Grim::ImageMagickProcessor#save supports several options as well:

pdf = Grim.reap("/path/to/pdf")
pdf[0].save('/path/to/image.png', {
  :width => 600,         # defaults to 1024
  :density => 72,        # defaults to 300
  :quality => 60,        # defaults to 90
  :colorspace => "CMYK", # defaults to "RGB"
  :alpha => "Activate"   # not used when not set
})

Grim has limited logging abilities. The default logger is Grim::NullLogger but you can also set your own logger.

require "logger"
Grim.logger = Logger.new($stdout).tap { |logger| logger.progname = 'Grim' }
Grim.processor = Grim::ImageMagickProcessor.new({:ghostscript_path => "/path/to/bin/gs"})
pdf = Grim.reap("/path/to/pdf")
pdf[3].save('/path/to/image.png')
# D, [2016-06-09T22:43:07.046532 #69344] DEBUG -- grim: Running imagemagick command
# D, [2016-06-09T22:43:07.046626 #69344] DEBUG -- grim: PATH=/path/to/bin:/usr/local/bin:/usr/bin
# D, [2016-06-09T22:43:07.046787 #69344] DEBUG -- grim: convert -resize 1024 -antialias -render -quality 90 -colorspace RGB -interlace none -density 300 /path/to/pdf /path/to/image.png

Reference

Contributors

License

See LICENSE for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].