All Projects → spatie → Pdf To Text

spatie / Pdf To Text

Licence: mit
Extract text from a pdf

Projects that are alternatives of or similar to Pdf To Text

Email To Pdf Converter
Converts email files (eml, msg) to pdf
Stars: ✭ 110 (-76.19%)
Mutual labels:  pdf, pdf-converter
Pdfcropmargins
pdfCropMargins -- a program to crop the margins of PDF files
Stars: ✭ 141 (-69.48%)
Mutual labels:  pdf, pdf-converter
Ptext Release
pText is a library for reading, creating and manipulating PDF files in python.
Stars: ✭ 124 (-73.16%)
Mutual labels:  pdf, pdf-converter
Gotenberg Php Client
PHP client for the Gotenberg API
Stars: ✭ 80 (-82.68%)
Mutual labels:  pdf, pdf-converter
Pdf Flipbook
Browse PDF document like a book turning its pages
Stars: ✭ 279 (-39.61%)
Mutual labels:  pdf, pdf-converter
Remarks
Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG
Stars: ✭ 94 (-79.65%)
Mutual labels:  pdf, pdf-converter
Net Core Docx Html To Pdf Converter
.NET Core library to create custom reports based on Word docx or HTML documents and convert to PDF
Stars: ✭ 133 (-71.21%)
Mutual labels:  pdf, pdf-converter
Mybox
Easy tools of document, image, file, network, location, color, and media.
Stars: ✭ 45 (-90.26%)
Mutual labels:  pdf, text
node-poppler
Asynchronous node.js wrapper for the Poppler PDF rendering library
Stars: ✭ 97 (-79%)
Mutual labels:  text, pdf-converter
Stapler
A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk
Stars: ✭ 238 (-48.48%)
Mutual labels:  pdf, pdf-converter
Laravel Pdf
A Simple package for easily generating PDF documents from HTML. This package is specially for laravel but you can use this without laravel.
Stars: ✭ 79 (-82.9%)
Mutual labels:  pdf, pdf-converter
Node Html Pdf
📄 Html to pdf converter in nodejs. It spawns a phantomjs process and passes the pdf as buffer or as filename.
Stars: ✭ 3,364 (+628.14%)
Mutual labels:  pdf, pdf-converter
Automator
Various Automator and AppleScript workflow and scripts for simplifying life
Stars: ✭ 68 (-85.28%)
Mutual labels:  pdf, text
Pdflayouttextstripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Stars: ✭ 1,369 (+196.32%)
Mutual labels:  pdf, text
Pdfsave
Convert websites into readable PDFs
Stars: ✭ 46 (-90.04%)
Mutual labels:  pdf, pdf-converter
Docnet
DocNET is as fast PDF editing and reading library for modern .NET applications
Stars: ✭ 128 (-72.29%)
Mutual labels:  pdf, pdf-converter
Booktype
Booktype is a free, open source platform that produces beautiful, engaging books formatted for print, Amazon, iBooks and almost any ereader within minutes.
Stars: ✭ 810 (+75.32%)
Mutual labels:  pdf, pdf-converter
Gotenberg Go Client
Go client for the Gotenberg API
Stars: ✭ 35 (-92.42%)
Mutual labels:  pdf, pdf-converter
Doctron
Docker-powered html convert to pdf(html2pdf), html to image(html2image like jpeg,png),which using chrome(golang) kernel, add watermarks to pdf, convert pdf to images etc.
Stars: ✭ 141 (-69.48%)
Mutual labels:  pdf, pdf-converter
Python Automation Scripts
Simple yet powerful automation stuffs.
Stars: ✭ 292 (-36.8%)
Mutual labels:  pdf, pdf-converter

Extract text from a pdf

Latest Version on Packagist GitHub Workflow Status Software License Quality Score Total Downloads

This package provides a class to extract text from a pdf.

use Spatie\PdfToText\Pdf;

echo Pdf::getText('book.pdf'); //returns the text from the pdf

Spatie is a webdesign agency based in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.

Support us

We invest a lot of resources into creating best in class open source packages. You can support us by buying one of our paid products.

We highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on our contact page. We publish all received postcards on our virtual postcard wall.

Requirements

Behind the scenes this package leverages pdftotext. You can verify if the binary installed on your system by issueing this command:

which pdftotext

If it is installed it will return the path to the binary.

To install the binary you can use this command on Ubuntu or Debian:

apt-get install poppler-utils

On a mac you can install the binary using brew

brew install poppler

If you're on RedHat or CentOS use this:

yum install poppler-utils

Installation

You can install the package via composer:

composer require spatie/pdf-to-text

Usage

Extracting text from a pdf is easy.

$text = (new Pdf())
    ->setPdf('book.pdf')
    ->text();

Or easier:

echo Pdf::getText('book.pdf');

By default the package will assume that the pdftotext command is located at /usr/bin/pdftotext. If it is located elsewhere pass its binary path to constructor

$text = (new Pdf('/custom/path/to/pdftotext'))
    ->setPdf('book.pdf')
    ->text();

or as the second parameter to the getText static method:

echo Pdf::getText('book.pdf', '/custom/path/to/pdftotext');

Sometimes you may want to use pdftotext options. To do so you can set them up using the setOptions method.

$text = (new Pdf())
    ->setPdf('table.pdf')
    ->setOptions(['layout', 'r 96'])
    ->text()
;

or as the third parameter to the getText static method:

echo Pdf::getText('book.pdf', null, ['layout', 'opw myP1$$Word']);

Please note that successive calls to setOptions() will overwrite options passed in during previous calls.

If you need to make multiple calls to add options (for example if you need to pass in default options when creating the Pdf object from a container, and then add context-specific options elsewhere), you can use the addOptions() method:

$text = (new Pdf())
    ->setPdf('table.pdf')
    ->setOptions(['layout', 'r 96'])
    ->addOptions(['f 1'])
    ->text()
;

Change log

Please see CHANGELOG for more information about what has changed recently.

Testing

 composer test

Contributing

Please see CONTRIBUTING for details.

Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

Credits

About Spatie

Spatie is a webdesign agency based in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.

License

The MIT License (MIT). Please see License File for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].