All Projects → apache → Pdfbox

apache / Pdfbox

Licence: apache-2.0
Mirror of Apache PDFBox

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Pdfbox

Poi
Mirror of Apache POI
Stars: ✭ 1,136 (-17.92%)
Mutual labels:  content, library
Ahk Rare
My collection of rare and maybe very useful functions
Stars: ✭ 101 (-92.7%)
Mutual labels:  library
Gowebdav
A golang WebDAV client library and command line tool.
Stars: ✭ 97 (-92.99%)
Mutual labels:  library
Digitalkeyboard
手动实现简单的身份证数字键盘
Stars: ✭ 99 (-92.85%)
Mutual labels:  library
Npclib
(Minecraft) NPCLib – Basic non-player character library.
Stars: ✭ 98 (-92.92%)
Mutual labels:  library
Eventsource
The Hoa\Eventsource library.
Stars: ✭ 99 (-92.85%)
Mutual labels:  library
Octo
A fuzzing library in JavaScript. ✨
Stars: ✭ 96 (-93.06%)
Mutual labels:  library
Charcoal Ios
A modern way to filter things in your iOS apps
Stars: ✭ 102 (-92.63%)
Mutual labels:  library
Mime
The Hoa\Mime library.
Stars: ✭ 100 (-92.77%)
Mutual labels:  library
Sunset.css
This library offers a collection of different CSS-powered transitions.
Stars: ✭ 99 (-92.85%)
Mutual labels:  library
Colored
🎨 Mirror of colored library repository
Stars: ✭ 98 (-92.92%)
Mutual labels:  library
Predicateflow
Write amazing, strong-typed and easy-to-read NSPredicate.
Stars: ✭ 98 (-92.92%)
Mutual labels:  library
Gl Catmull Clark
A javascript implementation of the Catmull-Clark subdivision surface algorithm
Stars: ✭ 100 (-92.77%)
Mutual labels:  library
Transit Map
Generate a schematic map (“metro map”) for a given (transit) network graph using Mixed Integer Programming.
Stars: ✭ 98 (-92.92%)
Mutual labels:  library
Gifdec
small C GIF decoder
Stars: ✭ 100 (-92.77%)
Mutual labels:  library
Go Daemon
A library for writing system daemons in golang.
Stars: ✭ 1,341 (-3.11%)
Mutual labels:  library
Geotic
Entity Component System library for javascript
Stars: ✭ 97 (-92.99%)
Mutual labels:  library
React Native Create Library
📓 Command line tool to create a React Native library with a single command
Stars: ✭ 1,362 (-1.59%)
Mutual labels:  library
Coq Ext Lib
A library of Coq definitions, theorems, and tactics. [[email protected],@liyishuai]
Stars: ✭ 102 (-92.63%)
Mutual labels:  library
Protobuf
Python implementation of Protocol Buffers data types with dataclasses support
Stars: ✭ 101 (-92.7%)
Mutual labels:  library

Apache PDFBox https://pdfbox.apache.org/

The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities. PDFBox is published under the Apache License, Version 2.0.

PDFBox is a project of the Apache Software Foundation https://www.apache.org/.

Binary Downloads

You can download binary versions for releases currently under development or older releases from our Download Page.

Build

You need Java 8 (or higher) and Maven 3 https://maven.apache.org/ to build PDFBox. The recommended build command is:

mvn clean install

The default build will compile the Java sources and package the binary classes into jar packages. See the Maven documentation for all the other available build options.

Contribute

There are various ways to help us improve PDFBox.

Support

Please follow the guidelines at our Support Page.

If you have questions about how to use PDFBox do ask on the Users Mailing List. This will get you help from the entire community.

The PDFBox examples and the test code in the sources will also provide additional information.

And there are additional resources available on sites such as Stack Overflow.

If you are sure you have found a bug the please report the issue in our Issue Tracker.

Known Limitations and Problems

See the issue tracker at https://issues.apache.org/jira/browse/PDFBOX for the full list of known issues and requested features. Some of the more common issues are:

  1. You get text like "G38G43G36G51G5" instead of what you expect when you are extracting text. This is because the characters are a meaningless internal encoding that point to glyphs that are embedded in the PDF document. The only way to access the text is to use OCR. This may be a future enhancement.

  2. You get an error message like "java.io.IOException: Can't handle font width" this MIGHT be due to the fact that you don't have the org/apache/pdfbox/resources directory in your classpath. The easiest solution is to simply include the apache-pdfbox-x.x.x.jar in your classpath.

  3. You get text that has the correct characters, but in the wrong order. This mght be because you have not enabled sorting. The text in PDF files is stored in chunks and the chunks do not need to be stored in the order that they are displayed on a page. By default, PDFBox does not sort the text.

License (see also LICENSE.txt)

Collective work: Copyright 2015 The Apache Software Foundation.

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Export control

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See https://www.wassenaar.org/ for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

Apache PDFBox uses the Java Cryptography Architecture (JCA) and the Bouncy Castle libraries for handling encryption in PDF documents.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].