All Projects → houking-can → PDFConverter

houking-can / PDFConverter

Licence: Apache-2.0 license
Best PDF Converter! PDF to any format, pdf2word/excel/xml/html/txt...

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PDFConverter

Html2openxml
Html2OpenXml is a small .Net library that convert simple or advanced HTML to plain OpenXml components. This program has started in 2009, initially to convert user's comments from SharePoint to Word.
Stars: ✭ 142 (+51.06%)
Mutual labels:  docx
P2.
📄 p2. - Simple and secure PDF to PNG server.
Stars: ✭ 191 (+103.19%)
Mutual labels:  docx
node-pdftocairo
📃 Node.js wrapper for pdftocairo - PDF to PNG/JPEG/TIFF/PDF/PS/EPS/SVG using cairo
Stars: ✭ 17 (-81.91%)
Mutual labels:  pdf2img
Docxtemplater
Generate docx pptx and xlsx (Microsoft Word, Powerpoint, Excel documents) from templates, from Node.js, the Browser and the command line / Demo: https://www.docxtemplater.com/demo
Stars: ✭ 1,990 (+2017.02%)
Mutual labels:  docx
Documentserver
ONLYOFFICE Document Server is an online office suite comprising viewers and editors for texts, spreadsheets and presentations, fully compatible with Office Open XML formats: .docx, .xlsx, .pptx and enabling collaborative editing in real time.
Stars: ✭ 2,335 (+2384.04%)
Mutual labels:  docx
Gotenberg
A Docker-powered stateless API for PDF files.
Stars: ✭ 3,272 (+3380.85%)
Mutual labels:  docx
Hrconvert2
A self-hosted, drag-and-drop, & nosql file conversion server that supports 62x file formats.
Stars: ✭ 132 (+40.43%)
Mutual labels:  docx
kodbox
kodbox is a file manager for web. It is a newly designed product based on kodexplorer. It is also a web code editor, which allows you to develop websites directly within the web browser.You can run kodbox either online or locally,on Linux, Windows or Mac based platforms
Stars: ✭ 1,188 (+1163.83%)
Mutual labels:  docx
Pswriteword
PSWriteWord is powershell module to create Microsoft Word documents without Microsoft Word installed...
Stars: ✭ 180 (+91.49%)
Mutual labels:  docx
docx-to-pdf-on-AWS-Lambda
Microsoft Word doc/docx to PDF conversion on AWS Lambda using Node.js
Stars: ✭ 42 (-55.32%)
Mutual labels:  docx
Plagiarism Checker
A utility to check if a document's contents are plagiarised
Stars: ✭ 149 (+58.51%)
Mutual labels:  docx
Poi Tl
Generate awesome word(docx) with template
Stars: ✭ 2,306 (+2353.19%)
Mutual labels:  docx
Open Xml Sdk
Open XML SDK by Microsoft
Stars: ✭ 3,005 (+3096.81%)
Mutual labels:  docx
Sonar Cnes Report
Generates analysis reports from SonarQube web API.
Stars: ✭ 145 (+54.26%)
Mutual labels:  docx
mathtype-extension
Calabash extension step to convert MathType OLE objects to MathML
Stars: ✭ 15 (-84.04%)
Mutual labels:  docx
Docxtractr
✂️ Extract Tables from Microsoft Word Documents with R
Stars: ✭ 139 (+47.87%)
Mutual labels:  docx
Duckx
C++ library for creating and updating Microsoft Word (.docx) files.
Stars: ✭ 214 (+127.66%)
Mutual labels:  docx
docx-pdf-pagecount
A npm module to page count od pdf and docx files
Stars: ✭ 22 (-76.6%)
Mutual labels:  docx
my-writing-workflow
Tutorial for converting markdown files in to APA-formatted docs, based on my workflow.
Stars: ✭ 35 (-62.77%)
Mutual labels:  docx
ilovepdf
Telegram Bot that helps you to convert Images to pdf, pdf to images, 45+ file formats to pdf, more features Soon..
Stars: ✭ 140 (+48.94%)
Mutual labels:  docx

PDFConverter

  1. Convert pdf to any other formats using Adobe DC SDK, like txt, xml, doc, docx, jpg, ps, rft etc. Notice: To run this project, you need install Adobe Acrobat Pro DC. We have not tested other versions of Adobe Acrobat. This link points to the download page of Acrobat Pro DC, you can try it for free for a few days.
  2. Using grobid, friendly to scientific paper text(not support images and tables).

Environment and Dependency

  • Operating system: Windows 10 Professional
  • IDE: Visual Studio 2017 Community
  • Development framework: . Net Framework 4.6.1
  • Dependency:
    • Acrobat
    • Adobe Acrobat 10.0 Type Library
    • System Windows Forms
  • The fist two are COM type libraries, after installing Adobe Acrobat DC, you can add reference in Visual Studio with project manager.

How to build and run it

Build PDFconvert

  1. Create a C# Console project, and choose the . Net framework version.
  2. Add references, click the COM in references manager and select Acrobat and Adobe Acrobat 10.0 Type Library.
  3. To run this project, you need add command parameters in project manager, the input file complete path and output dictionary(optional, if not specify, it will save the output file where the executable file in). You can also use console to run the executable file as follows:
    PDFConvert.exe -i inputfile -o outputdir -r true -t 20000
  1. If you run this repository directly, you can skip step 1 and 2.

Batch process with Python (Controller)

 python BatRun.py -e  C:\Users\~\PDFConvert.exe -i C:\Users\~ -o C:\Users\~ -f xml -t 60

Dependency

  • python3.5+
  • pip install pypiwin32==223
  • BeautifulSoup4
  • docx
  • xlrd
  • lxml

Notice
If you run this python command in cmd console or Powershell of Windows10, you'd be better disable 'QuickEdit Mode' and 'Insert Mode' every time to avoid the process stuck in suspended state. Simply right click up the top of your powershell console, head down to properties and under Edit options you will see 'QuickEdit Mode'.

Architecture

Extension

If you want to convert pdf to other formats, The cConvIDs supported by Acrobat library. The list of cConvIDs are as follows:

cConvID extension comment
com.adobe.acrobat.eps eps Not test
com.adobe.acrobat.html-3-20 html, htm Recommended
com.adobe.acrobat.html-4-01-css-1-00 html, htm Run well
com.adobe.acrobat.jpeg jpeg , jpg, jpe Not test
com.adobe.acrobat.jp2k jpf, jpx, jp2, j2k, j2c, jpc Not test
com.adobe.acrobat.doc doc Run well
com.adobe.acrobat.docx docx Run well
com.callas.preflight.pdfa pdf Not test
com.callas.preflight.pdfx pdf Not test
com.adobe.acrobat.png png Not test
com.adobe.acrobat.ps ps Not test
com.adobe.acrobat.rtf rft Not test
com.adobe.acrobat.accesstext txt Run well
com.adobe.acrobat.plain-text txt Run well
com.adobe.acrobat.tiff tiff, tif Not test
com.adobe.acrobat.xml-1-00 xml Recommended
com.adobe.acrobat.xlsx xlsx Run well

In Acrobat 10.0

Deprecated cConvID Equivalent Valid cConvID
com.adobe.acrobat.html-3-20 com.adobe.acrobat.html
com.adobe.acrobat.htm l- 4-01-css-1-00 com.adobe.acrobat.html

Refer to Acrobat SDK and documents to learn more. You can download the SDK package, and develop application on the samples.

Comparison

Format Convert speed Extract table Complete Analyze
XML Fast Yes Good Easy
Word Slow Yes Good General
Excel General Yes Great Hard
TXT Fatest No General Hardest
HTML Fast Yes Best Easy

Statement

The source code is for learning and communication only. Please contact Adobe for commercial use.

License

Apache License 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].