All Projects → dothinking → Pdf2docx

dothinking / Pdf2docx

Licence: gpl-3.0
Parse PDF file with PyMuPDF and generate docx with python-docx

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pdf2docx

Net Core Docx Html To Pdf Converter
.NET Core library to create custom reports based on Word docx or HTML documents and convert to PDF
Stars: ✭ 133 (+16.67%)
Mutual labels:  docx, pdf-converter
Hrconvert2
A self-hosted, drag-and-drop, & nosql file conversion server that supports 62x file formats.
Stars: ✭ 132 (+15.79%)
Mutual labels:  docx, pdf-converter
Docconv
Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text
Stars: ✭ 735 (+544.74%)
Mutual labels:  docx, pdf-converter
Marigold.openxhtml
MariGold.OpenXHTML is a wrapper library for Open XML SDK to convert HTML documents into Open XML word documents.
Stars: ✭ 44 (-61.4%)
Mutual labels:  docx
Thiefmd
The markdown editor worth stealing. Inspired by Ulysses, based on code from Quilter
Stars: ✭ 48 (-57.89%)
Mutual labels:  docx
Js Word
✒️ Word Processing Document Library
Stars: ✭ 1,203 (+955.26%)
Mutual labels:  docx
Sharpdocx
C# based template engine for generating Word documents
Stars: ✭ 100 (-12.28%)
Mutual labels:  docx
Gotenberg Go Client
Go client for the Gotenberg API
Stars: ✭ 35 (-69.3%)
Mutual labels:  pdf-converter
Docx
Fast and easy to use .NET library that creates or modifies Microsoft Word files without installing Word.
Stars: ✭ 1,288 (+1029.82%)
Mutual labels:  docx
Md2pdf
Convert Markdown documents to PDF
Stars: ✭ 63 (-44.74%)
Mutual labels:  pdf-converter
Superfileview
基于腾讯浏览服务Tbs,使用X5Webkit内核,实现文件的展示功能,支持多种文件格式
Stars: ✭ 1,115 (+878.07%)
Mutual labels:  docx
Serverless Html Pdf
Convert HTML to PDF thru a lambda function using PhantomJS.
Stars: ✭ 51 (-55.26%)
Mutual labels:  pdf-converter
Laravel Pdf
A Simple package for easily generating PDF documents from HTML. This package is specially for laravel but you can use this without laravel.
Stars: ✭ 79 (-30.7%)
Mutual labels:  pdf-converter
Pdfsave
Convert websites into readable PDFs
Stars: ✭ 46 (-59.65%)
Mutual labels:  pdf-converter
Remarks
Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG
Stars: ✭ 94 (-17.54%)
Mutual labels:  pdf-converter
Desktopeditors
An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents
Stars: ✭ 1,008 (+784.21%)
Mutual labels:  docx
Word2pdf Tools
📝通过LibreOffice / WPS / Microsoft Office / 第三方库 实现多种word转pdf格式的方案
Stars: ✭ 82 (-28.07%)
Mutual labels:  docx
Documentbuilder
ONLYOFFICE Document Builder is powerful text, spreadsheet, presentation and PDF generating tool
Stars: ✭ 61 (-46.49%)
Mutual labels:  docx
Academic Pandoc Template
Write beautiful academic texts with the distraction-free Pandoc Markdown and typademic.
Stars: ✭ 60 (-47.37%)
Mutual labels:  docx
Foliant
Comprehensive markdown-based documentation toolkit
Stars: ✭ 74 (-35.09%)
Mutual labels:  docx

pdf2docx

python-version codecov pypi-version license

  • Parse layout (text, image and table) from PDF file with PyMuPDF
  • Generate docx with python-docx

Features

  • [x] Parse and re-create paragraph

    • [x] text in horizontal/vertical direction: from left to right, from bottom to top
    • [x] font style, e.g. font name, size, weight, italic and color
    • [x] text format, e.g. highlight, underline, strike-through
    • [x] text alignment, e.g. left/right/center/justify
    • [x] external hyper link
    • [x] paragraph layout: horizontal alignment and vertical spacing
    • [ ] list style
  • [x] Parse and re-create image

    • [x] in-line image
    • [x] image in Gray/RGB/CMYK mode
    • [x] transparent image
    • [x] floating image, i.e. picture behind text
  • [x] Parse and re-create table

    • [x] border style, e.g. width, color
    • [x] shading style, i.e. background color
    • [x] merged cells
    • [x] vertical direction cell
    • [x] table with partly hidden borders
    • [x] nested tables
  • [x] Parsing pages with multi-processing

It can also be used as a tool to extract table contents since both table content and format/style is parsed.

Limitations

  • Text-based PDF file only
  • Normal reading direction only
    • horizontal/vertical paragraph/line/word
    • no word transformation, e.g. rotation

Documentation

Sample

sample_compare.png

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].