dothinking / Pdf2docx
Licence: gpl-3.0
Parse PDF file with PyMuPDF and generate docx with python-docx
Stars: ✭ 114
Programming Languages
python
139335 projects - #7 most used programming language
Labels
Projects that are alternatives of or similar to Pdf2docx
Net Core Docx Html To Pdf Converter
.NET Core library to create custom reports based on Word docx or HTML documents and convert to PDF
Stars: ✭ 133 (+16.67%)
Mutual labels: docx, pdf-converter
Hrconvert2
A self-hosted, drag-and-drop, & nosql file conversion server that supports 62x file formats.
Stars: ✭ 132 (+15.79%)
Mutual labels: docx, pdf-converter
Docconv
Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text
Stars: ✭ 735 (+544.74%)
Mutual labels: docx, pdf-converter
Marigold.openxhtml
MariGold.OpenXHTML is a wrapper library for Open XML SDK to convert HTML documents into Open XML word documents.
Stars: ✭ 44 (-61.4%)
Mutual labels: docx
Thiefmd
The markdown editor worth stealing. Inspired by Ulysses, based on code from Quilter
Stars: ✭ 48 (-57.89%)
Mutual labels: docx
Sharpdocx
C# based template engine for generating Word documents
Stars: ✭ 100 (-12.28%)
Mutual labels: docx
Docx
Fast and easy to use .NET library that creates or modifies Microsoft Word files without installing Word.
Stars: ✭ 1,288 (+1029.82%)
Mutual labels: docx
Superfileview
基于腾讯浏览服务Tbs,使用X5Webkit内核,实现文件的展示功能,支持多种文件格式
Stars: ✭ 1,115 (+878.07%)
Mutual labels: docx
Serverless Html Pdf
Convert HTML to PDF thru a lambda function using PhantomJS.
Stars: ✭ 51 (-55.26%)
Mutual labels: pdf-converter
Laravel Pdf
A Simple package for easily generating PDF documents from HTML. This package is specially for laravel but you can use this without laravel.
Stars: ✭ 79 (-30.7%)
Mutual labels: pdf-converter
Remarks
Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG
Stars: ✭ 94 (-17.54%)
Mutual labels: pdf-converter
Desktopeditors
An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents
Stars: ✭ 1,008 (+784.21%)
Mutual labels: docx
Word2pdf Tools
📝通过LibreOffice / WPS / Microsoft Office / 第三方库 实现多种word转pdf格式的方案
Stars: ✭ 82 (-28.07%)
Mutual labels: docx
Documentbuilder
ONLYOFFICE Document Builder is powerful text, spreadsheet, presentation and PDF generating tool
Stars: ✭ 61 (-46.49%)
Mutual labels: docx
Academic Pandoc Template
Write beautiful academic texts with the distraction-free Pandoc Markdown and typademic.
Stars: ✭ 60 (-47.37%)
Mutual labels: docx
pdf2docx
- Parse layout (text, image and table) from PDF file with
PyMuPDF
- Generate docx with
python-docx
Features
-
[x] Parse and re-create paragraph
- [x] text in horizontal/vertical direction: from left to right, from bottom to top
- [x] font style, e.g. font name, size, weight, italic and color
- [x] text format, e.g. highlight, underline, strike-through
- [x] text alignment, e.g. left/right/center/justify
- [x] external hyper link
- [x] paragraph layout: horizontal alignment and vertical spacing
- [ ] list style
-
[x] Parse and re-create image
- [x] in-line image
- [x] image in Gray/RGB/CMYK mode
- [x] transparent image
- [x] floating image, i.e. picture behind text
-
[x] Parse and re-create table
- [x] border style, e.g. width, color
- [x] shading style, i.e. background color
- [x] merged cells
- [x] vertical direction cell
- [x] table with partly hidden borders
- [x] nested tables
-
[x] Parsing pages with multi-processing
It can also be used as a tool to extract table contents since both table content and format/style is parsed.
Limitations
- Text-based PDF file only
- Normal reading direction only
- horizontal/vertical paragraph/line/word
- no word transformation, e.g. rotation
Documentation
Sample
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].