All Projects → dosyago → documentspark

dosyago / documentspark

Licence: other
💖 DocumentSpark - Simple secure document viewing server. Converts a document to a picture of its pages. Content disarm and reconstruction. CDR. Formerly p2. The CDR solution for ViewFinder remote browser.

Programming Languages

javascript
184084 projects - #8 most used programming language
HTML
75241 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to documentspark

P2.
📄 p2. - Simple and secure PDF to PNG server.
Stars: ✭ 191 (-9.48%)
Mutual labels:  png, docx, ppt
Desktopeditors
An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents
Stars: ✭ 1,008 (+377.73%)
Mutual labels:  docx, ppt
Msoffcrypto Tool
Python tool and library for decrypting MS Office files with passwords or other keys
Stars: ✭ 274 (+29.86%)
Mutual labels:  docx, ppt
Plagiarism Checker
A utility to check if a document's contents are plagiarised
Stars: ✭ 149 (-29.38%)
Mutual labels:  docx, ppt
workable-converter
基于libreoffice实现的文档转换项目,无框架依赖,即插即用
Stars: ✭ 74 (-64.93%)
Mutual labels:  docx, ppt
Documentbuilder
ONLYOFFICE Document Builder is powerful text, spreadsheet, presentation and PDF generating tool
Stars: ✭ 61 (-71.09%)
Mutual labels:  docx, ppt
Superfileview
基于腾讯浏览服务Tbs,使用X5Webkit内核,实现文件的展示功能,支持多种文件格式
Stars: ✭ 1,115 (+428.44%)
Mutual labels:  docx, ppt
Documentserver
ONLYOFFICE Document Server is an online office suite comprising viewers and editors for texts, spreadsheets and presentations, fully compatible with Office Open XML formats: .docx, .xlsx, .pptx and enabling collaborative editing in real time.
Stars: ✭ 2,335 (+1006.64%)
Mutual labels:  docx, ppt
Wedge
可配置的小说下载及电子书生成工具
Stars: ✭ 62 (-70.62%)
Mutual labels:  rtf, docx
GemBox.Document.Examples
Read, write, convert and print document files (DOCX, DOC, PDF, HTML, XPS, RTF, and TXT) in a simple and efficient way.
Stars: ✭ 53 (-74.88%)
Mutual labels:  rtf, docx
Koodo Reader
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
Stars: ✭ 2,938 (+1292.42%)
Mutual labels:  rtf, docx
Phpword
A pure PHP library for reading and writing word processing documents
Stars: ✭ 6,017 (+2751.66%)
Mutual labels:  rtf, docx
arrow-finder
These docs help you to find and use arrows you need more quickly
Stars: ✭ 24 (-88.63%)
Mutual labels:  png, rtf
xaringanBuilder
An R package for building xaringan slides into multiple outputs, including html, pdf, png, gif, pptx, and mp4.
Stars: ✭ 157 (-25.59%)
Mutual labels:  png
gfxprim
Open-source modular 2D bitmap graphics library with emphasis on speed and correctness.
Stars: ✭ 32 (-84.83%)
Mutual labels:  png
oculante
A minimalistic crossplatform image viewer written in rust
Stars: ✭ 169 (-19.91%)
Mutual labels:  png
ehrbase
An open source openEHR server
Stars: ✭ 137 (-35.07%)
Mutual labels:  cdr
StbSharp
C# port of the famous C framework
Stars: ✭ 62 (-70.62%)
Mutual labels:  png
PDFConverter
Best PDF Converter! PDF to any format, pdf2word/excel/xml/html/txt...
Stars: ✭ 94 (-55.45%)
Mutual labels:  docx
kodbox
kodbox is a file manager for web. It is a newly designed product based on kodexplorer. It is also a web code editor, which allows you to develop websites directly within the web browser.You can run kodbox either online or locally,on Linux, Windows or Mac based platforms
Stars: ✭ 1,188 (+463.03%)
Mutual labels:  docx

💖 DocumentSpark

Simple secure document viewing server. Used by Viewfinder™

Converts a document to a picture of its pages. View a document from the internet without downloading or running it on your machine, and without needing a word processor, spreadsheet app, or PDF viewer installed. This provides content disarm and reconstruction, or CDR. Also known as p2., this code is deployed commercially by Dosyago in their ViewFinder cloud browser product.

From the comments

This is a very simple server in NodeJS to accept a document upload (or a URL) and convert that document (using ImageMagick, LibreOffice and GhostScript) into a series of images, one for each page of the document.

The point was originally to allow people to view documents securely (such as email attachments) without needing to run nor download said document to their own devices. It was successful in doing that, but its use grew to becoming ad-hoc document hosting where people were attracted to the ability to access a page of a document, without needing to download the entire document.

The code is shared as something you can build upon and adapt to your uses in the open. It's not meant as a finished solution, it's meant as a starting point, something to give you ideas for how to implement your own version, or something to plug in to your own open-source work. The project was originally called "p2." for "PDF to ...", but it works on a wide range of source documents, including DOCX and (often but not always) XLSX, and so on. It doesn't work on HTML or TXT.

Use it

$ npm i documentspark@latest
$ cd node_modules/documentspark
$ ./setup.sh 
$ ./restart.sh

If you have SSL certs in $HOME/sslcerts/ these will be used, if not the server will run on HTTP. It will run under pm2 and default to port 443. You can supply a custom port with npm start <PORT>.

Navigate to yourserver:your_port/secretpage-canneverbefound.html to convert a document. You can input either a file, or a URL. It may not always be possible to obtain a document from the URL.

Document view pages are not protected by any authentication, they are simply chosen pseudo-randomly. You can modify the code to give document viewing pages longer, more securely random URLs.

By default, converted documents are cleaned out after 3 days. You can change this in /public/uploads/clean.sh which runs every few minutes and cleans any documents older than 4319 minutes (roughly 3 days).

Make it an API

There's a very simple "master key" secret parameter sent with the POST request. You can call this POST endpoint via a secure HTTPS API (using multitype/form encoding) and pass your custom secret= as a parameter to authorize the conversion.

System Requirements

You need a beefy machine. 4 cores, with 8 GB RAM for most documents. But more is better. Smaller machines will routinely run out of memory or take a long time when running the libreoffice, imagemagick and gs jobs.

Improving perf

You can try recompiling ImageMagick to have multicore support. I found this significantly improves performance.

Thanks to*

*No affiliation

License

Licensed under AGPL-3.0.

If you'd like to deploy this in your org without going open-source or for a for-profit project where youd want to release the source under AGPL-3.0 as well, write me ([email protected]) about a license exemption.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].