All Projects → TomasHubelbauer → modern-office-git-diff

TomasHubelbauer / modern-office-git-diff

Licence: MIT license
An experiment in tracking and diffing versions of modern Microsoft Office files in Git.

Programming Languages

javascript
184084 projects - #8 most used programming language
powershell
5483 projects
shell
77523 projects

Projects that are alternatives of or similar to modern-office-git-diff

Documentserver
ONLYOFFICE Document Server is an online office suite comprising viewers and editors for texts, spreadsheets and presentations, fully compatible with Office Open XML formats: .docx, .xlsx, .pptx and enabling collaborative editing in real time.
Stars: ✭ 2,335 (+4478.43%)
Mutual labels:  xlsx, office, docx, pptx
opentbs
With OpenTBS you can merge OpenOffice - LibreOffice and Ms Office documents with PHP using the TinyButStrong template engine. Simple use OpenOffice - LibreOffice or Ms Office to edit your templates : DOCX, XLSX, PPTX, ODT, OSD, ODP and other formats. That is the Natural Template philosophy.
Stars: ✭ 48 (-5.88%)
Mutual labels:  xlsx, docx, pptx, microsoft-office
redmine preview office
Plugin for Redmine. Preview Microsoft Office Documents in Redmine's preview pane
Stars: ✭ 27 (-47.06%)
Mutual labels:  xlsx, office, docx, pptx
Unioffice
Pure go library for creating and processing Office Word (.docx), Excel (.xlsx) and Powerpoint (.pptx) documents
Stars: ✭ 3,111 (+6000%)
Mutual labels:  xlsx, docx, pptx
Desktopeditors
An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents
Stars: ✭ 1,008 (+1876.47%)
Mutual labels:  xlsx, office, docx
Quip Export
Export all folders and documents from Quip
Stars: ✭ 28 (-45.1%)
Mutual labels:  xlsx, office, docx
Gotenberg
A Docker-powered stateless API for PDF files.
Stars: ✭ 3,272 (+6315.69%)
Mutual labels:  xlsx, docx, pptx
Open Xml Sdk
Open XML SDK by Microsoft
Stars: ✭ 3,005 (+5792.16%)
Mutual labels:  office, docx, pptx
eoffice
Export and import graphics and tables to MicroSoft office
Stars: ✭ 19 (-62.75%)
Mutual labels:  office, docx, pptx
Kodexplorer
A web based file manager,web IDE / browser based code editor
Stars: ✭ 5,490 (+10664.71%)
Mutual labels:  xlsx, docx
Luckysheet
Luckysheet is an online spreadsheet like excel that is powerful, simple to configure, and completely open source.
Stars: ✭ 9,772 (+19060.78%)
Mutual labels:  xlsx, office
eec
A fast and lower memory excel write/read tool.一个非POI底层,支持流式处理的高效且超低内存的Excel读写工具
Stars: ✭ 93 (+82.35%)
Mutual labels:  xlsx, office
Msoffcrypto Tool
Python tool and library for decrypting MS Office files with passwords or other keys
Stars: ✭ 274 (+437.25%)
Mutual labels:  xlsx, docx
Documentbuilder
ONLYOFFICE Document Builder is powerful text, spreadsheet, presentation and PDF generating tool
Stars: ✭ 61 (+19.61%)
Mutual labels:  xlsx, docx
Excelize
Golang library for reading and writing Microsoft Excel™ (XLSX) files.
Stars: ✭ 10,286 (+20068.63%)
Mutual labels:  xlsx, office
Phpspreadsheet
A pure PHP library for reading and writing spreadsheet files
Stars: ✭ 10,627 (+20737.25%)
Mutual labels:  xlsx, office
Sonar Cnes Report
Generates analysis reports from SonarQube web API.
Stars: ✭ 145 (+184.31%)
Mutual labels:  xlsx, docx
kodbox
kodbox is a file manager for web. It is a newly designed product based on kodexplorer. It is also a web code editor, which allows you to develop websites directly within the web browser.You can run kodbox either online or locally,on Linux, Windows or Mac based platforms
Stars: ✭ 1,188 (+2229.41%)
Mutual labels:  xlsx, docx
TemplaterExamples
Creating reports in .NET and Java
Stars: ✭ 37 (-27.45%)
Mutual labels:  xlsx, docx
ExcelFormulaBeautifier
Excel Formula Beautifer,make Excel formulas more easy to read,Excel公式格式化/美化,将Excel公式转为易读的排版
Stars: ✭ 27 (-47.06%)
Mutual labels:  xlsx, office

Modern Office Git Diff

An experiment in tracking and diffing versions of modern Microsoft Office files in Git.

Modern Office file formats are ZIP archives with XML files in them. The ZIP archives are binary files so Git (and furthemore GitHub, GitLab where diff cannot be tweaked) won't display a nice diff for them. The XML files are not binary, so in order to display a diff for these, this unpacks the ZIP files to directories that are tracked in Git. Tracking generated files is pretty dumb, but so is tracking binary files and when forced to have one, it's not a leap to have the other as well if it bring something useful to the table.

This is achieved using a PowerShell script which unpacks the ZIP file to a tracked directory, formats the XML files for nice diff and tracks the formatted files as well.

Looking for OpenOffice format support? Check out Tim Wiel's version

Examples:

The XML diff captures the exact change whereas the TXT diff captures text-only change for quick content inspection.

Features:

  • Every Office file (DOCX, XLSX, PPTS) has complementary .git directory with XML and TXT files for diffing
  • Formatting XML files for nicer diffing
  • Generating TXT files from just text nodes for lossy text-only diffing
  • Warning in extracted and generated content about read-onliness of the data
  • Skipping processing unchanged files for fast operation even in repos with many Office files
  • Removing associated generated content automatically for Office files that have been removed from the repo
  • Ability to run as a Git hook for worry free tracking

Limitations:

  • Stores compressed and uncompressed versions in Git - by design, for plain text diffing and binary source of truth
  • No support for DOC, XLS and PPT, only XLSX, DOCX and PPTX (XML based formats) - by design, no use diffing binary formats
  • Risk of getting generated files out of sync if hook is not run or a manual edit is made to the generated files
  • Won't process files uploaded to repository through GitHub/GitLab online UI (no pre-commit hook)

Support:

  • Windows: 10.0.16299+ (cmd -c ver)
  • Ubuntu: 16.0.0+ (lsb-release -r)

Running

Run PowerShell scripts using VS Code PowerShell Integrated Console to avoid security blocks. Open it by clicking on any .ps1 file with integrated terminal open or running the PowerShell: Show Integrated Console VS Code command (F1+(p+s+c+i)).

  • Run cmd/version-office-files.ps1 from the command line
  • Run cmd/edit-in-powrshell-ise.ps1 to open in PowerShell ISE (Integrated Shell Environment)
  • Add a Git pre-commit hook:
cp .git/hooks/pre-commit.sample .git/hooks/pre-commit
code .git/hooks/pre-commit

Observe commit diffs to see Office file changes in the XML and TXT files.

Testing

Run PowerShell scripts using VS Code PowerShell Integrated Console to avoid security blocks. Open it by clicking on any .ps1 file with integrated terminal open or running the PowerShell: Show Integrated Console VS Code command (F1+(p+s+c+i)).

Run cmd/run-tests.ps1 which will run NodeJS tests in test/ (prerequisites).

In this repository, the tests run together with the main script in a pre-commit hook in order to catch any bugs as soon as possible during development. When using this script as a tool in a repository other than this one, only the main script would be ran as shown in the Git pre-commit hook setup code.

Portability

Use WSL (Ubuntu) to test portability of the PowerShell script. Use lsb_release -a to find WSL Ubuntu version and use PowerShell Linux installation instructions

To-Do

Document how GitHub Codespaces change the game for running this script even when editing online

Of course the basic web editor UI still won't…

Configure Git to shut up about line endings in CI

Upgrade Node to latest and drop MJS in favor of ESM

See if VS Code SCM UI could be made to run the hook in PowerShell

The privileges security thing currently makes committing through VS Code fail.

Contributing

Use hook/pre-commit-development.sh when contributing to this repository to also run tests.

Related Works

Derived works based on this project:

Some notable prior art:

All of these focus on on-demand (non-tracked) generating of text-only versions of the files, do not capture structure changes. This project aims to explore the other, potentially less useful, but nonetheless interesting, route of versioning both the compressed and the uncompressed forms of a file in parallel. See the Features and Limitations sections for pros and cons.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].