All Projects → costerwi → Rezip

costerwi / Rezip

Licence: other
Git clean filter to output uncompressed zip files for better packing

Programming Languages

java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to Rezip

Pandoc Latex Tip
A pandoc filter for adding tip in LaTeX
Stars: ✭ 7 (-77.42%)
Mutual labels:  filter
Disposable Email Domains
a list of disposable and temporary email address domains
Stars: ✭ 873 (+2716.13%)
Mutual labels:  filter
Pornlist
Ad-blocking porn websites filter list for Adblock Plus and uBlock Origin.
Stars: ✭ 21 (-32.26%)
Mutual labels:  filter
Ios tips
iOS的一些示例,持续更新中:1、AVFoundation 高仿微信相机拍摄和编辑 2、AVFoundation 人脸检测、实时滤镜、音视频编解码、GPUImage框架的使用等音视频相关内容 3、OpenGLES 4、LeetCode算法练习 5、iOS Crash防护和APM监控 6、WKWebView相关的内容 等........
Stars: ✭ 896 (+2790.32%)
Mutual labels:  filter
Librestreaming
Android real-time effect filter rtmp streaming library.using Mediacodec HWencoding&librtmp stream.
Stars: ✭ 856 (+2661.29%)
Mutual labels:  filter
Fsharp Data Processing Pipeline
Provides an extensible solution for creating Data Processing Pipelines in F#.
Stars: ✭ 13 (-58.06%)
Mutual labels:  filter
Filterrific
Filterrific is a Rails Engine plugin that makes it easy to filter, search, and sort your ActiveRecord lists.
Stars: ✭ 810 (+2512.9%)
Mutual labels:  filter
Got Reload
Reload Go code in a running process at function/method level granularity, using Yaegi
Stars: ✭ 29 (-6.45%)
Mutual labels:  filter
Apache24 Modules
Modules for Apache 2.4 and maybe 2.2
Stars: ✭ 12 (-61.29%)
Mutual labels:  filter
Metalpetal
A GPU accelerated image and video processing framework built on Metal.
Stars: ✭ 907 (+2825.81%)
Mutual labels:  filter
Queryablelist
Python module to add support for ORM-style filtering to any list of items
Stars: ✭ 19 (-38.71%)
Mutual labels:  filter
Eloquent Filter
This simple package helps you filter Eloquent data using query filters.
Stars: ✭ 24 (-22.58%)
Mutual labels:  filter
Fsvideoview
An easy video playback view for iOS
Stars: ✭ 14 (-54.84%)
Mutual labels:  filter
Ltvideorecorder
A demo project demonstrating how to add filter, drawing, and text to a video
Stars: ✭ 16 (-48.39%)
Mutual labels:  filter
Jekyll Liquify
A Jekyll filter that parses Liquid from front matter
Stars: ✭ 21 (-32.26%)
Mutual labels:  filter
Pesdk Ios Examples
A fully customizable photo editor for your app.
Stars: ✭ 837 (+2600%)
Mutual labels:  filter
Django Suit Daterange Filter
Filter for django-admin allowing lookups by date range
Stars: ✭ 13 (-58.06%)
Mutual labels:  filter
Jsonpath Rs
JSONPath for Rust
Stars: ✭ 31 (+0%)
Mutual labels:  filter
Citadelcore
Cross platform filtering HTTP/S proxy based on .NET Standard 2.0.
Stars: ✭ 28 (-9.68%)
Mutual labels:  filter
Ng2 Flex Table
Angular 4 Table - Beautiful Table especially made for non-relational databases. With inline editing, column search & filter and fixed headers.
Stars: ✭ 15 (-51.61%)
Mutual labels:  filter

ReZip

For more efficient Git packing of ZIP based files.

Motivation

Many popular applications, such as Microsoft and Open Office, save their documents as XML in compressed zip containers. Small changes to these document's contents may result in big changes to their compressed binary container file. When compressed files are stored in a Git repository these big differences make delta compression inefficient or impossible and the repository size is roughly the sum of its revisions.

This small program acts as a Git clean filter driver. It reads a ZIP file from stdin and outputs the same ZIP content to stdout, but without compression.

pros
  • human readbale/plain-text diffs of (ZIP based) archives, (if they contain plain-text files)
  • smaller overall repository size if the archive contents change frequently
cons
  • slower git add/git commit process
  • (optional) slower checkout process

How it works

On every git add operation, the files assigned to the ZIP based file type in .gitattributes are piped through this filter to remove their compression. Git internally uses zlib compression to store the resulting blob, so the final size of the loose object in the repository is usually comparable to the size of the original compressed ZIP document.

The advantage of passing uncompressed data to Git, is that during garbage collection, when Git merges loose objects into packfiles, the delta compression it uses will be able to more efficiently pack the common data it finds among these uncompressed revisions. This can reduce the repository size by up to 50%, depending on the data.

The smudge filter will re-compress the ZIP documents when they are checked out. The rezipped file may be a different size than the original, because of the compression level used by the filter. The use of this filter at checkout will save disk space in the working directory, at the expense of performance during checkout. I have not found any application yet, that refused to read an uncompressed ZIP document, so the smudge filter is optional. This also means that repositories may be downloaded and used immediately, without any special burdon on the recipients to install this filter driver.

If other contributors add compressed ZIP documents to the repository without using the clean filter (the one applied during add/commit), the only harm will be the usual loss of packing efficiency for compressed documents during garbage collection, and non-verbose diffs.

Inspiration and similar projects

The idea to commit ZIP documents to the repository in uncompressed form was based on concepts demonstrated in the Mercurial Zipdoc extension by Andreas Gobell.

OoXmlUnpack is a similar program for Mercurial, written in C#, which also pretty-prints the XML files and adds some file handling features specific to Excel.

callegar/Rezip should be compatible with this Git filter, but is written as a bash script to drive Info-ZIP zip/unzip executables.

Zippey is a similar method available for Git, written in python, but it stores uncompressed data as custom records within the Git repository. This format is not directly usable without the smudge filter, so it is a less portable option.

Human readable diffing

This filter is only concerned with the efficient storage of ZIP data within Git. For human readable diffs between revisions, You will need to add a Git textconv program that can convert your format into text. Direct merges are not possible, since they would corrupt the ZIP CRC checksum. If the data within the ZIP is plain-text, then you could visualize differences with a textconv program like zipdoc. For more complex documents, there are domain specific options. For example for word processing, Excel, and Simulink.

Installation

This program requires Java JRE 8 or newer. Store ReZip.class somewhere in your home directory, for example ~/bin, or in your repository.

Define the filter drivers in ~/.gitconfig:

git config --global --replace-all filter.rezip.clean "java -cp ~/bin ReZip --store"
# optionally add smudge filter:
git config --global --add filter.rezip.smudge "java -cp ~/bin ReZip"

Assign filter attributes to paths in <repo-root>/.gitattributes:

# MS Office
*.docx  filter=rezip
*.xlsx  filter=rezip
*.pptx  filter=rezip
# OpenOffice
*.odt   filter=rezip
*.ods   filter=rezip
*.odp   filter=rezip
# Misc
*.mcdx  filter=rezip
*.slx   filter=rezip

As described in gitattributes, you may see unnecessary merge conflicts when you add attributes to a file that causes the repository format for that file to change. To prevent this, Git can be told to run a virtual check-out and check-in of all three stages of a file when resolving a three-way merge:

git config --add --bool merge.renormalize true

Observations

The following are based on my experience in real-world cases. Use at your own risk. Your mileage may vary.

Simulink

  • One packed repository with rezip was 54% of the size of the packed repository storing compressed ZIPs.
  • Another repository with 280 *.slx files and over 3000 commits was originally 281 MB and was reduced to 156 MB using this technique (55% of baseline).

Powerpoint

I found that the loose objects stored without this filter were about 5% smaller than the original file size (zlib on top of zip compression). When using the rezip filter, the loose objects were about 10% smaller than the original files, since zlib could work more efficiently on uncompressed data. The packed repository with rezip was only 10% smaller than the packed repository storing compressed zips. I think this unremarkable efficiency improvement is due to a large number of *.png files in the presentation which were already stored without compression in the original *.pptx.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].