All Projects → SnowflakePowered → vcdiff

SnowflakePowered / vcdiff

Licence: Apache-2.0 License
Heavily optimized .NET Core vcdiff library

Programming Languages

C#
18002 projects

Projects that are alternatives of or similar to vcdiff

xdelta-sharp
Decompressor for delta encoding VCDIFF (RFC-3284) -- xdelta3 compatible.
Stars: ✭ 27 (+68.75%)
Mutual labels:  xdelta3, vcdiff
xdelta3-python
Fast delta encoding in python using xdelta3
Stars: ✭ 30 (+87.5%)
Mutual labels:  xdelta3, vcdiff
deltaq
Fast and portable delta encoding for .NET in 100% safe, managed code.
Stars: ✭ 26 (+62.5%)
Mutual labels:  diff, vcdiff
composer-diff
Compares composer.lock changes and generates Markdown report so you can use it in PR description.
Stars: ✭ 51 (+218.75%)
Mutual labels:  diff
preact-source-learn
Preact+hook源码解析
Stars: ✭ 16 (+0%)
Mutual labels:  diff
avro ex
An Avro Library that emphasizes testability and ease of use.
Stars: ✭ 47 (+193.75%)
Mutual labels:  encoding
diff-check
Incremental code analysis tools based on checkstyle, pmd and jacoco
Stars: ✭ 48 (+200%)
Mutual labels:  diff
readtext
an R package for reading text files
Stars: ✭ 102 (+537.5%)
Mutual labels:  encoding
npmfs
javascript package inspector
Stars: ✭ 90 (+462.5%)
Mutual labels:  diff
base58
Fast implementation of base58 encoding on golang.
Stars: ✭ 121 (+656.25%)
Mutual labels:  encoding
deen
Generic data DEcoding/ENcoding application built with PyQt5.
Stars: ✭ 45 (+181.25%)
Mutual labels:  encoding
urdu-characters
📄 Complete collection of Urdu language characters & unicode code points.
Stars: ✭ 24 (+50%)
Mutual labels:  encoding
iconv
Fast encoding conversion library for Erlang / Elixir
Stars: ✭ 45 (+181.25%)
Mutual labels:  encoding
FastDiff
General purpose diffing library with parent/children n-level diffing
Stars: ✭ 36 (+125%)
Mutual labels:  diff
TyStrings
strings file tool for iOS / macOS developers
Stars: ✭ 15 (-6.25%)
Mutual labels:  diff
dotfiles
my dot files with git and docker extension for windows and linux
Stars: ✭ 13 (-18.75%)
Mutual labels:  diff
Lingo
Text encoding for modern C++
Stars: ✭ 28 (+75%)
Mutual labels:  encoding
go-gitdiff
Go library for parsing and applying patches created by Git
Stars: ✭ 41 (+156.25%)
Mutual labels:  diff
euv
写一个较为强大的Vue,支持虚拟DOM、diff更新以及基本的API。'vue'.split('').sort().join('') === 'euv'
Stars: ✭ 18 (+12.5%)
Mutual labels:  diff
dark-lord-obama
AV-evading Pythonic Reverse Shell with Dynamic Adaption Capabilities
Stars: ✭ 61 (+281.25%)
Mutual labels:  encoding

vcdiff

Nuget GitHub Workflow Status Codecov

This is a hard fork of VCDiff, originally written by Metric, written primarily for use in Snowflake.

Large chunks have been rewritten, and heavily optimized to be extremely fast, using vector intrinsics, as well as Memory<byte> and Span<byte> APIs as well as a sprinkling of unsafe pointer access to eke out every bit of performance possible. Non-scientific preliminary testing shows up to a 30x to 50x speedup compared to the original library when diffing a 2MB file.

Support for xdelta3 checksums have also been included. Testing was done with xdelta 3.1, support for xdelta 3.0 patch files has not been tested. Only patch files without external compression (-S none) are supported.

Format Encoding Decoding
RFC3284-compliant VCDIFF ✔️ ✔️
SDHC with Adler32 Checksum ✔️ ✔️
SDHC Interleaved (with and without Adler32 Checksum) ✔️ ✔️
xdelta3 with Adler32 Checksum (without compression) ✔️ ✔️
xdelta3 with Adler32 Checksum and VCD_APPHEADER (without compression) ✔️
xdelta3 with external compression

Wherever possible, SSE3 or AVX2 extensions are used on supported systems. Speeds are comparable, albeit slightly slower than the native xdelta3, depending on the chosen blocksize. A lot of work has gone into optimizing out the overhead of garbage collection and memory access through Memory<T>, as well as parallelizing computational work with SIMD extensions.

The original readme, with some changes to the API usage examples

This is a full implementation of open-vcdiff in C# based on Google's open-vcdiff. This is written entirely in C# - no external C++ libraries required. This includes proper SDHC support with interleaving and checksums. The only thing it does not support is encoding with a custom CodeTable currently. Will be added later if requested, or feel free to add it in and send a pull request.

It is fully compatible with Google's open-vcdiff for encoding and decoding. If you find any bugs please let me know. I tried to test as thoroughly as possible between this and Google's github version. The largest file I tested with was 10MB. Should be able to support up to 2-4GB depending on your system.

Requirements

Vector intrinsics and the Span<T> and Memory<T> memory APIs require .netstandard 2.1.

Encoding Data

The dictionary must be a file or data that is already in memory. The file must be fully read in first in order to encode properly. This is just how the algorithm works for VCDiff. The encode function is blocking.

using VCDiff.Include;
using VCDiff.Encoders;
using VCDiff.Shared;

void DoEncode() {
    using(FileStream output = new FileStream("...some output path", FileMode.Create, FileAccess.Write))
    using(FileStream dict = new FileStream("..dictionary / old file path", FileMode.Open, FileAccess.Read))
    using(FileStream target = new FileStream("..target data / new data path", FileMode.Open, FileAccess.Read)) {
        VcEncoder coder = new VcEncoder(dict, target, output);
        VCDiffResult result = coder.Encode(); //encodes with no checksum and not interleaved
        if(result != VCDiffResult.SUCCESS) {
            //error was not able to encode properly
        }
    }
}

Encoding with checksum or interleaved or both

encoder.Encode(interleaved: true, checksum: false);
encoder.Encode(interleaved: true, checksum: true);
encoder.Encode(interleaved: false, checksum: true);

Modifying the default chunk size for windows

int windowSize = 2; //in Megabytes. The default is 1MB window chunks.

VcEnoder coder = new VcEncoder(dict, target, output, windowSize)

Modifying the default minimum copy encode size. Which means the match must be >= MinBlockSize in order to qualify as match for copying from dictionary file.

// chunkSize is the minimum copy encode size.
// Default is 32 bytes. Lowering this can improve the delta compression for small files. 
// It must be a power of 2. 
VcEncoder coder = new VcEncoder(dict, target, output, blockSize: 8, chunkSize: 16);

Modifying the default BlockSize for hashing

// Increasing blockSize for large files with similar data can improve results.
VcEncoder coder = new VcEncoder(dict, target, output, blockSize: 32);

Decoding Data

The dictionary must be a file or data that is already in memory. The file must be fully read in first in order to decode properly.

Due note the interleaved version of a delta file is meant for streaming and it is supported by the decoder already. However, non-interleaved expects access for reading the full delta file at one time. The delta file is still streamed, but must be able to read fully in sequential order.

using VCDiff.Include;
using VCDiff.Decoders;
using VCDiff.Shared;

void DoDecode() {
    using (FileStream output = new FileStream("...some output path", FileMode.Create, FileAccess.Write))
    using (FileStream dict = new FileStream("..dictionary / old file path", FileMode.Open, FileAccess.Read))
    using (FileStream target = new FileStream("..delta encoded part", FileMode.Open, FileAccess.Read)) {
        VcDecoder decoder = new VcDecoder(dict, target, output);

        // The header of the delta file must be available before the first call to decoder.Decode().
        long bytesWritten = 0;
        VCDiffResult result = decoder.Decode(out bytesWritten);

        if(result != VCDiffResult.SUCCESS) {
            //error decoding
        }

        // if success bytesWritten will contain the number of bytes that were decoded
    }
}

Handling streaming of the interleaved format has the same setup. But instead you will continue calling decode until you know you have received everything. So, you will need to keep track of that. Everytime you loop through make sure you have enough data in the buffer to at least be able to decode the next VCDiff Window Header (which can be up to 22 bytes or so). After that the decode function will handle the waiting for the next part of the interleaved data for that VCDiff Window. The decode function is blocking.

while (bytesWritten < someSizeThatYouAreExpecting) {
    // make sure we have enough data in buffer to at least try and decode the next window section
    // otherwise we will probably receive an error.
    if(myStream.Length < 22) continue; 

    long thisChunk = 0;
    VCDiffResult result = decoder.Decode(out thisChunk);

    bytesWritten += thisChunk;

    if (result == VCDiffResult.ERROR) {
        // it failed to decode something
        // could be an issue that the window failed to parse
        // or actual data failed to decode properly
        break;
    }

    // otherwise continue on if you get SUCCESS or EOD (End of Data);
    // because only you know when you will have the data finished loading
    // the decoder doesn't care if nothing is available and it will keep trying until more is
}

License

vcdiff is a derivative work of open-vcdiff and xdelta3, and thus is also licensed under the Apache Public License 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].