All Projects → oll3 → Bita

oll3 / Bita

Licence: mit
Differential file synchronization over http

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to Bita

Casync
Content-Addressable Data Synchronization Tool
Stars: ✭ 890 (+513.79%)
Mutual labels:  synchronization, download
Download Manager
谷歌浏览器下载管理器插件【A chrome extension for managing download】
Stars: ✭ 141 (-2.76%)
Mutual labels:  download
Skill Share Crawler Dl
Download Videos Skill Share per ID or per Class
Stars: ✭ 122 (-15.86%)
Mutual labels:  download
Aria2 With Webui
docker of aria2 & webui
Stars: ✭ 132 (-8.97%)
Mutual labels:  download
Youtubeexplode
The ultimate dirty YouTube library
Stars: ✭ 1,775 (+1124.14%)
Mutual labels:  download
Videodownloadhelper
Chrome Extension to Help Download Video for Some Video Sites.
Stars: ✭ 136 (-6.21%)
Mutual labels:  download
Alltube
Web GUI for youtube-dl
Stars: ✭ 1,925 (+1227.59%)
Mutual labels:  download
Fgdownloader
用于断点下载、任务队列、上传进度、下载进度
Stars: ✭ 143 (-1.38%)
Mutual labels:  download
Magnetx
资源搜索型软件 macOS OSX magnet
Stars: ✭ 1,819 (+1154.48%)
Mutual labels:  download
Magnetw
磁力链接聚合搜索
Stars: ✭ 12,621 (+8604.14%)
Mutual labels:  download
Releases
dahliaOS ISO releases
Stars: ✭ 125 (-13.79%)
Mutual labels:  download
The Economist Ebooks
经济学人(含音频)、纽约客、自然、新科学人、卫报、科学美国人、连线、大西洋月刊、新闻周刊、国家地理等英语杂志免费下载、订阅(kindle推送),支持epub、mobi、pdf格式, 每周更新. The Economist 、The New Yorker 、Nature、The Atlantic 、New Scientist、The Guardian、Scientific American、Wired、Newsweek magazines, free download and subscription for kindle, mobi、epub、pdf format.
Stars: ✭ 3,471 (+2293.79%)
Mutual labels:  download
4chan Downloader
Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation
Stars: ✭ 136 (-6.21%)
Mutual labels:  download
Vs Deploy
Visual Studio Code extension that provides commands to deploy files of a workspace to a destination.
Stars: ✭ 123 (-15.17%)
Mutual labels:  download
Frontend Download Sample
🎄 自己整理的一些项目中遇到过的关于上传和下载的一些Demo,仅供给位看官参考,避免踩坑,即插即用,欢迎fork和star🌟,为这个仓库添砖加瓦~(P.S. 个人认为如果没写过上传下载其实还是挺麻烦的~)
Stars: ✭ 142 (-2.07%)
Mutual labels:  download
Syncthing Android
Wrapper of syncthing for Android.
Stars: ✭ 1,812 (+1149.66%)
Mutual labels:  synchronization
Downloadsearch
search for any kinds of files to download
Stars: ✭ 124 (-14.48%)
Mutual labels:  download
Realm Studio
Realm Studio
Stars: ✭ 134 (-7.59%)
Mutual labels:  synchronization
M3u8downloader
M3U8下载库,可以实现M3U8视频的下载,支持M3U8重定向,同时支持其他格式文件下载,例如MP4文件,目前支持kotlin、java、python3.x 三个语言版本
Stars: ✭ 145 (+0%)
Mutual labels:  download
Desync
Alternative casync implementation
Stars: ✭ 140 (-3.45%)
Mutual labels:  synchronization

CI MIT licensed

bita

bita is a HTTP based file synchronization tool striving for low bandwidth usage through data reuse.

  • Clone from remote while reusing data from any local file or device 📁
  • Clone using a file or block device as output 💾
  • Host archives using any regular HTTP/HTTPS server or service 🔗
  • Include in your own project with the bitar library 💫
  • Written in Rust for fun, performance and quality 🚀♥

Software updates

bita is a generic file synchronization tool but has been developed with software update of embedded/IoT systems in mind.

Software update is a typical case where bita may provide significant bandwidth reductions, where one can expect that a new software image will contain a lot of data already present on the system being updated. bita can identify the parts (chunks) already present on the system and fetch the missing ones from remote, still outputing an exact clone of the archived source file.

No need to pre-build patch files for going to/from different release versions. No need to run any special file server. Just bita compress the release image, upload the archive to any HTTP file hosting site. And bita clone the archive using whatever local data is available on the system.

concept

Compressing

On compression the input file is scanned for chunk boundaries using a rolling hash. With the default setting a suitable boundary should be found every ~64 KiB. A chunk is defined as the data contained between two boundaries. For each chunk a strong hash is generated (using blake2). The chunk location (offset and size) in the input file and the strong hash is then stored in the dictionary. If chunk's strong hash has not been seen before the chunk data is also compressed (using brotli) and inserted into the output archive.

The final archive will contain a dictionary describing the order of chunks in the input file and the compressed chunks necessary to rebuild the input file. The archive will also contain the configuration used when scanning input for chunks.

Cloning

On clone the dictionary and chunker configuration is first fetched from the remote archive. Then the given seed file(s) are scanned for chunks present in the dictionary. Scanning is done using the same configuration as when building the archive. Any chunk found in a seed file will be copied into the output file at the location(s) specified by the dictionary. When all seeds has been consumed the chunks still missing, if any, is fetched from the remote archive, decompressed and inserted into the output file.

To keep the HTTP overhead low while cloning all adjacent chunks are fetched with a single request. And if possible the same connection is used for the whole clone operation.

bita can also use the output file as seed and reorganize chunks in place. Except using this for the obvious reason of saving bandwidth this will also let bita avoid writing chunks that are already in place in the output file. This may be useful if writing to storage is either slow or we want to avoid tearing on the storage device.

Each chunk, both fetched from seed and from archive, is verified by its strong hash before written to the output. bita avoids using any extra storage space while cloning, the only file written to is the given output file.

Scanning for chunks

The process of splitting a file into chunks is heavily inspired by the one used in rsync. Where a window of bytes (default 64 for RollSum and 20 for BuzHash) is sliding through the input data, one byte at a time.

For every position of the window a short checksum is generated. If we're assuming that the checksum has an even distribution we can say that with some probability this checksum will be within a range of values at every n interval of bytes, where n represents the average target chunk size.

When the checksum is within this range a chunk boundary has been found. A strong hash (blake2) is then generated for the data between the last boundary and this one. The strong hash is the one used to identify this chunk while the weaker rolling hash is never stored but only used for finding chunk boundaries.

The average target chunk size and the upper/lower limit of a chunk's size is runtime configurable.

Server requirements

The server serving bita archives can be any HTTP/HTTPS server supporting range requests, which should be most.

Install from crates.io

Install bita using cargo:

[email protected]:~$ cargo install bita

Build from source

[email protected]:~$ cargo build

Build in release mode with rustls TLS backend:

[email protected]:~$ cargo build --release --no-default-features --features rustls-tls

Example usage

Create a compressed archive release_v1.1.ext4.cba from file release_v1.1.ext4:

[email protected]:~$ bita compress -i release_v1.1.ext4 release_v1.1.ext4.cba

Clone using block device /dev/mmcblk0p1 as seed and /dev/mmcblk0p2 as target:

[email protected]:~$ bita clone --seed /dev/mmcblk0p1 https://host/release_v1.1.ext4.cba /dev/mmcblk0p2

Clone and use output (/dev/mmcblk0p1) as seed while cloning:

[email protected]:~$ bita clone --seed-output https://host/release_v1.1.ext4.cba /dev/mmcblk0p1

Local archives can also be cloned:

[email protected]:~$ bita clone --seed-output local.cba local_output.file

Clone file at https://host/new.tar.cba using stdin (-) and block device /dev/sda1 as seed:

[email protected]:~$ gunzip -c old.tar.gz | bita clone --seed /dev/sda1 --seed - https://host/new.tar.cba new.tar

Compare two filesystem images to see how much content they share with different chunking parameters:

[email protected]:~$ bita diff release_v1.0.ext4 release_v1.1.ext4
[email protected]:~$ bita diff --hash-chunking BuzHash --avg-chunk-size 8KiB release_v1.0.ext4 release_v1.1.ext4

Similar tools and inspiration

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].