All Projects → green-coder → cdc

green-coder / cdc

Licence: MIT license
A library for performing Content-Defined Chunking (CDC) on data streams.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to cdc

tornado
The Tornado 🌪️ framework, designed and implemented for adaptive online learning and data stream mining in Python.
Stars: ✭ 110 (+511.11%)
Mutual labels:  data-stream
HadoopDedup
🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (+50%)
Mutual labels:  cdc
azure-sql-db-change-stream-debezium
SQL Server Change Stream sample using Debezium
Stars: ✭ 74 (+311.11%)
Mutual labels:  cdc
imgref
A trivial Rust struct for interchange of pixel buffers with width, height & stride
Stars: ✭ 45 (+150%)
Mutual labels:  rust-library
prettysize-rs
Pretty-print file sizes and more
Stars: ✭ 29 (+61.11%)
Mutual labels:  rust-library
actix-derive
[ARCHIVED] development moved into main actix repo
Stars: ✭ 38 (+111.11%)
Mutual labels:  rust-library
rsmorphy
Morphological analyzer / inflection engine for Russian and Ukrainian languages rewritten in Rust
Stars: ✭ 27 (+50%)
Mutual labels:  rust-library
southpaw
⚾ Streaming left joins in Kafka for change data capture
Stars: ✭ 48 (+166.67%)
Mutual labels:  cdc
pg-logical-replication
PostgreSQL Logical Replication client for node.js
Stars: ✭ 56 (+211.11%)
Mutual labels:  cdc
rust-ipfs-api
Rust language IPFS API implementation
Stars: ✭ 20 (+11.11%)
Mutual labels:  rust-library
httper
An asynchronous HTTP(S) client built on top of hyper.
Stars: ✭ 16 (-11.11%)
Mutual labels:  rust-library
finny.rs
Finite State Machines for Rust
Stars: ✭ 48 (+166.67%)
Mutual labels:  rust-library
e621 downloader
E621 and E926 downloader made in the Rust programming langauge.
Stars: ✭ 39 (+116.67%)
Mutual labels:  rust-library
rust-lcms2
ICC color profiles in Rust
Stars: ✭ 25 (+38.89%)
Mutual labels:  rust-library
rdp
A library providing FFI access to fast Ramer–Douglas–Peucker and Visvalingam-Whyatt line simplification algorithms
Stars: ✭ 20 (+11.11%)
Mutual labels:  rust-library
type-metadata
Rust type metadata reflection library
Stars: ✭ 27 (+50%)
Mutual labels:  rust-library
analyzing-reddit-sentiment-with-aws
Learn how to use Kinesis Firehose, AWS Glue, S3, and Amazon Athena by streaming and analyzing reddit comments in realtime. 100-200 level tutorial.
Stars: ✭ 40 (+122.22%)
Mutual labels:  data-stream
webbrowser-rs
Rust library to open URLs in the web browsers available on a platform
Stars: ✭ 150 (+733.33%)
Mutual labels:  rust-library
mailparse
Rust library to parse mail files
Stars: ✭ 148 (+722.22%)
Mutual labels:  rust-library
rabe
rabe is an Attribute Based Encryption library, written in Rust
Stars: ✭ 52 (+188.89%)
Mutual labels:  rust-library

cdc

A library for performing Content-Defined Chunking (CDC) on data streams. Implemented using generic iterators, very easy to use.

Example

  let reader: BufReader<File> = BufReader::new(file);
  let byte_iter = reader.bytes().map(|b| b.unwrap());

  // Finds and iterates on the separators.
  for separator in SeparatorIter::new(byte_iter) {
    println!("Index: {}, hash: {:016x}", separator.index, separator.hash);
  }

Each module is documented via an example which you can find in the examples/ folder.

To run them, use a command like:

cargo run --example separator --release

Note: Some examples are looking for a file named myLargeFile.bin which I didn't upload to Github. Please use your own files for testing.

What's in the crate

From low level to high level:

  • A RollingHash64 trait, for rolling hash with a 64 bits hash value.

  • Rabin64, an implementation of the Rabin Fingerprint rolling hash with a 64 bits hash value.

  • Separator, a struct which describes a place in a data stream identified as a separator.

  • SeparatorIter, an adaptor which takes an Iterator<Item=u8> as input and which enumerates all the separators found.

  • Chunk, a struct which describes a piece of the data stream (index and size).

  • ChunkIter, an adaptor which takes an Iterator<Item=Separator> as input and which enumerates chunks.

Implementation details

  • The library is not cutting any files, it only provides information on how to do it.

  • You can change the default window size used by Rabin64, and how the SeparatorIter is choosing the separator.

  • The design of this crate may be subject to changes sometime in the future. I am waiting for some features of Rust to mature up, specially the impl Trait feature.

Performance

There is a huge difference between the debug build and the release build in terms of performance. Remember that when you test the lib, use cargo run --release.

I may try to improve the performance of the lib at some point, but for now it is good enough for most usages.

License

Coded with ❤️ , licensed under the terms of the MIT license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].