All Projects → 10XGenomics → rust-shardio

10XGenomics / rust-shardio

Licence: MIT license
Out-of-memory sorting of large datasets map / reduce style processing

Programming Languages

rust
11053 projects

rust-shardio

Crates.io Downloads Crates.io Version Crates.io License Build Status Coverage Status API Docs

Library for out-of-memory sorting of large datasets which need to be processed in multiple map / sort / reduce passes.

You write a stream of items of type T implementing Serialize and Deserialize to a ShardWriter. The items are buffered, sorted according to a customizable sort key, then serialized to disk in chunks with serde + lz4, while maintaining an index of the position and key range of each chunk. You use a ShardReader to stream through a item in a selected interval of the key space, in sorted order.

See Docs for API and examples.

Note: Enable the 'full-test' feature in Release mode to turn on some long-running stress tests.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].