All Projects → ParkMyCar → compact_str

ParkMyCar / compact_str

Licence: MIT license
A memory efficient string type that can store up to 24* bytes on the stack

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to compact str

string-combinations
A simple, low-memory footprint function to generate all string combinations from a series of characters.
Stars: ✭ 25 (-92.24%)
Mutual labels:  string, memory
Superstring
A fast and memory-optimized string library for C++
Stars: ✭ 252 (-21.74%)
Mutual labels:  string, memory
Competitive Programming
Programming👨‍💻 Questions on BinarySearch💻, LeetCode💻, CodeChef💻, Codeforces💻,DSA 450
Stars: ✭ 188 (-41.61%)
Mutual labels:  string
DigitText
The module allows to translate numbers into a text equivalent. This is important in the billing.
Stars: ✭ 22 (-93.17%)
Mutual labels:  string
semi-memory
Tensorflow Implementation on Paper [ECCV2018]Semi-Supervised Deep Learning with Memory
Stars: ✭ 49 (-84.78%)
Mutual labels:  memory
lessram
Pure PHP implementation of array data structures that use less memory.
Stars: ✭ 20 (-93.79%)
Mutual labels:  memory
diepindepth
Collection of protocol, memory, and other information for the browser game diepio
Stars: ✭ 39 (-87.89%)
Mutual labels:  memory
is-primitive
Is the typeof value a javascript primitive?
Stars: ✭ 35 (-89.13%)
Mutual labels:  string
mh
A memory editor for iOS/macOS with JavaScript support
Stars: ✭ 35 (-89.13%)
Mutual labels:  memory
node-inline-assets
Node API, CLI and Grunt Task for inlining external assets of HTML/CSS files
Stars: ✭ 18 (-94.41%)
Mutual labels:  inline
as-string-sink
An efficient dynamically sized string buffer (aka String Builder) for AssemblyScript
Stars: ✭ 23 (-92.86%)
Mutual labels:  string
comment-mark
Interpolate strings with HTML comment markers!
Stars: ✭ 21 (-93.48%)
Mutual labels:  string
String.prototype.trim
ES5 spec-compliant shim for String.prototype.trim
Stars: ✭ 13 (-95.96%)
Mutual labels:  string
jest-serializer-html-string
A better Jest snapshot serializer for plain html strings
Stars: ✭ 17 (-94.72%)
Mutual labels:  string
obj-to-table
Create a table from an array of objects
Stars: ✭ 15 (-95.34%)
Mutual labels:  string
goin
`in` operator for go
Stars: ✭ 17 (-94.72%)
Mutual labels:  string
cache-bench
Explore the impact of virtual memory settings on caching efficiency on Linux systems under memory pressure
Stars: ✭ 25 (-92.24%)
Mutual labels:  memory
nova-inline-morph-to
A Laravel Nova field for displaying morphTo relationship inline.
Stars: ✭ 32 (-90.06%)
Mutual labels:  inline
cpu-memory-monitor
CPU & Memory Monitor, auto dump.
Stars: ✭ 26 (-91.93%)
Mutual labels:  memory
string-math
Evaluates a math expression from a string. Supports variables and custom operators.
Stars: ✭ 14 (-95.65%)
Mutual labels:  string

compact_str

A memory efficient string type that can store up to 24* bytes on the stack.

version on crates.io Minimum supported Rust Version: 1.57 mit license
Continuous Integration Status Cross Platform Status Minimum Supported Rust Version Status Clippy Status

* 12 bytes for 32-bit architectures


About

A CompactString is a more memory efficient string type, that can store smaller strings on the stack, and transparently stores longer strings on the heap (aka a small string optimization). They can mostly be used as a drop in replacement for String and are particularly useful in parsing, deserializing, or any other application where you may have smaller strings.

Properties

A CompactString specifically has the following properties:

  • size_of::<CompactString>() == size_of::<String>()
  • Stores up to 24 bytes on the stack
    • 12 bytes if running on a 32 bit architecture
  • Strings longer than 24 bytes are stored on the heap
  • Clone is O(n)
  • Conversion From<String> or From<Box<str>> is O(1)
  • Heap based string grows at a rate of 1.5x
    • The std library String grows at a rate of 2x
  • Space optimized for Option<_>
    • size_of::<CompactString>() == size_of::<Option<CompactString>>()

Traits

This crate exposes two traits, ToCompactString and CompactStringExt.

ToCompactString

Provides the to_compact_string(&self) method for converting types into a CompactString. This trait is automatically implemented for all types that are std::fmt::Display, with specialized higher performance impls for:

  • u8, u16, u32, u64, usize, u128
  • i8, i16, i32, i64, isize, i128
  • f32, f64
  • bool, char
  • NonZeroU*, NonZeroI*
  • String, CompactString

CompactStringExt

Provides two methods join_compact(seperator: impl AsRef<str>) and concat_compact(). This trait is automatically implemented for all types that can be converted into an iterator and yield types that impl AsRef<str>. This allows you to join Vec's, slices, and any other collection to form CompactStrings.

Macros

This crate exposes one macro format_compact! that can be used to create CompactStrings from arguments, like you can Strings with the std::format! macro.

Features

compact_str has the following optional features:

How it works

Note: this explanation assumes a 64-bit architecture, for 32-bit architectures generally divide any number by 2.

Normally strings are stored on the heap since they're dynamically sized. In Rust a String consists of three fields, each of which are the size of a usize. e.g. its layout is something like the following:

String: [ ptr<8> | len<8> | cap<8> ]

  1. ptr is a pointer to a location on the heap that stores the string
  2. len is the length of the string
  3. cap is the total capacity of the buffer being pointed to

This results in 24 bytes being stored on the stack, 8 bytes for each field. Then the actual string is stored on the heap, usually with additional memory allocated to prevent re-allocating if the string is mutated.

The idea of CompactString is instead of storing metadata on the stack, just store the string itself. This way for smaller strings we save a bit of memory, and we don't have to heap allocate so it's more performant. A CompactString is limited to 24 bytes (aka size_of::<String>()) so it won't ever use more memory than a String would.

The memory layout of a CompactString looks something like:

CompactString: [ buffer<23> | len<1> ]

Memory Layout

Internally a CompactString has two variants:

  1. Inline, a string <= 24 bytes long
  2. Heap allocated, a string > 24 bytes long

To maximize memory usage, we use a union instead of an enum. In Rust an enum requires at least 1 byte for the discriminant (tracking what variant we are), instead we use a union which allows us to manually define the discriminant. CompactString defines the discriminant within the last byte, using any extra bits for metadata. Specifically the discriminant has two variants:

  1. 0b11111110 - All 1s with a trailing 0, indicates heap allocated
  2. 0b11XXXXXX - Two leading 1s, indicates inline, with the trailing 6 bits used to store the length

and specifically the overall memory layout of a CompactString is:

  1. heap: { ptr: NonNull<u8>, len: usize, cap: Capacity }
  2. inline: { buffer: [u8; 24] }

Both variants are 24 bytes long

For heap allocated strings we use a custom BoxString which normally stores the capacity of the string on the stack, but also optionally allows us to store it on the heap. Since we use the last byte to track our discriminant, we only have 7 bytes to store the capacity, or 3 bytes on a 32-bit architecture. 7 bytes allows us to store a value up to 2^56, aka 64 petabytes, while 3 bytes only allows us to store a value up to 2^24, aka 16 megabytes.

For 64-bit architectures we always inline the capacity, because we can safely assume our strings will never be larger than 64 petabytes, but on 32-bit architectures, when creating or growing a CompactString, if the text is larger than 16MB then we move the capacity onto the heap.

We handle the capacity in this way for two reaons:

  1. Users shouldn't have to pay for what they don't use. Meaning, in the majority of cases the capacity of the buffer could easily fit into 7 or 3 bytes, so the user shouldn't have to pay the memory cost of storing the capacity on the heap, if they don't need to.
  2. Allows us to convert From<String> in O(1) time, by taking the parts of a String (e.g. ptr, len, and cap) and using those to create a CompactString, without having to do any heap allocations. This is important when using CompactString in large codebases where you might have CompactString working alongside of String.

For inline strings we only have a 24 byte buffer on the stack. This might make you wonder how can we store a 24 byte long string, inline? Don't we also need to store the length somewhere?

To do this, we utilize the fact that the last byte of our string could only ever have a value in the range [0, 192). We know this because all strings in Rust are valid UTF-8, and the only valid byte pattern for the last byte of a UTF-8 character (and thus the possible last byte of a string) is 0b0XXXXXXX aka [0, 128) or 0b10XXXXXX aka [128, 192). This leaves all values in [192, 255] as unused in our last byte. Therefore, we can use values in the range of [192, 215] to represent a length in the range of [0, 23], and if our last byte has a value < 192, we know that's a UTF-8 character, and can interpret the length of our string as 24.

Specifically, the last byte on the stack for a CompactString has the following uses:

  • [0, 192) - Is the last byte of a UTF-8 char, the CompactString is stored on the stack and implicitly has a length of 24
  • [192, 215] - Denotes a length in the range of [0, 23], this CompactString is stored on the stack.
  • [215, 254) - Unused
  • 254 - Denotes this CompactString is stored on the heap
  • 255 - Denotes the None variant for an Option<CompactString>

Testing

Strings and unicode can be quite messy, even further, we're working with things at the bit level. compact_str has an extensive test suite comprised of unit testing, property testing, and fuzz testing, to ensure our invariants are upheld. We test across all major OSes (Windows, macOS, and Linux), architectures (64-bit and 32-bit), and endian-ness (big endian and little endian).

Fuzz testing is run with libFuzzer and AFL++ with AFL++ running on both x86_64 and ARMv7 architectures. We test with miri to catch cases of undefined behavior, and run all tests on every rust compiler since v1.49 to ensure support for our minimum supported Rust version (MSRV).

unsafe code

CompactString uses a bit of unsafe code because accessing fields from a union is inherently unsafe, the compiler can't guarantee what value is actually stored. We also have some manually implemented heap data structures, i.e. BoxString, and mess with bytes at a bit level. That being said, uses of unsafe code in this library are quite limited and constrained to only where absolutely necessary, and always documented with // SAFETY: <reason>.

Similar Crates

Storing strings on the stack is not a new idea, in fact there are a few other crates in the Rust ecosystem that do similar things, an incomplete list:

  1. smol_str - Can inline 22 bytes, Clone is O(1), doesn't adjust for 32-bit archs
  2. smartstring - Can inline 23 bytes, Clone is O(n), is mutable
  3. kstring - Can inline 15 or 22 bytes dependent on crate features, Clone is O(1), can also store &'static strs
  4. flexstr - Can inline 22 bytes, Clone is O(1), can also store &'static strs

Thanks for readingme!
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].