All Projects → BurntSushi → memchr

BurntSushi / memchr

Licence: Unlicense and 2 other licenses found Licenses found Unlicense UNLICENSE Unknown COPYING MIT LICENSE-MIT
Optimized string search routines for Rust.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to memchr

Str
A SIMD optimized fixed-length string class along with an adaptive hash table for fast searching
Stars: ✭ 60 (-87.34%)
Mutual labels:  string, simd
Despacer
C library to remove white space from strings as fast as possible
Stars: ✭ 90 (-81.01%)
Mutual labels:  bytes, simd
Pupa
Simple micro templating
Stars: ✭ 231 (-51.27%)
Mutual labels:  string
aes-gcm-siv
.NET Core 3.0 implementation of AES-GCM-SIV nonce misuse-resistant authenticated encryption
Stars: ✭ 22 (-95.36%)
Mutual labels:  simd
hermes
A Haskell library for fast, memory-efficient decoding of JSON documents using the simdjson C++ library
Stars: ✭ 37 (-92.19%)
Mutual labels:  simd
Case
String case utitility: convert, identify, flip, extend
Stars: ✭ 237 (-50%)
Mutual labels:  string
Data-Structure-Algorithm-Programs
This Repo consists of Data structures and Algorithms
Stars: ✭ 464 (-2.11%)
Mutual labels:  string
Php To String
Cast any php value into a string
Stars: ✭ 219 (-53.8%)
Mutual labels:  string
Turbo-Transpose
Transpose: SIMD Integer+Floating Point Compression Filter
Stars: ✭ 50 (-89.45%)
Mutual labels:  simd
string-combinations
A simple, low-memory footprint function to generate all string combinations from a series of characters.
Stars: ✭ 25 (-94.73%)
Mutual labels:  string
cs string
Header-only library providing unicode aware string support for C++
Stars: ✭ 91 (-80.8%)
Mutual labels:  string
lsp-dsp-lib
DSP library for signal processing
Stars: ✭ 37 (-92.19%)
Mutual labels:  simd
Stringsimilarity.net
A .NET port of java-string-similarity
Stars: ✭ 242 (-48.95%)
Mutual labels:  string
Guided Missile Simulation
Guided Missile, Radar and Infrared EOS Simulation Framework written in Fortran.
Stars: ✭ 33 (-93.04%)
Mutual labels:  simd
Number To Words
Number to string standalone PHP library with i18n. Drivers for numbers and currency included.
Stars: ✭ 234 (-50.63%)
Mutual labels:  string
NString
A collection of utilities for working with strings in .NET.
Stars: ✭ 34 (-92.83%)
Mutual labels:  string
Superstring.py
A fast and memory-optimized string library for heavy-text manipulation in Python
Stars: ✭ 231 (-51.27%)
Mutual labels:  string
ternary-logic
Support for ternary logic in SSE, XOP, AVX2 and x86 programs
Stars: ✭ 21 (-95.57%)
Mutual labels:  simd
quick-adc
Quick ADC
Stars: ✭ 20 (-95.78%)
Mutual labels:  simd
uwu
fastest text uwuifier in the west
Stars: ✭ 1,193 (+151.69%)
Mutual labels:  simd

memchr

This library provides heavily optimized routines for string search primitives.

Build status Crates.io

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/memchr

Overview

  • The top-level module provides routines for searching for 1, 2 or 3 bytes in the forward or reverse direction. When searching for more than one byte, positions are considered a match if the byte at that position matches any of the bytes.
  • The memmem sub-module provides forward and reverse substring search routines.

In all such cases, routines operate on &[u8] without regard to encoding. This is exactly what you want when searching either UTF-8 or arbitrary bytes.

Compiling without the standard library

memchr links to the standard library by default, but you can disable the std feature if you want to use it in a #![no_std] crate:

[dependencies]
memchr = { version = "2", default-features = false }

On x86 platforms, when the std feature is disabled, the SSE2 accelerated implementations will be used. When std is enabled, AVX accelerated implementations will be used if the CPU is determined to support it at runtime.

Using libc

memchr is a routine that is part of libc, although this crate does not use libc by default. Instead, it uses its own routines, which are either vectorized or generic fallback routines. In general, these should be competitive with what's in libc, although this has not been tested for all architectures. If using memchr from libc is desirable and a vectorized routine is not otherwise available in this crate, then enabling the libc feature will use libc's version of memchr.

The rest of the functions in this crate, e.g., memchr2 or memrchr3 and the substring search routines, will always use the implementations in this crate. One exception to this is memrchr, which is an extension in libc found on Linux. On Linux, memrchr is used in precisely the same scenario as memchr, as described above.

Minimum Rust version policy

This crate's minimum supported rustc version is 1.41.1.

The current policy is that the minimum Rust version required to use this crate can be increased in minor version updates. For example, if crate 1.0 requires Rust 1.20.0, then crate 1.0.z for all values of z will also require Rust 1.20.0 or newer. However, crate 1.y for y > 0 may require a newer minimum version of Rust.

In general, this crate will be conservative with respect to the minimum supported version of Rust.

Testing strategy

Given the complexity of the code in this crate, along with the pervasive use of unsafe, this crate has an extensive testing strategy. It combines multiple approaches:

  • Hand-written tests.
  • Exhaustive-style testing meant to exercise all possible branching and offset calculations.
  • Property based testing through quickcheck.
  • Fuzz testing through cargo fuzz.
  • A huge suite of benchmarks that are also run as tests. Benchmarks always confirm that the expected result occurs.

Improvements to the testing infrastructure are very welcome.

Algorithms used

At time of writing, this crate's implementation of substring search actually has a few different algorithms to choose from depending on the situation.

  • For very small haystacks, Rabin-Karp is used to reduce latency. Rabin-Karp has very small overhead and can often complete before other searchers have even been constructed.
  • For small needles, a variant of the "Generic SIMD" algorithm is used. Instead of using the first and last bytes, a heuristic is used to select bytes based on a background distribution of byte frequencies.
  • In all other cases, Two-Way is used. If possible, a prefilter based on the "Generic SIMD" algorithm linked above is used to find candidates quickly. A dynamic heuristic is used to detect if the prefilter is ineffective, and if so, disables it.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].