All Projects → Turnerj → Quickenshtein

Turnerj / Quickenshtein

Licence: MIT license
Making the quickest and most memory efficient implementation of Levenshtein Distance with SIMD and Threading support

Programming Languages

C#
18002 projects

Projects that are alternatives of or similar to Quickenshtein

Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+868.63%)
Mutual labels:  edit-distance, levenshtein, levenshtein-distance
LinSpell
Fast approximate strings search & spelling correction
Stars: ✭ 52 (-74.51%)
Mutual labels:  edit-distance, levenshtein, levenshtein-distance
levenshtein.c
Levenshtein algorithm in C
Stars: ✭ 77 (-62.25%)
Mutual labels:  edit-distance, levenshtein, levenshtein-distance
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (-44.12%)
Mutual labels:  levenshtein, string-distance
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (-70.59%)
Mutual labels:  levenshtein, levenshtein-distance
affinegap
📐 A Cython implementation of the affine gap string distance
Stars: ✭ 57 (-72.06%)
Mutual labels:  levenshtein-distance, string-distance
spellchecker-wasm
SpellcheckerWasm is an extrememly fast spellchecker for WebAssembly based on SymSpell
Stars: ✭ 46 (-77.45%)
Mutual labels:  levenshtein, levenshtein-distance
Textdistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+1162.25%)
Mutual labels:  levenshtein, levenshtein-distance
Levenshtein
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Stars: ✭ 38 (-81.37%)
Mutual labels:  levenshtein, levenshtein-distance
edit-distance-papers
A curated list of papers dedicated to edit-distance as objective function
Stars: ✭ 49 (-75.98%)
Mutual labels:  edit-distance, levenshtein
edits.cr
Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment
Stars: ✭ 16 (-92.16%)
Mutual labels:  edit-distance, levenshtein
FastFuzzyStringMatcherDotNet
A BK tree implementation for fast fuzzy string matching
Stars: ✭ 23 (-88.73%)
Mutual labels:  edit-distance, levenshtein-distance
Java String Similarity
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+1077.94%)
Mutual labels:  levenshtein-distance, string-distance
stringosim
String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...
Stars: ✭ 47 (-76.96%)
Mutual labels:  levenshtein, string-distance
levenshtein finder
Similar string search in Levenshtein distance
Stars: ✭ 19 (-90.69%)
Mutual labels:  levenshtein, string-distance
eddie
No description or website provided.
Stars: ✭ 18 (-91.18%)
Mutual labels:  edit-distance, levenshtein
OpenCV Raspberry pi TBB
Latest pre-compiled binary of Pre-released & Stable OpenCV (4.0.0) along with TBB (2018-Update 6) for Raspberry Pi.
Stars: ✭ 46 (-77.45%)
Mutual labels:  threading
SCNMathExtensions
Math extensions for SCNVector3, SCNQuaternion, SCNMatrix4
Stars: ✭ 32 (-84.31%)
Mutual labels:  simd
matrix-multiplication-threading
Matrix multiplication using c++11 threads
Stars: ✭ 31 (-84.8%)
Mutual labels:  threading
hpc
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-80.88%)
Mutual labels:  simd

Icon

Quickenshtein

A quick and memory efficient Levenshtein Distance calculator for .NET

AppVeyor Codecov NuGet

Performance

Quickenshtein gets its speed and memory efficiency from a number of different optimizations. To get the most performance out of the library, you will need .NET Core 3 or higher as this has support for hardware intrinsics.

Quickenshtein takes advantage of the following hardware intrinsics. On any recent x86 system, you will likely have these available.

If your computer doesn't have one of the hardware intrinsics available, Quickenshtein will still work - just slower than optimal.

Multi-Threading

By default, Quickenshtein performs in single-threaded mode as this mode performs best for small to medium size strings while having no memory allocations. When dealing with huge strings of 8000 characters or more, it may be useful to switch to multi-threaded mode. In this mode, the calculation is broken up and shared between multiple cores in a system.

Multi-threading is especially useful for systems without hardware intrinsics or for .NET Framework as shown in the table below where it provided a 3x performance improvement.

Method Runtime NumberOfCharacters Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Quickenshtein .NET 4.7.2 8000 110.686 ms 10.118 ms 0.554 ms - - - -
Quickenshtein_Threaded .NET 4.7.2 8000 36.601 ms 16.121 ms 0.883 ms - - - 1260 B

To enable threading, you can pass in CalculationOptions.DefaultWithThreading to Levenshtein.GetDistance() or configure your own CalculationOptions with settings that work best for you.

Note: Multi-threading is not allocation free (unlike single-threading mode) and will allocate a small amount depending on the number of threads used.

Benchmarking

There are a number of benchmarks in the repository that you can run on your system to see how well Quickenshtein performs.

Most of these benchmarks...

  • Run .NET Framework and .NET Core so you can see how the performance changes between them
  • Compare against a simple baseline Levenshtein Distance implementation with no specific optimizations
  • Compare against Fastenshtein, one of the other fast .NET Levenshtein Distance implementations

You can view results to these benchmarks at the links below:

Example Usage

using Quickenshtein;

// Common usage (uses default CalculationOptions with threading disabled)
var distance1 = Levenshtein.GetDistance("Saturday", "Sunday");

// Enable threading
var distance2 = Levenshtein.GetDistance("Saturday", "Sunday", CalculationOptions.DefaultWithThreading);

// Custom calculation options (helps with tuning for your specific workload and environment - you should benchmark your configurations on your system)
var distance3 = Levenshtein.GetDistance("Saturday", "Sunday", new CalculationOptions {
    EnableThreadingAfterXCharacters = 10000,
    MinimumCharactersPerThread = 25000
});

Learning Resources

I've written quite a bit about Levenshtein Distance and various ways you can extract performance from it:

If you prefer video:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].