All Projects → hbakhtiyor → strsim

hbakhtiyor / strsim

Licence: MIT license
string similarity based on Dice's coefficient in go

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to strsim

String Similarity
Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.
Stars: ✭ 2,254 (+5679.49%)
Mutual labels:  strings, string-similarity, string-comparison, dice-coefficient
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+192.31%)
Mutual labels:  string-matching, string-similarity, dice-coefficient
Levenshtein
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Stars: ✭ 38 (-2.56%)
Mutual labels:  string-matching, string-similarity, string-comparison
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+53.85%)
Mutual labels:  string-similarity, dice-coefficient
beda
Beda is a golang library for detecting how similar a two string
Stars: ✭ 34 (-12.82%)
Mutual labels:  string-matching, string-similarity
stance
Learned string similarity for entity names using optimal transport.
Stars: ✭ 27 (-30.77%)
Mutual labels:  string-matching, string-similarity
bigint
bigint is a C++ library which can handle Very very Big Integers. It can calculate factorial of 1000000... it can go any big. It may be useful in Competitive Coding and Scientific Calculations which deals with very very large Integers. It can also be used in Decryption process. It has many inbuilt functions which can be very useful.
Stars: ✭ 34 (-12.82%)
Mutual labels:  strings
effcee
Effcee is a C++ library for stateful pattern matching of strings, inspired by LLVM's FileCheck
Stars: ✭ 76 (+94.87%)
Mutual labels:  string-matching
common
Metarhia Common Library
Stars: ✭ 55 (+41.03%)
Mutual labels:  strings
regXwild
⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Conari (with caching of 0x29 opcodes +optimizations) etc.
Stars: ✭ 20 (-48.72%)
Mutual labels:  strings
wildmatch
Simple string matching with questionmark- and star-wildcard operator
Stars: ✭ 37 (-5.13%)
Mutual labels:  string-matching
the-stringler
An OOP approach to string manipulation.
Stars: ✭ 36 (-7.69%)
Mutual labels:  strings
simplehstore
🏪 Easy way to use a PostgreSQL database (and the HSTORE feature) from Go
Stars: ✭ 54 (+38.46%)
Mutual labels:  strings
ostrich
An SMT Solver for string constraints
Stars: ✭ 18 (-53.85%)
Mutual labels:  strings
C-Complete-practice
This repository will contains C programs from beginners to advance level
Stars: ✭ 59 (+51.28%)
Mutual labels:  strings
node-red-contrib-string
Provides a string manipulation node with a chainable UI based on the concise and lightweight stringjs.com.
Stars: ✭ 15 (-61.54%)
Mutual labels:  string-matching
algos
A collection of algorithms in rust
Stars: ✭ 16 (-58.97%)
Mutual labels:  string-matching
stringsifter
A machine learning tool that ranks strings based on their relevance for malware analysis.
Stars: ✭ 567 (+1353.85%)
Mutual labels:  strings
Libft
42 library of basic C functions - queues, lists, memory operations and more 😄
Stars: ✭ 21 (-46.15%)
Mutual labels:  strings
concat
Demo repository for habr.com article about faster Go string concatenation.
Stars: ✭ 16 (-58.97%)
Mutual labels:  strings

GoDoc Build Status Go Report Card

strsim

Finds degree of similarity between two strings, based on Dice's Coefficient.

Table of Contents

Usage

Install using:

go get -u github.com/hbakhtiyor/strsim

In your code:

import "github.com/hbakhtiyor/strsim"

similarity := strsim.Compare("healed", "sealed")

matches := strsim.FindBestMatch("healed", []string{"edward", "sealed", "theatre")

API

Requiring the module gives an object with two methods:

Compare(a, b string) float64

Returns a fraction between 0 and 1, which indicates the degree of similarity between the two strings. 0 indicates completely different strings, 1 indicates identical strings. The comparison is case-sensitive.

Arguments
  1. a (string): The first string
  2. b (string): The second string

Order does not make a difference.

Returns

(float64): A fraction from 0 to 1, both inclusive. Higher number indicates more similarity.

Examples
strsim.Compare("healed", "sealed")
// → 0.8

strsim.Compare("Olive-green table for sale, in extremely good condition.", 
  "For sale: table in very good  condition, olive green in colour.")
// → 0.6060606060606061

strsim.Compare("Olive-green table for sale, in extremely good condition.", 
  "For sale: green Subaru Impreza, 210,000 miles")
// → 0.2558139534883721

strsim.Compare("Olive-green table for sale, in extremely good condition.", 
  "Wanted: mountain bike with at least 21 gears.")
// → 0.1411764705882353

FindBestMatch(s string, targets []string) *MatchResult

Compares s against each string in targets.

Arguments
  1. s (string): The string to match each target string against.
  2. targets ([]string): Each string in this array will be matched against the main string.
Returns

(MatchResult): An object with a Matches field, which gives a similarity score for each target string, a BestMatch field, which specifies which target string was most similar to the main string, and a BestMatchIndex field, which specifies the index of the BestMatch in the targets array.

Examples
strsim.FindBestMatch("Olive-green table for sale, in extremely good condition.", []string{
  "For sale: green Subaru Impreza, 210,000 miles", 
  "For sale: table in very good condition, olive green in colour.", 
  "Wanted: mountain bike with at least 21 gears.",
});
// → 
MatchResult {
  Matches: []Match {
    { Target: "For sale: green Subaru Impreza, 210,000 miles",
      Score: 0.2558139534883721 },
    { Target: "For sale: table in very good condition, olive green in colour.",
      Score: 0.6060606060606061 },
    { Target: "Wanted: mountain bike with at least 21 gears.",
      Score: 0.1411764705882353 } },
  BestMatch: Match
    { Target: "For sale: table in very good condition, olive green in colour.",
      Score: 0.6060606060606061 },
  BestMatchIndex: 1 
}

Benchmark

BenchmarkCompare-4         	   20000	     82479 ns/op	   15921 B/op	      51 allocs/op
BenchmarkFindBestMatch-4   	   30000	     60800 ns/op	   11707 B/op	      41 allocs/op
BenchmarkSortedByScore-4   	 2000000	       638 ns/op	     128 B/op	       4 allocs/op
Hardware used
  • Intel® Core™ i3-2310M CPU @ 2.10GHz × 4
  • 4Gb RAM
Version
  • Go 1.11.2
  • Ubuntu 18.04.01 LTS x86_64 OS
  • 4.15.0-39-generic kernel

Credit

https://github.com/aceakash/string-similarity

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].