Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → umbertogriffo → Trie

umbertogriffo / Trie

A Mixed Trie and Levenshtein distance implementation in Java for extremely fast prefix string searching and string similarity.

Programming Languages

68154 projects - #9 most used programming language

Labels

data-structures stream trie

Projects that are alternatives of or similar to Trie

An idiomatic and fast QP-trie implementation in pure Rust.

Stars: ✭ 47 (+88%)

Mutual labels: data-structures, trie

Data Structures

Data-Structures using C++.

Stars: ✭ 121 (+384%)

Mutual labels: data-structures, trie

libgenerics is a minimalistic and generic library for C basic data structures.

Stars: ✭ 42 (+68%)

Mutual labels: data-structures, trie

.NET Implementations of Trie Data Structures for Substring Search, Auto-completion and Intelli-sense. Includes: patricia trie, suffix trie and a trie implementation using Ukkonen's algorithm.

Stars: ✭ 122 (+388%)

Mutual labels: data-structures, trie

Data Structures

A collection of powerful data structures

Stars: ✭ 2,534 (+10036%)

Mutual labels: data-structures, trie

Data Structures

Go datastructures.

Stars: ✭ 336 (+1244%)

Mutual labels: data-structures, trie

Adaptive Radix Tree

A fast and space efficient Radix tree in Java

Stars: ✭ 76 (+204%)

Mutual labels: data-structures, trie

A collection of common algorithms and data structures implemented in java, c++, and python.

Stars: ✭ 142 (+468%)

Mutual labels: data-structures, trie

Template Library of Data Structures in C++17

Stars: ✭ 151 (+504%)

Mutual labels: data-structures, trie

Merkle Patricia Tree

Project is in active development and has been moved to the EthereumJS VM monorepo.

Stars: ✭ 277 (+1008%)

Mutual labels: data-structures, trie

C++ implementation of a fast and memory efficient HAT-trie

Stars: ✭ 565 (+2160%)

Mutual labels: data-structures, trie

sane nested dictionaries and lists for python

Stars: ✭ 19 (-24%)

Mutual labels: data-structures

A d-dimensional aabb-tree implementation for MATLAB.

Stars: ✭ 5 (-80%)

Mutual labels: data-structures

A Stream Multiplexing Library for golang with least memory usage

Stars: ✭ 818 (+3172%)

Mutual labels: stream

YAML templating tool that works on YAML structure instead of text

Stars: ✭ 816 (+3164%)

Mutual labels: data-structures

Autosuggest Trie

Minimalistic trie implementation for autosuggest and autocomplete components

Stars: ✭ 22 (-12%)

Mutual labels: trie

⚡️ Lightning fast URL routing in Python (trie router)

Stars: ✭ 18 (-28%)

Mutual labels: trie

Springcloud Learning

Spring Cloud基础教程，持续连载更新中

Stars: ✭ 6,839 (+27256%)

Mutual labels: stream

Algorithm is a library of tools that is used to create intelligent applications.

Stars: ✭ 787 (+3048%)

Mutual labels: data-structures

The bidirectional mapping library for Python.

Stars: ✭ 779 (+3016%)

Mutual labels: data-structures

View All Similar Projects ➔

A Mixed Trie and Levenshtein distance implementation in Java for extremely fast prefix string searching and string similarity.

Author: Umberto Griffo
Twitter: @UmbertoGriffo

Contents

Trie definition
Levenshtein distance definition
Mixing Trie and the Levenshtein distance in order to calculate String Similarity Faster
Performance
Complexity
Use Case
Example of usage
References

Trie definition

Trie[1] is an ordered tree data structure that uses strings as keys. It's an efficient information retrieval data structure that we can use to search a word in O(M) time, where M is maximum string length. However the penalty is on trie storage requirements. A common application of a trie is storing a predictive text or autocomplete dictionary, such as found on a mobile telephone. Such applications take advantage of a trie's ability to quickly search for, insert, and delete entries.

The following picture shows a trie with the keys "Joe", "John", "Johnny", "Johnny", "Jane", and "Jack"

Levenshtein distance definition

The Levenshtein distance[5] is a string metric for measuring the difference between two words. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

Mixing Trie and the Levenshtein distance in order to calculate String Similarity Faster

In the Iterative with full matrix version of Levenshtein distance[5] we can avoid a lot of work if we can process the words in order, so we never need to repeat a row for the same prefix of letters[4]. The trie data structure is perfect for this.

If I have a trie with the words "Joe", "John", "Johnny", "Johnny", "Jane", and "Jack" the following table shows the resulting matrix containing words that are less than the given maximum distance (equal to 2) from the target word "Jane":

		J	a	n	e
	0	1	2	3	4
J	1	0	1	2	3
a	2	1	0	1	2
c	3	2	1	1	2
*k()**	4	3	2	2	2
n	3	2	1	0	1
*e()**	4	3	2	1	0
o	2	1	1	2	3
*e()**	3	2	2	2	2
h	3	2	2	2	3
n(*)	4	3	3	2	2
n	5	4	4	3	3
y(*)	5	4	4	3	4

Each right element of the array contains the distance.

Performance

The tests have been carried out on a Laptop (Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 8GB RAM). I've added 483.805 distinct words into Trie and I've performed the following tests using the word "rap":

Operation	Time (milliseconds)	Results
Loading words into Trie	970	483.805 words loaded
Count words starts with prefix 'rap'	0,03	147
Get words starts with prefix 'rap'	1,69	a List containing 147 words
String Similarity (Maximum Distance 1) from word 'rap'	22,98	a Map containing 48 similar words
Remove word rap	0,04

Complexity

Average: |Access|Search|Insertion|Deletion|String Similarity| |----|----|----|----|----| |O(k)|O(k)|O(k)|O(k)|O(k*n)|

where k is maximum string length and n is number of nodes in the trie

Use Case

Dictionary Suggestions OR Autocomplete dictionary

Retrieving data stored in Trie data structure is very fast, so it is most suited for application where retrieval are more frequently performed like Phone directory where contact searching operation is used frequently [1-3].

Searching Contact from Mobile Contact list OR Phone Directory

Auto suggestion of words while searching for anything in dictionary is very common [2-3]. If we search for word "tiny", then it auto suggest words starting with same characters like "tine", "tin", "tinny" etc.

String Similarity

Trie can be used to Fast and Easy calculate Levenshtein distance [4].

Example of usage

With this implementation you can perform the follows operations:

Add a word.
Remove a word.
Check if there is any word in the trie that starts with the given prefix.
Check if a word is in the trie.
Count words starts with a partial name to search the application for.
Get words starts with a partial name to search the application for.
Show the Trie
Calculate string similarity using Levenshtein distance.

import java.nio.charset.StandardCharsets;
import java.util.stream.Stream;

public class Demo {
	public static void main(String[] args) {

		// This is a case sensitive trie
		Trie trie = new Trie(true, StandardCharsets.UTF_8);

		// Add words.
		trie.add("Joe");
		trie.add("John");
		trie.add("Johny");
		trie.add("Johnny");
		trie.add("Jane");
		trie.add("Jack");
		
		System.out.println("Number Of Words: " + trie.getNumberOfWords());

		trie.show();
		
		// Return true if the word is in the trie.
		System.out.println(trie.search("Jane"));
		// Return true if there is any word in the trie that starts with the given prefix.
		System.out.println(trie.startsWith("Ja"));
		// Remove a word
		trie.remove("Johnny");

		// Count words starts with a partial name to search the application for
		System.out.println("Number Of Words Starts with 'John': " + trie.countWordStartsWith("John"));
		System.out.println("Number Of Words Starts with 'Ja': " + trie.countWordStartsWith("Ja"));
		// Get words starts with a partial name to search the application for
		Stream<String> words = trie.getWordStartsWith("Jo");
		words.forEach(System.out::println);
		
		// String Similarity using Levenshtein distance		
		// Get words that are less than the given maximum distance from the target word
		for (Map.Entry<String, Integer> entry : trie.getSimilarityMap("Jane", 1).entrySet()) {
			System.out.println(entry.getKey() + " - " + entry.getValue());
		}
		// Get word that is less than the given maximum distance from the target word
		System.out.println(trie.similarity("Jane", 1));
	}
}

References

[1] Trie https://en.wikipedia.org/wiki/Trie
[2] Hackerrank challenge contacts https://www.hackerrank.com/challenges/contacts
[3] What is a trie? What is its importance and how it is implemented? https://www.quora.com/What-is-a-trie-What-is-its-importance-and-how-it-is-implemented
[4] Fast and Easy Levenshtein distance using a Trie http://stevehanov.ca/blog/index.php?id=114
[5] Levenshtein distance https://en.wikipedia.org/wiki/Levenshtein_distance

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 25

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗