All Projects → rohansuri → Adaptive Radix Tree

rohansuri / Adaptive Radix Tree

Licence: mit
A fast and space efficient Radix tree in Java

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Adaptive Radix Tree

Proalgos Cpp
C++ implementations of well-known (and some rare) algorithms, while following good software development practices
Stars: ✭ 369 (+385.53%)
Mutual labels:  data-structures, datastructures
Algorithms And Data Structures In Java
Algorithms and Data Structures in Java
Stars: ✭ 498 (+555.26%)
Mutual labels:  data-structures, datastructures
Atomic queue
C++ lockless queue.
Stars: ✭ 373 (+390.79%)
Mutual labels:  data-structures, datastructures
Data Structures Algorithms
My implementation of 85+ popular data structures and algorithms and interview questions in Python 3 and C++
Stars: ✭ 273 (+259.21%)
Mutual labels:  data-structures, datastructures
Fundamentals Of Python Data Structures
《数据结构(Python语言描述)》"Fundamentals of Python:Data Structures" 电子书和配套代码
Stars: ✭ 30 (-60.53%)
Mutual labels:  data-structures, datastructures
Merkle Patricia Tree
Project is in active development and has been moved to the EthereumJS VM monorepo.
Stars: ✭ 277 (+264.47%)
Mutual labels:  data-structures, trie
Complete Placement Preparation
This repository consists of all the material required for cracking the coding rounds and technical interviews during placements.
Stars: ✭ 1,114 (+1365.79%)
Mutual labels:  data-structures, datastructures
treebitmap
Fast IP lookup table for IPv4/IPv6 prefixes
Stars: ✭ 81 (+6.58%)
Mutual labels:  datastructures, trie
Trie
A Mixed Trie and Levenshtein distance implementation in Java for extremely fast prefix string searching and string similarity.
Stars: ✭ 25 (-67.11%)
Mutual labels:  data-structures, trie
Go
Algorithms Implemented in GoLang
Stars: ✭ 7,385 (+9617.11%)
Mutual labels:  data-structures, datastructures
Data-structures
Data Structures in Java
Stars: ✭ 13 (-82.89%)
Mutual labels:  datastructures, data-structures
Qp Trie Rs
An idiomatic and fast QP-trie implementation in pure Rust.
Stars: ✭ 47 (-38.16%)
Mutual labels:  data-structures, trie
swift-algorithms-data-structs
📒 Algorithms and Data Structures in Swift. The used approach attempts to fully utilize the Swift Standard Library and Protocol-Oriented paradigm.
Stars: ✭ 42 (-44.74%)
Mutual labels:  datastructures, data-structures
Data Structures
Go datastructures.
Stars: ✭ 336 (+342.11%)
Mutual labels:  data-structures, trie
Algorithms
Data Structures & Algorithms. Includes solutions for Cracking the Coding Interview 6th Edition
Stars: ✭ 89 (+17.11%)
Mutual labels:  datastructures, trie
Algorithms and data structures
180+ Algorithm & Data Structure Problems using C++
Stars: ✭ 4,667 (+6040.79%)
Mutual labels:  data-structures, datastructures
Data Structures
A collection of powerful data structures
Stars: ✭ 2,534 (+3234.21%)
Mutual labels:  data-structures, trie
Staticvec
Implements a fixed-capacity stack-allocated Vec alternative backed by an array, using const generics.
Stars: ✭ 236 (+210.53%)
Mutual labels:  data-structures, datastructures
Hat Trie
C++ implementation of a fast and memory efficient HAT-trie
Stars: ✭ 565 (+643.42%)
Mutual labels:  data-structures, trie
Libgenerics
libgenerics is a minimalistic and generic library for C basic data structures.
Stars: ✭ 42 (-44.74%)
Mutual labels:  data-structures, trie

Adaptive Radix Tree implemented as a Java NavigableMap

Build Status codecov Bintray javadoc

This library provides an implementation of Adaptive Radix Tree (ART) as a Java NavigableMap based on the ICDE 2013 paper "The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases" by Viktor Leis.

In the space of ordered data structures, Radix trees are particularly interesting since their height and time complexity depends on key length (k) rather than number of keys (n) already present in the tree. In the era of extremely large data sets, when n is growing faster than k, having a time complexity independent of n is very attractive.

However, they have always been plagued by the problem of excessive space consumption due to having a fixed number of children that every internal node has.

ART solves this problem by adaptively changing the size of internal nodes depending on the number of children it actually has. As the number of children of an internal node increase/decrease, ART will grow/shrink the internal node into one of 4, 16, 48, 256 sizes.

For example, here's the result of storing some sample strings in both structures.

Where Radix Tree will have allocated space for 1024 pointers (4 nodes having 256 each), ART only requires 28 pointers (3 Node4 and 1 Node16). At 8-bytes per pointer, that's about 8K bytes saved.

Radix Tree Adaptive Radix Tree
alt text alt text

Table of contents

Overview

  • O(k) put/get/remove time complexity where k is key length.
  • Implements NavigableMap and hence is a drop-in replacement for TreeMap.
  • Cache friendly because:
    • Uses path compression and lazy leaf expansion to avoid single child paths thereby reducing pointer indirections which cause cache misses. These two techniques also reduce tree height.
    • Compact nodes that are array backed and hence exhibit spatial locality, utilising cache lines better.

Use case

Adaptive Radix Trees make Radix trees favourable again for cases where they simply aren't used because of excessive space consumption.

In general Radix Trees are preferred over Binary Search Trees when the dataset is such that the height of Radix tree O(k) turns out to be lesser than the number of comparisons done in Binary Search Trees O(k logn).

Binary comparable keys

For using ART, the keys need to be transformed into binary comparable keys which are the byte array representation of your keys such that the result of doing lexicographic comparison over them is the same as doing the key comparison.

Signed integers

Signed integers are stored in two's complement notation. This means that negative integers always have their MSB set and hence are bitwise lexicographically greater than positive integers.

For example -1 in 2's complement form is 1111 1111 1111 1111 1111 1111 1111 1111, whereas +1 is 0000 0000 0000 0000 0000 0000 0000 0001.

This is not the correct binary comparable transformation since +1 > -1 but the above transformation lexicographically orders +1 before -1.

In this case, the right transformation is obtained by flipping the sign bit.

Therefore -1 will be 0111 1111 1111 1111 1111 1111 1111 1111 and +1 as 1000 0000 0000 0000 0000 0000 0000 0001.

ASCII encoded character strings

Naturally yield the expected order as 'a' < 'b' and their respective byte values 97 < 98 obey the order.

IPv4 addresses

Naturally yield the expected order since each octet is an unsigned byte and unsigned types in binary have the expected lexicographic ordering.

For example, 12.10.192.0 < 12.10.199.255 and their respective binary representation 00001100.00001010.11000000.00000000 is lexicographically smaller than 00001100.00001010.11000111.11111111.

Further reading

Section IV of the paper.

Usage

An implementation of the following interface is required for the map key:

public interface BinaryComparable<K> {
	byte[] get(K key);
}

Simple keys based on primitives and String

BinaryComparables already provides the key transformations for primitives and Strings.

NavigableMap<String, Value> arts = new AdaptiveRadixTree(BinaryComparables.forString());
NavigableMap<Integer, Value> arti = new AdaptiveRadixTree(BinaryComparables.forInteger());

This example shows the transformation for 32-bit InetAddress which is internally stored as a integer.

Compound keys

With only fixed length attributes

Transform each attribute separately and concatenate the results.

This example shows the transformation for a compound key made up of two integers.

With variable length attributes

Transformation of a variable length attribute that is succeeded by another attribute is required to end with a byte 0 for the right transformation. Without it, compound key ("a", "bc") and ("ab", "c") would be incorrectly treated equal. Note this only works if byte 0 is not part of the variable length attribute's key space, otherwise ("a\0", "b") would be incorrectly ordered before ("a", "b").

If byte 0 is part of the key space then the key transformation requires remapping every byte 0 as byte 0 followed by byte 1 and ending with two byte 0s. This is described in section IV.B (e).

Examples

Since this library implements NavigableMap interface, the usage is just as any other Java map.

NavigableMap<String, String> art = new AdaptiveRadixTree<>(BinaryComparables.forString());
art.put("key1", "value");
art.put("key2", "value");
art.get("key1"); // value
art.containsKey("somekey"); // false
art.floorKey("key2"); // key1
art.remove("key1");

For more, check the examples directory.

YCSB Benchmarks

Configuration Value
Processor Intel i7-8750H CPU @ 2.20GHz
JVM Oracle JDK build 12.0.1+12
OS MacOS Mojave 10.14.4
Memory 16 GB
L1, L2, L3 cache sizes 32K, 262K, 9MB

Benchmarks are done against TreeMap (standard library implementation of Red black tree) with three datasets in order of increasing key length:

  • 64-bit random integers (8 bytes)
  • Strings with average length of 24 bytes
  • Strings with average length of 55 bytes

Refer YCSB for workload characteristics.

Load (100% insert)

alt text alt text alt text

Conclusion:

As key length increases, height of Adaptive Radix Tree increases whereas that of TreeMap stays low (log n) and hence is faster to insert into.

C (100% lookup)

alt text alt text alt text

Conclusion:

As key length increases, height of Adaptive Radix Tree increases whereas that of TreeMap stays low (log n) and hence is faster to search from.

E (95% range scan, 5% insert)

alt text

Memory consumption

Comparison against a Radix tree where every node has space for 256 children.

alt text

Tests

Compliance of ART to SortedMap interface has been tested by running it against Apache Common Collection's suite of tests.

Additionally this project extends that suite by including NavigableMap specific tests (something that can be contributed upstream).

Running tests

gradle test testJUnit4

Download

Use JCenter as the repository or download directly from Bintray.

repositories {
    jcenter()
}

compile 'com.github.rohansuri:adaptive-radix-tree:1.0.0-beta'
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].