All Projects → andylamp → Bplustree

andylamp / Bplustree

Licence: apache-2.0
An efficient, conscise, and simple implementation of a purely on-disk B+ Tree data structure

Programming Languages

java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to Bplustree

Kitdb
KitDB是一个内嵌式持久型的 高速NoSQL存储 lib
Stars: ✭ 439 (+377.17%)
Mutual labels:  disk
Ruby Stats
Fetch statistics about your machine using Ruby
Stars: ✭ 5 (-94.57%)
Mutual labels:  disk
Write
Write data to the file system, creating any intermediate directories if they don't already exist. Used by flat-cache and many others!
Stars: ✭ 68 (-26.09%)
Mutual labels:  disk
Testdisk
TestDisk & PhotoRec
Stars: ✭ 511 (+455.43%)
Mutual labels:  disk
Zpan
A self-hosted cloud disk base on the cloud storage./ 一个基于云存储的网盘系统,用于自建私人网盘或企业网盘。
Stars: ✭ 765 (+731.52%)
Mutual labels:  disk
Node Vhd
Microsoft's Virtual Hard Disk (VHD) Format
Stars: ✭ 7 (-92.39%)
Mutual labels:  disk
Webdavmailrucloud
WebDAV cloud.mail.ru ...& Yandex.Disk | WebDAV Облако Mail.Ru Сетевой Диск
Stars: ✭ 386 (+319.57%)
Mutual labels:  disk
Ezio
BT-based Disk Deployment tool
Stars: ✭ 77 (-16.3%)
Mutual labels:  disk
Cleanmywechat
自动删除 PC 端微信缓存数据,包括从所有聊天中自动下载的大量文件、视频、图片等数据内容,解放你的空间。
Stars: ✭ 816 (+786.96%)
Mutual labels:  disk
Diskqueue
A thread-safe, multi-process(ish) persistent queue library for .Net and Mono
Stars: ✭ 66 (-28.26%)
Mutual labels:  disk
Heim
Cross-platform async library for system information fetching 🦀
Stars: ✭ 572 (+521.74%)
Mutual labels:  disk
Dua Cli
View disk space usage and delete unwanted data, fast.
Stars: ✭ 744 (+708.7%)
Mutual labels:  disk
Multibootusb
Create multiboot live Linux on a USB disk...
Stars: ✭ 1,042 (+1032.61%)
Mutual labels:  disk
Binaryprefs
Rapidly fast and lightweight re-implementation of SharedPreferences which stores each preference in files separately, performs disk operations via NIO with memory mapped byte buffers and works IPC (between processes). Written from scratch.
Stars: ✭ 484 (+426.09%)
Mutual labels:  disk
Ruby Vmstat
A focused and fast library to gather memory, cpu, network, load avg and disk information
Stars: ✭ 68 (-26.09%)
Mutual labels:  disk
Discutils
Utility libraries to interact with discs, filesystem formats and more
Stars: ✭ 417 (+353.26%)
Mutual labels:  disk
Node Ntfs
Windows NT File System (NTFS) file system driver
Stars: ✭ 18 (-80.43%)
Mutual labels:  disk
Yandex
PHP SDK для работы с Яндекс Диском yandex disk
Stars: ✭ 82 (-10.87%)
Mutual labels:  disk
Xfce4 Genmon Scripts
🐭 XFCE panel generic monitor scripts
Stars: ✭ 69 (-25%)
Mutual labels:  disk
Ansible Manage Lvm
Stars: ✭ 55 (-40.22%)
Mutual labels:  disk

build Maven

A Purely On-Disk Implementation of a B+ Tree

After quite a few hours that included developing the thing as well as testing it here is a (fully) functional implementation of a B+ Tree data structure purely on the disk. This was developed mostly for educational reasons that stemmed from the fact that I could not find a B+ Tree implementation that met the following:

  • Purely Disk based implementation
  • Used strict paging which the user could tune (256k, 1K, 4K etc)
  • Was a (Key, Value) storage not just for Keys (B-Tree)
  • Implemented deleting from the data structure
  • Supported duplicate entries (if required)
  • was adequately tested (so that I know it would work)

I came about needing to implement a B+ Tree for a side project of mine... and I was really surprised to see that there were no implementations that met the above requirements; not only that but I was not able to find a reference code/pseudocode that had a working delete implementation and support for duplicate keys. To my dismay even my trusty (and beloved) CLRS didn't include an implementation of a B+ Tree but only for the B-Tree with no duplicates and delete pseudocode (well, one could say that they gave the steps...but that's not the point).

It has to be noted that I could find some B+ Tree implementations that had delete and duplicate key support, mainly in open-source database projects. This unfortunately meant that these implementations were deeply integrated into their parent projects and as a result they were optimized for their particular use-cases. Finally the code bases where significantly larger (hence making the code reading/understanding much harder than it should be!).

So I went about to implement mine (cool stuff, many hours of head scratching were involved!) while also putting effort in creating this "tuned" down version which mainly cuts down on features for simplicity, enhanced code clarity and clutter reduction.

Ease of use features

As I said above this project was done mainly to create a working example of a B+ Tree data structure purely on the disk so the code is well commented (I think) and can be understood easily; that said... we have some "delicacies" that make working and testing this project a bit easier, which are outlined below

  • it uses maven, so most modern IDE's can import it without hassle...
  • it requires jdk v8 (for some parts, change them to have backwards support)
  • it uses a dedicated tester as well as JUnit tests
  • has an interactive menu that the user can individually perform the operations

Implementation details

Insertions

For insertions we use a modified version of the method provided by CLRS with modifications to support duplicate keys (which will be covered below). Complexity is not altered from the usual case and we require one pass down the tree as well.

Searching

This is assumed to be the most common operation on the B+ Tree; which support two modes of searching:

  • Singular Key searches
  • Range Queries

Here two distinct functions are used to cover these two cases:

  • searchKey is used for singular value searches (unique or not)
  • rangeSearch is used for performing range queries

Both of these methods require only one pass over the tree to find the results (if any). Additionally, since we store the keys in a sorted order we can exploit binary search to reduce the total node lookup time significantly. This is done along with a slight modification to the search algorithm to return the lower bound instead of failure.

Deletes

We again use one pass down the tree deletes but this is a bit more complex than the other two operations... it again requires only one pass to delete the key (if found) but as it descends down the tree it performs any redistribution/merging that needs to happen in order to keep our data structure valid.

Handling of duplicate keys

In order to avoid hurting the search performance functionality which is (assumed to be) the most common operation in a B+ Tree data structure the following scheme was implemented to support duplicate keys (at the cost of a bit more page reads).

The tree only actually indexes singular keys but each key has an overflow pointer which leads to its overflow page that has all of the duplicate entries stored as a linked list. If needed, multiple trailing overflow pages per key are created to accommodate for these extra insertions should they exceed the capacity of one page. The downside is that we use a bit more space per page as well as reads in order to read these overflow pages.

Page lookup table

To avoid moving around things too much we keep each page into a free page pool that has all of the available pages so far; this in turn let's us create an index very fast without having to pay costly reads if we wanted to have a clustered tree (although we again use more space, usually).

License

This work, at its current version, is licensed under the Apache 2.0 license.

Final words

Hopefully I'll create a GitHub page for this... where I explain my implementation a bit more but until then this will suffice! Oh and I really hope this implementation is clear and concise enough so that it can make the notions of B+ Trees crystal clear!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].