All Projects → wangziqi2013 → Bwtree

wangziqi2013 / Bwtree

Licence: apache-2.0
An open sourced implementation of Bw-Tree in SQL Server Hekaton

Projects that are alternatives of or similar to Bwtree

Kangaroo
SQL client and admin tool for popular databases
Stars: ✭ 127 (-38.94%)
Mutual labels:  sqlserver
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-26.44%)
Mutual labels:  sqlserver
Qxorm
QxOrm library - C++ Qt ORM (Object Relational Mapping) and ODM (Object Document Mapper) library - Official repository
Stars: ✭ 176 (-15.38%)
Mutual labels:  sqlserver
L4
L4 (Lock-Free on Read) Hashtable is a C++ library that implements hash table with arbitray byte stream keys/values.
Stars: ✭ 136 (-34.62%)
Mutual labels:  lock-free
Efcore.bulkextensions
Entity Framework Core Bulk Batch Extensions for Insert Update Delete Read (CRUD), Truncate and SaveChanges operations on SQL Server, PostgreSQL, SQLite
Stars: ✭ 2,295 (+1003.37%)
Mutual labels:  sqlserver
Modernarchitectureshop
The Microservices Online Shop is an application with a modern software architecture that is cleanly designed and based on.NET lightweight technologies. The shop has two build variations. The first variant is the classic Microservices Architectural Style. The second one is with Dapr. Dapr has a comprehensive infrastructure for building highly decoupled Microservices; for this reason, I am using Dapr to achieve the noble goal of building a highly scalable application with clean architecture and clean code.
Stars: ✭ 154 (-25.96%)
Mutual labels:  sqlserver
Bobsql
demos, scripts, samples, and code from the two bobs who work at Microsoft on SQL Server
Stars: ✭ 121 (-41.83%)
Mutual labels:  sqlserver
Koolreport
This is an Open Source PHP Reporting Framework which you can use to write perfect data reports or to construct awesome dashboards using PHP
Stars: ✭ 204 (-1.92%)
Mutual labels:  sqlserver
Libcds
A C++ library of Concurrent Data Structures
Stars: ✭ 2,006 (+864.42%)
Mutual labels:  lock-free
Nanodbc
A small C++ wrapper for the native C ODBC API | Requires C++14 since v2.12
Stars: ✭ 175 (-15.87%)
Mutual labels:  sqlserver
Onefile
The world's first wait-free Software Transactional Memory
Stars: ✭ 139 (-33.17%)
Mutual labels:  lock-free
Sql Server Maintenance Solution
SQL Server Maintenance Solution
Stars: ✭ 2,003 (+862.98%)
Mutual labels:  sqlserver
Yuniql
Free and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (-25%)
Mutual labels:  sqlserver
Gosora
Gosora is an ultra-fast and secure forum software written in Go that balances usability with functionality.
Stars: ✭ 131 (-37.02%)
Mutual labels:  sqlserver
Fsharp.data.sqlclient
A set of F# Type Providers for statically typed access to MS SQL database
Stars: ✭ 187 (-10.1%)
Mutual labels:  sqlserver
Sio.core
✔ [ SIOC ] Swastika I/O Core is an all in one platform (e.g CMS, eCommerce, Forum, Q&A, CRM...) ASP.NET Core / Dotnet Core System based on SIOH Framework.
Stars: ✭ 121 (-41.83%)
Mutual labels:  sqlserver
Workloadtools
A collection of tools to collect, analyze and replay SQL Server workloads, on premises and in the cloud
Stars: ✭ 154 (-25.96%)
Mutual labels:  sqlserver
Dbtool
数据库工具,根据表结构文档生成创建表sql,根据数据库表信息导出Model和表结构文档,根据文档生成数据库表,根据已有Model文件生成创建数据库表sql
Stars: ✭ 206 (-0.96%)
Mutual labels:  sqlserver
Sharding Method
分表分库的新思路——服务层Sharding框架,全SQL、全数据库兼容,ACID特性与原生数据库一致,能实现RR级别读写分离,无SQL解析性能更高
Stars: ✭ 188 (-9.62%)
Mutual labels:  sqlserver
Multiqueue
A fast mpmc queue with broadcast capabilities
Stars: ✭ 167 (-19.71%)
Mutual labels:  lock-free

Open BwTree Build Status Coverage Status

This is an implementation of BwTree, based on a design from Microsoft Research[1]. The official implementation of BwTree by Microsoft is currently deployed in SQL Server Hekaton, Azure DocumentDB, Bing and other Microsoft products.

BwTree is a general purpose, concurrent and lock-free B+-Tree index. It allows for multiple threads modifying and/or querying the tree concurrently without corrupting the tree or giving inconsistent results. However, BwTree only guarantees atomicity of operations, and is not a concurrency control agent. If both atomicity and isolation is required, an extra concurrency control layer must be implemented.

This project is developed together with Peloton, a self-adaptive database system prototype by Carnegie Mellon University. Though we strive to maintain BwTree as a standalong module with maximum portability and easiness to use, the special properties of lock-free data structure and of BwTree itself renders some standarized interfaces difficult or impossible to implement (for sake of atomicity), or leads to very inefficient implementations. But still, switching from a C++ STL compliant container to BwTree usually requires only few lines of code change in most common cases, and we are working to simplify things for not-so-common uses.

Benchmark

(Since currently this repo is undergoing radical design changes, benchmark will be unavailable for few weeks)

Improvements

ABORT Delta

In the official BwTree paper, removing a node is described as posting a node remove delta on the removed node, finishing it by posting an index term delete delta on top of the parent node delta chain. This procedure has a potential problem when the parent is undergoing a split and the split key is unfortunately chosen to be pointing to the removed node. In this case, after parent node split, its left most child is logically merged into the left sibling of the parent node, which implicitly changes both the low key of the parent and the high key of its left sibling.

Our approach to avoid this problem is to post a special ABORT node on the parent before we post remove node. This ABORT node blocks all further access of the parent node, and in the meanwhile it prevents any thread that has already taken the snapshot before the ABORT node is posted posting another SMO. After remove node delta is posted, we remove this ABORT node, and jump to the left sibling to finish the remove SMO. Also when posting a node split delta, we always check whether the chosen key is mapped to a removed child. If it is the case then we do not split and continue.

Posting InnerDeleteNode

Besides that, when finishing a node split delta by posting index term insert node on the parent node delta chain, the paper suggests that the parent node snapshot we have taken while traversing down needs to be checked against the current most up-to-date (i.e. at least the most up-to-date parent node after we have taken the snapshot of the child split delta node), to avoid very delicate problems that would only happen under a super high insert-delete contention (as in mixed-test environment in this repo). But actually such check is unnecessary in the case of split delta (though it is a must-do for merge delta), since even if we have missed few inner data nodes on the out-of-date parent node delta chain, the worst result is observing inconsistent Key-NodeID pair inside the parent node delta chain, which is a hint that the current snapshot is quite old, and that an abort is necessary.

But on the other hand, when dealing with node merge delta, it is mandatory that we check for the status of the parent node after taking the snapshot of the merge delta node. Because for merge delta node's help-along protocol, if we go back to the parent node, and could not find the separator key that is going to be deleted, we assume that it has already been finished by other threads, and thus will proceed with the current operation that might overwrite the merge delta before it is finished. Imagine the case where the snapshot was taken before an InnerInsertNode was posted. In this case, if we use the old parent node snapshot, then a wrong conslusion that the merge delta has already been finished will be reached.

Known Problem

Wait-Freedom-less

The ABORT mechanism stated above renders BwTree non-wait-free, since it essentially locks the parent node of the node being removed, and one thread will spin on that ABORT node if it intends to post on the locked parent. Even worse, if the node posting ABORT node dies before it removes the ABORT, no remaining thread could succeed posting on the locked parent, and the data structure will become non-responsive (i.e. starvation to the extreme).

Mapping Table Size

Till now the size of the mapping table is fixed, and could not be extended in the case of an overflow. This problem could be alleviated by allocating a "large enough" mapping table that covers 99% of the cases, but no one could guarantee how large is "large enough", and when the size of the table will grow to the limit. For running benchmarks on data base systems this is fine, but for real world use cases it needs to be fixed, and I am open to any suggestion related to this issue.

Non-Scalable Randomness

Random number generator was incorrect, in a sense that the default generator provided by C++11 is a bottleneck under multicore workload, with a stable throughput of 1.00 M op/second in our configuration. This has already been corrected recently, but we are still investigating into this issue trying to find out a better solution than the current patch.

Coverity

Another small issue is that coveralls data is not accurate, in a sense that lines inside header files are not detected at all. This should be attributed to gcov rather than BwTree

Compile and Run

Command Description
make prepare Prepares build directory. Must be used first to set up the environment after each fresh clone
make Compiles and links, with debugging flag, and with lower optimization level (-O2)
make full-speed Compiles without debugging flag. Better performance (-O3)
make small-size Compiles without debugging flag, but also optimized for smaller code size (-Os)
make benchmark-bwtree Compiles and runs benchmark (seq. and random) for bwtree
make benchmark-all This compiles and runs benchmark for bwtree, std::map, std::unordered_map, stx::btree and stx::bwtree_multimap
make stress-test Runs stress test on BwTree
make test Runs insert-read-delete test for multiple times with different patterns
make epoch-test Runs epoch manager test
make infinite-insert-test Runs random insert test on a random pattern
make email-test Runs email test. This requires a special email input file that we will not provide for some reason
make mixed-test Runs insert-delete extremely high contention test. This test is the one that fails most implementations
make benchmark-btree-full Run the same benchmark as those in 'benchmark-bwtree-full' for stx::btree_multimap
make benchmark-bwtree-full Runs insert-seq read-rand read-zipf read workload for BwTree on 30 Milltion keys. Use THREAD_NUM=xxx before make command to specify the number of threads used for testing

Releases

You could either use git checkout to view these releases, or download them through Github web interface.

Release Name Description
baseline A stable working version of BwTree used as testing baseline against design changes
fixed-forward-iterator Preallocation + fixed forward iterator
inner-search-hint Search hint inside InnerNode traversal

Misc

Under stl_test folder there are C++ source files illustrating how STL and our private container libraries could be used, as well as some testing / benchmarking / demo code. Using make command to compile and link these STL test cases.

For email_test, a text file with email addresses generated in some way is required, which is not provided here for some reason. However, generating your own email file should not be difficult.

References

[1] Levandoski, Justin J., David B. Lomet, and Sudipta Sengupta. "The Bw-Tree: A B-tree for new hardware platforms." In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pp. 302-313. IEEE, 2013.

License

Personally I would like to open source it without any license. But in order to stay complianct with Peloton's licene, the licence for OpenBwTree is Apache License (https://www.apache.org/licenses/LICENSE-2.0)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].