All Projects → erisata → paxoid

erisata / paxoid

Licence: Apache-2.0 license
Paxos based masterless ID/Sequence generator.

Programming Languages

erlang
1774 projects
Makefile
30231 projects

Projects that are alternatives of or similar to paxoid

Numgen
Creates objects that generate number sequences
Stars: ✭ 5 (-75%)
Mutual labels:  sequence
Monster
The Art of Template MetaProgramming (TMP) in Modern C++♦️
Stars: ✭ 90 (+350%)
Mutual labels:  sequence
react-sequence-animator
A React library for sequence animations
Stars: ✭ 23 (+15%)
Mutual labels:  sequence
Nunit cshaprp cheatsheet
Example implementations of each attribute available in Nunit2 unit Testing Framework using C# .NET.
Stars: ✭ 14 (-30%)
Mutual labels:  sequence
Iter
Simple iterator abstract datatype, intended to iterate efficiently on collections while performing some transformations.
Stars: ✭ 71 (+255%)
Mutual labels:  sequence
Bioconvert
Bioconvert is a collaborative project to facilitate the interconversion of life science data from one format to another.
Stars: ✭ 112 (+460%)
Mutual labels:  sequence
Seqkit
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
Stars: ✭ 607 (+2935%)
Mutual labels:  sequence
paxos-rs
Paxos implementation in Rust
Stars: ✭ 66 (+230%)
Mutual labels:  paxos
Swarm
A robust and fast clustering method for amplicon-based studies
Stars: ✭ 88 (+340%)
Mutual labels:  sequence
Sequitur
Library of autoencoders for sequential data
Stars: ✭ 162 (+710%)
Mutual labels:  sequence
Fill Range
Fill in a range of numbers or letters, positive or negative, optionally passing an increment or multiplier to use.
Stars: ✭ 41 (+105%)
Mutual labels:  sequence
Seqsvr
High performance unique number generator powered by Go
Stars: ✭ 58 (+190%)
Mutual labels:  sequence
Snowflake
java edition of [Twitter Snowflake](https://github.com/twitter/snowflake), a network service for generating unique ID numbers at high scale with some simple guarantees.
Stars: ✭ 114 (+470%)
Mutual labels:  sequence
Restez
😴 📂 Create and Query a Local Copy of GenBank in R
Stars: ✭ 22 (+10%)
Mutual labels:  sequence
genome updater
Bash script to download/update snapshots of files from NCBI genomes repository (refseq/genbank) with track of changes and without redundancy
Stars: ✭ 93 (+365%)
Mutual labels:  sequence
Period
PHP's time range API
Stars: ✭ 616 (+2980%)
Mutual labels:  sequence
Seqsvr
序列号生成器--《万亿级调用系统:微信序列号生成器架构设计及演变》开源实现
Stars: ✭ 109 (+445%)
Mutual labels:  sequence
bio
A lightweight and high-performance bioinformatics package in Golang
Stars: ✭ 80 (+300%)
Mutual labels:  sequence
vscode-commands
Run commands from Tree View / Status Bar / Quick Pick.
Stars: ✭ 45 (+125%)
Mutual labels:  sequence
Easysequence
EasySequence is a powerful fundamental library to process sequcence type, such as array, set, dictionary. All type object which conforms to NSFastEnumeration protocol can be initialzed to an EZSequence instance, then you can operation with them. Finally, you can transfer them back to the original type.
Stars: ✭ 150 (+650%)
Mutual labels:  sequence

Paxoid -- Paxos based masterless sequence.

This application implements a Paxos based masterless ID / Sequence generator. It was built to assign short identifiers to the Erlang/OTP nodes in a cluster. The assigned node identifiers then were used to generate object identifiers of the form {NodeId, LocalCounter} locally.

The following are the main properties, that design was based on:

  • The system should act in the AP mode (from the CAP theorem). It is better to have sequence duplicated in the case of network partitions instead of having system stalled. A merge procedure is defined to repair partitions on merge by renumbering the duplicated IDs.

  • The system must preserve consistency in a single partition. I.e. the AP choice should be no excuse for consistency, where it can actually be achieved.

  • The performance was not the primary concern, as the primary use is expected to generate a small number of IDs, that are then used by all the nodes as a prefixes for locally incrementing counters.

  • The ID should be small, to be able to display them in a GUI, etc. We chosen it to be a number.

There are other techniques for generating (almost) unique IDs, including GUIDs, Twitter's Snowflake, etc. Some overview can be found at Generating unique IDs in a distributed environment at high scale. All of them are either non-masterless or have some probability to generate duplicate IDs. Here we want to have defined semantics for duplication of the IDs and recovery from that.

Check it out

Start several nodes:

rebar3 shell --name [email protected]
rebar3 shell --name [email protected]
rebar3 shell --name [email protected]

Start a paxoid process on each node:

paxoid:start_link(test).

Then join them by running the following on any of the nodes:

paxoid:join(test, ['[email protected]', '[email protected]', '[email protected]']).
paxoid:info(test). % To get some details on the runtime.

Now you can call paxoid:next_id(test) to get new ID from the sequence on any of the nodes.

In order to check, if IDs can be retrieved in parallel, run the following in each of the started nodes:

erlang:register(sh, self()),
receive start -> rp([paxoid:next_id(test) || _ <- lists:seq(1, 100)]) end.

and then start the parallel generation of IDs by running the following from a separate node (rebar3 shell --name [email protected]):

[ erlang:send({sh, N}, start) || N <- ['[email protected]', '[email protected]', '[email protected]']].

Using it in an application

There are two ways to start the paxoid peers:

  • Supervised by the user application. In this case one can get a supervisor's child specification by calling paxoid:start_spec/2 and then pass it to the corresponding application supervisor. Most likely this is the preferred way.

  • Supervised by te paxoid application. For this case one should call paxoid:start_sup/2. The application can also use predefined paxoid peers. They can be configured via the predefined environment variable of the paxoid application.

The paxoid processes can be started with several options passed as a map with the following keys:

  • join => [node()] -- a list of nodes we should synchronize with. That's only an initial list, more nodes can be discovered later. This can be used to join new node to an existing cluster.

  • callback => module() | {module(), Args :: term()} -- a callback module implementing the paxoid behaviour. It can be used to implement a custom persistence as well as to get notifications on various events (like new mapping for a duplicated ID). You can look at paxoid_cb_mem for an example of such a callback module. This module is used by default.

Design choices

  • Sequences are named using atoms only. This allows to register peers using the local registry and to access them via {Name, Node}. We wanted to avoid additional dependencies. Maybe pg2 can be used instead. The atoms were considered as good enough for naming, because the distributed sequence is designed to be used as a basis for local counters, thus number of sequences will be low.

  • Single peer is implemented as a single process. This process combines several FSMs (startup phases, a list of consensus attempts, a list of merge attempts), although they are sharing a lot of common state. Splitting them to separate processes could increase the complexity by adding the coordination between them. Another reason here is related to the process registry. If we are not using a registry like gproc, we need to maintain the relations between processes, thus again adding additional complexity.

Formal verification

TBD: TLA+ specification.

Modules

paxoid
paxoid_cb_file
paxoid_cb_mem
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].