All Projects → diennea → blobit

diennea / blobit

Licence: Apache-2.0 license
BlobIt - a Distributed Large Object Storage

Programming Languages

java
68154 projects - #9 most used programming language
shell
77523 projects
CSS
56736 projects

Projects that are alternatives of or similar to blobit

Infinit
The Infinit policy-based software-defined storage platform.
Stars: ✭ 363 (+1151.72%)
Mutual labels:  storage, distributed
DBMSology
The Paper List on Design and Implmentation of System Software
Stars: ✭ 67 (+131.03%)
Mutual labels:  storage, distributed
oxen-storage-server
Storage server for Oxen Service Nodes
Stars: ✭ 19 (-34.48%)
Mutual labels:  storage, distributed
storage-abstraction
Provides an abstraction layer for interacting with a storage; the storage can be local or in the cloud.
Stars: ✭ 36 (+24.14%)
Mutual labels:  storage, blob-storage
Sia
Blockchain-based marketplace for file storage. Project has moved to GitLab: https://gitlab.com/NebulousLabs/Sia
Stars: ✭ 2,731 (+9317.24%)
Mutual labels:  storage, distributed
Turms
The world's most advanced open source instant messaging engine for 100K~10M concurrent users https://turms-im.github.io/docs
Stars: ✭ 97 (+234.48%)
Mutual labels:  storage, distributed
Storj
Ongoing Storj v3 development. Decentralized cloud object storage that is affordable, easy to use, private, and secure.
Stars: ✭ 1,278 (+4306.9%)
Mutual labels:  storage, distributed
BlobHelper
BlobHelper is a common, consistent storage interface for Microsoft Azure, Amazon S3, Komodo, Kvpbase, and local filesystem written in C#.
Stars: ✭ 23 (-20.69%)
Mutual labels:  storage, blob-storage
Bojack
🐴 The unreliable key-value store
Stars: ✭ 101 (+248.28%)
Mutual labels:  storage, distributed
Foundatio
Pluggable foundation blocks for building distributed apps.
Stars: ✭ 1,365 (+4606.9%)
Mutual labels:  storage, distributed
majordodo
Distributed Operations and Data Organizer built on Apache BookKeeper
Stars: ✭ 25 (-13.79%)
Mutual labels:  distributed, bookkeeper
Cherry-Node
Cherry Network's node implemented in Rust
Stars: ✭ 72 (+148.28%)
Mutual labels:  storage, distributed
storage
Storage Standard
Stars: ✭ 92 (+217.24%)
Mutual labels:  storage
storage-box
Intuitive and easy-to-use storage box.
Stars: ✭ 26 (-10.34%)
Mutual labels:  storage
mconfig
a lightweight distributed configuration center
Stars: ✭ 13 (-55.17%)
Mutual labels:  storage
ImplicitGlobalGrid.jl
Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid
Stars: ✭ 88 (+203.45%)
Mutual labels:  distributed
SimpleStorage
💾 Simplify Android Storage Access Framework for file management across API levels.
Stars: ✭ 498 (+1617.24%)
Mutual labels:  storage
laravel-ovh
Wrapper for OVH Object Storage integration with laravel
Stars: ✭ 30 (+3.45%)
Mutual labels:  storage
money
Dapper Style Distributed Tracing Instrumentation Libraries
Stars: ✭ 65 (+124.14%)
Mutual labels:  distributed
nova
Web framework for Erlang.
Stars: ✭ 175 (+503.45%)
Mutual labels:  distributed

BlobIt

BlobIt is a ditributed binary large objects (BLOBs) storage built upon Apache BookKeeper

Overview

BlobIt stores BLOBS (binary large objects) in buckets, a bucket is like a namespace. Multitenanty is fundamental in BlobIt architecture and it is expected that each tenant uses its own bucket.

Data is stored on a Apache BookKeeper cluster, and this automagically enables BlobIt to scale horizontally, the more Bookies you have the more amount of data you will be able to store.

BlobIt needs a metadata service in order to store refecences to the data, it ships by default with HerdDB, which is also built upon BookKeeper.

Architectural overview

BlobIt is designed for performance and expecially low latency in this scenario: the writer stores one BLOB and readers immediately read such BLOB (usually from different machines). This is the most common path in EmailSuccess, as BlobIt is the core datastore for it. Blobs are supposed to be retained for a couple of weeks, not for very long term, but there is nothing in the design of BlobIt that prevents you for storing data for years.

BlobIt clients talk directly to Bookies both for reads and writes, this way we are exploiting directly all of the BookKeeper optimizations on the write and read path. This architecture is totally decentralized, there is no BlobIt server. You can use the convenience binaries BlobIt service that is simply a pre-package bundle able to run ZooKeeper, BookKeeper, HerdDB and a REST API.

You can see BlobIt as simple extension to BookKeeper, with a metadata layer which makes it simple to:

  • reference Data using a user-supplied name (in form of bucketId/name)
  • organize efficently data in BookKeeper, an allow deletion of BLOBs.

Write path

Writes

Batches of Blobs are stored in BookKeeper ledgers (using the WriteHandleAdv API), We are storing more then one BLOB inside one BookKeeper ledger. BlobIt will collect unused ledgers and delete them.

When a Writer stores a BLOB it receives immedialy an unique ID of the blob, this ID is unique in the whole cluster, not only in the scope of the bucket. Such ID is a "smart id" and it contains all of the information needed to retrieve data without using the metadata service.

Such ID contains information like:

  • the ID of the ledger (64bit)
  • fist entry id (64bit)
  • number of entries (32bit)
  • size of the entry (64bit)

With such information it is possible to read the whole BLOB or even only parts. An object is immutable and it cannot be modified.

The client can assign a custom name, unique inside the context of the Bucket, Readers will be able to access the object using this key. You can assign the same key to another object, this way

If you are using custom keys the writer and the reader have to perform an additional RPC to the metadata service.

Write path

BookKeeper client stores data in immutable ledgers, and performs writes to a quorum of Bookies, which are only dta storage nodes. Each ledger will be written to several bookies, and all the information needed for data retrival is stored on ZooKeeper. ZooKeeper also stores data for Bookie discovery.

So the normal write flow is:

  • create a new ledger:
    • choose a set of available Bookies using ZooKeeper
    • write new ledger metadata to ZooKeeper
  • write each part of the BLOB directly to the Bookies
  • record on the metadata service (HerdDB) which entries of the ledger contains the data
  • perform an RPC to the HerdDB tablespace ledger for the bucket to write metadata
  • the database will perform a write on BookKeeper (still to a quorum of bookies)

The metadata service is decentralized: each bucket will have a dedicated tablespace on HerdDB, this leader will be indipendent from the ledgers of other tablespaces of otherbuckets, this way the system will scale horizonally with the number of Buckets.

Reads

BlobIt clients read data directly from Bookies. Because the objectId contains all of the information to access the data. In case of lookup by custom key a lookup on the metadata service is needed.

The reader can read the full Blob of parts of it. BlobIT supports a Streaming API for reads, suitable for very large objects: as soon as data comes from BookKeeper it is streamed to the application: Think about an HTTP service which retrieves an object and serves it directly to the client.

Buckets and data locality

You can use Buckets in order to make it possible to store data nearby the writer or the reader. BlobIt is able to use an HerdDB tablespace for each bucket, this way all of the metadata of the bucket will be handled using the placement policies configured in the system.

This is very important, because each Bucket will be able to survive and work indipendently from the others.

A typical scenario is to move readers, writers and the primary copy metadata and data next to each other, and have replicas on other machines/racks. Both for the metadata service (HerdDB) and the data service (BookKeeper) replicas will be activated immediately as soon as the reference machines are no more available without any service outage.

Deleting data

Data deletion is the most tricky part, because BlobIt is storing more than one BLOB inside the same ledeger, so you can delete a ledeger only when there is no live BLOB stored in it. We have a garbage collector system which makes maintenance of the Bucket and deleted data from BookKeeper when it is no more needed. Bookies in turn will do their own Garbage Collection, depending on the configuration. So disk space won't be reclaimed as soon as a BLOB is deleted.

BlobIt garbage collection is totally decentralized, any client can run the procedure, and it runs per bucket. Even in this case it is expected that services which operate on a bucket are co-located and take care of running the garbage collection in a timely manner. Usually it makes sense to run the GC of a bucket after deleting a batch of BLOBs of the same bucket.

Java Client example

A tipical writer looks like this:

String BUCKET_ID = "test";
byte[] TEST_DATA = "foo".getBytes();
HerdDBDataSource datasource = new HerdDBDataSource();
datasource.setUrl("jdbc:herddb:localhost");
Configuration configuration
                = new Configuration()
                    .setType(Configuration.TYPE_BOOKKEEPER)
                    .setConcurrentWriters(10)
                    .setUseTablespaces(true)
                    .setZookeeperUrl(env.getAddress());
try (ObjectManager manager = ObjectManagerFactory.createObjectManager(configuration, datasource);) {      
      manager.createBucket(BUCKET_ID, BUCKET_ID, BucketConfiguration.DEFAULT).get();

      BucketHandle bucket = manager.getBucket(BUCKET_ID);
      String id = bucket.put(null, TEST_DATA).get();
}

A typical reader looks like this:

String BUCKET_ID = "test";
byte[] TEST_DATA = "foo".getBytes();
HerdDBDataSource datasource = new HerdDBDataSource();
datasource.setUrl("jdbc:herddb:localhost");
Configuration configuration
                = new Configuration()
                    .setType(Configuration.TYPE_BOOKKEEPER)
                    .setConcurrentWriters(10)
                    .setUseTablespaces(true)
                    .setZookeeperUrl(env.getAddress());
try (ObjectManager manager = ObjectManagerFactory.createObjectManager(configuration, datasource);) {      
      manager.createBucket(BUCKET_ID, BUCKET_ID, BucketConfiguration.DEFAULT).get();

      BucketHandle bucket = manager.getBucket(BUCKET_ID);
      
      byte[] data = bucketReaders.get(it).get();
}

Most of the APIs are async and they are based on CompletableFuture.

REST API

We are delivering a REST service which is (almost) compatible with Open Stack Swift API. This service is still in ALPHA phase and it is currently used in order to perform benchmarks and comparisons with other products.

Security

As BlobIt is mostly a layer on top of BookKeeper, ZooKeeper and HerdDB all of the security aspects are handled directly but low level clients. Our suggestion is that you enable SASL/Kerberos authentication on all of such services. There is no support for other security features, because we expecte the client application to be in charge for handling semantics of buckets/blobs.

Getting in touch

Feel free to create issues in order to interact with the community.

Documentation will come soon, start with the examples inside the test cases in order to understand better how it works.

Please let us know if you are trying out this project, we will be happy to hear about your case and help you.

License

BlobIt is under Apache 2 license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].