All Projects → rayokota → hdocdb

rayokota / hdocdb

Licence: Apache-2.0 license
HBase as a JSON Document Database

Programming Languages

java
68154 projects - #9 most used programming language
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to hdocdb

NoSQLDataEngineering
NoSQL Data Engineering
Stars: ✭ 25 (+4.17%)
Mutual labels:  hbase, document-database
Lidea
大型分布式系统实时监控平台
Stars: ✭ 28 (+16.67%)
Mutual labels:  hbase
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+570.83%)
Mutual labels:  hbase
CytoPy
A data-centric flow/mass cytometry automated analysis framework
Stars: ✭ 27 (+12.5%)
Mutual labels:  document-database
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+637.5%)
Mutual labels:  hbase
phoenix
Apache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (-4.17%)
Mutual labels:  hbase
Tera
An Internet-Scale Database.
Stars: ✭ 1,846 (+7591.67%)
Mutual labels:  hbase
orion
Management and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (+220.83%)
Mutual labels:  hbase
database-journal
Databases: Concepts, commands, codes, interview questions and more...
Stars: ✭ 50 (+108.33%)
Mutual labels:  document-database
Node Hbase
Asynchronous HBase client for NodeJs using REST
Stars: ✭ 226 (+841.67%)
Mutual labels:  hbase
Hgraphdb
HBase as a TinkerPop Graph Database
Stars: ✭ 226 (+841.67%)
Mutual labels:  hbase
Sparkstreaming
💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (+645.83%)
Mutual labels:  hbase
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+800%)
Mutual labels:  hbase
Hbase Doc Zh
📖 HBase 中文参考指南
Stars: ✭ 164 (+583.33%)
Mutual labels:  hbase
xdu-cloudcourse-web
西电云计算课程大作业Web端代码示例
Stars: ✭ 26 (+8.33%)
Mutual labels:  hbase
Hbase Connectors
Apache HBase Connectors
Stars: ✭ 153 (+537.5%)
Mutual labels:  hbase
mizo
Super-fast Spark RDD for Titan Graph Database on HBase
Stars: ✭ 24 (+0%)
Mutual labels:  hbase
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-12.5%)
Mutual labels:  hbase
replicator
MySQL Replicator. Replicates MySQL tables to Kafka and HBase, keeping the data changes history in HBase.
Stars: ✭ 41 (+70.83%)
Mutual labels:  hbase
bagri
XML/Document DB on top of distributed cache
Stars: ✭ 40 (+66.67%)
Mutual labels:  document-database

HDocDB - HBase as a JSON Document Database

Build Status Maven Javadoc

HDocDB is a client layer for using HBase as a store for JSON documents. It implements many of the interfaces in the OJAI framework.

Installing

Releases of HDocDB are deployed to Maven Central.

<dependency>
    <groupId>io.hdocdb</groupId>
    <artifactId>hdocdb</artifactId>
    <version>1.0.1</version>
</dependency>

Building

You can also choose to build HDocDB manually. Prerequisites for building:

  • git
  • Maven
  • Java 8
git clone https://github.com/rayokota/hdocdb.git
cd hdocdb
mvn clean package -DskipTests

Deployment

Currently HDocDB does not make use of coprocessors. However, HDocDB does make use of server-side filters. To deploy HDocDB:

  • Add target/hdocdb-1.0.0.jar to the classpath of all HBase region servers.
  • Restart the HBase region servers.

Setup

To initialize HDocDB, an HBase connection is required. For example,

...
Configuration config = HBaseConfiguration.create();
Connection conn = ConnectionFactory.createConnection(config);
HDocumentDB hdocdb = new HDocumentDB(conn);
...

Next is to obtain a document collection.

...
HDocumentCollection coll = hdocdb.getCollection("mycollection");
...

Each document collection is backed by an HBase table.

Creating Documents

Once a document collection is in hand, creating documents is straightforward.

Document doc = new HDocument()
    .setId("jdoe")
    .set("firstName", "John")
    .set("lastName", "Doe")
    .set("dateOfBirth", ODate.parse("1970-10-10"));
coll.insert(doc);

You can also use the insertOrReplace() method, which will replace the document with the same ID if it already exists.

coll.insertOrReplace(doc);

Retrieving Documents

To retrieve all documents in a collection, use the find() method.

DocumentStream docs = coll.find();

To retrieve a single document by ID, use the findById() method.

Document doc = coll.findById("jdoe");

You can also pass a condition to the find() method.

QueryCondition condition = new HQueryCondition()
    .and()
    .is("lastName", QueryCondition.Op.EQUAL, "Doe")
    .is("dateOfBirth", QueryCondition.Op.LESS, ODate.parse("1981-01-01"))
    .close()
    .build();
DocumentStream docs = coll.find(condition);

Updating Documents

To update a document, first create a document mutation.

DocumentMutation mutation = new HDocumentMutation()
    .setOrReplace("firstName", "Jim")
    .setOrReplace("dateOfBirth", ODate.parse("1970-10-09"));
coll.update("jdoe", mutation);

Here are the different types of methods supported with HDocumentMutation.

  • setOrReplace - update or replace a field with the given value
  • set - perform an update if a field either doesn't exist or has the same type as the given value
  • delete - delete a field
  • increment - increment a numeric field with the given value
  • append - append the given array (or string) to an existing array (or string)
  • merge - merge the given subdocument with an existing subdocument

All of the methods other than the setOrReplace() method perform a read-modify-write at the client side.

Deleting Documents

To delete a document:

coll.delete("jdoe");

Saving and Retrieving Objects

Since OJAI has Jackson integration, HDocDB can treat HBase as an object store. Assuming your Java class is annotated as follows:

public class User {

    private String id;
    private String firstName;
    private String lastName;

    @JsonCreator
    public User(@JsonProperty("_id")       String id,
                @JsonProperty("firstName") String firstName,
                @JsonProperty("lastName")  String lastName) {
        this.id = id;
        this.firstName = firstName;
        this.lastName = lastName;
    }

    @JsonProperty("_id")
    public String getId() { return id; }

    public String getFirstName() { return firstName; }

    public String getLastName() { return lastName; }
}

Then instances of your class can be saved and retrieved using HDocDB.

User user = new User("jsmith", "John", "Smith");
Document doc = Json.newDocument(user);
coll.insert(doc);
...
user = coll.findById("jsmith").toJavaBean(User.class);

Global Secondary Indexes

HDocDB also has basic support for global secondary indexes. For more sophisticated indexing support, an engine that can perform full text searches, such as ElasticSearch or Solr, is recommended.

Index management is performed mostly on the client-side, so it is not as performant as a coprocessor-based solution such as that provided by Apache Phoenix. Also, covered indexes are not supported, so each index lookup requires a join. However, the currrent index implementation should still help speed up some reads (at the cost of slightly slower writes).

To create a secondary index on the lastName field:

coll.createIndex("myindex" "lastName", Value.Type.STRING);

If the index is created after documents have already been added to the database, then the index will be populated in the background asynchronously. Since the indexing is performed on the client, this may take some time for a large collection.

Now, when performing a query such as the following, the index above will be used.

QueryCondition condition = new HQueryCondition()
    .and()
    .is("lastName", QueryCondition.Op.EQUAL, "Doe")
    .is("dateOfBirth", QueryCondition.Op.LESS, ODate.parse("1981-01-01"))
    .close()
    .build();
DocumentStream docs = coll.find(condition);

A query will use at most one index. We can verify which index was used as follows.

System.out.println(((HDocumentStream)docs).explain().asDocument());

which should print the following.

{
    "plan": "index scan",
    "indexName": "myindex",
    "indexBounds": {"lastName": "[Doe‥Doe]"},
    "staleIndexesRunningCount": 0
}

We can also specify which index to use.

DocumentStream docs = coll.findWithIndex("myindex", condition);

Or that no index should be used.

DocumentStream docs = coll.findWithIndex(Index.NONE, condition);

You can also create compound indexes.

IndexBuilder builder = coll.newIndexBuilder("myindex2")
    .add("lastName", Value.Type.STRING)
    .add("firstName", Value.Type.STRING)
    .build();

HDocDB Shell with Nashorn Integration

The HDocDB shell is a command-line shell with Nashorn integration, so that MongoDB-like queries can be specified interactively or in a Nashorn script.

To start the HDocDB shell you need to use jrunscript that comes with Java (typically found in $JAVA_HOME/bin).

$ jrunscript -cp <hbase-conf-dir>:target/hdocdb-1.0.0.jar -f target/classes/shell/hdocdb.js -f - 

Here is a sample run.

nashorn> db.mycoll.insert( { _id: "jdoe", first_name: "John", last_name: "Doe" } )
	
nashorn> var doc = db.mycoll.find( { last_name: "Doe" } )[0]
	
nashorn> print(doc)
{"_id":"jdoe","first_name":"John","last_name":"Doe"}
	
nashorn> db.mycoll.update( { last_name: "Doe" }, { $set: { first_name: "Jim" } } )
	
nashorn> var doc = db.mycoll.find( { last_name: "Doe" } )[0]
	
nashorn> print(doc)
{"_id":"jdoe","first_name":"Jim","last_name":"Doe"}
	
nashorn> db.mycoll.delete( "jdoe" )

To run a script:

$ jrunscript -cp <hbase-conf-dir>:target/hdocdb-1.0.0.jar -f target/classes/shell/hdocdb.js -f <script>

Implementation Notes

Each document is stored as a separate row in HBase. This allows multiple operations on a document to be performed together atomically. The document is essentially "shredded" using a technique called key-flattening, as described in the Argo paper. That technique was developed for use with a relational database, but in HDocDB it has been adapted for HBase.

The implementation of global secondary indexes is based on blogs by Hofhansl and Yates.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].