All Projects → YANG-DB → yang-db

YANG-DB / yang-db

Licence: Apache-2.0 license
YANGDB Open-source, Scalable, Non-native Graph database (Powered by Elasticsearch)

Programming Languages

java
68154 projects - #9 most used programming language
javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects
shell
77523 projects
Batchfile
5799 projects

Projects that are alternatives of or similar to yang-db

koza
Data transformation framework for LinkML data models
Stars: ✭ 21 (-77.17%)
Mutual labels:  ontology, knowledge-graph
Go Cyber
Your 🔵 Superintelligence
Stars: ✭ 270 (+193.48%)
Mutual labels:  search-engine, knowledge-graph
oceanbase
OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.
Stars: ✭ 4,466 (+4754.35%)
Mutual labels:  scale, distributed
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
Stars: ✭ 406 (+341.3%)
Mutual labels:  ontology, knowledge-graph
knowledge-graph-change-language
Tools for working with KGCL
Stars: ✭ 14 (-84.78%)
Mutual labels:  ontology, knowledge-graph
ITO
Intelligence Task Ontology (ITO)
Stars: ✭ 37 (-59.78%)
Mutual labels:  ontology, knowledge-graph
PeARS-orchard
This is the decentralised version of PeARS, the people's search engine, to be taken as Phase 1 of the fully distributed system.
Stars: ✭ 34 (-63.04%)
Mutual labels:  search-engine, distributed
ont-api
ONT-API (OWL-API over Apache Jena)
Stars: ✭ 20 (-78.26%)
Mutual labels:  ontology, owl-ontology
spicedb
Open Source, Google Zanzibar-inspired fine-grained permissions database
Stars: ✭ 3,358 (+3550%)
Mutual labels:  scale, distributed
Dweb.page
Your Gateway to the Distributed Web
Stars: ✭ 239 (+159.78%)
Mutual labels:  search-engine, distributed
Dgraph
Native GraphQL Database with graph backend
Stars: ✭ 17,127 (+18516.3%)
Mutual labels:  scale, distributed
janusgraph-docker
Yet another JanusGraph, Cassandra/Scylla and Elasticsearch in Docker Compose setup
Stars: ✭ 54 (-41.3%)
Mutual labels:  gremlin, cypher-query-language
Awesome Search
Awesome Search - this is all about the (e-commerce) search and its awesomeness
Stars: ✭ 361 (+292.39%)
Mutual labels:  search-engine, knowledge-graph
OLGA
an Ontology SDK
Stars: ✭ 36 (-60.87%)
Mutual labels:  ontology, knowledge-graph
awesome-knowledge-graphs
Graph databases, Knowledge Graphs, SPARQ
Stars: ✭ 56 (-39.13%)
Mutual labels:  knowledge-graph, graph-databases
react-scale-text
A React library to keep an element's text scaled to fit it's container
Stars: ✭ 39 (-57.61%)
Mutual labels:  scale
jelass
Janus + Elastic Search + Cassandra docker container with SSL Client Certificates implemented.
Stars: ✭ 13 (-85.87%)
Mutual labels:  gremlin
manager
The API endpoint that manages nebula orchestrator clusters
Stars: ✭ 28 (-69.57%)
Mutual labels:  distributed
legitbot
🤔 Is this Web request from a real search engine🕷 or from an impersonating agent 🕵️‍♀️?
Stars: ✭ 18 (-80.43%)
Mutual labels:  search-engine
dxram
A distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-72.83%)
Mutual labels:  distributed

From Graph to open-search - A tail of a Yang-DB...

Coverage Status GitHub license

Run

Latest News

Project YANG-DB

Members:
Contributors:
Evangelist:
License:

GitHub license

Code Coverage:

Coverage Status

Dependencies Tags:

Infrastructure Technologies

Introduction

A Post introducing our new Open source initiative for building a Scalable Distributed Graph DB Over Elasticsearch https://www.linkedin.com/pulse/making-db-lior-perry/

Another usage of Elasticsearch as a graph DB https://medium.com/@imriqwe/elasticsearch-as-a-graph-database-bc0eee7f7622

The world of graph databases has had a tremendous impact during the last few years, in particularity relating to social networks and their effect of our everyday activity.

The once mighty (and lonely) RDBMS is now obliged to make room for an emerging and increasingly important partner in the data center: the graph database.

Twitter’s using it, Facebook’s using it, even online dating sites are using it; they are using a relationship graphs. After all, social is social, and ultimately, it’s all about relationships.

There are two main elements that distinguish graph technology: storage and processing.

Graph DB - Storage

Graph storage commonly refers to the structure of the database that contains graph data.

Such graph storage is optimized for graphs in many aspects, ensuring that data is stored efficiently, keeping nodes and relationships close to each other in the actual physical layer.

Graph storage is classified as non-native when the storage comes from an outside source, such as a relational, columnar or any other type of database (most cases a NoSQL store is preferable)

Non-native graph databases usually comprise of existing relational, document and key value stores, adapted for the graph data model query scenarios.

Graph DB - Processing

Graph Processing includes accessing the graph, traversing the vertices & edges and collecting the results.

A traversal is how you query a graph, navigating from starting nodes to related nodes, following relationships according to some rules.

finding answers to questions like "what music do my friends like that I don’t yet own?"

Graph Models

One of the more popular models for representing a graph is the Property Model.

Property model

This model contains connected entities (the nodes) which can hold any number of attributes (key-value-pairs).

Nodes

Nodes have a unique id and list of attributes represent their features and content.

Nodes can be marked with labels representing their different roles in your domain. In addition to relationship properties, labels can also serve metadata over graph elements.

Nodes are often used to represent entities but depending on the domain relationships may be used for that purpose as well.

Relationships

Relationship is represented by the source and target node they are connecting and in case of multiple connections between the same vertices – additional label of property to distinguish (type of relationship)

Relationships organize nodes into arbitrary structures, allowing a graph to resemble a list, a tree, a map, or a compound entity — any of which may be combined into yet more complex structures.

Very much like foreign keys between tables in relational DB model, In the graph model relationship describes the relations between the vertices.

One major difference in this model (compared to the strict relational schema) is that this schema-less structure enables adding / removing relationship between vertices without any constraints.

Additional graph model is the Resource Description Framework (RDF) model.

Why Elastic

Our use-case is in the domain of the social networks. A very large social graph that must be frequently updated and available for both:

  • simple (mostly textual) search

  • graph based queries.

All the read & write are made in concurrency with reasonable response time and ever growing throughput.

The first requirement was fulfilled using Elasticsearch – a well known and established NoSql document search and storage engine capable of containing very large volume of data.

For the second requirement we decided that our best solution would be to use elasticsearch as the non-native graph-DB storage layer.

As mentioned before, a graph-DB storage layer can be implemented using a non-native storage such as NoSql storage.

In future discussion I’ll get into details why the most popular community alternative for graph-DB – Neo4J, could not fit our needs.

Modeling data as graph

The first issue on our plate is to design the data model representing the graph, as a set of vertices and edges.

With elastic we can utilize its powerful search abilities to efficiently fetch node & relation documents according to the query filters.

Elastic Index

In elasticsearch each index can be described as a table for a specific schema, the index itself is partitioned into shared which allow scale and redundancy (with replicas) across the cluster.

A document is routed to a particular shard in an index using the following formula:

shard_num = hash(_routing) % num_primary_shards

Each index has a schema (called type in elastic) which defines the documents structure (called mapping in elastic). Each index can hold only a single type of mapping (since elastic 6)

The vertices index will contain the vertices documents with the properties, the edges index will contain the edges documents with their properties.

Query Language

The way we describe how to traverse the graph (data source)

There are few graph-oriented query languages:

Some of the languages are more pattern based and declarative, some are more imperative – they all describe the logical way of traversing the data.

Cypher

Let’s consider Cypher - a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.

It allows us to state what we want to select, insert, update or delete from our graph data without requiring us to describe exactly how to do it.

alt text

From logical to physical

Once a logical query is given we need to translate it to the physical layer of the data storage which is elasticsearch.

Elastic has a query DSL which is focused on search and aggregations – not on traversing, we need an additional translation phase that will take into account the schematic structure of the graph (and the underlying indices).

Logical to physical query translation is a process that involves few steps:

  • validating the query against the schema

  • translating the labels into real schema entities (indices)

  • creating the physical elastic query

This is the process in a high-level review, in practice - there will be more stages that optimize the logical query; in some cases it is possible to create multiple physical plans (execution plans) and rank them according to some efficiency (cost) strategy such as count of elements needed to fetch...

Conclusion

We started with discussing the purpose of graphs DB in today’s business use cases and reviewed different models for representing a graph. Understanding the fundamental logical building blocks that a potential graph DB should consist and discussed an existing NoSql candidate to fulfill the storage layer requirements.

Once we selected elasticsearch as the storage layer we took the LDBC Social Network Benchmark graph model and simplified it to be optimized in that specific storage. We discussed the actual storage schema with the redundant properties and reviewed cypher language to query the storage in an sql-like graph pattern language.

We continued to see the actual transformation of the cypher query into a physical execution query that will run by Elasticsearch.

In the last section we took a simple graph query and drilled down into the details of the execution strategies and the bulking mechanism.

Start Using

Please review the following tutorial pages:

Installaiton tutorial:

Schema creation tutorial:

Data Loading tutorial:

Query the Graph tutorial:

Projection materialization & count tutorial:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].