All Projects → sirixdb → brackit

sirixdb / brackit

Licence: other
Query processor with proven optimizations, ready to use for your document store to query semi-structured data with a JSONiq like extension of XQuery. Can also be used as an ad-hoc in-memory query processor.

Programming Languages

java
68154 projects - #9 most used programming language
XQuery
69 projects

Projects that are alternatives of or similar to brackit

Jsonpath Rs
JSONPath for Rust
Stars: ✭ 31 (+10.71%)
Mutual labels:  json-data, xpath
postal-codes-json-xml-csv
Collection of postal codes in different formats, ready for importing.
Stars: ✭ 181 (+546.43%)
Mutual labels:  json-data
buildsqlx
Go database query builder library for PostgreSQL
Stars: ✭ 92 (+228.57%)
Mutual labels:  clauses
api-data
Static JSON data from the API, plus a JSON Schema
Stars: ✭ 88 (+214.29%)
Mutual labels:  json-data
jsonfiddle
JSON Fiddling
Stars: ✭ 14 (-50%)
Mutual labels:  json-data
xpath2.js
xpath.js - Open source XPath 2.0 implementation in JavaScript (DOM agnostic)
Stars: ✭ 74 (+164.29%)
Mutual labels:  xpath
panthro
An implementation of XPath 3.0 in Objective-C/Cocoa
Stars: ✭ 45 (+60.71%)
Mutual labels:  xpath
selectorlib
A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+89.29%)
Mutual labels:  xpath
soda-for-java
SODA (Simple Oracle Document Access) for Java is an Oracle library for writing Java apps that work with JSON (and not only JSON!) in the Oracle Database. SODA allows your Java app to use the Oracle Database as a NoSQL document store.
Stars: ✭ 61 (+117.86%)
Mutual labels:  json-data
py-jsonq
A simple Python package to Query over Json Data
Stars: ✭ 100 (+257.14%)
Mutual labels:  json-data
vscode-xslt-tokenizer
VSCode extension for highlighting XSLT and XPath (upto 3.0/3.1)
Stars: ✭ 37 (+32.14%)
Mutual labels:  xpath
graphql-cli-load
A graphql-cli data import plugin to call mutations with data from JSON/CSV files
Stars: ✭ 63 (+125%)
Mutual labels:  json-data
exquery
EXQuery repository
Stars: ✭ 19 (-32.14%)
Mutual labels:  xpath
krates
📦 A free HTTP based JSON storage service
Stars: ✭ 36 (+28.57%)
Mutual labels:  json-data
reapr
🕸→ℹ️ Reap Information from Websites
Stars: ✭ 14 (-50%)
Mutual labels:  xpath
cypress-xpath
Adds XPath command to Cypress test runner
Stars: ✭ 145 (+417.86%)
Mutual labels:  xpath
DataStore
Visual develop tool of creating mocked Json
Stars: ✭ 30 (+7.14%)
Mutual labels:  json-data
fs2-data
streaming data parsing and transformation library
Stars: ✭ 103 (+267.86%)
Mutual labels:  xpath
PowerJSON
Powerjson is json's improved data format.
Stars: ✭ 24 (-14.29%)
Mutual labels:  json-data
qtspecs
QT4 specifications
Stars: ✭ 22 (-21.43%)
Mutual labels:  xpath

Build & test

Brackit - a retargetable JSONiq query engine

Brackit is a flexible JSONiq and XQuery query processor developed during Dr. Sebastian Bächles time as a PhD student at the TU Kaiserslautern in the context of their research in the field of query processing for semi-structured data. The system features a fast runtime and a flexible compiler backend, which is, e.g., able to rewrite queries for optimized join processing and efficient aggregation operations. It's either usable as an in-memory ad-hoc query engine or as the query engine of a data store. The data store itself can add sophisticated optimizations in different stages of the query processor. Thus, Brackit already bundles common optimizations and a data store can add further optimizations for instance for index matching.

Lately, Johannes Lichtenberger has added many optional temporal enhancements for temporal data stores such as SirixDB. Furthermore, JSON is now a first-class citizen. Brackit supports a slightly different syntax but the same data model as JSONiq and all update primitives described in the JSONiq specification. Brackit also supports Python-like array slices.

Main features

  • Retargetable, thus sharing optimizations, which are common for different data stores (physical optimizations and index rewrite rules can simply be added in further stages).
  • JSONiq, a language which especially targets querying JSON, supporting user defined functions, easy tree traversals, FLWOR expressions to iterate, filter, sort and project item sequences.
  • Set-oriented processing, meaning pipelined execution of FLWOR clauses through operators, which operate on arrays of tuples and thus support known optimizations from relational database querying for implicit joins and aggregates.

We're currently working on a Jupyter Notebook / Tutorial.

Here's a more detailed document about the vision and overall mission of Brackit.

Syntax differences in relation to JSONiq

  • array indexes start at position 0
  • object projections via a special syntax ($object{field1,field2,field3} instead of a function)
  • Python-like array slices

Community

We have a Discord server, where we'd welcome everyone who's interested in the project.

Publications

As the project started at a university (TU - Kaiserslautern under supervision of Dr. Dr. Theo Härder we'd be happy if it would be used as a research project again, too as there's a wide field of topics for future research and improvements.)

Getting started

If you simply want to use Brackit as a standalone query processor use the JAR provided with the release

Otherwise for contributing

Download ZIP or Git Clone

git clone https://github.com/sirixdb/brackit.git

or use the following dependencies in your Maven or Gradle project if you want to add queries in your Java or Kotlin projects for instance or if you want to implement some interfaces and add custom rewrite rules to be able to query your data store.

Brackit uses Java 17, thus you need an up-to-date Gradle (if you want to work on Brackit) and an IDE (for instance IntelliJ or Eclipse).

Maven / Gradle

At this stage of development, you should use the latest SNAPSHOT artifacts from the OSS snapshot repository to get the most recent changes. You should use the most recent Maven/Gradle versions as we'll update to the newest Java versions.

Just add the following repository section to your POM or build.gradle file:

<repository>
  <id>sonatype-nexus-snapshots</id>
  <name>Sonatype Nexus Snapshots</name>
  <url>https://oss.sonatype.org/content/repositories/snapshots</url>
  <releases>
    <enabled>false</enabled>
  </releases>
  <snapshots>
    <enabled>true</enabled>
  </snapshots>
</repository>
repository {
    maven {
        url "https://oss.sonatype.org/content/repositories/snapshots/"
        mavenContent {
            snapshotsOnly()
        }
    }
}
<dependency>
  <groupId>io.sirix</groupId>
  <artifactId>brackit</artifactId>
  <version>0.1.10-SNAPSHOT</version>
</dependency>
compile group:'io.sirix', name:'brackit', version:'0.3-SNAPSHOT'

What's Brackit?

Brackit is a query engine, which either could be used by different storage/database backends whereas common optimizations are shared as for instance set-oriented processing and hash-joins of FLWOR-clauses. Furthermore, in-memory stores for both processing XML and JSON are supported, thus brackit can simply be used as an in-memory query processor for ad-hoc analysis.

At the moment we support XQuery 1.0 including library module support, the XQuery Update Facility 1.0 and some features of XQuery 3.0 like the FLWOR clauses group by and count.

As a speciality, Brackit comes with extensions to work natively with JSON-style arrays and objects mostly as in JSONiq, also supporting all the update statements of JSONiq. Furthermore array index slices as in Python are supported. Another extension allows you to use a special statement syntax for writing query programs in a script-like style.

Jupyter Notebook / Tutorial

We're currently working on a tutorial, where you can execute interactive queries on Brackit's in-memory store.

Installation

Compiling from source

To build and package change into the root directy of the project and run Maven:

mvn package

To skip running the unit tests run instead.

mvn -DskipTests package

That's all. You find the ready-to-use jar file(s) in the subdirectory ./target

Step 3: Dependency

If you want to use brackit in your other maven- or gradle-based projects, please have a look into the "Maven / Gradle" section.

First Steps

Running from the command line

Brackit ships with a rudimentary command line interface to run ad-hoc queries. Invoke it with

java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar

where x.y.z is the version number of brackit.

Simple queries

The simplest way to run a query is by passing it via STDIN:

echo "1+1" | java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar

=> 2

If the query is stored in a separate file, let's say test.xq, type:

java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar -qf test.xq

or use the file redirection of your shell:

java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar < test.xq

You can also use an interactive shell and enter a bunch of queries terminated with an "END" on the last line:

java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar -iq

Querying documents

Querying documents is as simple as running any other query.

The default "storage" module resolves any referred documents accessed by the XQuery functions fn:doc() and fn:collection() at query runtime (XML).

To query a document in your local filesytem simply use the path to this document in the fn:doc() function:

java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar -q "doc('products.xml')//product[@prodno = '4711']"

For JSON there's the function json-doc(). Let's assume we have the following simple JSON structure:

{
  "products": [
    { "productno": 4711, "product": "Product number 4711" },
    { "productno": 5982, "product": "Product number 5982" }
  ]
}

We can query this first dereferencing the "products" object field with ., then unbox the array value via [] and add a filter where "$$" denotes the current context item and "{fieldName}" projects the resulting object into a new object, which is returned.

java -jar brackit.jar -q "json-doc('products.json').products[][$$.productno eq 4711]{product}"

Query result
{"product":"Product number 4711"}

Of course, you can also directly query documents via http(s), or ftp. For example:

java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar -q "count(doc('http://example.org/foo.xml')//bar)"

or

java -jar brackit-x.y.z-SNAPSHOT-with-dependencies.jar -q "count(jn:doc('http://example.org/foo.xml').bar[])"

Coding with Brackit

Running a query embedded in a Java program requires only a few lines of code:

String query = """
    for $i in (1 to 4)
    let $d := {$i}
    return $d
    """;

// initialize a query context
QueryContext ctx = new QueryContext();

// compile the query
XQuery xq = new XQuery(query);

// enable formatted output
xq.setPrettyPrint(true);

// run the query and write the result to System.out
xq.serialize(ctx, System.out);

JSON

Brackit features a seamless integration of JSON-like objects and arrays directly at the language level.

You can easily mix arbitrary XML and JSON data in a single query or simply use brackit to convert data from one format into the other. This allows you to get the most out of your data.

The language extension allows you to construct and operate JSON data directly; additional utility functions help you to perform typical tasks.

Everything is designed to simplify joint processing of XDM and JSON and to maximize the freedom of developers. Thus, our extension effectively supports some sort of superset of XDM and JSON. That means, it is possible to create arrays and objects which do not strictly conform to the JSON RFC. It's up to you to decide how you want to have your data look like!

Arrays

Arrays can be created using an extended version of the standard JSON array syntax:

(: statically create an array with 3 elements of different types: 1, 2.0, "3" :)
[ 1, 2.0, "3" ]

(: for compliance with the JSON syntax we have to use functions to create the values 'true', 'false', and 'null'. They are translated into the XML values xs:bool('true'), xs:bool('false') and an atomic null value.
:)
[ true(), false(), jn:null() ]

(: as that's cumbersome per default Brackit will parse the tokens 'true', 'false' to the XDM boolean values and 'null' to the new type js:null. :)
[ true, false, null ]

(: is different to :)
[ (./true), (./false), (./null) ]
(: where each field is initialized as the result of a path expression
   starting from the current context item, e,g., './true'.
:)

(: dynamically create an array by evaluating some expressions: :)
[ 1+1, substring("banana", 3, 5), () ] (: yields the array [ 2, "nana", () ] :)

(: arrays can be nested and fields can be arbitrary sequences :)
[ (1 to 5) ] (: yields an array of length 1: [(1,2,3,4,5)] :)
[ some text ] (: yields an array of length 1 with an XML fragment as field value :)
[ 'x', [ 'y' ], 'z' ] (: yields an array of length 3: [ 'x' , ['y'], 'z' ] :)

(: a preceding '=' distributes the items of a sequence to individual array positions :)
[ =(1 to 5) ] (: yields an array of length 5: [ 1, 2, 3, 4, 5 ] :)

(: array fields can be accessed by the '[[ ]]' postfix operator: :)
let $a := [ "Jim", "John", "Joe" ] return $a[[1]] (: yields the string "John" :)

(: the function bit:len() returns the length of an array :)
bit:len([ 1, 2 ]) (: yields 2 :)

(: array slices are supported as for instance (as in Python) :)
let $a := ["Jim", "John", "Joe" ] return $a[[0:2]] (: yields ["Jim", "John"] :)

(: array slices with a step operator :)
let $a := ["Jim", "John", "Joe" ] return $a[[0:2:-1]] (: yields ["John", "Jim"] :)

let $a := [{"foo": 0}, "bar", {"baz":true}] return $a[[::2]] (: yields [{"foo":0},{"baz:true}] :)

(: array unboxing :)
let $a := ["Jim", "John", "Joe"] return $a[] (: yields the sequence "Jim" "John" "Joe" :)

(: the unboxing is made implicitly in for-loops :)
let $a := ["Jim", "John", "Joe]
for $value in $a
return $value (: yields the same as above :)

(: negative array index :)
let $a := ["Jim", "John", "Joe"] return $a[[-1]] (: yields "Joe" :)

Objects

Objects provide an alternative to XML to represent structured data. Like with arrays we support an extended version of the standard JSON object syntax:

(: statically create a record with three fields named 'a', 'b' and 'c' :)
{ "a": 1, "b" : 2, "c" : 3 }

(: 'null' is a new atomic type and jn:null() creates this type, true and false are translated into the XML values xs:bool('true'), xs:bool('false').
:)
{ "a": true(), "b" : false(), "c" : jn:null()}

or simply

{ "a": true, "b": false, "c": null}

(: field values may be arbitrary expressions:)
{ "a" : concat('f', 'oo') , "b" : 1+1, "c" : [1,2,3] } (: yields {"a":"foo","b":2,"c":[1,2,3]} :)

(: field values are defined by key-value pairs or by an expression
   that evaluates to an object
:)
let $r := { "x":1, "y":2 } return { $r, "z":3} (: yields {"x":1,"y":2,"z":3} :)

(: fields may be selectively projected into a new object :)
{"x": 1, "y": 2, "z": 3}{z,y} (: yields {"z":3,"y":2} :)

(: values of object fields can be accessed using the deref operator '.' :)
{ "a": "hello", "b": "world" }.b (: yields the string "world" :)

(: the deref operator can be used to navigate into deeply nested object structures :)
let $n := yval let $r := {"e" : {"m":'mvalue', "n":$n}} return $r.e.n/y (: yields the XML fragment yval :)

(: the deref operator can be used to navigate into deeply nested object structures in combination with the array unboxing operator for instance :)
(: note, that here the expression "[]" is unboxing the array and a sequence of items is evaluated for the next deref operator :)
(: the deref operator thus either get's a sequence input or an object as the left operand :)
let $r := {"e": {"m": [{"n":"o"}, true, null, {"n": "bar"}] }, "n":"m"}} return $r.e.m[].n (: yields "o" "bar" :)

(: to only retrieve the first item/value in the array you can use an index :)
let $r := {"e": {"m": [{"n":"o"}, true, null, {"n": "bar"}] }, "n":"m"}} return $r.e.m[[0]].n (: yields "o" :)

(: the function bit:fields() returns the field names of an object :)
let $r := {"x": 1, "y": 2, "z": 3} return bit:fields($r) (: yields the xs:QName array [x,y,z ] :)

(: the function bit:values() returns the field values of an object :)
let $r := {"x": 1, "y": 2, "z": (3, 4) } return bit:values($r) (: yields the array [1,2,(2,4)] :)

JSONiq update expressions

Brackit supports all defined update statements in the JSONiq specification. It makes sense to implement these in a data store backend as for instance in SirixDB.

(: rename a field in an object :)
let $object := {"foo": 0}
return rename json $object.foo as "bar"  (: renames the field foo of the object to bar :)

(: append values into an array :)
append json (1, 2, 3) into ["foo", true, false, null]  (: appends the sequence (1,2,3) into the array (["foo",true,false,null,[1,2,3]]) :)

(: insert at a specific position :)
insert json (1, 2, 3) into ["foo", true, false, null] at position 2  (: inserts the sequence (1,2,3) into the second position of the array (["foo",true,[1,2,3],false,null]) :)

(: insert a json object and merge the field/values into an existing object :)
insert json {"foo": not(true), "baz": null} into {"bar": false}   (: inserts/appends the two field/value pairs into the object ({"bar":false,"foo":false,"baz:null}) :)

(: delete a field/value from an object :)
delete json {"foo": not(true), "baz": null}.foo    (: removes the field "foo" from the object :)

(: delete an array item at position 1 in the array :)
delete json ["foo", 0, 1][[1]]  (: removes the 0 (["foo",1]) :)

(: replace a JSON value of a field with another value :)
replace json value of {"foo": not(true), "baz": null}.foo with 1     (: thus, the object is adapted to {"foo":1,"baz":null} :)

(: replace an item in an array at the second position (that is the third) :)
replace json value of ["foo", 0, 1][[2]] with "bar"   (: thus, the array is adapted to ["foo",0,"bar"]

Parsing JSON

(: the utility function json:parse() can be used to parse JSON data dynamically
   from a given xs:string
:)
let $s := io:read('/data/sample.json') return json:parse($s)

Statement Syntax Extension (Beta)

IMPORTANT NOTE:

** This extension is only a syntax extension to simplify programmer's life when writing XQuery. It is neither a subset of nor an equivalent to the XQuery Scripting Extension 1.0. **

Almost any non-trivial data processing task consists of a series of consecutive steps. Unfortunately, the functional style of XQuery makes it a bit cumbersome to write code in a convenient, script-like fashion. Instead, the standard way to express a linear multi-step process (with access to intermediate results) is to write a FLWOR expression with a series of let-clauses.

As a shorthand, Brackit allows you to write such processes as a sequence of ';'-terminated statements, which most developers are familiar with:

(: declare external input :)
declare variable $file external;

(: read input data :)
$events := fn:collection('events');

(: join the two inputs :)
$incidents := for $e in $events
              where $e/@severity = 'critical'
              let $ip := x/system/@ip
              group by $ip
              order by count($e)
              return {$ip} count($e) ;

(: store report to file :)
$report := {$incidents};
$output := bit:serialize($report);
io:write($file, $output);

(: return a short message as result :)
Generated '{count($incidents)}' incident entries to report '{$file}'

Internally, the compiler treats this as a FLWOR expression with let-bindings. The result, i.e., the return expression, is the result of the last statement. Accordingly, the previous example is equivalent to:

(: declare external input :)
declare variable $file external;

(: read input data :)
let $events := fn:collection('events')

(: join the two inputs :)
let $incidents := for $e in $events
                  where $e/@severity = 'critical'
                  let $ip := x/system/@ip
                  group by $ip
                  order by count($e)
                  return {$ip} count($e)

(: store report to file :)
let $report := {$incidents}
let $output := bit:serialize($report)
let $written := io:write($file, $output)

(: return a short message as result :)
return Generated '{count($incidents)}' incident entries to report '{$file}'

The statement syntax is especially helpful to improve readability of user-defined functions.

The following example shows an - admittedly rather slow - implementation of the quicksort algorithm:

declare function local:qsort($values) {
    $len := count($values);
    if ($len <= 1) then (
        $values
    ) else (
        $pivot := $values[$len idiv 2];
        $less := $values[. < $pivot];
        $greater := $values[. > $pivot];
        (local:qsort($less), $pivot, local:qsort($greater))
    )
};

local:qsort((7,8,4,5,6,9,3,2,0,1))
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].