All Projects → ExpediaGroup → jasvorno

ExpediaGroup / jasvorno

Licence: Apache-2.0 license
A library for strong, schema based conversion between 'natural' JSON documents and Avro

Programming Languages

java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to jasvorno

avrow
Avrow is a pure Rust implementation of the avro specification https://avro.apache.org/docs/current/spec.html with Serde support.
Stars: ✭ 27 (+50%)
Mutual labels:  avro
darwin
Avro Schema Evolution made easy
Stars: ✭ 26 (+44.44%)
Mutual labels:  avro
avro-parser-haskell
Language definition and parser for AVRO (.avdl) files.
Stars: ✭ 14 (-22.22%)
Mutual labels:  avro
tamer
Standalone alternatives to Kafka Connect Connectors
Stars: ✭ 42 (+133.33%)
Mutual labels:  avro
php-kafka-lib
PHP Kafka producer / consumer library with PHP Avro support, based on php-rdkafka
Stars: ✭ 38 (+111.11%)
Mutual labels:  avro
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+4583.33%)
Mutual labels:  avro
parquet-extra
A collection of Apache Parquet add-on modules
Stars: ✭ 30 (+66.67%)
Mutual labels:  avro
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+250%)
Mutual labels:  avro
flow2schema
Generate json-schemas for flowtype definitions
Stars: ✭ 20 (+11.11%)
Mutual labels:  avro
avrocount
Count records in Avro files efficiently
Stars: ✭ 16 (-11.11%)
Mutual labels:  avro
kafka-avro-confluent
Kafka De/Serializer using avro and Confluent's Schema Registry
Stars: ✭ 18 (+0%)
Mutual labels:  avro
kafka-serialization
Lego bricks to build Apache Kafka serializers and deserializers
Stars: ✭ 122 (+577.78%)
Mutual labels:  avro
schema-registry
📙 json & avro http schema registry backed by Kafka
Stars: ✭ 23 (+27.78%)
Mutual labels:  avro
singlestore-logistics-sim
Scalable package delivery logistics simulator built using SingleStore and Vectorized Redpanda
Stars: ✭ 31 (+72.22%)
Mutual labels:  avro
avro-to-typescript
Compile Apache Avro schema files to TypeScript classes
Stars: ✭ 31 (+72.22%)
Mutual labels:  avro
srclient
Golang Client for Schema Registry
Stars: ✭ 188 (+944.44%)
Mutual labels:  avro
avro turf
A library that makes it easier to use the Avro serialization format from Ruby.
Stars: ✭ 130 (+622.22%)
Mutual labels:  avro
registryless-avro-converter
An avro converter for Kafka Connect without a Schema Registry
Stars: ✭ 45 (+150%)
Mutual labels:  avro
xml-avro
Convert XSD -> AVSC and XML -> AVRO
Stars: ✭ 32 (+77.78%)
Mutual labels:  avro
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (+33.33%)
Mutual labels:  avro

JASVORNO

Start using

You can obtain Jasvorno from Maven Central :

Maven Central GitHub license

Overview

A library for serializing/deserializing arbitrary JSON to the Avro format using a Schema. Although Avro already has some inbuilt JSON support, it has a number of limitations that cannot be addressed in a backwards compatible way. Jasvorno offers supplementary classes that allow any JSON document structure to be used with Avro with robust checks for conformity to a schema. See the 'Avro limitations' section for more information.

Usage

JSON to Avro

JsonNode datum = new ObjectMapper().readTree(jsonString);
Object avro = JasvornoConverter.convertToAvro(GenericData.get(), datum, schema);

Avro to JSON

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Encoder encoder = new JasvornoEncoder(schema, outputStream);
new GenericDatumWriter<Record>(schema).write(avro, encoder);
encoder.flush();
String json = new String(outputStream.toByteArray())

Comparison

Given the following schema, compare compliant JSON document structures used by the standard Avro JSON encoding and Jasvorno's encoding. Note that the Jasvorno documents are not polluted with union type indexes.

{"type": "record", "name": "myrecord",
 "fields" : [
   {"name": "id",  "type": "long"},
   {"name": "val", "type": ["string", "long", "null"]}
 ]
}

Avro encoding examples

{"id": 1, "val": {"string": "hello"}}
{"id": 1, "val": {"long": 2}}
{"id": 1, "val": null}

Jasvorno encoding examples

{"id": 1, "val": "hello"}
{"id": 1, "val": 2}
{"id": 1, "val": null}
{"id": 1}

Avro limitations

Although Avro already has inbuilt JSON coders: JsonDecoder, JsonEncoder. These require Avro specific document structures, primarily for handling union types. Avro also offers support for free form JSON document structures with the org.apache.avro.data.Json type, but this only checks for general JSON compliance (i.e. that it is a valid JSON document) rather than the structure of the node tree that the document declares. These limitations prevent the direct use of arbitrary JSON documents structures in a strict, schema enforced manner.

This is a problem because it either requires users to transform their JSON into Avro compatible forms, or for Avro specific implementation details to leak out into user's JSON models. Jasvorno solves this.

Implementation details

Jasvorno does not currently follow Avro's encoder/decoder symmetry. Instead we use the JasvornoConverter in place of a Decoder implementation. This is because in the absence of the union indexes present in Avro's own JSON document structures, we must preemptively explore the document tree to determine the best Schema match. Therefore it is easier to read the entire JSON document up front and then check for Schema compliance against this, converting to Avro along the way. A potential problem with is that this approach might be expensive for schemas containing many, deep union types. Additionally there are type constructs that are impossible to disambiguate by referencing JSON node values alone; concretely any union containing bytes and string. In this event we favour the string type but we also provide the com.hotels.jasvorno.schema.SchemaValidator should you wish to defensively detect these ambiguous constructs in your schemas.

Schema compatibility

Although Jasvorno does not directly deal with schema evolution and compatibility, these concepts are common in systems that use Avro. This is of particular concern when encountering fields that are present in a JSON document, but not declared in the schema. Under some schema compatibility modes such fields are erroneous, yet with others they are expected. To model these different situations appropriately Jasvorno allows you to specify a UndeclaredFieldBehaviour when constructing a JasvornoConverter.

Prior art

Jasvorno is based on the JsonUtil class from the Kite project.

Author

Created by Elliot West, with thanks to Adrian Woodhead, Dave Maughan, and James Grant.

Legal

This project is available under the Apache 2.0 License.

Copyright 2016-2019 Expedia, Inc.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].