All Projects → creativcoder → avrow

creativcoder / avrow

Licence: Apache-2.0, MIT licenses found Licenses found Apache-2.0 LICENSE-APACHE MIT LICENSE-MIT
Avrow is a pure Rust implementation of the avro specification https://avro.apache.org/docs/current/spec.html with Serde support.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to avrow

avro-serde-php
Avro Serialisation/Deserialisation (SerDe) library for PHP 7.3+ & 8.0 with a Symfony Serializer integration
Stars: ✭ 43 (+59.26%)
Mutual labels:  serialization, avro, deserialization, avro-schema
Schematics
Project documentation: https://schematics.readthedocs.io/en/latest/
Stars: ✭ 2,461 (+9014.81%)
Mutual labels:  serialization, schema, deserialization
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+6000%)
Mutual labels:  schema, avro, avro-schema
avro ex
An Avro Library that emphasizes testability and ease of use.
Stars: ✭ 47 (+74.07%)
Mutual labels:  schema, avro, avro-schema
Noproto
Flexible, Fast & Compact Serialization with RPC
Stars: ✭ 138 (+411.11%)
Mutual labels:  serialization, avro, deserialization
AvroConvert
Apache Avro serializer for .NET
Stars: ✭ 44 (+62.96%)
Mutual labels:  serialization, avro, deserialization
Marshmallow
A lightweight library for converting complex objects to and from simple Python datatypes.
Stars: ✭ 5,857 (+21592.59%)
Mutual labels:  serialization, schema, deserialization
Beeschema
Binary Schema Library for C#
Stars: ✭ 46 (+70.37%)
Mutual labels:  serialization, schema, deserialization
Awesome Python Models
A curated list of awesome Python libraries, which implement models, schemas, serializers/deserializers, ODM's/ORM's, Active Records or similar patterns.
Stars: ✭ 124 (+359.26%)
Mutual labels:  serialization, schema, deserialization
serde
🚝 (unmaintained) A framework for defining, serializing, deserializing, and validating data structures
Stars: ✭ 49 (+81.48%)
Mutual labels:  serialization, schema, deserialization
Jsonapi Rails
Rails gem for fast jsonapi-compliant APIs.
Stars: ✭ 242 (+796.3%)
Mutual labels:  serialization, deserialization
Jsonapi Rb
Efficiently produce and consume JSON API documents.
Stars: ✭ 219 (+711.11%)
Mutual labels:  serialization, deserialization
schema-registry-php-client
A PHP 7.3+ API client for the Confluent Schema Registry REST API based on Guzzle 6 - http://docs.confluent.io/current/schema-registry/docs/index.html
Stars: ✭ 40 (+48.15%)
Mutual labels:  avro, avro-schema
NBT
A java implementation of the NBT protocol, including a way to implement custom tags.
Stars: ✭ 128 (+374.07%)
Mutual labels:  serialization, deserialization
pony-capnp
Cap’n Proto plugin for generating serializable Pony classes. 🐴 - 🎩'n 🅿️
Stars: ✭ 19 (-29.63%)
Mutual labels:  serialization, schema
Mashumaro
Fast and well tested serialization framework on top of dataclasses
Stars: ✭ 208 (+670.37%)
Mutual labels:  serialization, deserialization
Dart Json Mapper
Serialize / Deserialize Dart Objects to / from JSON
Stars: ✭ 206 (+662.96%)
Mutual labels:  serialization, deserialization
sbt-avro
Plugin SBT to Generate Scala classes from Apache Avro schemas hosted on a remote Confluent Schema Registry.
Stars: ✭ 15 (-44.44%)
Mutual labels:  schema, avro
sqlathanor
Serialization / De-serialization support for the SQLAlchemy Declarative ORM
Stars: ✭ 105 (+288.89%)
Mutual labels:  serialization, deserialization
marshmallow-validators
Use 3rd-party validators (e.g. from WTForms and colander) with marshmallow
Stars: ✭ 24 (-11.11%)
Mutual labels:  serialization, deserialization
avrow

Actions Status crates docs.rs license license Contributor Covenant



Avrow is a pure Rust implementation of the Avro specification with Serde support.



Table of Contents

Overview

Avrow is a pure Rust implementation of the Avro specification: a row based data serialization system. The Avro data serialization format finds its use quite a lot in big data streaming systems such as Kafka and Spark. Within avro's context, an avro encoded file or byte stream is called a "data file". To write data in avro encoded format, one needs a schema which is provided in json format. Here's an example of an avro schema represented in json:

{
  "type": "record",
  "name": "LongList",
  "aliases": ["LinkedLongs"],
  "fields" : [
    {"name": "value", "type": "long"},
    {"name": "next", "type": ["null", "LongList"]}
  ]
}

The above schema is of type record with fields and represents a linked list of 64-bit integers. In most implementations, this schema is then fed to a Writer instance along with a buffer to write encoded data to. One can then call one of the write methods on the writer to write data. One distinguishing aspect of avro is that the schema for the encoded data is written on the header of the data file. This means that for reading data you don't need to provide a schema to a Reader instance. The spec also allows providing a reader schema to filter data when reading.

The Avro specification provides two kinds of encoding:

  • Binary encoding - Efficent and takes less space on disk.
  • JSON encoding - When you want a readable version of avro encoded data. Also used for debugging purposes.

This crate implements only the binary encoding as that's the format practically used for performance and storage reasons.

Features

  • Full support for recursive self-referential schemas with Serde serialization/deserialization.
  • All compressions codecs (deflate, bzip2, snappy, xz, zstd) supported as per spec.
  • Simple and intuitive API - As the underlying structures in use are Read and Write types, avrow tries to mimic the same APIs as Rust's standard library APIs for minimal learning overhead. Writing avro values is simply calling write or serialize (with serde) and reading avro values is simply using iterators.
  • Less bloat / Lightweight - Compile times in Rust are costly. Avrow tries to use minimal third-party crates. Compression codec and schema fingerprinting support are feature gated by default. To use them, compile with respective feature flags (e.g. --features zstd).
  • Schema evolution - One can configure the avrow Reader with a reader schema and only read data relevant to their use case.
  • Schema's in avrow supports querying their canonical form and have fingerprinting (rabin64, sha256, md5) support.

Note: This is not a complete spec implemention and remaining features being implemented are listed under Todo section.

Getting started:

Add avrow as a dependency to Cargo.toml:

[dependencies]
avrow = "0.2.0"

Examples:

Writing avro data

use anyhow::Error;
use avrow::{Schema, Writer};
use std::str::FromStr;

fn main() -> Result<(), Error> {
    // Create schema from json
    let schema = Schema::from_str(r##"{"type":"string"}"##)?;
    // or from a path
    let schema2 = Schema::from_path("./string_schema.avsc")?;
    // Create an output stream
    let stream = Vec::new();
    // Create a writer
    let writer = Writer::new(&schema, stream.as_slice())?;
    // Write your data!
    let res = writer.write("Hey")?;
    // or using serialize method for serde derived types.
    let res = writer.serialize("there!")?;

    Ok(())
}

For simple and native Rust types, avrow provides a From impl to convert to Avro value types. For compound or user defined types (structs or enums), one can use the serialize method which relies on serde. Alternatively, one can construct avrow::Value instances which is a more verbose way to write avro values and should be a last resort.

Reading avro data

fn main() -> Result<(), Error> {
    let schema = Schema::from_str(r##""null""##);
    let data = vec![
        79, 98, 106, 1, 4, 22, 97, 118, 114, 111, 46, 115, 99, 104, 101,
        109, 97, 32, 123, 34, 116, 121, 112, 101, 34, 58, 34, 98, 121, 116,
        101, 115, 34, 125, 20, 97, 118, 114, 111, 46, 99, 111, 100, 101,
        99, 14, 100, 101, 102, 108, 97, 116, 101, 0, 145, 85, 112, 15, 87,
        201, 208, 26, 183, 148, 48, 236, 212, 250, 38, 208, 2, 18, 227, 97,
        96, 100, 98, 102, 97, 5, 0, 145, 85, 112, 15, 87, 201, 208, 26,
        183, 148, 48, 236, 212, 250, 38, 208,
    ];
    // Create a Reader
    let reader = Reader::with_schema(v.as_slice(), &schema)?;
    for i in reader {
        dbg!(&i);
    }

    Ok(())
}

Self-referential recursive schema example

use anyhow::Error;
use avrow::{from_value, Codec, Reader, Schema, Writer};
use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize)]
struct LongList {
    value: i64,
    next: Option<Box<LongList>>,
}

fn main() -> Result<(), Error> {
    let schema = r##"
        {
            "type": "record",
            "name": "LongList",
            "aliases": ["LinkedLongs"],
            "fields" : [
              {"name": "value", "type": "long"},
              {"name": "next", "type": ["null", "LongList"]}
            ]
          }
        "##;

    let schema = Schema::from_str(schema)?;
    let mut writer = Writer::with_codec(&schema, vec![], Codec::Null)?;

    let value = LongList {
        value: 1i64,
        next: Some(Box::new(LongList {
            value: 2i64,
            next: Some(Box::new(LongList {
                value: 3i64,
                next: Some(Box::new(LongList {
                    value: 4i64,
                    next: Some(Box::new(LongList {
                        value: 5i64,
                        next: None,
                    })),
                })),
            })),
        })),
    };

    writer.serialize(value)?;

    // Calling into_inner performs flush internally. Alternatively, one can call flush explicitly.
    let buf = writer.into_inner()?;

    // read
    let reader = Reader::with_schema(buf.as_slice(), &schema)?;
    for i in reader {
        let a: LongList = from_value(&i)?;
        dbg!(a);
    }

    Ok(())
}

An example of writing a json object with a confirming schema. The json object maps to the avrow::Record type.

use anyhow::Error;
use avrow::{from_value, Reader, Record, Schema, Writer};
use serde::{Deserialize, Serialize};
use std::str::FromStr;

#[derive(Debug, Serialize, Deserialize)]
struct Mentees {
    id: i32,
    username: String,
}

#[derive(Debug, Serialize, Deserialize)]
struct RustMentors {
    name: String,
    github_handle: String,
    active: bool,
    mentees: Mentees,
}

fn main() -> Result<(), Error> {
    let schema = Schema::from_str(
        r##"
            {
            "name": "rust_mentors",
            "type": "record",
            "fields": [
                {
                "name": "name",
                "type": "string"
                },
                {
                "name": "github_handle",
                "type": "string"
                },
                {
                "name": "active",
                "type": "boolean"
                },
                {
                    "name":"mentees",
                    "type": {
                        "name":"mentees",
                        "type": "record",
                        "fields": [
                            {"name":"id", "type": "int"},
                            {"name":"username", "type": "string"}
                        ]
                    }
                }
            ]
            }
"##,
    )?;

    let json_data = serde_json::from_str(
        r##"
    { "name": "bob",
        "github_handle":"ghbob",
        "active": true,
        "mentees":{"id":1, "username":"alice"} }"##,
    )?;
    let rec = Record::from_json(json_data, &schema)?;
    let mut writer = crate::Writer::new(&schema, vec![])?;
    writer.write(rec)?;

    let avro_data = writer.into_inner()?;
    let reader = crate::Reader::new(avro_data.as_slice())?;
    for value in reader {
        let mentors: RustMentors = from_value(&value)?;
        dbg!(mentors);
    }
    Ok(())
}

Writer customization

If you want to have more control over the parameters of Writer, consider using WriterBuilder as shown below:

use anyhow::Error;
use avrow::{Codec, Reader, Schema, WriterBuilder};

fn main() -> Result<(), Error> {
    let schema = Schema::from_str(r##""null""##)?;
    let v = vec![];
    let mut writer = WriterBuilder::new()
        .set_codec(Codec::Null)
        .set_schema(&schema)
        .set_datafile(v)
        // set any custom metadata in the header
        .set_metadata("hello", "world")
        // set after how many bytes, the writer should flush
        .set_flush_interval(128_000)
        .build()
        .unwrap();
    writer.serialize(())?;
    let v = writer.into_inner()?;

    let reader = Reader::with_schema(v.as_slice(), schema)?;
    for i in reader {
        dbg!(i?);
    }

    Ok(())
}

Refer to examples for more code examples.

Supported Codecs

In order to facilitate efficient encoding, avro spec also defines compression codecs to use when serializing data.

Avrow supports all compression codecs as per spec:

These are feature-gated behind their respective flags. Check Cargo.toml features section for more details.

Using avrow-cli tool:

Quite often you will need a quick way to examine avro file for debugging purposes. For that, this repository also comes with the avrow-cli tool (av) by which one can examine avro datafiles from the command line.

See avrow-cli repository for more details.

Installing avrow-cli:

cd avrow-cli
cargo install avrow-cli

Using avrow-cli (binary name is av):

av read -d data.avro

The read subcommand will print all rows in data.avro to standard out in debug format.

Rust native types to Avro value mapping (via Serde)

Primitives

Rust native types (primitive types) Avro (Value)
(), Option::None null
bool boolean
i8, u8, i16, u16, i32, u32 int
i64, u64 long
f32 float
f64 double
&[u8], Vec<u8> bytes
&str, String string

Complex

Rust native types (complex types) Avro
struct Foo {..} record
enum Foo {A,B} (variants cannot have data in them) enum
Vec<T> where T: Into<Value> array
HashMap<String, T> where T: Into<Value> map
T where T: Into<Value> union
Vec<u8> : Length equal to size defined in schema fixed

Todo

  • Logical types support.
  • Sorted reads.
  • Single object encoding.
  • Schema Registry as a trait - would allow avrow to read from and write to remote schema registries.
  • AsyncRead + AsyncWrite Reader and Writers.
  • Avro protocol message and RPC support.
  • Benchmarks and optimizations.

Changelog

Please see the CHANGELOG for a release history.

Contributions

All kinds of contributions are welcome.

Head over to CONTRIBUTING.md for contribution guidelines.

Support

Buy Me A Coffee

ko-fi

MSRV

Avrow works on stable Rust, starting 1.37+. It does not use any nightly features.

License

Dual licensed under either of Apache License, Version 2.0 or MIT license at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this crate by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].