All Projects → GeethanadhP → xml-avro

GeethanadhP / xml-avro

Licence: Apache-2.0 license
Convert XSD -> AVSC and XML -> AVRO

Programming Languages

scala
5932 projects
python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to xml-avro

avro ex
An Avro Library that emphasizes testability and ease of use.
Stars: ✭ 47 (+46.88%)
Mutual labels:  avro, avro-schema
avro-typescript
TypeScript Code Generator for Apache Avro Schema Types
Stars: ✭ 19 (-40.62%)
Mutual labels:  avro, avro-schema
registryless-avro-converter
An avro converter for Kafka Connect without a Schema Registry
Stars: ✭ 45 (+40.63%)
Mutual labels:  avro, avro-schema
avro-schema-generator
Library for generating avro schema files (.avsc) based on DB tables structure
Stars: ✭ 38 (+18.75%)
Mutual labels:  avro, avro-schema
avro-serde-php
Avro Serialisation/Deserialisation (SerDe) library for PHP 7.3+ & 8.0 with a Symfony Serializer integration
Stars: ✭ 43 (+34.38%)
Mutual labels:  avro, avro-schema
darwin
Avro Schema Evolution made easy
Stars: ✭ 26 (-18.75%)
Mutual labels:  avro, avro-schema
avrora
A convenient Elixir library to work with Avro schemas and Confluent® Schema Registry
Stars: ✭ 59 (+84.38%)
Mutual labels:  avro, avro-schema
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+5046.88%)
Mutual labels:  avro, avro-schema
schema-registry-php-client
A PHP 7.3+ API client for the Confluent Schema Registry REST API based on Guzzle 6 - http://docs.confluent.io/current/schema-registry/docs/index.html
Stars: ✭ 40 (+25%)
Mutual labels:  avro, avro-schema
avrow
Avrow is a pure Rust implementation of the avro specification https://avro.apache.org/docs/current/spec.html with Serde support.
Stars: ✭ 27 (-15.62%)
Mutual labels:  avro, avro-schema
avro-to-typescript
Compile Apache Avro schema files to TypeScript classes
Stars: ✭ 31 (-3.12%)
Mutual labels:  converter, avro
osx-callhistory-decryptor
macOS (incl big sur) call history decryptor/converter to CSV format.
Stars: ✭ 19 (-40.62%)
Mutual labels:  converter
avrocount
Count records in Avro files efficiently
Stars: ✭ 16 (-50%)
Mutual labels:  avro
unicode-formatter
Convert portions of text to fancy text using unicode fonts for use on Twitter and other sites that don't support rich text
Stars: ✭ 31 (-3.12%)
Mutual labels:  converter
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-25%)
Mutual labels:  avro
to-json-schema
Converts JS objects to JSON Schema
Stars: ✭ 83 (+159.38%)
Mutual labels:  converter
convertron3000
Convertron3000 commodore 64 graphics converter
Stars: ✭ 24 (-25%)
Mutual labels:  converter
express-to-koa
Use express middlewares in Koa2, the one that really works.
Stars: ✭ 18 (-43.75%)
Mutual labels:  converter
can2mqtt
Bidirectional CAN-Bus to MQTT-Bridge
Stars: ✭ 39 (+21.88%)
Mutual labels:  converter
schema-registry
📙 json & avro http schema registry backed by Kafka
Stars: ✭ 23 (-28.12%)
Mutual labels:  avro

XSD => Avsc & XML => Avro

No longer maintained actively


This project was initially a fork of xml-avro-elodina. Later evolved to separate project with lotsss of bug fixes, memory & performance improvements, options, re-coded in Scala


  • Converts any XSD to a proper usable Avro schema (Avsc)
  • Converts any XML to avro using the provided schema. What can it do? See the list below.
    • Handle any large size XML (even in GigaBytes), as it streams the xml
    • Read xml from stdin and output to stdout
    • Validate the XML with XSD
    • Split the data at any specified element (can have any no.of splits)
    • Handle multiple documents in single file (useful when streaming continuous data)
    • Write out failed documents without killing the whole process
    • Completely configurable

Running Project

  1. git clone to clone the repository to local
  2. gradle build to generate the jar file
  3. java -jar ./build/libs/xml-avro-all-<VERSION>.jar -c <CONFIG_FILE> to run the code (options as below)

Check ./example/config.yml for sample configuration file

Config File

Create yml config file as per the below format

debug: false                    # Enable printing of debug messages
baseDir: "files"                # Base directory where most files are relative to
namespaces: true                # Enable/Disable usage of namespaces in schema/conversion - Optional (default: true)

XML:                            # Convert XML
  xmlInput: stdin               # Source of XML [ stdin | "somefile.xml" ]
  avscFile: "books.avsc"        # Avsc file to use for conversion - (If not using splits)
  avroOutput: stdout             # Traget location [ stdout | "somefile.avro" ] - Optional (Uses the xmlInput to assume the output) (If not using splits)
  documentRootTag: "books"      # Root tag of the XML (without namespace)
  validationXSD: "books.xsd"    # Enable validation with specified xsd
  ignoreMissing: true           # Incase you use a smaller version of avsc (to take only required tags),
                                # tags in the xml may not exist in the trimmed avsc.. 
                                # This option enables to ignore the missing tags instead of failing
  suppressWarnings: true        # In case of a lot of missing fields don't print them as warnings
  split:                        # Split the avro records based on specifed list
    -
      by: "bookName"            # Split tag name
      avscFile: "name.avsc"     # Avsc File for the split part
      avroFile: "name.avro"     # Avro file name to save to
    -
      by: "bookPublisher"
      avscFile: "publisher.avsc"
      avroFile: "publisher.avro"
  qaDir: "some path"            # Writes some count details 
  caseSensitive: true           # Tags matching xml & avsc are case sensitive - Optional (default: true) 
  ignoreCaseFor:                # Ignore case senitivity for the below list
    - "SomeTag"
  docErrorLevel: "WARNING"      # Use this level to log in case of error in a document 
  errorFile: "failures.xml"     # Writes the failed documents to this file
  useAvroInput: true            # Read xml data from inside an avro file
  inputAvroMappings:            # Set of mappings from source field name to target, use "xmlInput" as target to mark it as the xml data, use "unique_id" as target to mark the value as unique key
      "headers" : "avroHeader"
      "body" : "xmlInput"
      "headers.unique_id" : "unique_id"
XSD:
  xsdFile: "somefile.xsd"       # Source of XSD
  avscFile: "books.avsc"        # Avsc file to save as - Optional (Uses the xsdFile to assume the output)
  stringTimestamp: true         # Represent timestamp as string instead of long. Defaults to false. Setting this value to "true" overrides XSD.logicalTypes.xsDateTime to "string".
  attributePrefix: "_"          # Optional, will assign the specified prefix for attributes in the avsc schema

  ignoreHiveKeywords: true      # Do not suffix field name with `_value` when matching Hive keywords. Default value is false.
  rootElementQName: "{ns}name"  # Only generate schema for root element matching this QName
  logicalTypes:
    xsDateTime: "long"          # Configures the Avro mapping of xs:dateTime XML types. [ long | string | timestamp-micros | timestamp-millis ]
                                #   "long" (the default) maps xs:dateTime types to regular Avro "long". Same as the default mapping for xs:dateTime in older xml-avro versions. 
                                #   "string" maps xs:dateTime types to Avro "string"
                                #   "timestamp-micros" maps xs:dateTime types to Avro "timestamp-micros" logical type annotating a "long". 
                                #   "timestamp-millis" maps xs:dateTime types to Avro "timestamp-millis" logical type annotating a "long".
                                #   Note: Setting the stringTimestamp will override this config value to "string" for backward compatibility reasons. 
    xsDate: "string"            # Configures the Avro mapping of xs:date XML types. [ string | date ].
                                #   "string" (the default) maps xs:date types to Avro "string"
                                #   "date" maps xs:date types to Avro "date" logical type annotating an "int".  
    xsTime: "string"            # Configures the Avro mapping of xs:time XML types. [ string | time-micros | time-millis ]
                                #   "string" (the default) maps xs:time types to Avro "string".
                                #   "time-micros" maps xs:time types to Avro "time-micros" logical type annotating a "long". 
                                #   "time-millis" maps xs:time types to Avro "time-millis" logical type annotating a "long".
                                #
    xsDecimal:                  # Configurations controlling the mapping of xs:decimal XML types
                                #
     avroType: "decimal"        #   Configures the Avro type mapping of xs:Decimal derived xml types.
                                #   Possible values are: [ double | string | decimal ]
                                #   - "double" (the default) maps xs:decimal types to Avro "double".
                                #   - "string" maps xs:decimal types to Avro "string".
                                #   - "decimal" maps xs:decimal types to Avro "decimal" logical types annotating "bytes".
                                #   When using the "decimal" option, the mandatory precision and scale properties of the Avro
                                #   decimal type are picked up from any xs:totalDigits and xs:fractionDigits restriction facets, if any.
                                #   In the absense of these restriction facets, the mapping will instead fall back to using a backup strategy defined
                                #   by a combination of the fallbackType, fallbackPrecision and fallbackScale configurations. 
                                #     
     fallbackType: "string"     # Configures a fallback type mapping for xs:decimal types with unrestricted precision and scale. (i.e. types without
                                #   declared xs:totalDigits and xs:fractionDigits restriction facets). This configuration is ignored, unless the
                                #   avroType setting is configured to "decimal".
                                #   The possible values are: [ string | double | decimal ]
                                #   All options are identical to those described under the avroType configuration, the only exception being 
                                #   "decimal" that uses the fallbackPrecision and fallbackScale configurations as a defaults for missing
                                #   precision and scale information.
                                #   
     fallbackPrecision: 5       #  Configures the fallback precision for decimal types without declared xs:totalDigits
                                #    and restriction. Required when fallbackType is set to "decimal".
                                #
     fallbackScale: 3           #  Configures the fallback scale for decimal types without declared xs:fractionDigits restriction.
                                #   Required when fallbackType is set to "decimal".

Docker

Build docker image

docker build -t xml-avro:v1.8.2 --build-arg VERSION=1.8.2 .

Run with docker

docker run --rm -v $(pwd)/example:/app/example -v $(pwd)/example/config.yml:/app/config.yml xml-avro:latest
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].