All Projects → tubular → confluent-spark-avro

tubular / confluent-spark-avro

Licence: Apache-2.0 License
Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to confluent-spark-avro

schema-registry-php-client
A PHP 7.3+ API client for the Confluent Schema Registry REST API based on Guzzle 6 - http://docs.confluent.io/current/schema-registry/docs/index.html
Stars: ✭ 40 (+122.22%)
Mutual labels:  avro, schema-registry, confluent
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+9050%)
Mutual labels:  avro, schema-registry, confluent
avrora
A convenient Elixir library to work with Avro schemas and Confluent® Schema Registry
Stars: ✭ 59 (+227.78%)
Mutual labels:  avro, schema-registry, confluent
kafka-scala-examples
Examples of Avro, Kafka, Schema Registry, Kafka Streams, Interactive Queries, KSQL, Kafka Connect in Scala
Stars: ✭ 53 (+194.44%)
Mutual labels:  avro, schema-registry
Abris
Avro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (+622.22%)
Mutual labels:  spark, avro
avro-serde-php
Avro Serialisation/Deserialisation (SerDe) library for PHP 7.3+ & 8.0 with a Symfony Serializer integration
Stars: ✭ 43 (+138.89%)
Mutual labels:  avro, confluent
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+3944.44%)
Mutual labels:  spark, avro
tamer
Standalone alternatives to Kafka Connect Connectors
Stars: ✭ 42 (+133.33%)
Mutual labels:  avro, schema-registry
puppet-confluent
Puppet Module for installing and configuring the Confluent Platform
Stars: ✭ 14 (-22.22%)
Mutual labels:  schema-registry, confluent
kafka-avro-confluent
Kafka De/Serializer using avro and Confluent's Schema Registry
Stars: ✭ 18 (+0%)
Mutual labels:  avro, confluent
schema-registry-gitops
Manage Confluent Schema Registry subjects through Infrastructure as code
Stars: ✭ 36 (+100%)
Mutual labels:  schema-registry, confluent
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+438.89%)
Mutual labels:  spark, avro
schema-registry
📙 json & avro http schema registry backed by Kafka
Stars: ✭ 23 (+27.78%)
Mutual labels:  avro, schema-registry
spring-cloud-stream-event-sourcing-testcontainers
Goal: create a Spring Boot application that handles users using Event Sourcing. So, whenever a user is created, updated, or deleted, an event informing this change is sent to Kafka. Also, we will implement another application that listens to those events and saves them in Cassandra. Finally, we will use Testcontainers for integration testing.
Stars: ✭ 16 (-11.11%)
Mutual labels:  avro, schema-registry
sbt-avro
Plugin SBT to Generate Scala classes from Apache Avro schemas hosted on a remote Confluent Schema Registry.
Stars: ✭ 15 (-16.67%)
Mutual labels:  avro, schema-registry
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+222.22%)
Mutual labels:  spark, avro
srclient
Golang Client for Schema Registry
Stars: ✭ 188 (+944.44%)
Mutual labels:  avro, confluent
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+2083.33%)
Mutual labels:  spark, avro
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+2155.56%)
Mutual labels:  spark, avro
avro turf
A library that makes it easier to use the Avro serialization format from Ruby.
Stars: ✭ 130 (+622.22%)
Mutual labels:  avro, schema-registry

Confluent Spark Avro

Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry. More details about Schema Registry on the official website.

Usages

We expect that you use it together with native Spark Kafka Reader.

val df = spark
    .read
    .format("kafka")
    .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
    .option("subscribe", "topic1")
    .load()

val utils = ConfluentSparkAvroUtils("http://schema-registry.my-company.com:8081")
val keyDeserializer = utils.deserializerForSubject("topic1-key")
val valueDeserialzer = utils.deserializerForSubject("topic1-value")

df.select(
    keyDeserializer(col("key").alias("key")),
    valueDeserializer(col("value").alias("value"))
).show(10)

Data decryption

With this same sample code above you can read data encrypted with AES256 with KMS, except it expect encrypted data to use specific format: [magic byte (value 2 or 3) | encrypted aes256 key | encrypted avro data]

Build

The tool is designed to be used with Spark >= 2.0.2.

sbt assembly
ll target/scala-2.11/confluent-spark-avro-assembly-1.0.jar

Testing

We haven't added unit tests, but you can test UDFs with the next command:

sbt "project confluent-spark-avro" "run kafka.host:9092 http://schema-registry.host:8081 kafka.topic"

TODO

[ ] Spark UDFs to serialize messages.

License

The project is licensed under the Apache 2 license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].