Relational Database Loader
Introduction
This project contains applications required to load Snowplow data into relational databases.
RDB Shredder
RDB Shredder is a Spark job which:
- Reads Snowplow enriched events from S3
- Extracts any unstructured event JSONs and context JSONs found
- Validates that these JSONs conform to schema
- Adds metadata to these JSONs to track their origins
- Writes these JSONs out to nested folders dependent on their schema
It is designed to be run downstream of the Enrich job.
RDB Loader
RDB Loader (previously known as StorageLoader) is a Scala application that runs in background, discovering data, produced by RDB Shredder from SQS queue and loading it into one of possible storage targets.
RDB Stream Shredder (experimental)
An application similar to RDB Shredder, but working without Apache Spark or EMR and reading directly from Kinesis Stream. Only Shredder or Stream Shredder should be used.
Find out more
Technical Docs | Setup Guide | Roadmap & Contributing |
---|---|---|
Technical Docs | Setup Guide | Roadmap |
Copyright and License
Snowplow Relational Database Loader is copyright 2012-2021 Snowplow Analytics Ltd.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.