All Projects → shazam → scala-datapipeline-dsl

shazam / scala-datapipeline-dsl

Licence: Apache-2.0 license
Domain-specific language to help build and maintain AWS Data Pipelines

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to scala-datapipeline-dsl

terraform-aws-efs-backup
Terraform module designed to easily backup EFS filesystems to S3 using DataPipeline
Stars: ✭ 40 (+60%)
Mutual labels:  datapipeline
zdh web
大数据采集,抽取平台
Stars: ✭ 292 (+1068%)
Mutual labels:  datapipeline

AWS DataPipeline DSL for Scala

A Scala domain-specific language and toolkit to help you build and maintain AWS DataPipeline definitions.

This tool aims to ease the burden of maintaining a large suite of AWS DataPipelines. At Shazam, we use this tool to define our data pipelines in Scala code and avoid the boilerplate and maintenance headache of managing 10s or 100s of JSON pipeline configuration files.

Benefits:-

  • Write and maintain Scala code instead of JSON configuration
  • Use the DSL's >> syntax to clearly express dependencies between your pipeline's activities
  • Share code/configuration between your pipeline definitions
  • Never write dependsOn or precondition again, this library manages all ids and object references for you
  • Add your own wrapper around this library to predefine most your most commonly-used data pipeline objects

Tutorial

Build the compiler using sbt:

$ sbt assembly

Create a "Hello World" AWS Data Pipeline definition Scala file:

object HelloWorldPipeline {

  import datapipeline.dsl._

  val pipeline =
    AwsDataPipeline(name = "HelloWorldPipeline")
      .withSchedule(
        frequency = Daily,
        startDateTimeIso = "2018-01-01T00:00:00"
      )
      .withActivities(
        ShellCommandActivity(
          name = "Echo Hello World",
          workerGroup = "my-task-runner",
          Command("echo 'Hello AWS Data Pipeline World!'")
        )
      )

}

Use the compiler to produce JSON from our Scala definition:

$ java -jar target/scala-2.12/datapipeline-compiler.jar HelloWorldPipeline HelloWorldPipeline.scala
Writing pipeline definition to: ./HelloWorldPipeline.json

The output JSON file contains your pipeline definition ready to deploy to AWS.

Supported AWS DataPipeline Objects

For details see Supported Objects.

License

This tool is licensed under Apache License 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].