elsevier-core-engineering / Replicator

Licence: mit
Automated Cluster and Job Scaling For HashiCorp Nomad

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Replicator

Hashi Up
bootstrap HashiCorp Consul, Nomad, or Vault over SSH < 1 minute
Stars: ✭ 113 (-31.93%)
Mutual labels:  hashicorp, nomad, consul
Sherpa
Sherpa is a highly available, fast, and flexible horizontal job scaling for HashiCorp Nomad. It is capable of running in a number of different modes to suit different requirements, and can scale based on Nomad resource metrics or external sources.
Stars: ✭ 165 (-0.6%)
Mutual labels:  hashicorp, nomad, autoscaling
libra
A Nomad auto scaler
Stars: ✭ 72 (-56.63%)
Mutual labels:  hashicorp, nomad, autoscaling
Nomad Firehose
Firehose all nomad job, allocation, nodes and evaluations changes to rabbitmq, kinesis or stdout
Stars: ✭ 96 (-42.17%)
Mutual labels:  hashicorp, nomad, consul
nomad-consult-ansible-centos
Deploy nomad & consult on centos with ansible
Stars: ✭ 17 (-89.76%)
Mutual labels:  consul, hashicorp, nomad
nomad-droplets-autoscaler
DigitalOcean Droplets target plugin for HashiCorp Nomad Autoscaler
Stars: ✭ 42 (-74.7%)
Mutual labels:  hashicorp, nomad, autoscaling
hashicorp-labs
Deploy locally on VM an Hashicorp cluster formed by Vault, Consul and Nomad. Ready for deploying and testing your apps.
Stars: ✭ 32 (-80.72%)
Mutual labels:  consul, hashicorp, nomad
local-hashicorp-stack
Local Hashicorp Stack for DevOps Development without Hypervisor or Cloud
Stars: ✭ 23 (-86.14%)
Mutual labels:  consul, hashicorp, nomad
vim-hcl
Syntax highlighting for HashiCorp Configuration Language (HCL)
Stars: ✭ 83 (-50%)
Mutual labels:  consul, hashicorp, nomad
nomad-box
Nomad Box - Simple Terraform-powered setup to Azure of clustered Consul, Nomad and Traefik Load Balancer that runs Docker/GoLang/Java workloads. NOTE: Only suitable in dev environments at the moment until I learn more Terraform, Consul, Nomad, Vault :P
Stars: ✭ 18 (-89.16%)
Mutual labels:  consul, hashicorp, nomad
Hashi Ui
A modern user interface for @hashicorp Consul & Nomad
Stars: ✭ 1,119 (+574.1%)
Mutual labels:  hashicorp, nomad, consul
Escalator
Escalator is a batch or job optimized horizontal autoscaler for Kubernetes
Stars: ✭ 539 (+224.7%)
Mutual labels:  aws, autoscaling, scaling
Terraform Modules
Reusable Terraform modules
Stars: ✭ 63 (-62.05%)
Mutual labels:  aws, nomad, consul
Kube Aws Autoscaler
Simple, elastic Kubernetes cluster autoscaler for AWS Auto Scaling Groups
Stars: ✭ 94 (-43.37%)
Mutual labels:  aws, autoscaling
Python Nomad
Client library Hashicorp Nomad
Stars: ✭ 90 (-45.78%)
Mutual labels:  hashicorp, nomad
Vaultron
🤖 Vault clusters Terraformed onto Docker for great fun and learning!
Stars: ✭ 96 (-42.17%)
Mutual labels:  hashicorp, consul
Nomadfiles
A collection of Nomad job files for deploying applications to a cluster
Stars: ✭ 89 (-46.39%)
Mutual labels:  hashicorp, nomad
Nomad Helper
Useful tools for working with @hashicorp Nomad at scale
Stars: ✭ 96 (-42.17%)
Mutual labels:  hashicorp, nomad
Aws Auto Scaling Custom Resource
Libraries, samples, and tools to help AWS customers onboard with custom resource auto scaling.
Stars: ✭ 78 (-53.01%)
Mutual labels:  aws, autoscaling
Ecs Formation
Tool to build Docker cluster composition for Amazon EC2 Container Service(ECS)
Stars: ✭ 114 (-31.33%)
Mutual labels:  aws, autoscaling

Replicator

Build Status Go Report Card Join the chat at https://gitter.im/els-core-engineering/replicator/Lobby GoDoc

Replicator is a fast and highly concurrent Go daemon that provides dynamic scaling of Nomad jobs and worker nodes.

  • Replicator job scaling policies are configured as meta parameters within the job specification. A job scaling policy allows scaling constraints to be defined per task-group. Currently supported scaling metrics are CPU and Memory; there are plans for additional metrics as well as different metric backends in the future. Details of configuring job scaling and other important information can be found on the Replicator Job Scaling wiki page.

  • Replicator supports dynamic scaling of multiple, distinct cluster worker nodes in an AWS autoscaling group. Worker pool autoscaling is configured through Nomad client meta parameters. Details of configuring worker pool scaling and other important information can be found on the Replicator Cluster Scaling wiki page.

At present, worker pool autoscaling is only supported on AWS, however, future support for GCE and Azure are planned using the Go factory/provider pattern.

Download

Pre-compiled releases for a number of platforms are available on the GitHub release page. Docker images are also available from the elsce Docker Hub page.

Running

Replicator can be run in a number of ways; the recommended way is as a Nomad service job either using the Docker driver or the exec driver. There are example Nomad job specification files available as a starting point.

It's recommended to take a look at the agent configuration options to configure Replicator to run best in your environment.

Replicator is fully capable of running as a distributed service; using Consul sessions to provide leadership locking and exclusion. State is also written by Replicator to the Consul KV store, allowing Replicator failures to be handled quickly and efficiently.

An example Nomad client configuration that can be used to enable autoscaling on the worker pool:

bind_addr = "0.0.0.0"
client {
  enabled =  true
  meta {
    "replicator_cooldown"            = 400
    "replicator_enabled"              = true
    "replicator_node_fault_tolerance" = 1
    "replicator_notification_uid"     = "REP2"
    "replicator_provider"             = "aws"
    "replicator_region"               = "us-east-1"
    "replicator_retry_threshold"      = 3
    "replicator_scaling_threshold"    = 3
    "replicator_worker_pool"          = "container-node-public-prod"
  }
}

An example job which has autoscaling enabled:

job "example" {
  datacenters = ["dc1"]
  type        = "service"

  update {
    max_parallel = 1
    stagger      = "10s"
  }

  group "cache" {
    count = 3

    meta {
      "replicator_max"               = 10
      "replicator_cooldown"          = 50
      "replicator_enabled"           = true
      "replicator_min"               = 1
      "replicator_retry_threshold"   = 1
      "replicator_scalein_mem"       = 30
      "replicator_scalein_cpu"       = 30
      "replicator_scaleout_mem"      = 80
      "replicator_scaleout_cpu"      = 80
      "replicator_notification_uid"  = "REP1"
    }

    task "redis" {
      driver = "docker"
      config {
        image = "redis:3.2"
        port_map {
          db = 6379
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
        network {
          mbits = 10
          port "db" {}
        }
      }

      service {
        name = "global-redis-check"
        tags = ["global", "cache"]
        port = "db"
        check {
          name     = "alive"
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Permissions

Replicator requires permissions to Consul and the AWS (the only currently supported cloud provider) API in order to function correctly. The Consul ACL token is passed as a configuration parameter and AWS API access should be granted using an EC2 instance IAM role. Vault support is planned for the near future, which will change the way in which permissions are managed and provide a much more secure method of delivering these.

Consul ACL Token Permissions

If the Consul cluster being used is running ACLs; the following ACL policy will allow Replicator the required access to perform all functions based on its default configuration:

key "" {
  policy = "read"
}
key "replicator/config" {
  policy = "write"
}
node "" {
  policy = "read"
}
node "" {
  policy = "write"
}
session "" {
  policy = "read"
}
session "" {
  policy = "write"
}

AWS IAM Permissions

Until Vault integration is added, the instance pool which is capable of running the Replicator daemon requires the following IAM permissions in order to perform worker pool scaling:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AuthorizeAutoScalingActions",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeScalingActivities",
                "autoscaling:DetachInstances",
                "autoscaling:UpdateAutoScalingGroup"
            ],
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Sid": "AuthorizeEC2Actions",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeRegions",
                "ec2:TerminateInstances",
                "ec2:DescribeInstanceStatus"
            ],
            "Effect": "Allow",
            "Resource": "*"
        }
    ]
}

Commands

Replicator supports a number of commands (CLI) which allow for the easy control and manipulation of the replicator binary. In-depth documentation about each command can be found on the Replicator commands wiki page.

Command: agent

The agent command is the main entry point into Replicator. A subset of the available replicator agent configuration can optionally be passed in via CLI arguments and the configuration parameters passed via CLI flags will always take precedent over parameters specified in configuration files.

Detailed information regarding the available CLI flags can be found in the Replicator agent command wiki page.

Command: failsafe

The failsafe command is used to toggle failsafe mode across the pool of Replicator agents. Failsafe mode prevents any Replicator agent from taking any scaling actions on the resource placed into failsafe mode.

Detailed information about failsafe mode operations and the available CLI options can be found in the Replicator failsafe command wiki page.

Command: init

The init command creates example job scaling and worker pool scaling meta documents in the current directory. These files provide a starting example for configuring both scaling functionalities.

Command: version

The version command displays build information about the running binary, including the release version.

Frequently Asked Questions

When does Replicator adjust the size of the worker pool?

Replicator will dynamically scale-in the worker pool when:

  • Resource utilization falls below the capacity required to run all current jobs while sustaining the configured node fault-tolerance. When calculating required capacity, Replicator includes scaling overhead required to increase the count of all running jobs by one.
  • Before removing a worker node, Replicator simulates capacity thresholds if we were to remove a node. If the new required capacity is within 10% of the current utilization, Replicator will decline to remove a node to prevent thrashing.

Replicator will dynamically scale-out the worker pool when:

  • Resource utilization exceeds or closely approaches the capacity required to run all current jobs while sustaining the configured node fault-tolerance. When calculating required capacity, Replicator includes scaling overhead required to increase the count of all running jobs by one.

When does Replicator perform scaling actions against running jobs?

Replicator will dynamically scale a job when:

  • A valid scaling policy for the job task-group is present within the job specification meta parameters and has the enabled flag set to true.
  • A job specification can consist of multiple groups, each group can contain multiple tasks. Resource allocations and count are specified at the group level.
  • Replicator evaluates scaling thresholds against the resource requirements defined within a group task. If any task within a group is found to violate the scaling thresholds, the group count will be adjusted accordingly.

Contributing

Contributions to Replicator are very welcome! Please refer to our contribution guide for details about hacking on Replicator.

For questions, please check out the Elsevier Core Engineering/replicator room in Gitter.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].