All Projects → jet → nomad-service-alerter

jet / nomad-service-alerter

Licence: other
Alerting for Nomad Jobs

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to nomad-service-alerter

terraform-google-nomad
📗 Terraform Module for Nomad clusters with Consul on GCP
Stars: ✭ 63 (+70.27%)
Mutual labels:  consul, nomad
Istio
Connect, secure, control, and observe services.
Stars: ✭ 28,970 (+78197.3%)
Mutual labels:  consul, nomad
vim-hcl
Syntax highlighting for HashiCorp Configuration Language (HCL)
Stars: ✭ 83 (+124.32%)
Mutual labels:  consul, nomad
100 Days Of Go
100 days of Go learning
Stars: ✭ 24 (-35.14%)
Mutual labels:  consul, nomad
Hashi Up
bootstrap HashiCorp Consul, Nomad, or Vault over SSH < 1 minute
Stars: ✭ 113 (+205.41%)
Mutual labels:  consul, nomad
nomad-demo
Vagrant based demo setup for running Hashicorp Nomad
Stars: ✭ 88 (+137.84%)
Mutual labels:  consul, nomad
JAlgoArena
JAlgoArena programming contest platform
Stars: ✭ 32 (-13.51%)
Mutual labels:  consul, nomad
local-hashicorp-stack
Local Hashicorp Stack for DevOps Development without Hypervisor or Cloud
Stars: ✭ 23 (-37.84%)
Mutual labels:  consul, nomad
Nomad Firehose
Firehose all nomad job, allocation, nodes and evaluations changes to rabbitmq, kinesis or stdout
Stars: ✭ 96 (+159.46%)
Mutual labels:  consul, nomad
Terraform Modules
Reusable Terraform modules
Stars: ✭ 63 (+70.27%)
Mutual labels:  consul, nomad
deadman-check
Monitoring companion for Nomad periodic jobs and Cron
Stars: ✭ 49 (+32.43%)
Mutual labels:  consul, nomad
Consul Backinator
Command line Consul backup and restore utility supporting KVs, ACLs and Queries
Stars: ✭ 206 (+456.76%)
Mutual labels:  consul, nomad
offensive-infrastructure
Offensive Infrastructure with Modern Technologies
Stars: ✭ 88 (+137.84%)
Mutual labels:  consul, nomad
gocast
GoCast is a tool for controlled BGP route announcements from a host
Stars: ✭ 55 (+48.65%)
Mutual labels:  consul, nomad
nomad-box
Nomad Box - Simple Terraform-powered setup to Azure of clustered Consul, Nomad and Traefik Load Balancer that runs Docker/GoLang/Java workloads. NOTE: Only suitable in dev environments at the moment until I learn more Terraform, Consul, Nomad, Vault :P
Stars: ✭ 18 (-51.35%)
Mutual labels:  consul, nomad
nomad-consult-ansible-centos
Deploy nomad & consult on centos with ansible
Stars: ✭ 17 (-54.05%)
Mutual labels:  consul, nomad
Hashi Ui
A modern user interface for @hashicorp Consul & Nomad
Stars: ✭ 1,119 (+2924.32%)
Mutual labels:  consul, nomad
Replicator
Automated Cluster and Job Scaling For HashiCorp Nomad
Stars: ✭ 166 (+348.65%)
Mutual labels:  consul, nomad
hashicorp-labs
Deploy locally on VM an Hashicorp cluster formed by Vault, Consul and Nomad. Ready for deploying and testing your apps.
Stars: ✭ 32 (-13.51%)
Mutual labels:  consul, nomad
Springcloudlearning
《史上最简单的Spring Cloud教程源码》
Stars: ✭ 16,218 (+43732.43%)
Mutual labels:  consul

NOTICE: SUPPORT FOR THIS PROJECT ENDED ON 18 November 2020

This projected was owned and maintained by Jet.com (Walmart). This project has reached its end of life and Walmart no longer supports this project.

We will no longer be monitoring the issues for this project or reviewing pull requests. You are free to continue using this project under the license terms or forks of this project at your own risk. This project is no longer subject to Jet.com/Walmart's bug bounty program or other security monitoring.

Actions you can take

We recommend you take the following action:

  • Review any configuration files used for build automation and make appropriate updates to remove or replace this project
  • Notify other members of your team and/or organization of this change
  • Notify your security team to help you evaluate alternative options

Forking and transition of ownership

For security reasons, Walmart does not transfer the ownership of our primary repos on Github or other platforms to other individuals/organizations. Further, we do not transfer ownership of packages for public package management systems.

If you would like to fork this package and continue development, you should choose a new name for the project and create your own packages, build automation, etc.

Please review the licensing terms of this project, which continue to be in effect even after decommission.

ORIGINAL README BELOW


Nomad Service Alerter

Nomad Service Alerter is a tool written in Go, whose primary goal is to provide alerting for your services running on Nomad (https://www.nomadproject.io/). It offers configurable opt-in alerting options which you can specify in your Nomad Job manifest (json file) as Environment Variables. The Nomad Service Alerter mainly covers Consul Health-Check Alerts and Service Restart-Loops Alerts.

Alerts

Nomad Service Alerter supports the following Alerts :

Consul Health-Check Alerts

This alert will monitor your service and alert on allocations and versions that are failing their defined consul health-checks. You will be able to set the duration threshold for which the service must remain unhealthy before alerting. The alert will include the details of all the allocations of the service which is failing the consul health check.

Service Restart-Loops Alerts

This alert will monitor jobs (and all of its allocations) and alert on the services which go into an un-ending restart loop. This indicates that there is an error in the service which is not allowing it to enter a successful Running state (the allocations are created but are constantly in pending state). This is a more accurate way to alert of Nomad jobs vs. monitoring Dead state (which may be a valid state if you set count to 0).

Queued Instances Alerts

You can configure Nomad Service Alerter to opt in into Queued Instances Alerts which will trigger an alert when the service has un-allocated instances for at least 3 minutes.

Orphaned Instances Alerts

You can configure Nomad Service Alerter to opt in into Orphaned Instances Alerts which will trigger an alert when the service has more number of allocations running than what it has asked for (In this case there is one or multiple rogue allocations running on some machine which do not have any parent nomad process, hence the name). Similar to Queued instances alert, this alert will be triggered when the service remains in described state for at least 3 minutes.

Build and Test

To run the tool on your local machine, you will have to :


"nomad_server" --> your nomad server address
"env" --> the environment in which the tool would be running
"region" --> the region in which your tool would be running
"consul_server" --> your consul server address
"consul_datacenter" --> datacenter of your consul server

You can use the script loadenv.sh after adding appropriate values to load all the above variables.

  • Run go build
  • Execute the binary. (Or you can skip the go build step and run go run main.go instead)

Configuring a nomad service to be alerted on by Nomad Service Alerter upon being unhealthy

You can configure your service by adding following key-value pairs to the Meta section of your Nomad Job.

  • consul_service_healthcheck_enabled --> true/false (to enable/disable consul healthcheck alerts)
  • consul_service_healthcheck_threshold --> Time duration for which service can remain in unhealthy state before getting alerted (eg. 2m0s)
  • pd_service_key --> 32 characters Pagerduty Serrvice integration key (all the alerts will be sent here)
  • restart_loop_alerting_enabled --> true/false (to enable/disable restart loop alerts)
  • orphaned_instances_alert_enabled --> true/false (to enable/disable orphaned allocations alert)
  • queued_instances_alert_enabled --> true/false (to enable/disable queued allocations alert)

Following is an example of key-value pairs described above that your Job Meta section (Job level) should have :

consul_service_healthcheck_enabled: true
consul_service_healthcheck_threshold: 3m0s
restart_loop_alerting_enabled: true
orphaned_instances_alert_enabled: true
queued_instances_alert_enabled: true
pd_service_key: 22221234567890123456789000000000

Running Nomad Service Alerter on Nomad

If you want to run Nomad Service Alerter on Nomad, you would need to have the Environment Variables (ones described in 'Build and Test' section) set with appropriate values in your job manifest (json file):


"nomad_server" --> your nomad server address
"env" --> the environment in which the tool would be running
"region" --> the region in which your tool would be running
"consul_server" --> your consul server address
"consul_datacenter" --> datacenter of your consul server

Once your Job file is ready, use the standard method of submitting the job to nomad (https://www.nomadproject.io/docs/operating-a-job/submitting-jobs.html).

Alert Integrations

As of now, Nomad Service Alerter only supports integration with PagerDuty.

Maintainers

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].