All Projects → fourTheorem → slic-watch

fourTheorem / slic-watch

Licence: Apache-2.0 license
Easy alarms and dashboards for Lambda, DynamoDB, API Gateway, Kinesis, Step Functions and more

Programming Languages

javascript
184084 projects - #8 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to slic-watch

serverless-image-rendering
Image delivery with AWS Lambda ⚡
Stars: ✭ 43 (-51.14%)
Mutual labels:  lambda
lambda2js
Converts a C# expression tree (from Linq namespace) to a syntatically correct javascript code.
Stars: ✭ 51 (-42.05%)
Mutual labels:  lambda
lambda-ci
CI/CD for Lambda Functions with Jenkins
Stars: ✭ 20 (-77.27%)
Mutual labels:  lambda
lambda-lite-js
a tiny FUNCITONAL LANGUAGE implemented by javascript. 一个函数式语言,使用 js 实现。
Stars: ✭ 77 (-12.5%)
Mutual labels:  lambda
terraform-aws-efs-backup
Terraform module designed to easily backup EFS filesystems to S3 using DataPipeline
Stars: ✭ 40 (-54.55%)
Mutual labels:  lambda
serverless-plugin-parcel
A Serverless framework plugin to bundle assets with Parcel (ES6/7 or Typescript)
Stars: ✭ 23 (-73.86%)
Mutual labels:  lambda
FancyDialog
Kotlin + DSL风格代替传统的Builder模式 诸多可配置项 高阶函数代替自定义回调接口 书写起来超级顺手
Stars: ✭ 24 (-72.73%)
Mutual labels:  lambda
serverless-podcast
[UNMAINTAINED] 📢 Easy, cheap podcast hosting using Serverless and S3
Stars: ✭ 15 (-82.95%)
Mutual labels:  lambda
s3-monitoring
No description or website provided.
Stars: ✭ 14 (-84.09%)
Mutual labels:  lambda
aws-backup-lambda
A utility AWS lambda function to manage EBS and RDS snapshot backups.
Stars: ✭ 60 (-31.82%)
Mutual labels:  lambda
lambda-example
Example REST service for a Lambda article I wrote
Stars: ✭ 23 (-73.86%)
Mutual labels:  lambda
aws-cloudformation-cognito-identity-pool
A Lambda-backed Custom Resource for a Cognito Identity Pool in CloudFormation
Stars: ✭ 35 (-60.23%)
Mutual labels:  lambda
tech1-temple-aws
AWS Proofs of Concepts repository. No Longer Supported
Stars: ✭ 32 (-63.64%)
Mutual labels:  lambda
lambda-runtime-pypy3.5
AWS Lambda Runtime for PyPy 3.5
Stars: ✭ 17 (-80.68%)
Mutual labels:  lambda
lambda-memory-performance-benchmark
Performance and cost benchmark tool for AWS Lambda on memory sizes 📈⏱
Stars: ✭ 60 (-31.82%)
Mutual labels:  lambda
iot-button-ec2-controller
Allows the start/stop of EC2 instances using an AWS IoT button
Stars: ✭ 23 (-73.86%)
Mutual labels:  lambda
serverless-graalvm-demo
Sample serverless application written in Java compiled with GraalVM native-image
Stars: ✭ 132 (+50%)
Mutual labels:  lambda
cora
Genius programmer should write his own lisp!
Stars: ✭ 40 (-54.55%)
Mutual labels:  lambda
ooso
Java library for running Serverless MapReduce jobs
Stars: ✭ 25 (-71.59%)
Mutual labels:  lambda
terraform-lambda-fixed-ip
Provide a fixed IP (ElasticIP) to your AWS Lambdas
Stars: ✭ 20 (-77.27%)
Mutual labels:  lambda

slic-watch

serverless npm version Build Coverage Status JavaScript Style Guide

SLIC Watch provides a CloudWatch Dashboard and Alarms for:

  1. AWS Lambda
  2. API Gateway
  3. DynamoDB
  4. Kinesis Data Streams
  5. SQS Queues
  6. Step Functions
  7. ECS (Fargate or EC2)
  8. SNS
  9. EventBridge

Currently, SLIC Watch is available as a Serverless Framework plugin. Serverless Framework v2 and v3 are supported.

Getting Started

  1. 📦 Install the plugin:
npm install serverless-slic-watch-plugin --save-dev
  1. 🖋️ Add the plugin to the plugins section of serverless.yml:
plugins:
  - serverless-slic-watch-plugin
  1. 🪛 Optionally, add some configuration for the plugin to the custom -> slicWatch section of serverless.yml. Here, you can specify a reference to the SNS topic for alarms. This is optional, but it's usually something you want so you can receive alarm notifications via email, Slack, etc.
custom:
  slicWatch:
    topicArn: {'Fn::Ref': myTopic}

See the Configuration section below for more detailed instructions on fine tuning SLIC Watch to your needs.

  1. 🚢 Deploy your application in the usual way, for example:
sls deploy
  1. 👀 Head to the CloudWatch section of the AWS Console to check out your new dashboards 📊 and alarms !

Features

CloudWatch Alarms and Dashboard widgets are created for all supported resources in the CloudFormation stack generated by The Serverless Framework. This includes generated resources as well as resources specifed explicitly in the resources section. Any feature can be configured or disabled completely - see the section on configuration to see how.

Lambda Functions

Lambda Function alarms are created for:

  1. Errors
  2. Throttles, as a percentage of the number of invocations
  3. Duration, as a percentage of the function's configured timeout
  4. Invocations, disabled by default
  5. IteratorAge, for function's triggered by an Event Source Mapping

Lambda dashboard widgets show:

Errors Throttles Duration Average, P95 and Maximum
Errors Throttles Throttles
Invocations Concurrent Executions Iterator Age
Invocations concurrent executions Iterator Age

API Gateway

API Gateway alarms are created for:

  1. 5XX Errors
  2. 4XX Errors
  3. Latency

API Gateway dashboard widgets show:

5XX Errors 4XX Errors Latency Count
5XX Errors 4XX Errors Latency Count

DynamoDB

DynamoDB alarms are created for:

  1. Read Throttle Events (Table and GSI)
  2. Write Throttle Events (Table and GSI)
  3. UserErrors
  4. SystemErrors

Dashboard widgets are created for tables and GSIs: dynamodbGSIReadThrottle.png dynamodbGSIWriteThrottle.png dynamodbTableWriteThrottle.png

ReadThrottleEvents (Table) WriteThrottleEvent (Table)
WriteThrottleEvents Table WriteThrottleEvents Table
ReadThrottleEvents (GSI) WriteThrottleEvent (GSI)
WriteThrottleEvents GSI WriteThrottleEvents GSI

Kinesis Data Streams

Kinesis data stream alarms are created for:

  1. Iterator Age
  2. Read Provisioned Throughput Exceeded
  3. Write Provisioned Throughput Exceeded
  4. PutRecord.Success
  5. PutRecords.Success
  6. GetRecords.Success

Kinesis data stream dashboard widgets show:

Iterator Age Read Provisioned Throughput Exceeded Write Provisioned Throughput Exceeded
Iterator Age Provisioned Throughput Exceeded Put/Get Success

SQS Queues

SQS Queue alarms are create for:

  1. Age Of Oldest Message (disabled by default). If enabled, a threshold in seconds should be specified.
  2. In Flight Messages Percentage. This is a percentage of the AWS hard limits (20,000 messages for FIFO queues and 120,000 for standard queues).

SQS queue dashboard widgets show:

Messages Sent, Received and Deleted Messages Visible Age of Oldest Message
Messages Messages Visible Oldest Message

Step Functions

Step Function alarms are created for:

  1. Execution Throttled
  2. Executions Failed
  3. Executions Timed Out

The dashboard contains one widget per Step Function:

ExecutionsFailed ExecutionThrottled, ExecutionsTimedOut
Step Function widget

ECS / Fargate

ECS alarms are created for Fargate or EC2 clusters:

  1. Memory Utilization
  2. CPU Utilization

SNS

SNS alarms are created for:

  1. Number of Notifications Filtered Out due to Invalid Attributes
  2. Number of Notifications Failed

SNS Topic dashboard widgets show:

Messages Filtered Out - Invalid Attributes Notifications Failed
Invalid Attributes Notifications Failed

EventBridge

EventBridge alarms are created for:

  1. Failed Invocations
  2. Throttled Rules

EventBridge Rule dashboard widgets show:

Failed Invocations Invocations
FailedInvocations Invocations

Configuration

Configuration is entirely optional - SLIC Watch provides defaults that work out of the box.

Note: Alarm configuration is cascading. This means that configuration properties are automatically propagated from parent to children nodes (unless an override is present at the given node).

You can customize the configuration:

  • at the top level, for all resources in each service, and/or
  • at the level of individual functions.

Plugin configuration

Top-level plugin configuration can be specified in the customslicWatch section of serverless.yml

  • The topicArn may be optionally provided as an SNS Topic destination for all alarms. If you omit the topic, alarms are still created but are not sent to any destination.
  • Alarms or dashboards can be disabled at any level in the configuration by adding enabled: false. You can even disable all plugin functionality by specifying enabled: false at the top-level plugin configuration.

Supported options along with their defaults are shown below.

# ...

custom:
  slicWatch:
    topicArn: SNS_TOPIC_ARN  # This is optional but recommended so you can receive alarms via email, Slack, etc.
    enabled: true

    alarms:
      enabled: true
      Period: 60
      EvaluationPeriods: 1
      TreatMissingData: notBreaching
      ComparisonOperator: GreaterThanThreshold
      Lambda: # Lambda Functions
        Errors:
          Threshold: 0
          Statistic: Sum
        ThrottlesPc: # Throttles are evaluated as a percentage of invocations
          Threshold: 0
        DurationPc: # Duration is evaluated as a percentage of the function timeout
          Threshold: 95
          Statistic: Maximum
        Invocations: # No invocation alarms are created by default. Override threshold to create alarms
          enabled: false # Note: this one requires both `enabled: true` and `Threshold: someValue` to be effectively enabled
          Threshold: null
          Statistic: Sum
        IteratorAge:
          Threshold: 10000
          Statistic: Maximum
      ApiGateway: # API Gateway REST APIs
        5XXError:
          Statistic: Average
          Threshold: 0
        4XXError:
          Statistic: Average
          Threshold: 0.05
        Latency:
          ExtendedStatistic: p99
          Threshold: 5000
      States: # Step Functions
        Statistic: Sum
        ExecutionsThrottled:
          Threshold: 0
        ExecutionsFailed:
          Threshold: 0
        ExecutionsTimedOut:
          Threshold: 0
      DynamoDB:
        # Consumed read/write capacity units are not alarmed. These should either
        # be part of an auto-scaling configuration for provisioned mode or should be automatically
        # avoided for on-demand mode. Instead, we rely on persistent throttling
        # to alert failures in these scenarios.
        # Throttles can occur in normal operation and are handled with retries. Threshold should
        # therefore be configured to provide meaningful alarms based on higher than average throttling.
        Statistic: Sum
        ReadThrottleEvents:
          Threshold: 10
        WriteThrottleEvents:
          Threshold: 10
        UserErrors:
          Threshold: 0
        SystemErrors:
          Threshold: 0
      Kinesis:
        GetRecords.IteratorAgeMilliseconds:
          Statistic: Maximum
          Threshold: 10000
        ReadProvisionedThroughputExceeded:
          Statistic: Maximum
          Threshold: 0
        WriteProvisionedThroughputExceeded:
          Statistic: Maximum
          Threshold: 0
        PutRecord.Success:
          ComparisonOperator: LessThanThreshold
          Statistic: Average
          Threshold: 1
        PutRecords.Success:
          ComparisonOperator: LessThanThreshold
          Statistic: Average
          Threshold: 1
        GetRecords.Success:
          ComparisonOperator: LessThanThreshold
          Statistic: Average
          Threshold: 1
        SQS:
          # approximate age of the oldest message in the queue above threshold: messages aren't processed fast enough
          AgeOfOldestMessage:
            Statistic: Maximum
            enabled: false # Note: this one requires both `enabled: true` and `Threshold: someValue` to be effectively enabled
            Threshold: null
          # approximate number of messages in flight above threshold (in percentage of hard limit: 120000 for regular queues and 20000 for FIFO queues)
          InFlightMessagesPc:
            Statistic: Maximum
            Threshold: 80 # 80% of 120.000 for regular queues or 80% of 20000 for FIFO queues
        ECS:
          MemoryUtilization:
            Statistic: Average
            Threshold: 90
          CPUUtilization:
            Statistic: Average
            Threshold: 90
        SNS:
          NumberOfNotificationsFilteredOut-InvalidAttributes:
            Statistic: Sum 
            Threshold: 1
          NumberOfNotificationsFailed:
            Statistic: Sum 
            Threshold: 1
        Events:
          #EventBridge
          FailedInvocations:
            Statistic: Sum 
            Threshold: 1
          ThrottledRules:
            Statistic: Sum 
            Threshold: 1

    dashboard:
      enabled: true
      timeRange:
        # For possible 'start' and 'end' values, see
        # https:# docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/CloudWatch-Dashboard-Body-Structure.html
        start: -PT3H
      metricPeriod: 300
      widgets:
        metricPeriod: 300
        width: 8
        height: 6
        Lambda:
          # Metrics per Lambda Function
          Errors:
            Statistic: ['Sum']
          Throttles:
            Statistic: ['Sum']
          Duration:
            Statistic: ['Average', 'p95', 'Maximum']
          Invocations:
            Statistic: ['Sum']
          ConcurrentExecutions:
            Statistic: ['Maximum']
          IteratorAge:
            Statistic: ['Maximum']
        ApiGateway:
          5XXError:
            Statistic: ['Sum']
          4XXError:
            Statistic: ['Sum']
          Latency:
            Statistic: ['Average', 'p95']
          Count:
            Statistic: ['Sum']
        States:
          # Step Functions
          ExecutionsFailed:
            Statistic: ['Sum']
          ExecutionsThrottled:
            Statistic: ['Sum']
          ExecutionsTimedOut:
            Statistic: ['Sum']
        DynamoDB:
          # Tables and GSIs
          ReadThrottleEvents:
            Statistic: ['Sum']
          WriteThrottleEvents:
            Statistic: ['Sum']
        Kinesis:
          # Kinesis Data Streams
          GetRecords.IteratorAgeMilliseconds:
            Statistic: ['Maximum']
          ReadProvisionedThroughputExceeded:
            Statistic: ['Sum']
          WriteProvisionedThroughputExceeded:
            Statistic: ['Sum']
          PutRecord.Success:
            Statistic: ['Average']
          PutRecords.Success:
            Statistic: ['Average']
          GetRecords.Success:
            Statistic: ['Average']
        SQS:
          # SQS Queues
          NumberOfMessagesSent:
            Statistic: ["Sum"]
          NumberOfMessagesReceived:
            Statistic: ["Sum"]
          NumberOfMessagesDeleted:
            Statistic: ["Sum"]
          ApproximateAgeOfOldestMessage:
            Statistic: ["Maximum"]
          ApproximateNumberOfMessagesVisible:
            Statistic: ["Maximum"]
        ECS:
          MemoryUtilization:
            Statistic: ["Average"]
          CPUUtilization:
            Statistic: ["Average"]
        SNS:
          NumberOfNotificationsFilteredOut-InvalidAttributes:
            Statistic: ["Sum"]
          NumberOfNotificationsFailed:
            Statistic: ["Sum"]
        Events:
          #EventBridge
          FailedInvocations:
            Statistic: ["Sum"]
          ThrottledRules:
            Statistic: ["Sum"]
          Invocations: 
            Statistic: ["Sum"] 

An example project is provided for reference: serverless-test-project

Function-level configuration

For each function, add the slicWatch property to configure specific overrides for alarms and dashboards relating to the AWS Lambda Function resource.

functions:
  hello:
    handler: basic-handler.hello
    slicWatch:
      dashboard:
        enabled: false    # No Lambda widgets will be created for this function
      alarms:
        Lambda:
          Invocations:
            Threshold: 2  # The invocation threshold is specific to
                          # this function's expected invocation count

To disable all alarms for any given function, use:

functions:
  hello:
    handler: basic-handler.hello
    slicWatch:
      alarms:
        Lambda:
          enabled: false

A note on CloudWatch cost

This plugin creates additional CloudWatch resources that, apart from a limited free tier, have an associated cost. Depending on what you enable, SLIC Watch creates one dashboard and multiple alarms. The number of each depend on the number of resources in your stack and the number of stacks you have.

Check out the AWS CloudWatch Pricing page to understand the cost impact of creating CloudWatch resources.

References

Other Projects

  1. serverless-plugin-aws-alerts
  2. Real World Serverless Application - Serverless Operations
  3. CDK Watchful
  4. CDK Patterns - The CloudWatch Dashboard

Reading

  1. AWS Well Architected Serverless Applications Lens
  2. How to Monitor Lambda with CloudWatch Metrics - Yan Cui

LICENSE

Apache - LICENSE

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].