All Projects → alexcasalboni → Kinesis Streams Fan Out Kinesis Analytics

alexcasalboni / Kinesis Streams Fan Out Kinesis Analytics

Licence: apache-2.0
Amazon Kinesis Streams fan-out via Kinesis Analytics (powered by the Serverless Framework)

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Kinesis Streams Fan Out Kinesis Analytics

Ng Toolkit
⭐️ Angular tool-box! Start your PWA in two steps! Add Serverless support for existing projects and much more
Stars: ✭ 1,116 (+1074.74%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Es Logs
A Serverless plugin to transport logs to ElasticSearch
Stars: ✭ 51 (-46.32%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Aws Auto Cleanup
Open-source application to programmatically clean your AWS resources based on a whitelist and time to live (TTL) settings
Stars: ✭ 276 (+190.53%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Chrome
🌐 Run headless Chrome/Chromium on AWS Lambda
Stars: ✭ 2,625 (+2663.16%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Node Simple Messaging
Simple email AWS lambda function
Stars: ✭ 75 (-21.05%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Aws Lambda Typescript
This sample uses the Serverless Application Framework to implement an AWS Lambda function in TypeScript, deploy it via CloudFormation, publish it through API Gateway to a custom domain registered on Route53, and document it with Swagger.
Stars: ✭ 228 (+140%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Express
⚡ Take existing Express.js apps and host them easily on cheap, auto-scaling, serverless infrastructure (AWS Lambda and AWS HTTP API).
Stars: ✭ 337 (+254.74%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Cartoonify
Deploy and scale serverless machine learning app - in 4 steps.
Stars: ✭ 157 (+65.26%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Plugin Git Variables
⚡️ Expose git variables to serverless
Stars: ✭ 75 (-21.05%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Aws Lambda Node Postgres
Serverless AWS Lambda with Node.js,Postgres Rest API with Sequelize.
Stars: ✭ 18 (-81.05%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Retinal
🏙 Retinal is a Serverless AWS Lambda service for resizing images on-demand or event-triggered
Stars: ✭ 208 (+118.95%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Chalice
Python Serverless Microframework for AWS
Stars: ✭ 8,513 (+8861.05%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Components
The Serverless Framework's new infrastructure provisioning technology — Build, compose, & deploy serverless apps in seconds...
Stars: ✭ 2,259 (+2277.89%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverlessbydesign
A visual approach to serverless development. Think. Build. Repeat.
Stars: ✭ 254 (+167.37%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Aws Alias
Alias support for Serverless 1.x
Stars: ✭ 171 (+80%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Plugin Canary Deployments
Canary deployments for your Serverless application
Stars: ✭ 283 (+197.89%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Sentry Plugin
This plugin adds automatic forwarding of errors and exceptions to Sentry (https://sentry.io) and Serverless (https://serverless.com)
Stars: ✭ 146 (+53.68%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Serverless Next.js
⚡ Deploy your Next.js apps on AWS Lambda@Edge via Serverless Components
Stars: ✭ 2,977 (+3033.68%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Webiny Js
Enterprise open-source serverless CMS. Includes a headless CMS, page builder, form builder and file manager. Easy to customize and expand. Deploys to AWS.
Stars: ✭ 4,869 (+5025.26%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework
Graphql Serverless
Sample project to guide the use of GraphQL and Serverless Architecture.
Stars: ✭ 28 (-70.53%)
Mutual labels:  aws, serverless, aws-lambda, serverless-framework

Amazon Kinesis Streams fan-out via Kinesis Analytics - made with serverless

Amazon Kinesis Analytics can fan-out your Kinesis Streams and avoid read throttling.

Each Kinesis Streams shard can support a maximum total data read rate of 2 MBps (max 5 transactions), and a maximum total data write rate of 1 MBps (max 1,000 records). Even if you provision enough write capacity, you are not free to connect as many consumers as you'd like, especially with AWS Lambda, because you'll easily reach the read capacity.

For example, if you have 10 shards and you push 8,000 events per second with an average size of 1KB each, you will be around 80% of your write capacity (8MBps out of 10MBps). If you connect three consumers though, you'll be trying to read around 24MBps (which is above your max read capacity of 20MBps).

You could implement the fan-out with AWS Lambda (great resource here), but you'd have to deal with API calls and retry issues yourself to avoid duplicated events across output channels.

This repository is a reference architecture to solve the fan-out problem with Kinesis Analytics, which can stream data from an input Stream to multiple output Streams (or Firehose delivery streams).

Kinesis Analytics architecture
(image source: AWS Documentation)

What's the scenario?

I have chosen a sample use case based on financial transactions.

We have an input Stream where we'll be pushing fake transactions records (id, username, amount), but we want to have two Lambda consumers that will read from two independent Kinesis Streams.

One stream will contain only positive transactions (amount >= 0), and the second stream will contain only negative transactions (amount < 0).

You may want to do this to solve the read throughput problem, to further split the input stream into three or four streams, or just to simplify the processing logic and flow of each individual case.

For now, these sample Lambda Functions don't do much besides logging the batch of 100 records into CloudWatch logs.

How to deploy the Kinesis Analytics Application

First, install the Serverless Framework and configure your AWS credentials:

$ npm install serverless -g
$ serverless config credentials --provider aws --key XXX --secret YYY

Now, you can quickly install this service as follows:

$ serverless install -u https://github.com/alexcasalboni/kinesis-streams-fan-out-kinesis-analytics

The Serverless Framework will download and unzip the repository, but it won't install dependencies. Don't forget to install npm dependencies before proceeding:

$ cd kinesis-streams-fan-out-kinesis-analytics
$ npm install

Then you can deploy all the resources defined in the serverless.yml file. Unfortunately, Kinesis Analytics is not supported by CloudFormation yet, and it will be created via API as a second step.

$ sls deploy

The CloudFormation Stack will provide the following Outputs:

  • TransactionsStreamARN: the input Kinesis Stream
  • PositiveTransactionsStreamARN: the first output Kinesis Stream
  • NegativeTransactionsStreamARN: the second output Kinesis Stream
  • KinesisAnalyticsIAMRoleARN: the IAM Role required for Kinesis Analytics to read from and write into the three Kinesis Streams

You can find these values by running the following serverless command:

$ sls info --verbose

You should take note of these ARNs and use them in the following commands (ideally it shouldn't be a manual process, but this was just a 4h side project).

How to Create the Kinesis Analytics Application

This command will create the Kinesis Application:

$ npm run create -- -N kinesis-fanout-app -P {KinesisAnalyticsIAMRoleARN} -I {TransactionsStreamARN} -O {PositiveTransactionsStreamARN} -U {NegativeTransactionsStreamARN}

Don't forget to replace the resource names with the corresponding ARNs (see CloudFormation Outputs above).

How to Start/Stop the Kinesis Analytics Application

This command will start or stop the Kinesis Application, based on the current status:

$ npm run toggle -- -N kinesis-fanout-app

Hint: some intermediate status don't allow explicit transitions (e.g. "STARTING"), and you'll just have to wait :)

How to Delete the Kinesis Analytics Application

This command will delete the Kinesis Application:

$ npm run delete -- -N kinesis-fanout-app

Hint: do this only when your're done with all the other commands (Kinesis Analytics is charged per-hour).

How to Put Records into the main Kinesis Stream

This command will put a few records (up to 500) into the input Kinesis Stream:

$ npm run putrecords -- -I Transactions -N 100

Hint: this endpoint does not require a full ARN (the Stream name is enough).

Where does the magic happen?

In order to stream your records from the main Kinesis Stream to the other two, we will need a SQL query that runs on real-time data and filters/groups it based on the transaction amount.

Also, Kinesis Analytics will need to know the exact data model of the input stream. I have defined it in this file, but you can easily change the records schema and then invoke the DiscoverInputSchema API to generate the exact mapping required by Kinesis Analytics.

In case you decide to change the records structure, here are the two commands you'll need to run:

$ npm run putrecords -- -I Transactions -N 100
$ npm run discover -- -I {TransactionsStreamARN} -P {KinesisAnalyticsIAMRoleARN}

Here is the SQL query, adapted to the data model of this scenario (source file here):

CREATE OR REPLACE STREAM "POSITIVE_TRANSACTIONS" (
    ID INTEGER,
    USERNAME VARCHAR(200),
    AMOUNT DECIMAL(5,2)
);
CREATE OR REPLACE STREAM "NEGATIVE_TRANSACTIONS" (
    ID INTEGER,
    USERNAME VARCHAR(200),
    AMOUNT DECIMAL(5,2)
);

CREATE OR REPLACE PUMP "STREAM_PUMP1" 
    AS INSERT INTO "POSITIVE_TRANSACTIONS"
        SELECT STREAM ID, USERNAME, AMOUNT
        FROM "SOURCE_SQL_STREAM_001"
        WHERE AMOUNT >= 0;

CREATE OR REPLACE PUMP "STREAM_PUMP2"
    AS INSERT INTO "NEGATIVE_TRANSACTIONS"
        SELECT STREAM ID, USERNAME, AMOUNT
        FROM "SOURCE_SQL_STREAM_001"
        WHERE AMOUNT < 0;

Ok, but what's going on?

  • We define two output destinations with the very same structure, named POSITIVE_TRANSACTIONS and NEGATIVE_TRANSACTIONS, and we connect them to the corresponding Kinesis Streams (see source code here).
  • We also define two "pumps" (with arbitrary names) that will buffer our data before writing into the output stream
  • For each pump, we define the filtering query that will write positive transactions to the first stream, and negative transactions to the second one

Note: this SQL query will be executed in real-time on the incoming data, without any windowing or buffering. Well, in my tests it took a bit less than 2 seconds for a batch of 500 records to go through the secondary streams and reach my Lambda Functions.

Contributing

Contributors and PRs are always welcome!

Tests and coverage

Install dev dependencies with npm install --dev.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].