All Projects → alexcasalboni → serverless-data-pipeline-sam

alexcasalboni / serverless-data-pipeline-sam

Licence: Apache-2.0 License
Serverless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to serverless-data-pipeline-sam

Aws Auto Terminate Idle Emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-73.08%)
Mutual labels:  cloudformation, aws-lambda, amazon-web-services
Ssm Cache Python
AWS System Manager Parameter Store caching client for Python
Stars: ✭ 177 (+126.92%)
Mutual labels:  aws-lambda, aws-s3, amazon-web-services
Aws Csa Notes 2018
My AWS Certified Solutions Architect Associate Study Notes!
Stars: ✭ 167 (+114.1%)
Mutual labels:  aws-lambda, aws-s3, amazon-web-services
Autospotting
Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.
Stars: ✭ 2,014 (+2482.05%)
Mutual labels:  cloudformation, aws-lambda, amazon-web-services
ob bulkstash
Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+44.87%)
Mutual labels:  amazon-web-services, data-pipeline
cfn-cheapest-nat
Cheapest AWS VPC NAT.
Stars: ✭ 38 (-51.28%)
Mutual labels:  cloudformation, amazon-web-services
webpack-aws-lambda
AWS Lambda that runs webpack and output the bundle.js file
Stars: ✭ 12 (-84.62%)
Mutual labels:  aws-lambda, amazon-web-services
aws-maven-plugin
Deploys resources to AWS using maven
Stars: ✭ 25 (-67.95%)
Mutual labels:  cloudformation, aws-s3
Docs
Rapid CloudFormation: Modular, production ready, open source.
Stars: ✭ 209 (+167.95%)
Mutual labels:  cloudformation, amazon-web-services
eks-deep-dive-2019
Amazon EKS Deep Dive 2019
Stars: ✭ 61 (-21.79%)
Mutual labels:  cloudformation, amazon-web-services
xilution-react-todomvc
An implementation of TodoMVC featuring AWS Serverless Application Model (SAM) and Xilution SaaS.
Stars: ✭ 24 (-69.23%)
Mutual labels:  aws-s3, aws-sam
serverless-discord-bot
A serverless Discord Bot template built for AWS Lambda based on Discord's slash commands and the slash-create library.
Stars: ✭ 37 (-52.56%)
Mutual labels:  cloudformation, aws-sam
aws-pdf-textract-pipeline
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+80.77%)
Mutual labels:  cloudformation, data-pipeline
Data-pipeline-project
Data pipeline project
Stars: ✭ 18 (-76.92%)
Mutual labels:  amazon-web-services, data-pipeline
Aws Toolkit Eclipse
AWS Toolkit for Eclipse – an open-source plugin for developing, deploying, and managing AWS applications.
Stars: ✭ 252 (+223.08%)
Mutual labels:  cloudformation, aws-lambda
aws-cfn-custom-resource-lambda-edge
🏗 AWS CloudFormation custom resource that allows deploying Lambda@Edge from any region
Stars: ✭ 19 (-75.64%)
Mutual labels:  cloudformation, amazon-web-services
monitoring-jump-start
Monitor AWS resources with ease
Stars: ✭ 67 (-14.1%)
Mutual labels:  cloudformation, amazon-web-services
AutoSpotting
Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.
Stars: ✭ 2,058 (+2538.46%)
Mutual labels:  cloudformation, amazon-web-services
serverless data pipeline example
Build and Deploy A Serverless Data Pipeline on AWS
Stars: ✭ 24 (-69.23%)
Mutual labels:  aws-lambda, aws-s3
whats-your-name
Sample app for AWS Serverless Repository - uses Amazon Rekognition to recognize person on the photo
Stars: ✭ 17 (-78.21%)
Mutual labels:  cloudformation, aws-lambda

Serverless Data Pipeline - Powered by AWS SAM

Serverless Data Pipeline build with Amazon API Gateway, AWS Lambda, Amazon Kinesis Firehose, Amazon S3, and Amazon Athena.

How to deploy the stack

See scripts/deploy.sh (customize your deployment bucket and stack name).

How to ingest new records via API

See scripts/track.sh (customize your stack name).

What kind of queries can I run on the dataset?

It depends on the data that you collect and on the virtual tables that you define on Athena and Glue.

The file queries.sql contains a few sample queries that you can run with the default schema (e.g. {"name": "John", "action": "charge", "value": 100}).

Resources list

This stack will create the following resources:

  • An API Gateway endpoint that you can use to track events by submitting any JSON data via the HTTP POST method
  • A Kinesis Firehose Delivery Stream that will buffer, optionally compress, and write each record into S3
  • A Lambda Function to process/manipulate/clean/skip records before they get written into S3
  • An S3 Bucket that will contain all the collected data
  • Three Athena Named Queries to get started quickly with serverless queries
  • An IAM Role and Policy for API Gateway
  • An IAM Role and Policy for Kinesis Firehose

Parameters

  • ApiStageName: The API Gateway Stage name (e.g. dev, prod, etc.)
  • FirehoseS3Prefix: The S3 Key prefix for Kinesis Firehose
  • FirehoseCompressionFormat: The compression format used by Kinesis Firehose
  • FirehoseBufferingInterval: How long Firehose will wait before writing a new batch into S3
  • FirehoseBufferingSize: The maximum batch size in MB
  • LambdaTimeout: Lambda's max execution time in seconds
  • LambdaMemorySize: Lambda's max memory configuration
  • AthenaDatabaseName: The Athena database name
  • AthenaTableName: The Athena table name

Outputs

  • TrackURL: The public URL to submit new records
  • BucketName: The bucket that will store your data
  • FunctionName: The Lambda Function that will process/validate records

Gotchas

  • The architecture is 100% serverless (no hourly costs, no servers to manage)
  • The API Gateway endpoint is publicly accessible (i.e. any browser or anonymous website user can potentially submit new records/events)
  • You can customize the template to enable encryption at rest for Kinesis Firehose
  • You can configure Kinesis Firehose's buffering (see Parameters above)
  • Athena's Named Queries cannot be updated (you need to create a new query with a different logical name)
  • Make sure the S3 bucket is empty when you delete the stack
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].