All Projects → ghdna → Athena Express

ghdna / Athena Express

Licence: mit
athena-express makes it easier to execute SQL queries on Amazon Athena by chaining together a bunch of methods in the AWS SDK. This allows you to execute SQL queries AND fetch JSON results in the same synchronous call - well suited for web applications.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Athena Express

Dialetus Service
API to Informal dictionary for the idiomatic expressions that each Brazilian region It has
Stars: ✭ 202 (+81.98%)
Mutual labels:  aws, aws-lambda, json
Ssm Cache Python
AWS System Manager Parameter Store caching client for Python
Stars: ✭ 177 (+59.46%)
Mutual labels:  aws, aws-lambda, amazon-web-services
Aws Appsync Rds Aurora Sample
An AWS AppSync Serverless resolver for the Amazon Aurora relational database.
Stars: ✭ 122 (+9.91%)
Mutual labels:  aws, aws-lambda, amazon-web-services
Serverless Es Logs
A Serverless plugin to transport logs to ElasticSearch
Stars: ✭ 51 (-54.05%)
Mutual labels:  aws, aws-lambda, amazon-web-services
Amazon Rekognition Video Analyzer
A working prototype for capturing frames off of a live MJPEG video stream, identifying objects in near real-time using deep learning, and triggering actions based on an objects watch list.
Stars: ✭ 309 (+178.38%)
Mutual labels:  aws, aws-lambda, amazon-web-services
webpack-aws-lambda
AWS Lambda that runs webpack and output the bundle.js file
Stars: ✭ 12 (-89.19%)
Mutual labels:  aws-lambda, node-js, amazon-web-services
Autospotting
Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.
Stars: ✭ 2,014 (+1714.41%)
Mutual labels:  aws, aws-lambda, amazon-web-services
Aws Auto Cleanup
Open-source application to programmatically clean your AWS resources based on a whitelist and time to live (TTL) settings
Stars: ✭ 276 (+148.65%)
Mutual labels:  aws, aws-lambda, amazon-web-services
Aws Auto Terminate Idle Emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-81.08%)
Mutual labels:  aws, aws-lambda, amazon-web-services
Perun
A command-line validation tool for AWS Cloud Formation that allows to conquer the cloud faster!
Stars: ✭ 82 (-26.13%)
Mutual labels:  aws, amazon-web-services, json
Lambcycle
🐑🛵 A declarative lambda middleware with life cycle hooks 🐑🛵
Stars: ✭ 88 (-20.72%)
Mutual labels:  aws, aws-lambda
Aws Serverless Airline Booking
Airline Booking is a sample web application that provides Flight Search, Flight Payment, Flight Booking and Loyalty points including end-to-end testing, GraphQL and CI/CD. This web application was the theme of Build on Serverless Season 2 on AWS Twitch running from April 24th until end of August in 2019.
Stars: ✭ 1,290 (+1062.16%)
Mutual labels:  aws, aws-lambda
Lambda Monitoring
Logging and metrics libraries for AWS Lambda
Stars: ✭ 85 (-23.42%)
Mutual labels:  aws, aws-lambda
Python Lambdarest
Flask like web framework for AWS Lambda
Stars: ✭ 84 (-24.32%)
Mutual labels:  aws, aws-lambda
Aws Sdk Arduino
An experimental SDK for working with AWS Services on Arduino-compatible devices. Currently has support for DynamoDB and Kinesis.
Stars: ✭ 90 (-18.92%)
Mutual labels:  aws, amazon-web-services
Cloudquery
cloudquery transforms your cloud infrastructure into SQL or Graph database for easy monitoring, governance and security.
Stars: ✭ 1,300 (+1071.17%)
Mutual labels:  aws, sql
Content Lambda Boto3
Automating AWS with Lambda, Python, and Boto3
Stars: ✭ 91 (-18.02%)
Mutual labels:  aws, aws-lambda
Lambroll
lambroll is a minimal deployment tool for AWS Lambda.
Stars: ✭ 97 (-12.61%)
Mutual labels:  aws, aws-lambda
Node Athena
a nodejs simple aws athena client
Stars: ✭ 97 (-12.61%)
Mutual labels:  aws, aws-lambda
Dynamodb Stream Elasticsearch
Missing blueprint for AWS Lambda. Reads stream from AWS DynamoDB and writes it to ElasticSearch.
Stars: ✭ 83 (-25.23%)
Mutual labels:  aws, aws-lambda

Athena-Express: Simplifying SQL queries on Amazon Athena

NPM

Build Status Package Quality Code Climate Coverage Status Downloads

As published on the official AWS Partner Network Blog

Synopsis

Athena-Express can simplify executing SQL queries in Amazon Athena AND fetching cleaned-up JSON results in the same synchronous call - well suited for web applications.

Example:
Athena-Express Example

Amazon Athena Background

Amazon Athena, launched at AWS re:Invent 2016, made it easier to analyze data in Amazon S3 using standard SQL. Under the covers, it uses Presto, which is an opensource SQL engine developed by Facebook in 2012 to query their 300 Petabyte data warehouse. It's incredibly powerful!

Amazon Athena combines the strength of Presto with serverless & self-managed capabilities of AWS. By simply pointing Athena to your data in Amazon S3, one could start querying using standard SQL. Most results are delivered within seconds and there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

How athena-express simplifies using Amazon Athena

athena-express simplifies integrating Amazon Athena with any Node.JS application - running as a standalone application or as a Lambda function. As a wrapper on AWS SDK, Athena-Express bundles the following steps listed on the official AWS Documentation:

  1. Initiates a query execution
  2. Keeps checking until the query has finished executing
  3. Fetches the results of the query execution from Amazon S3

And as added features

  1. Formats the results into a clean, user-friendly JSON array
  2. Handles specific Athena errors by recursively retrying for ThrottlingException, NetworkingError, and TooManyRequestsException
  3. Provides optional helpful stats including cost per query in USD

Integrating with Amazon Athena without athena-express would require you to identify the appropriate API methods in the AWS SDK, stich them together sequentially, and then build out an error handling & retry mechanism for each of those methods.

athena-express can help you save time & effort in setting up this integration so that you can focus on core application development.

How is athena-express being used?

The most common use case is integrating a web front-end with Amazon Athena using athena-express as a backend. This backend could be any Node.JS application including AWS Lambda.

Here is an example using AWS Lambda: athena-express architecture

This architecture has a web front-end that invokes an API endpoint hosted on Amazon API Gateway by passing a query request. The query request can be as simple as SELECT * FROM movies LIMIT 3

This API Gateway then triggers a Lambda function that has the athena-express library imported.

Setup

Prerequisites

  • You will need either an IAM Role (if you're running athena-express on AWS Lambda or AWS EC2) OR an IAM User with accessKeyId and secretAccessKey (if you're running athena-express on a standalone NodeJS application)
  • This IAM role/user must have AmazonAthenaFullAccess and AmazonS3FullAccess policies attached
    • Note: As an alternative to granting AmazonS3FullAccess you could granularize and limit write access to a specific bucket. Just specify this bucket name during athena-express initialization

Configuration

  • athena-express needs an AWS SDK object created with relevant permissions as mentioned in the prerequisites above.
  • This AWS object is passed within the constructor so that it can invoke Amazon Athena SDK. It's up to you how you create this aws object. Here are 4 options:
    1. Create an aws object by explicitly passing in the accessKeyId and secretAccessKey generated in prerequisites
     const aws = require("aws-sdk");
     const awsCredentials = {
         region: "YOUR_AWS_REGION",
     	accessKeyId: "YOUR_AWS_ACCESS_KEY_ID",
     	secretAccessKey: "YOUR_AWS_SECRET_ACCESS_KEY"
     };
     aws.config.update(awsCredentials);
    
     const athenaExpressConfig = { aws }; //configuring athena-express with aws sdk object
     const athenaExpress = new AthenaExpress(athenaExpressConfig);
    
    1. OR if using Lambda, provide an IAM execution role with AmazonAthenaFullAccess and AmazonS3FullAccess policies attached
     const aws = require("aws-sdk");
     const athenaExpressConfig = { aws }; //configuring athena-express with aws sdk object
     const athenaExpress = new AthenaExpress(athenaExpressConfig);
    
    1. OR Use instance profiles when using EC2s
    2. OR Use environment variables

Option 1: Simple configuration

  • Simple configuration requires only the AWS SDK object to be passed as a parameter to initialize athena-express
  • Default values are assumed for all parameter options and athena-express creates a new S3 bucket in your AWS account for Amazon Athena to store the query results in.
const aws = require("aws-sdk");

const athenaExpressConfig = { aws }; //simple configuration with just an aws sdk object

//Initializing athena-express
const athenaExpress = new AthenaExpress(athenaExpressConfig);

Option 2: Advance configuration

  • Besides the aws sdk paramater that is required, you can add any of the following optional parameters below
const aws = require("aws-sdk");

//Advance configuration with all options
const athenaExpressConfig = {
	aws, /* required */
	s3: "STRING_VALUE", /* optional format 's3://bucketname'*/
	db: "STRING_VALUE", /* optional */
	workgroup: "STRING_VALUE", /* optional */
	formatJson: BOOLEAN, /* optional default=true */
	retry: Integer, /* optional default=200 */
	getStats: BOOLEAN, /* optional default=false */
	ignoreEmpty: BOOLEAN, /* optional default=true */
	encryption: OBJECT, /* optional */
	skipResults: BOOLEAN, /* optional default=false */
	waitForResults: BOOLEAN /* optional default=true */
};

//Initializing AthenaExpress
const athenaExpress = new AthenaExpress(athenaExpressConfig);
Advance config Parameters:
Parameter Format Default Value Description
s3 string athena-express creates a new bucket for you The location in Amazon S3 where your query results are stored, such as s3://path/to/query/bucket/.
athena-express will create a new bucket for you if you don't provide a value for this param but sometimes that could cause an issue if you had recently deleted a bucket with the same name. (something to do with cache). When that happens, just specify you own bucket name. Alternatively you can also use workgroup.
db string default Athena database name that the SQL queries should be executed in. When a db name is specified in the config, you can execute SQL queries without needing to explicitly mention DB name. e.g.
athenaExpress.query("SELECT * FROM movies LIMIT 3")
as opposed to
athenaExpress.query({sql: "SELECT * FROM movies LIMIT 3", db: "moviedb"});
workgroup string primary The name of the workgroup in which the query is being started.
Note: athena-express cannot create workgroups (as it includes a lot of configuration) so you will need to create one beforehand IFF you intend to use a non default workgroup. Learn More here. Setting up Workgroups
formatJson boolean true Override as false if you rather get the raw unformatted output from S3.
retry integer 200 milliseconds Wait interval between re-checking if the specific Athena query has finished executing
getStats boolean false Set getStats: true to capture additional metadata for your query, such as:
  • EngineExecutionTimeInMillis
  • DataScannedInBytes
  • TotalExecutionTimeInMillis
  • QueryQueueTimeInMillis
  • QueryPlanningTimeInMillis
  • ServiceProcessingTimeInMillis
  • DataScannedInMB
  • QueryCostInUSD
  • Count
  • QueryExecutionId
  • S3Location
ignoreEmpty boolean true Ignore fields with empty values from the final JSON response.
encryption object -- Encryption configuation example usage:
{ EncryptionOption: "SSE_KMS", KmsKey: process.env.kmskey}
skipResults boolean false For a unique requirement where a user may only want to execute the query in Athena and store the results in S3 but NOT fetch those results in that moment.
Perhaps to be retrieved later or simply stored in S3 for auditing/logging purposes.
To retrieve the results, you can simply pass the QueryExecutionId into athena-express as such: athenaExpress.query("ab493e66-138f-4b78-a187-51f43fd5f0eb")
waitForResults boolean true When low latency is the objective, you can skip waiting for a query to be completed in Athena. Returns QueryExecutionId, which you can pass into athena-express later as such: athenaExpress.query("ab493e66-138f-4b78-a187-51f43fd5f0eb")
Not to be confused with skipResults, which actually waits for the query to be completed before returning QueryExecutionId and other stats. waitForResults is meant for fire-and-forget kind of operations.

Usage: Invoking athena-express

Using Promises to query Athena:
/*Option 1: object notation*/ 
let myQuery = {
	sql: "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3" /* required */,
	db: "sampledb" /* optional. You could specify a database here or in the advance configuration option mentioned above*/
};

/*OR Option 2: string notation*/ 
let myQuery = "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3"

athenaExpress
	.query(myQuery)
	.then(results => {
		console.log(results);
	})
	.catch(error => {
		console.log(error);
	});
Using Async/Await to query Athena:
(async () => {
/*Option 1: object notation*/ 
	let myQuery = {
		sql: "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3" /* required */,
		db: "sampledb" /* optional. You could specify a database here or in the configuration constructor*/
	};
    
/*OR Option 2: string notation*/ 
let myQuery = "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3"

	try {
		let results = await athenaExpress.query(myQuery);
		console.log(results);
	} catch (error) {
		console.log(error);
	}
})();
Using QueryExecutionID (optional):

Applicable only if you already have the QueryExecutionID from an earlier execution. See skipResults or waitForResults in the advance config params above to learn more.

const myQueryExecutionId = "bf6ffb5f-6c36-4a66-8735-3be6275960ae";
let results = await athenaExpress.query(myQueryExecutionId);
console.log(results);

Full Examples

Using a standalone NodeJS application
"use strict";

const AthenaExpress = require("athena-express"),
	aws = require("aws-sdk"),
	awsCredentials = {
		region: "YOUR_AWS_REGION",
		accessKeyId: "YOUR_AWS_ACCESS_KEY_ID",
		secretAccessKey: "YOUR_AWS_SECRET_ACCESS_KEY"
	};

aws.config.update(awsCredentials);

const athenaExpressConfig = {
	aws,
	s3: "s3://my-bucket-for-storing-athena-results-us-east-1",
	getStats: true
};

const athenaExpress = new AthenaExpress(athenaExpressConfig);

//Invoking a query on Amazon Athena
(async () => {
	let myQuery = {
		sql: "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3",
		db: "sampledb"
	};

	try {
		let results = await athenaExpress.query(myQuery);
		console.log(results);
	} catch (error) {
		console.log(error);
	}
})();
Using AWS Lambda
"use strict";

const AthenaExpress = require("athena-express"),
	aws = require("aws-sdk");

	/* AWS Credentials are not required here 
    /* Make sure the IAM Execution Role used by this Lambda 
    /* has the necessary permission to execute Athena queries 
    /* and store the result in Amazon S3 bucket
    /* See configuration section above under Setup for more info */

const athenaExpressConfig = {
	aws,
	db: "sampledb",
	getStats: true
};
const athenaExpress = new AthenaExpress(athenaExpressConfig);

exports.handler = async event => {
	const sqlQuery = "SELECT elb_name, request_port, request_ip FROM elb_logs LIMIT 3";

	try {
		let results = await athenaExpress.query(sqlQuery);
		return results;
	} catch (error) {
		return error;
	}
};
Results:
Athena-Express result

More Examples

UTILITY queries - Added in v3.0
Show Tables (single column result)
const results = await athenaExpress.query("SHOW TABLES");
console.log(results);

//Output:
{ Items:
  [ { row: 'default' },
    { row: 'sampledb' } ] }
Describe Table (dual column result)
const results = await athenaExpress.query("DESCRIBE elb_logs");
console.log(results);

//Output:
{ Items:
  [ { request_timestamp: 'string' },
    { elb_name: 'string' },
    { request_ip: 'string' },
    { request_port: 'int' },
    { backend_ip: 'string' },
    { backend_port: 'int' },
    { request_processing_time: 'double' },
    { backend_processing_time: 'double' },
    { client_response_time: 'double' },
    { elb_response_code: 'string' },
    { backend_response_code: 'string' },
    { received_bytes: 'bigint' },
    { sent_bytes: 'bigint' },
    { request_verb: 'string' },
    { url: 'string' },
    { protocol: 'string' },
    { user_agent: 'string' },
    { ssl_cipher: 'string' },
    { ssl_protocol: 'string' } ] }

Contributors

Gary Arora

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].