All Projects → Scout24 → emr-autoscaling

Scout24 / emr-autoscaling

Licence: Apache-2.0 license
No description or website provided.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to emr-autoscaling

data-product-streaming
Template to deploy a Data Product for data stream processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Stars: ✭ 32 (+100%)
Mutual labels:  data-platform
data-landing-zone
Template to deploy a single Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Landing Zone is a logical construct and a unit of scale in the architecture that enables data retention and execution of data workloads for generating insights and value with data.
Stars: ✭ 136 (+750%)
Mutual labels:  data-platform
data-product-analytics
Template to deploy a Data Product for analytics and data science use-cases into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to create insights and products for external users.
Stars: ✭ 62 (+287.5%)
Mutual labels:  data-platform
data-management-zone
Template to deploy the Data Management Zone of Cloud Scale Analytics (former Enterprise-Scale Analytics). The Data Management Zone provides data governance and management capabilities for the data platform of an organization.
Stars: ✭ 142 (+787.5%)
Mutual labels:  data-platform
data-product-batch
Template to deploy a Data Product for Batch data processing into a Data Landing Zone of the Data Management & Analytics Scenario (former Enterprise-Scale Analytics). The Data Product template can be used by cross-functional teams to ingest, provide and create new data assets within the platform.
Stars: ✭ 27 (+68.75%)
Mutual labels:  data-platform
atrocore
AtroCore is an open-source Data Platform, Data Management and Master Data Management (MDM) software, which can be used to quickly create any business application.
Stars: ✭ 38 (+137.5%)
Mutual labels:  data-platform
quitsies
A persisted drop-in replacement for Memcached, respecting the rules of quitsies.
Stars: ✭ 16 (+0%)
Mutual labels:  data-platform
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+3725%)
Mutual labels:  data-platform

Description

Scale your AWS Elastic MapReduce Cluster by automatically adding or removing Task Instances. Every 5 minutes an AWS Cloudwatch Rule triggers an AWS Lambda Function which checks AWS Cloudwatch Metrics to decide whether to scale up or down.

Scaling Rules

Scaling is only initiated when no scaling is currently in progress. In addition downscaling is not performed during office hours. Apart from that the following rules are used to decide whether to scale or not.

  • scaling up
    • at least 1 YARN container has been pending during the past 5 minutes
    • at least 1 task instance group is not running its maximum of configured instances
  • scaling down
    • average memory consumption by YARN is below a given threshold for the last hour
    • at least 1 task instance group is running above its minimum of configured instances
    • the current time is not in office hours on a week day

Instance Group Selection

Currently only task instance groups are eligible for scaling and only those with a spot bid price. If the cluster has more than one task instance group it sorts all groups by their bid price in descending order and then selects the first eligible group for scaling.

Build

This project is built using Make. To setup your build environment simply do the following:

make setup-environment

To perform tests, execute

make test

To perform a build, i.e. execute unit tests and package the zip file for AWS Lambda:

make package

If you are getting an error in build due AWS region like this:

autoscaling/venv/lib/python3.9/site-packages/botocore/regions.py", line 148, in _endpoint_for_partition
    raise NoRegionError()
botocore.exceptions.NoRegionError: You must specify a region.

Please, configure a default region for your AWS CLI using the command aws configure and follow the steps. Currently, the default region is eu-west-1.

Deployment to AWS

Committing changes triggers a Jenkins build

Link to the Jenkins build

Parameters

The Function takes 2 sets of parameters:

Mandatory Parameters

  • emrJobFlowId
    • ID of the EMR cluster which is to be scaled

Optional Parameters

  • emrDownScalingMemoryAllocationThreshold
    • when the average memory consumption by YARN drops below this value a downscaling is triggered
    • floating point in range [0.0, 1.0]
    • defaults to 0.6
  • emrScalingMinInstances
    • minimum number of instances that has to be kept for each task instance group
    • integer >= 0
    • defaults to 0
  • emrScalingMaxInstances
    • maximum number of instances that is allowed for each task instance group
    • integer >= 0
    • defaults to 20
  • officeHoursStart
    • begin of office hour range during which no downscaling will be initiated
    • integer between 0 and 24
    • defaults to 8
  • officeHoursEnd
    • end of office hour range during which no downscaling will be initiated
    • integer between 0 and 24
    • defaults to 18
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].