All Projects → chadgeary → nifi

chadgeary / nifi

Licence: other
Deploy a secured, clustered, auto-scaling NiFi service in AWS.

Programming Languages

HCL
1544 projects
shell
77523 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to nifi

Aws
A collection of bash shell scripts for automating various tasks with Amazon Web Services using the AWS CLI and jq.
Stars: ✭ 493 (+1232.43%)
Mutual labels:  ec2, s3, iam
terraform-modules
Terraform Modules by Peak
Stars: ✭ 16 (-56.76%)
Mutual labels:  s3, iam, iac
Awesome Aws
A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.
Stars: ✭ 9,895 (+26643.24%)
Mutual labels:  ec2, s3, iam
go-localstack
Go Wrapper for using localstack
Stars: ✭ 56 (+51.35%)
Mutual labels:  ec2, s3, iam
NiFi-Rule-engine-processor
Drools processor for Apache NiFi
Stars: ✭ 34 (-8.11%)
Mutual labels:  big-data, nifi, apache-nifi
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+113.51%)
Mutual labels:  big-data, pipeline
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (+121.62%)
Mutual labels:  big-data, s3
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+29605.41%)
Mutual labels:  big-data, zookeeper
Couchdb Docker
Semi-official Apache CouchDB Docker images
Stars: ✭ 194 (+424.32%)
Mutual labels:  big-data, apache
Ozone
Scalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+791.89%)
Mutual labels:  big-data, s3
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (+210.81%)
Mutual labels:  big-data, s3
masc
Microsoft's contributions for Spark with Apache Accumulo
Stars: ✭ 20 (-45.95%)
Mutual labels:  big-data, apache
Cloud Volume
Read and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (+70.27%)
Mutual labels:  big-data, s3
Cortx
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+1051.35%)
Mutual labels:  big-data, s3
Streamx
kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (+159.46%)
Mutual labels:  big-data, s3
Hive
Apache Hive
Stars: ✭ 4,031 (+10794.59%)
Mutual labels:  big-data, apache
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+305.41%)
Mutual labels:  big-data, apache
stork
Retrieve tokens from Vault for your EC2 instances.
Stars: ✭ 12 (-67.57%)
Mutual labels:  ec2, iam
couchdb-pkg
Apache CouchDB Packaging support files
Stars: ✭ 24 (-35.14%)
Mutual labels:  big-data, apache
runiac
Run IaC Anywhere With Ease
Stars: ✭ 18 (-51.35%)
Mutual labels:  pipeline, iac

Reference

NiFi secure+autoscaling cluster built automatically in AWS via Terraform+Ansible.

Options

Two designs are provided, either:

  • NiFi on EC2 with Zookeeper running within the same EC2 instances, or
  • NiFi on EC2 with Zookeeper running separately in ECS Fargate.
  • Side note - for considerations about using RHEL as opposed to Ubuntu as the base EC2 OS, see rhel.md.

Requirements

  • An AWS account
  • Follow Step-by-Step (compatible with Windows and Ubuntu)

Media

  • Video Guide - a bit outdated, but still useful. Follow along with me as I deploy using the step-by-step guide below.
  • Discord - for questions, ideas, comments, or troubleshooting assistance.

Step-by-Step Terraform Deployment

Windows Users install WSL (Windows Subsystem Linux)

#############################
## Windows Subsystem Linux ##
#############################
# Launch an ELEVATED Powershell prompt (right click -> Run as Administrator)

# Enable Windows Subsystem Linux
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart

# Reboot your Windows PC
shutdown /r /t 5

# After reboot, launch a REGULAR Powershell prompt (left click).
# Do NOT proceed with an ELEVATED Powershell prompt.

# Download the Ubuntu 2004 package from Microsoft
curl.exe -L -o ubuntu-2004.appx https://aka.ms/wsl-ubuntu-2004
 
# Rename the package
Rename-Item ubuntu-2004.appx ubuntu-2004.zip
 
# Expand the zip
Expand-Archive ubuntu-2004.zip ubuntu-2004
 
# Change to the zip directory
cd ubuntu-2004
 
# Execute the ubuntu 2004 installer
.\ubuntu2004.exe
 
# Create a username and password when prompted

Install Terraform, Git, and create an SSH key pair

#############################
##  Terraform + Git + SSH  ##
#############################
# Add terraform's apt key (enter previously created password at prompt)
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
 
# Add terraform's apt repository
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
 
# Install terraform and git
sudo apt-get update && sudo apt-get -y install terraform git
 
# Clone the project
git clone https://github.com/chadgeary/nifi

# Create SSH key pair (RETURN for defaults)
ssh-keygen

Install the AWS cli and create non-root AWS user. An AWS account is required to continue.

#############################
##          AWS            ##
#############################
# Open powershell and start WSL
wsl

# Change to home directory
cd ~

# Install python3 pip
sudo apt update && sudo DEBIAN_FRONTEND=noninteractive apt-get -q -y install python3-pip

# Install awscli via pip
pip3 install --user --upgrade awscli

# Create a non-root AWS user in the AWS web console with admin permissions
# This user must be the same user running terraform apply
# Create the user at the AWS Web Console under IAM -> Users -> Add user -> Check programmatic access and AWS Management console -> Attach existing policies -> AdministratorAccess -> copy Access key ID and Secret Access key
# See for more information: https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html#getting-started_create-admin-group-console

# Set admin user credentials
~/.local/bin/aws configure

# Validate configuration
~/.local/bin/aws sts get-caller-identity 

# For troubleshooting EC2 instances, use the SSM Session Manager plugin
curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/ubuntu_64bit/session-manager-plugin.deb" -o ~/session-manager-plugin.deb
sudo dpkg -i ~/session-manager-plugin.deb

# and set the SSH helper configuration for SSM Session Manager
tee -a ~/.ssh/config << EOM
host i-* mi-*
    ProxyCommand sh -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"
EOM

Customize the deployment - See variables section below

# Change to the project's aws directory in powershell
cd ~/nifi/zks-on-ec2/

# Open File Explorer in a separate window
# Navigate to ubuntu project directory - change \chad\ to your WSL username
%HOMEPATH%\ubuntu-2004\rootfs\home\chad\nifi\ubuntu

# Edit the nifi.tfvars file using notepad and save

Deploy

# In powershell's WSL window, change to the project's aws directory
cd ~/nifi/zks-on-ec2/

# Initialize terraform and apply the terraform state
terraform init
terraform apply -var-file="nifi.tfvars"

# If permissions errors appear, fix with the below command and re-run the terraform apply.
sudo chown $USER nifi.tfvars && chmod 600 nifi.tfvars

# Note the outputs from terraform after the apply completes

# Wait for the virtual machine to become ready (Ansible will setup the services for us). NiFi can take 15+ minutes to initialize.

Variables

# See nifi.tfvars

Post-Deployment

Review terraform output for quick links to State Manager (ansible) status, Load Balancer health, Cloudwatch logs, and the admin certificate in S3 which must be added to a browser for web access.

Maintenance

If modifying nifi.properties:

  1. Change the nifi.properties file in playbooks/zookeepers/ and playbooks/nodes/
  2. Re-run terraform apply -var-file="nifi.tfvars"
  3. Re-apply the SSM associations mentioned in terraform output

If re-sizing instances or otherwise modifying autoscaling group(s):

  1. Change the instance type in nifi.tfvars
  2. Re-run terraform apply -var-file="nifi.tfvars"
  3. Scale the node autoscaling group down, either all at once (min 0 / max 0) or incrementally to replace instances of the old size/AMI.
  4. Scale the zookeeper autoscaling groups down, always leave at least one zookeeper running, preferably two - e.g.:
  • If zk1, zk2, and zk3 are running, scale down zk3. Once complete, scale zk3 back up.
  • Repeat for zk2, then zk3.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].