All Projects → Azure-Samples → Functions Python Data Cleaning Pipeline

Azure-Samples / Functions Python Data Cleaning Pipeline

Licence: mit
Using Python for Azure Functions to clean and preprocess data using pandas through a Blob and Event grid messaging pipeline

Programming Languages

python
139335 projects - #7 most used programming language

page_type: sample description: "This sample demonstrates a data cleaning pipeline with Azure Functions written in Python." languages:

  • python products:
  • azure-functions
  • azure-storage

Data Cleaning Pipeline

This sample demonstrates a data cleaning pipeline with Azure Functions written in Python triggered off a HTTP event from Event Grid to perform some pandas cleaning and reconciliation of CSV files. Using this sample we demonstrate a real use case where this is used to perform cleaning tasks.

Getting Started

Deploy to Azure

Prerequisites

  • Install Python 3.6+
  • Install Functions Core Tools
  • Install Docker
  • Note: If run on Windows, use Ubuntu WSL to run deploy script

Steps

  • Deploy through Azure CLI

    • Open AZ CLI and run az group create -l [region] -n [resourceGroupName] to create a resource group in your Azure subscription (i.e. [region] could be westus2, eastus, etc.)
    • Run az group deployment create --name [deploymentName] --resource-group [resourceGroupName] --template-file azuredeploy.json
  • Deploy Function App

Test

  • Upload s1.csv file into c1raw container
  • Watch event grid trigger the CleanTrigger1 function and produce a "cleaned_s1_raw.csv"
  • Repeat the same for s2.csv into c2raw container
  • Now send the following HTTP request to the Reconcile function to merge
{
	"file_1_url" : "https://{storagename}.blob.core.windows.net/c1raw/cleaned_s1_raw.csv",
	"file_2_url" : "https://{storagename}.blob.core.windows.net/c2raw/cleaned_s2_raw.csv",
	"batchId" : "1122"
}

  • Watch it produce final.csv file
  • Can use a logic app to call the reconcile method with batch id's

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].