Azure-Samples / Functions Python Data Cleaning Pipeline
Licence: mit
Using Python for Azure Functions to clean and preprocess data using pandas through a Blob and Event grid messaging pipeline
Stars: ✭ 15
Programming Languages
python
139335 projects - #7 most used programming language
page_type: sample description: "This sample demonstrates a data cleaning pipeline with Azure Functions written in Python." languages:
- python products:
- azure-functions
- azure-storage
Data Cleaning Pipeline
This sample demonstrates a data cleaning pipeline with Azure Functions written in Python triggered off a HTTP event from Event Grid to perform some pandas cleaning and reconciliation of CSV files. Using this sample we demonstrate a real use case where this is used to perform cleaning tasks.
Getting Started
Deploy to Azure
Prerequisites
- Install Python 3.6+
- Install Functions Core Tools
- Install Docker
- Note: If run on Windows, use Ubuntu WSL to run deploy script
Steps
-
Deploy through Azure CLI
- Open AZ CLI and run
az group create -l [region] -n [resourceGroupName]
to create a resource group in your Azure subscription (i.e. [region] could be westus2, eastus, etc.) - Run
az group deployment create --name [deploymentName] --resource-group [resourceGroupName] --template-file azuredeploy.json
- Open AZ CLI and run
-
Deploy Function App
- Create/Activate virtual environment
- Run
func azure functionapp publish [functionAppName] --build-native-deps
Test
- Upload s1.csv file into c1raw container
- Watch event grid trigger the CleanTrigger1 function and produce a "cleaned_s1_raw.csv"
- Repeat the same for s2.csv into c2raw container
- Now send the following HTTP request to the Reconcile function to merge
{
"file_1_url" : "https://{storagename}.blob.core.windows.net/c1raw/cleaned_s1_raw.csv",
"file_2_url" : "https://{storagename}.blob.core.windows.net/c2raw/cleaned_s2_raw.csv",
"batchId" : "1122"
}
- Watch it produce final.csv file
- Can use a logic app to call the reconcile method with batch id's
References
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].