All Projects → Azure-Samples → Serverless-File-Validation

Azure-Samples / Serverless-File-Validation

Licence: MIT license
Using Azure Serverless products to perform file validation on a per-batch basis

Programming Languages

C#
18002 projects
python
139335 projects - #7 most used programming language
ruby
36898 projects - #4 most used programming language
powershell
5483 projects
shell
77523 projects

Projects that are alternatives of or similar to Serverless-File-Validation

durable-functions-producer-consumer
Uses Durable Functions' fan out pattern to load N messages across M sessions into various Azure Storage/Messaging services. Includes the ability to consume the messages with another Azure Function & load timing data in to Event Hubs for ingestion in to analytics services like Azure Data Explorer
Stars: ✭ 31 (+47.62%)
Mutual labels:  azure-functions, azure-storage, durable-functions
Serverless Microservices Reference Architecture
This reference architecture walks you through the decision-making process involved in designing, developing, and delivering a serverless application using a microservices architecture through hands-on instructions for configuring and deploying all of the architecture's components along the way. The goal is to provide practical hands-on experience in working with several Azure services and the technologies that effectively use them in a cohesive and unified way to build a serverless-based microservices architecture.
Stars: ✭ 270 (+1185.71%)
Mutual labels:  azure-functions, azure-storage
lets-encrypt-azure
Azure function based Let's Encrypt automation for Azure CDN & app services
Stars: ✭ 60 (+185.71%)
Mutual labels:  azure-functions, blob-storage
azure-developer-college
Repository for the Azure Developer College Workshop
Stars: ✭ 16 (-23.81%)
Mutual labels:  azure-functions, azure-storage
movie-db-java-on-azure
Sample movie database app built using Java on Azure
Stars: ✭ 28 (+33.33%)
Mutual labels:  azure-functions, azure-storage
DurableDungeon
A game designed to teach and learn serverless durable functions in C#
Stars: ✭ 55 (+161.9%)
Mutual labels:  azure-functions, durable-functions
Azure For Developers Workshop
The Azure cloud is huge and the vast service catalog may appear daunting at first, but it doesn’t have to be!
Stars: ✭ 38 (+80.95%)
Mutual labels:  azure-functions, azure-storage
durablefunctions-mapreduce-dotnet
An implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day
Stars: ✭ 20 (-4.76%)
Mutual labels:  azure-functions, durable-functions
Serverless Url Shortener
Azure Function for a URL shortening website. Uses serverless functions, Azure Table Storage and Application Insights.
Stars: ✭ 113 (+438.1%)
Mutual labels:  azure-functions, azure-storage
Saga Orchestration Serverless
An orchestration-based saga implementation reference in a serverless architecture
Stars: ✭ 136 (+547.62%)
Mutual labels:  orchestration, azure-functions
media-services-v3-dotnet-core-functions-integration
The project includes several folders of sample Azure Functions for use with Azure Media Services v3 that show workflows related to ingesting content, encoding, publishing or live stream management.
Stars: ✭ 41 (+95.24%)
Mutual labels:  azure-functions, azure-storage
AzUnzipEverything
A simple Azure Function to Unzip files from a blob storage to another one
Stars: ✭ 24 (+14.29%)
Mutual labels:  azure-functions, blob-storage
fl-image-resize
A library to quickly resize images with Azure Functions
Stars: ✭ 15 (-28.57%)
Mutual labels:  azure-functions, azure-storage
BlobHelper
BlobHelper is a common, consistent storage interface for Microsoft Azure, Amazon S3, Komodo, Kvpbase, and local filesystem written in C#.
Stars: ✭ 23 (+9.52%)
Mutual labels:  azure-storage, blob-storage
nodejs-postgresql-azure
Repositório responsável pela série de artigos sobre Node.js com PostgreSQL
Stars: ✭ 70 (+233.33%)
Mutual labels:  azure-functions, azure-storage
Azure Readiness Checklist
This checklist is your guide to the best practices for deploying secure, scalable, and highly available infrastructure in Azure. Before you go live, go through each item, and make sure you haven't missed anything important!
Stars: ✭ 457 (+2076.19%)
Mutual labels:  azure-functions, azure-storage
Developing Solutions Azure Exam
This repository contains resources for the Exam AZ-203: Developing Solutions for Microsoft Azure. You can find direct links to resources and and practice resources to test yourself ☁️🎓📚
Stars: ✭ 59 (+180.95%)
Mutual labels:  azure-functions, azure-storage
QueueBatch
WebJobs/Azure Functions trigger providing batches of Azure Storage Queues messages directly to your function
Stars: ✭ 40 (+90.48%)
Mutual labels:  azure-functions, azure-storage
NLog.Extensions.AzureStorage
NLog Target for Azure Storage. Uses NLog batch write to optimize writes to Storage.
Stars: ✭ 27 (+28.57%)
Mutual labels:  azure-storage, blob-storage
az-func-as-a-graph
Visualizes your Azure Functions project in form of a graph
Stars: ✭ 40 (+90.48%)
Mutual labels:  azure-functions
page_type languages products description
sample
csharp
python
azure
azure-blob-storage
azure-event-grid
azure-functions
azure-logic-apps
azure-storage
azure-table-storage
dotnet
dotnet-core
dotnet-standard
This sample outlines ways to accomplish validation across files received in a batch format using Azure Serverless technologies.

File processing and validation using Azure Functions, Logic Apps, and Durable Functions

This sample outlines multiple ways to accomplish the following set of requirements using Azure Serverless technologies. One way uses the "traditional" serverless approach, another Logic Apps, and another Azure Functions' Durable Functions feature.

Problem statement

Given a set of customers, assume each customer uploads data to our backend for historical record keeping and analysis. This data arrives in the form of a set of .csv files with each file containing different data. Think of them almost as SQL Table dumps in CSV format.

When the customer uploads the files, we have two primary objectives:

  1. Ensure that all the files required for the customer are present for a particular "set" (aka "batch") of data
  2. Only when we have all the files for a set, continue on to validate the structure of each file ensuring a handful of requirements:
    • Each file must be UTF-8 encoded
    • Depending on the file (type1, type2, etc), ensure the correct # of columns are present in the CSV file

Setup

To accomplish this sample, you'll need to set up a few things:

  1. Azure General Purpose Storage
    • For the Functions SDK to store its dashboard info, and the Durable Functions to store their state data
  2. Azure Blob Storage
    • For the customer files to be uploaded in to
  3. Azure Event Grid (with Storage Events)
  4. ngrok to enable local Azure Function triggering from Event Grid (see this blog post for more)
  5. Visual Studio 2019
  6. Azure Storage Explorer (makes testing easier)

For the Python version of this sample (folder AzureFunctions.Python), follow the instructions in its dedicated readme.

Execution

Pull down the code.

Copy sample.local.settings.json in the AzureFunctions.v3 project to a new file called local.settings.json.

This file will be used across the functions, durable or otherwise.

Next, run any of the Function apps in this solution. You can use the v1 (.Net Framework) or the v3 (.Net Core) version, it's only needed for Event Grid validation. With the function running, add an Event Grid Subscription to the Blob Storage account (from step 2), pointing to the ngrok-piped endpoint you created in step 4. The URL should look something like this:

  • Normal Functions: https://b3252cc3.ngrok.io/api/EnsureAllFiles
  • Durable Functions: https://b3252cc3.ngrok.io/api/Orchestrator

An Event Grid subscription set up to target an ngrok endpoint

Upon saving this subscription, you'll see your locally-running Function get hit with a request and return HTTP OK, then the Subscription will go green in Azure and you're set.

Now, open Azure Storage Explorer and connect to the Blob Storage Account you've created. In here, create a container named cust1. Inside the container, create a new folder called inbound.

Take one of the .csv files from the sampledata folder of this repo, and drop it in to the inbound folder.

You'll see the endpoint you defined as your Event Grid webhook subscription get hit.

Durable Function Execution

  1. Determine the "batch prefix" of the file that was dropped. This consists of the customer name (cust1), and a datetime stamp in the format YYYYMMDD_HHMM, making the batch prefix for the first batch in sampledata defined as cust1_20171010_1112
  2. Check to see if a sub-orchestration for this batch already exists.
  3. If not, spin one up and pass along the Event Grid data that triggered this execution
  4. If so, use RaiseEvent to pass the filename along to the instance.

In the EnsureAllFiles sub-orchestration, we look up what files we need for this customer (cust1) and check to see which files have come through thus far. As long as we do not have the files we need, we loop within the orchestration. Each time waiting for an external newfile event to be thrown to let us know a new file has come through and should be processed.

When we find we have all the files that constitute a "batch" for the customer, we call the ValidateFileSet activity function to process each file in the set and validate the structure of them according to our rules.

When Validation completes successfully, all files from the batch are moved to a valid-set subfolder in the blob storage container. If validation fails (try removing a column in one of the lines in one of the files), the whole set gets moved to invalid-set

Resetting Durable Execution

Because of the persistent behavior of state for Durable Functions, if you need to reset the execution because something goes wrong it's not as simple as just re-running the function. To do this properly, you must:

  • Delete the DurableFunctionsHubHistory Table in the "General Purpose" Storage Account you created in Step 1 above.
  • Delete any files you uploaded to the /inbound directory of the blob storage container triggering the Functions.

Note: after doing these steps you'll have to wait a minute or so before running either of the Durable Function implementations as the storage table creation will error with 409 CONFLICT while deletion takes place.

"Classic" Function execution

  1. Determine the "batch prefix" of the file that was dropped. This consists of the customer name (cust1), and a datetime stamp in the format YYYYMMDD_HHMM, making the batch prefix for the first batch in sampledata defined as cust1_20171010_1112
  2. Check to see if we have all necessary files in blob storage with this prefix.
  3. If we do, check to see if there's a lock entry in the FileProcessingLocks table of the General Purpose Storage Account containing this prefix. If so, bail. If not, create one, then call the ValidateFunctionUrl endpoint with the batch prefix as payload.
  4. The Validate function gets the request & checks to see if the lock is marked as 'in progress'. If so, bail. If not, mark it as such and continue validating the files in the Blob Storage account which match the prefix passed in.

When Validation completes successfully, all files from the batch are moved to a valid-set subfolder in the blob storage container. If validation fails (try removing a column in one of the lines in one of the files), the whole set gets moved to invalid-set

Resetting Classic Execution

  • Delete the FileProcessingLocks table from the General Purpose Storage Account.
  • Delete any files you uploaded to the /inbound directory of the blob storage container triggering the Functions.

Note: after doing these steps you'll have to wait a minute or so before running either of the Durable Function implementations as the storage table creation will error with 409 CONFLICT while deletion takes place.

Logic Apps

While not identically behaved, this repo also contains deployment scripts for two Logic App instances which perform roughly the same flow.

Batch Processor

This LA gets Storage Events from event grid, pulls off the full prefix of the file (also containing the URL), and sends this on to...

Batch Receiver

This receives events from the Processor and waits for 3 containing the same prefix to arrive before sending the batch on to the next step (you can change this to be whatever you want after deployment)

Known issues

Durable Functions

  • If you drop all the files in at once, there exists a race condition when the events fired from Event Grid hit the top-level Orchestrator endpoint; it doesn't execute StartNewAsync fast enough and instead of one instance per batch, you'll end up with multiple instances for the same prefix (even though we desire one instance per, acting like a singleton).

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].