Appending data to Azure Storage Blob concurrently

Append Blob is a fairly common feature of Azure Storage, which makes all kinds of logs or data aggregations a piece of cake. While the whole concept is super-easy(at least from the SDK client point of view), using it in a real scenario could give you headaches. Why you may ask? Well, the nature of appends is not so obvious and at the first glance our perceptions could be deceived.

A quick look at the documentation reveals how limited are our options to append data to a blob concurrently

Four Horsemen of the Parallelypse

There're 5 methods in total available on CloudAppendBlob type:

  • AppendBlob
  • AppendFromStream
  • AppendText
  • AppendFromByteArray
  • AppendFromFile

As you can see, I grouped them a little so we have two categories:

  • methods for multiple writers scenario
  • methods for a single writer scenario

Now the question is - how do we know, that one method is designed for this specific scenario? Well, the easiest option is to read the documentation. This is a description of AppendText taken from the API reference:

Appends a string of text to an append blob.
This API should be used strictly in a single writer scenario because the API internally uses the append-offset conditional header to avoid duplicate blocks which does not work in a multiple writer scenario.

So what happens if you try to use such method in a e.g. Azure Function?

The remote server returned an error: (412) The append position condition specified was not met 

This is not the best description, isn't it?

Gimme a code snippet!

The easiest way to fix this issue is to transform a string to a stream:

/
using (var ms = new MemoryStream())
using(var sw = new StreamWriter(ms))
{
	await sw.WriteLineAsync("Serialized_data");
	await sw.FlushAsync();

	ms.Position = 0;

	await blob.AppendBlockAsync(ms);
}

Summary

Personally I found this issue is not so obvious - I involuntarily used AppendText method, which looked as the best match to my code and after some time I noticed those 412 error codes. The one thing you have to remember when using AppendBlock is the fact, that each block cannot exceed 4 MB size each. This - along with the limit of 50k append operations - allows for building a blob of max size equal to 195 GBs, which should be fine for the most of projects.

Accessing an orchestration history in Durable Functions

One of the important aspects of Durable Functions is the fact, that you're able to access each orchestration history, inpect and analyze it when you need to do so. This is a great addition to the whole framework since you're able to both:

  • leverage current features like Application Insights integration
  • write a custom solution, which will extend current capabilities and easily integrate with your systems

However what are the options, which we're given? Let's take a look!

Application Insights

There's an extensive guide available here, which explains in details how one can access and use Application Insights integration and query the data. There's one important information listed there:

By default, Application Insights telemetry is sampled by the Azure Functions runtime to avoid emitting data too frequently. 
This can cause tracking information to be lost when many lifecycle events occur in a short period of time. 

So it's important to be aware of this flaw and always design your solution so it's don't rely on possibly lost information.

Another important thing is the configuration:

/
{
    "logger": {
        "categoryFilter": {
            "categoryLevels": {
                "Host.Triggers.DurableTask": "Information"
            }
        }
    }
}

Sometimes it's imporant to adjust logging levels since in Functions there's many events that are being constantly logged - if you're using Azure subscription with limits(like MSDN subscriptions) you can easily run out of free quota, what will prevent your functions from logging.

Storage

This is nothing surprising that Durable Functions also use Storage Account to store data. You can get the same results as accessing AI by going to the Storage Account attached to the Function App instance and accessing history table. Its name is listed in host.json:

/
{
  "durableTask": {
    "HubName": "SampleHubVS"
  }
}

So when you access your Storage Account, it should be there: