Event Hub Capture as a service

For some reason I've had hard times searching for proper use cases when considering Event Hub Capture feature. On the first glance it seems as a reasonable functionality, which can be used for many different purposes:

  • in-built archive of events
  • seamless integration with Event Grid
  • input for batch processing of events

On the other hand I haven't seen any use cases(I mean - besided documentation) using Capture in a real scenario. What's more, the price of this feature(roughly 70$ per each TU) could be a real blocker in some projects(for 10TUs you pay ~200$ monthly for processing of events, now add additional ~700$ - my heart bleeds...). So what is Capture really for?

Compare

Experience shows, that never ever disqualify a service until you compare it with other cloud components. Recently my team has been struggling with choosing the right tool to process messages in our data ingestion platform. To be honest it's not that obvious scenario as it looks. You could choose:

  • EH Capture with Azure Data Lake Analytics
  • Direct processing by Azure Functions
  • Azure Batch with event processors
  • Stream Analytics
  • VM with event processors
  • EH Capture + Azure Batch

Really, we can easily imagine several solutions, each one having pros and cons. Some solutions seems to be more for real time processing (like Stream Analytics, VM with event processors), some require delicate precision when designing(Functions), some seem to be fun but when you calculate the cost, it's 10x bigger that in other choices(ADLA). All are more or less justified. But what about functionality?

IT'S JUST NOT THERE

Now imagine you'd like to distribute your events between different directories(e.g. in Data Lake Store) using dynamic parameter(e.g. a parameter from an event). This simple requirement easily kills some of solution listed above:

  • ADLA has this feature in private preview
  • Stream Analytics doesn't have this even in the backlog

On the other hand, even if this feature was available, I'd consider a different path. If in the end I expect to have my events catalogued, Azure Batch + EH Capture seems like a good idea(especially that it allows to perform batch processing). This doubles the amount of storage needed but greatly simplifies the solution(and gives me plenty of flexibility).

There's one big flaw in such design however - if we're considering dropping events directly to an Data Lake Store instance, we have to have them in the same resource group(what doesn't always work). In such scenario you have to use Blob Storage as a staging scenario(what could be an advantage with recent addition of Soft Delete Blobs).

What about money?

Still Capture is more expensive than a simple solution using Azure Functions. But is it always a case? I find pricing of Azure Functions better if you're able to process events in batches. If for some reason you're unable to do that(or batches are really small), the price goes up and up. That's why I said, that this requires delicate precision when designing - if you think about all problems upfront, you'll be able to use the easiest and the simplest one solution.

Conclusion

I find Capture useful in some listed scenarios, but the competition is strong here. It's hard to compete with well designed serverless services, which offer better pricing and often perform with comparable results. Remember to always choose what you need, not what you're told to. Each architecture has different requirements and many available guides in most cases what cover your solution.

Is Event Grid faster than Azure Functions? #2

In the previous post I presented you the result of a basic smoke test using Blob Trigger in Azure Functions and the same functionality in Event Grid. The outcome was not surprising - Event Grid seems to be faster and more reliable way of notifying other services about new blobs. What if we perform a stress test? Is anything going to change? Let's check this!

Publisher

For the current episode I used following producer:

/
private static void Main()
{

	MainAsync().GetAwaiter().GetResult();
}

private static async Task MainAsync()
{
	while (true)
	{
		var storageAccount = CloudStorageAccount.Parse("");
		var blobClient = storageAccount.CreateCloudBlobClient();
		var container = blobClient.GetContainerReference("functionsgrid");
		container.CreateIfNotExists();
		var blockBlob = container.GetBlockBlobReference(Guid.NewGuid().ToString());

		blockBlob.UploadText(JsonConvert.SerializeObject(new Blob()));
		Console.WriteLine($"[{DateTime.Now}] Blob uploaded!");

		await Task.Delay(10);
	}
}

public class Blob
{
	public Blob()
	{
		Id = Guid.NewGuid();
		Created = DateTime.Now;
	}

	public Guid Id { get; set; }

	public DateTime Created { get; set; }

	public string Text { get; set; }
}

What is more I run 5 producers at the very same time. 

Results

Here are the results:

How to interpret this chart? On the y-axis we have total execution time(in miliseconds). You can clearly see the difference between Functions and Event Grid(in fact the maximum execution time for Function was greater than 30 minutes!). What is more, the median for Function lies between 50 and 100 seconds.

Conclusions

It seems that there's a clear improvement when it comes to processing times when switching from Azure Functions to Event Grid. You may ask why one would like to use Azure Functions when the difference in processing time is so obvious? As always - you have to ask yourself what are your requirements and what are the current recommendations. The rule of a thumb would be, that if I need predictable delivery time, I'd go for Event Grid. If I don't mind whether a function is called after 1 second or 1 minute, Azure Function is still a viable option.