Azure Table Storage good practices - log tail pattern

Although Azure Table Storage is pretty simple and straightforward solution, designing it so you can take the most from it is not an easy task. I've seen some projects, where invalid decisions regarding tables structure, wrong partition keys values or "who cares" row keys led to constant degradation of performance and raising cost of maintenance. Many of those problems could be avoided by introducing smart yet easy to introduce patterns, from which one I like the most is the log tail pattern.

The problem

Let's consider following table:

/
PK | RK | Timestamp | Event

As you may know, rows inserted into Azure Storage table are always stored in ascending order using a row key. This allow you to go from the start to the end of a segments pretty quickly and with predictable performance. If we assume, that we have following rows in our table:

/
PK | RK | Timestamp  | Event
foo   1   01/01/2000  { "Event": "Some_event"}
foo   2   01/01/2000  { "Event": "Some_event"}
foo   3   01/01/2000  { "Event": "Some_event"}
(...)
foo   9999   01/01/2000  { "Event": "Some_event"}
foo   10000   01/01/2000  { "Event": "Some_event"}

we can really quickly fetch a particular row because we can easily query it using PK and RK. We know that 1 will be before 10 and 100 will be stored before 6578. The problem happens when we cannot quickly determine which RK is bigger - mostly because we used e.g. a custom identifier like combination of a GUID and a timestamp. This forces us often to query large portions of a table just to find the most recent records. It'd possible to use a statement in our query like WHERE RowKey > $some_value, but it still introduces some overhead, which could be critical in some scenarios. How can we store our data in Table Storage and retrieve most recent records quickly and efficiently?

Log tail pattern

Fortunately the solution is really easy here and if decision is made early, it doesn't require much effort to introduce. The idea is simple - find a row key, which will "reorder" our table in a way, that the newest rows are also the first ones in a table. The concept seems to be tricky initially, but soon will be the first thing you think about when you hear "Azure Table Storage" ;)

Let's consider following solution(taken from here):

/
string invertedTicks = string.Format("{0:D19}", DateTime.MaxValue.Ticks - DateTime.UtcNow.Ticks);
DateTime dt = new DateTime(DateTime.MaxValue.Ticks - Int64.Parse(invertedTicks));

This will reverse the order how rows are stored in a table and allow you to quickly fetch those you're interested in the most. It's especially useful when creating all kinds of logs and appending to them, where you're usually interested mostly in the most recent records.

Summary

I rely heavily on this pattern in many projects I've created(both for my private use and commercial) and it really helps in creating efficient table structure, which can be easily queried and is optimized for a particular use. Of course it cannot be used in all scenarios, but for some it's a must have.

One of the things you have to consider here is that you must pad the reverse tick value with leading zeroes to ensure the string value sorts as expected(something that is fixed by string.Format() in the example). Without this fix you can end with incorrectly ordered rows. Nonetheless it's a small price you have to pay for a proper design and performance.

Real cost of developing a project in Azure

Recently I've moved one of my side projects fully to the cloud. In this short post I'd like to show you what is the real cost of developing a fairly small yet multi-dimensional project and how Azure gives me and my client flexibility to select what is really needed in this particular moment.

The project

I won't go into details here. Just to make a long story short - we're gathering data from many electronic devices and then provide reports based on different time intervals, places and custom properties to the clients. There's also a need to handle the old legacy system and migrate data from the old database to the new one.

Azure components

Mentioned project is built using following elements:

  • Storage account
  • Function App
  • 2x Web App
  • SQL database
  • Azure B2C
  • Application Insights

worth mentioning here is the fact, that by default we're using free tiers for Web Apps and SQL server.

Taking into consideration all above our monthly cost of developing this project is 1,50 EUR on average.

Caveats

Because we're using free tiers, we have to be aware of limits - like available CPU minutes per day - but on the other hand, there's no problem to scale up when needed. This is what really made us into cloud - if only small features are being developed, free tier is more than enough. If we're during a strenuous period of developing new features, we can just go to the portal and change a tier.

Additionally we have to be aware, that on production we won't be able to use free versions of Azure components because of lacking features and much higher traffic. On the other hand, saving money in such way instead of paying much more money for resources we won't be able to utilize is a much smarter decision.

Detailed cost

There're two resources which make the most of our cost: Storage account and Function App. This is because they're the "hot path" in the system - one of the functions fetches data from FTP and pushes each record to a queues. Other functions take data from queues and perform some transformations, store data and push it further. When I checked my Billing page, the cost looks like this:

  • Storage account 0,62 EUR
  • Function App 0,55 EUR
  • B2C 0,09EUR
  • The rest of resources 0 EUR

We didn't need more power this month so we could keep the lowest cost possible.

Summary

Carefully designed cloud solution could really lower monthly cost of developing a project. On the other hand I've seen many examples, where developing a product in the cloud was much more expensive than an old-fashioned VM(or even a production environment also hosted somewhere in Azure!). Pay attention to resources used, selected tiers and their utilization and you won't be surprised when you see a bill at the end of a month.