Don't be afraid to store it twice

In most cases we try to follow DRY approach - Don't Repeat Yourself. This simple yet powerful statement, which - if executed correctly - has the power of changing meh projects to good ones, doesn't work well in all scenarios and use cases. As one once said "One size doesn't fit all", you shouldn't always follow each pattern or approach blindly. You can easily get burnt.

Problem

Imagine following scenario(or rather requirement):

We have following concepts in the application: Company, Group and User. User can have access to multiple Companies(from which one is his main). Group is a logical thing, which exists within a Company

The question is - how to store data in Table Storage, so we can easily query both all Groups within a Company and a single Group. 

Let's skip User for a second and consider following implementation:

/
[FunctionName("GroupList")]
public static Task<HttpResponseMessage> GroupList(
	[HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "group")] HttpRequestMessage req,
	[Table(TableName, "AzureWebJobsStorage")] IQueryable<GroupEntity> groups,
	[Table(Company.TableName, "AzureWebJobsStorage")] IQueryable<Company.CompanyEntity> companies,
	TraceWriter log)
{
	var groupsQuery = groups.Take(100).ToList().Select(_ => new
	{
		Id = _.RowKey,
		Name = _.Name,
		Description = _.Description,
		// ReSharper disable once ReplaceWithSingleCallToFirst
		CompanyName = companies.Where(company => company.RowKey == _.CompanyId.ToString()).FirstOrDefault()?.Name
	}).ToList();

	var response = req.CreateResponse(HttpStatusCode.OK,
		groupsQuery);

	return Task.FromResult(response);
}

[FunctionName("GroupGet")]
public static Task<HttpResponseMessage> GroupGet(
	[HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "group/{id}")] HttpRequestMessage req,
	[Table(TableName, "{partitionKey}", "{id}", Connection = "AzureWebJobsStorage")] GroupEntity group,
	TraceWriter log)
{
	var response = req.CreateResponse(HttpStatusCode.OK, group);
	return Task.FromResult(response);
}

public class GroupEntity : TableEntity
{
	public GroupEntity(Guid companyId)
	{
		PartitionKey = companyId.ToString();
		RowKey = Guid.NewGuid().ToString();
	}

	public string Name { get; set; }
	public Guid CompanyId { get; set; }
	public string Description { get; set; }
}

All right, initially it all looks fine. If we want to get a list of User specific Groups, we'll just add some context(e.g. store all user's companies somewhere and just use their identifiers as a parameter in a query). Now let's request a single Group. We have a Group identifier, we can get the whole row from a database...

NO WE CAN'T!

Since GroupEntity's PK is equal to Company's identifier, we'd have to make one query per each company a user has access to. Not very smart, not very clean. What to do? Change PK in GroupEntity to a generic one? We'll lost the possibility to make fast queries for all groups within a company. Make a combined identifier and user it as a PK? We still have to perform multiple queries. Go for SQL and perform proper JOINs? This is definitely a possibility - but we don't need other features of a relative database. Is it a dead end? 

Solution

One thing in Azure Storage is really cheap - it's the storage itself. How can we remodel our tables so we can improve both performance and lower transactions amount? Well, we can store our data twice!

Before you start throwing tomatoes at me, consider following example:

/
[FunctionName("GroupCreate")]
public static async Task<HttpResponseMessage> GroupCreate(
	[HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = "group")] HttpRequestMessage req,
	[Table(TableName, "AzureWebJobsStorage")] IAsyncCollector<GroupEntity> groups,
	TraceWriter log)
{
	var data = JsonConvert.DeserializeObject<GroupEntity>(await req.Content.ReadAsStringAsync());

	var group = new GroupEntity(data.CompanyId)
	{
		Name = data.Name,
		CompanyId = data.CompanyId,
		Description = data.Description
	};

	await groups.AddAsync(group);
	await groups.AddAsync(new GroupEntity
	{
		RowKey = group.RowKey,
		Name = data.Name,
		CompanyId = data.CompanyId,
		Description = data.Description
	});

	var response = req.CreateResponse(HttpStatusCode.OK, group);
	return response;
}

Here we're storing a row twice but with slight changes. The first insert stores data with no changes to the previous version. The second one is the crucial one - it performs one change, which is changing PK to a generic one named "group". Thanks to this solution we can have two almost the same rows, one for being displayed as a part of a list, one for storing all info regarding a row. 

Now you may ask - how it secures a row from being displayed if a User doesn't have access to a given Company? That's why we're storing company identifier along with a row in CompanyId column. This is much quickier and cleaner solution than performing several requests to Table Storage - we can cache the data locally and just check whether identifiers match.

Summary

Modeling Table Storage is both challenging and rewarding - you can easily hit performance problems if tables are not designed carefully, on the other hand wise design allow you to really push the limit. Those redesigns are important also because of one more thing - they save time. And in a cloud time = money. Make sure you pay only as much as needed.

Analyze Table Storage queries locally

In general Table Storage has this awesome feature to check what kind of heresy we're creating by turning on diagnostics logs in the portal. It allows us to analyze queries and operations, so it's possible to find potentially problematic places and tweak them with ease. This functionality has one major drawback - we have to actually deploy our application and run it in the cloud. 

For most of the time you'll be developing using Table Storage Emulator. It'd great to check all our queries right after developing a feature. What options do we have? In fact Emulator has its own logs - they won't be as good and smooth as with full diagnostics, but for most scenarios they should be sufficient.

In the portal we have plenty of options we can use to analyze Table Storage performance

Enabling logging locally

To enable logging, you have to rutn this feature on in the Emulator config. You can find it located here:

C:\Users\{USER}\AppData\Local\AzureStorageEmulator

My config looks like this:

/
<?xml version="1.0"?>
<StorageEmulator xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <SQLInstance>(localdb)\MSSQLLocalDB</SQLInstance>
  <PageBlobRoot>C:\Users\kamz\AppData\Local\AzureStorageEmulator\PageBlobRoot</PageBlobRoot>
  <BlockBlobRoot>C:\Users\kamz\AppData\Local\AzureStorageEmulator\BlockBlobRoot</BlockBlobRoot>
  <LogPath>C:\Users\kamz\AppData\Local\AzureStorageEmulator\Logs</LogPath>
  <LoggingEnabled>false</LoggingEnabled>
</StorageEmulator>

To turn logging on you have to change LoggingEnabled property to true. If you have Emulator running, you'll have to restart it.

Reading logs

Once logging is enabled you'll see, that in the following location:

C:\Users\{USER}\AppData\Local\AzureStorageEmulator\Logs

Some files are starting to appear. There're logs with different levels of severity + some performance related information. If everything's running smoothly most of those files won't be helpful for you. In my scenario I find Verbose_debug file most interesting. 

This file contains information about requests which are handled by our Emulator. Consider following log:

/
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Got request GET http://127.0.0.1:10002/devstoreaccount1/Tables('User') 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Tables('User')
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=c7a80f0d-4c22-4cdf-bfc2-89b983d5a363] Leaving concurrent request: 0
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Entering concurrent request: 1
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Got request GET http://127.0.0.1:10002/devstoreaccount1/Tables('UserXCompany') 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Tables('UserXCompany')
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=2131ae3e-cee7-4bbd-a2b9-2ab795dfb60e] Leaving concurrent request: 0
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Entering concurrent request: 1
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Got request GET http://127.0.0.1:10002/devstoreaccount1/Tables('Company') 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Tables('Company')
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=93d3b264-e4b1-4233-950f-225c7756ee14] Leaving concurrent request: 0
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Entering concurrent request: 1
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Got request GET http://127.0.0.1:10002/devstoreaccount1/User?$top=20 2015-12-11
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/User?$top=20
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Setting status code to 200
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] In the process end callback
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Leaving concurrent request: 0
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Entering concurrent request: 1
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Got request GET http://127.0.0.1:10002/devstoreaccount1/UserXCompany?$filter=PartitionKey%20eq%20%27b55cf6d6-0012-480d-a792-9d4bb0813431%27 2015-12-11
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/UserXCompany?$filter=PartitionKey%20eq%20%27b55cf6d6-0012-480d-a792-9d4bb0813431%27
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Setting status code to 200
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] In the process end callback
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Leaving concurrent request: 0
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Entering concurrent request: 1
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Got request GET http://127.0.0.1:10002/devstoreaccount1/Company?$filter=RowKey%20eq%20%272de37ce9-b012-43f9-9c09-ba1107bddaaf%27&$top=1 2015-12-11
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Host-style URI resolution failed. Allowing path-style for uri http://127.0.0.1:10002/devstoreaccount1/Company?$filter=RowKey%20eq%20%272de37ce9-b012-43f9-9c09-ba1107bddaaf%27&$top=1
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] StorageAccount.GetPropertiesImpl(PropertyNames=0x0020, ServiceMetadataPropertyNames=0x000000007807C000,,00:00:30)
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Setting status code to 200
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] In the process end callback
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Entering BasicHttpRestProcessor.EndProcess
16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Leaving concurrent request: 0

There're many information which we can get from it:

  • How many operations were handled during our queries
  • Which operations were handled concurrently
  • Which tables were queries
  • What queries were used to obtain information

Let's check how it related to the real function. Above log was generated for the following function:

/
[FunctionName("UserList")]
public static Task<HttpResponseMessage> UserList(
	[HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "user")] HttpRequestMessage req,
	[Table("User", "AzureWebJobsStorage")] IQueryable<UserEntity> users,
	[Table("UserXCompany", "AzureWebJobsStorage")] IQueryable<UserXCompanyEntity> usersXCompanies,
	[Table("Company", "AzureWebJobsStorage")] IQueryable<Company.CompanyEntity> companies,
	TraceWriter log)
{
	var usersQuery = users.Take(20).ToList().Select(_ => new
	{
		Login = _.Login,
		Name = _.Name,
		Surname = _.Surname,
		Company = FindUserCompanies(_, usersXCompanies, companies)
	});

	var response = req.CreateResponse(HttpStatusCode.OK,
		usersQuery);

	return Task.FromResult(response);
}

private static List<string> FindUserCompanies(UserEntity user, IQueryable<UserXCompanyEntity> usersXCompanies,
	IQueryable<Company.CompanyEntity> companies)
{
	var values = usersXCompanies.Where(_ => _.PartitionKey == user.RowKey).ToList();
	return values.Select(value => companies.Where(_ => _.RowKey == value.RowKey).First().Name).ToList();
}

I was expecting to see 3 queries(per each User loaded). Note that those queries were created dynamically thanks to using IQueryable<>:

  • GET 'User', TOP 20
  • GET 'UserXCompany', PartitionKey = User.RowKey
  • GET 'Company', TOP 1, RowKey = UserXCompany.RowKey

Did I get them?

16.02.2018 22:13:16 [Verbose_debug] [ActivityId=eba444dc-2752-4d92-a993-a96f4b8ab771] Got request GET http://127.0.0.1:10002/devstoreaccount1/User?$top=20 2015-12-11

16.02.2018 22:13:17 [Verbose_debug] [ActivityId=e08f0361-1901-493f-b1cc-885d8a3fd6d8] Got request GET http://127.0.0.1:10002/devstoreaccount1/UserXCompany?$filter=PartitionKey%20eq%20%27b55cf6d6-0012-480d-a792-9d4bb0813431%27 2015-12-11

16.02.2018 22:13:17 [Verbose_debug] [ActivityId=11f9e8e6-f345-4a65-8221-0be27c9edbaa] Got request GET http://127.0.0.1:10002/devstoreaccount1/Company?$filter=RowKey%20eq%20%272de37ce9-b012-43f9-9c09-ba1107bddaaf%27&$top=1 2015-12-11

What now?

Now when I'm able to see how it looks under the hood I'm both confident, that the code I wrote works as expected and I'm able to tweak queries if I must. Maybe I should store Company information with a User to save one query? Maybe I should limit UserXCompany rows fetched? There many possibilities but once you're ready for changes, things should be easier in the future.