Analyzing Table Storage bindings locally in Azure Functions

In one of my current projects I heavily rely on Table Storage bindings used in many of my functions. In fact I have several dozen API functions, which are the base of the whole system. Because codebase grows each day, I needed a tool, which will allow me easily validate whether I query Table Storage correctly - that means I follow some basic principles like:

  • using both PartitionKey and RowKey in a query
  • if RowKey is unavailable - using PartitionKey so I won't have to read the whole table
  • using $top whenever possible so I won't load the whole partition
  • using query projection - leveraging $select for selecting only a subset of columns in a row

In fact I knew two ways of doing that:

  1. Checking logs of Storage Emulator, what I described in this blog post. The disadvantage of that solution is that is logs nearly each and every request so it is hard to find a particular one you're interested in
  2. Using SQL Server Profiler to check what kind of queries are materialized 

As you can see above logs from Storage Emulator are quite detailed, yet painful to work with

I needed a tool, which would combine features of both solutions.

Reading SQL Server Profiler

The idea was to somehow read what SQL Server Profiler outputs when queries are sent to Storage Emulator. Fortunately it is really simple using following classes:

  • SqlConnectionInfo
  • TraceServer

Both are easily accessible in SQL Server directory:

  • C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.SqlServer.ConnectionInfo.dll
  • C:\Program Files (x86)\Microsoft SQL Server\140\SDK\Assemblies\Microsoft.SqlServer.ConnectionInfoExtended.dll

There is however a small gotcha. Since SQL Server Profiler is a 32-bit application, you cannot use above classes in 64-bit one. Additionally those assemblies are SQL Server version sensitive - locally I have an instance of SQL Server 2017, if you have other version, you'd have to change the path to point to the correct one.

Does it work?

After some initial testing it seems it works. Let's assume you have following code:

/
[FunctionName("DeviceList")]
public static Task<HttpResponseMessage> DeviceList(
	[HttpTrigger(AuthorizationLevel.Anonymous, "get", Route = "device")] HttpRequestMessage req,
	[Table(TableName, Connection = Constants.TableStorageConnectionName)] IQueryable<DeviceEntity> devices,
	[Table(Firmware.Firmware.TableName, Connection = Constants.TableStorageConnectionName)] IQueryable<Firmware.Firmware.FirmwareEntity> firmware,
	[Table(CounterType.CounterType.TableName, Connection = Constants.TableStorageConnectionName)] IQueryable<CounterType.CounterType.CounterTypeEntity> counterTypes,
	[Table(Location.Location.TableName, Connection = Constants.TableStorageConnectionName)] IQueryable<Location.Location.LocationEntity> locations,
	[Identity] UserIdentity identity,
	TraceWriter log)
{
	if (identity.IsAuthenticated() == false) return identity.CreateUnauthorizedResponse();

	var firmwareVersionsCached = firmware.Take(50).ToList();
	var counterTypesCached = counterTypes.Take(50).ToList();
	var locationsCached = locations.Take(50).ToList();

	var query = devices.Where(_ => _.PartitionKey != "device").Take(100).ToList().Select(_ => new
	{
		Id = _.RowKey,
		Name = _.Name,
		SerialNumber = _.SerialNumber,
		Firmware = firmwareVersionsCached.First(f => f.RowKey == _.FirmwareId.ToString()).Version,
		CounterType = counterTypesCached.First(ct => ct.RowKey == _.CounterTypeId.ToString()).Name,
		Location = locationsCached.First(l => l.RowKey == _.LocationId.ToString()).Name
	});

	var response = req.CreateResponse(HttpStatusCode.OK,
		query);

	return Task.FromResult(response);
}

 

Here you can find a part of diagnostic logs from executing above function:

You can find the whole project on GitHub: https://github.com/kamil-mrzyglod/StorageEmulatorTracer. After some time spent with this tools I found planty of issues in my code like:

  • not using PartitionKey
  • reading the same table twice
  • materializing all rows from a table when I needed only a subset

I guess I will even more flaws in the next days. 

Is Event Grid faster than Azure Functions? #1

It's over two months since the last blog post, so it's time for a big come back!

In the very first post in 2018, I'd like to test whether Event Grid brings us improvements when it comes to calling registered subscribers over an old-fashioned Blob Trigger in Azure Functions. I decided to start with this topic mostly because it's no more than a few days since Event Grid's GA was announced. Let's start!

Set-up the architecture

To performs a basic test(in fact I divided this post into two parts), we'll need two things:

  • Function App with 3 functions(publisher, blob trigger and HTTP endpoint)
  • Event Grid instance
  • General purpose Storage Account V2 - link

Why do we need General-purpose v2 (GPv2) account? Well - new Event Grid storage trigger requires updated version of an account, there's nothing we can do about it. The good thing is the fact, that you may upgrade your account to GPv2 using e.g. this command:

/
Set-AzureRmStorageAccount -ResourceGroupName <resource-group> -AccountName <storage-account> -UpgradeToStorageV2

HTTP endpoint and Event Grid subscription

To create a subcription in Event Grid from a storage to a function we have to use of the following methods:

  • Azure CLI
  • Powershell
  • REST API
  • SDK

Unfortunately for now it's not possible to subscribe to storage events using Azure Portal. For the purpose of this test I decided to use Azure CLI in the portal:

/
az eventgrid event-subscription create --resource-id "/subscriptions/55f3dcd4-cac7-43b4-990b-a139d62a1eb2/resourceGroups/kalstest/providers/Microsoft.Storage/storageaccounts/kalsegblob" --name es3     --endpoint https://contoso.azurewebsites.net/api/f1?code=code

You can find the full reference to the command here.

You can easily use Cloud Shell here, which is available within the portal

If you run the command now, you'll be suprised - it's not possible to create a subscription because your endpoint is not authenticated. What the heck you may ask? Well, this is all described in the documentation of Event Grid. To make the long story short - each time you try to add a new endpoint, which will be used to send events to, it has to be validated. TO validate your endpoint, Event Grid sends a message similar to this:

/
[{
  "id": "2d1781af-3a4c-4d7c-bd0c-e34b19da4e66",
  "topic": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "subject": "",
  "data": {
    "validationCode": "512d38b6-c7b8-40c8-89fe-f46f9e9622b6"
  },
  "eventType": "Microsoft.EventGrid.SubscriptionValidationEvent",
  "eventTime": "2018-01-25T22:12:19.4556811Z",
  "metadataVersion": "1",
  "dataVersion": "1"
}]

What you have to do is to respond to such request using validationCode it sent:

/
{
  "validationResponse": "512d38b6-c7b8-40c8-89fe-f46f9e9622b6"
}

How to achieve it in our test? We'll develop our HTTP function and perform quick deserialization just to have our enpoint validated. Once it's done, we can switch function's content with a proper logic:

/
[FunctionName("Http")]
public static async Task<HttpResponseMessage> Run(
	[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequestMessage req,
	TraceWriter log)
{
	var data = await req.Content.ReadAsStringAsync();
	var @event = JsonConvert.DeserializeObject<Event[]>(data)[0];

	return req.CreateResponse(HttpStatusCode.OK, new { validationResponse = @event.Data.ValidationCode});
}

public class Event
{
	public string Topic { get; set; }
	public string Subject { get; set; }
	public string EventType { get; set; }
	public DateTime EventTime { get; set; }
	public Guid Id { get; set; }

	public ValidationRequest Data { get; set; }
}

public class ValidationRequest
{
	public string ValidationCode { get; set; }
}

Once you publish this function, you can run the command mentioned to register a new event subscription.

Functionality

These are function I used to perform the first part of the test:

PUBLISHER

/
[FunctionName("Publisher")]
public static async Task Run([TimerTrigger("*/10 * * * * *")]TimerInfo myTimer,
	[Blob("functionsgrid/blob", FileAccess.Write, Connection = "FunctionsGrid")] Stream blob,
	TraceWriter log)
{
	log.Info($"C# Timer trigger function executed at: {DateTime.Now}");

	using (var sw = new StreamWriter(blob))
	{
		await sw.WriteAsync(JsonConvert.SerializeObject(new Blob()));
		log.Info("Blob created!");
	}
}

[FunctionName("Publisher2")]
public static void Run2([TimerTrigger("*/10 * * * * *")]TimerInfo myTimer,
	[Blob("functionsgrid/blob2", FileAccess.Write, Connection = "FunctionsGrid")] out string blob,
	TraceWriter log)
{
	log.Info($"C# Timer trigger function 2 executed at: {DateTime.Now}");

	var o = new Blob { Text = File.ReadAllText("file.txt") };
	blob = JsonConvert.SerializeObject(o);
}

public class Blob
{
	public Blob()
	{
		Id = Guid.NewGuid();
		Created = DateTime.Now;
	}

	public Guid Id { get; set; }

	public DateTime Created { get; set; }

	public string Text { get; set; }
}

HTTP

/
[FunctionName("Http")]
[return: Table("Log", Connection = "FunctionsGrid")]

public static async Task<Blob.LogEntity> Run(
	[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequestMessage req,
	TraceWriter log)
{
	var dateTriggered = DateTime.Now;
	var data = await req.Content.ReadAsStringAsync();

	var @event = JsonConvert.DeserializeObject<Event[]>(data)[0];
	log.Info($"Processing {@event.Id} event.");

	var storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=functionsgrid;AccountKey=l52CpYyO4D30m3UoGk/jTruzYo1HuvTlQjvGWTG1wZeN01n4YLK1zwdy6VS6D6tN26YUXzuQcQKXZDdMOr0X9g==;EndpointSuffix=core.windows.net");
	var blobClient = storageAccount.CreateCloudBlobClient();
	var blob = blobClient.GetBlobReferenceFromServer(new Uri(@event.Data.Url));

	using (var sr = new StreamReader(blob.OpenRead()))
	{
		var readToEnd = sr.ReadToEnd();
		log.Info(readToEnd);
		var fileBlob = JsonConvert.DeserializeObject<Publisher.Blob>(readToEnd);
		log.Info("Text: " + fileBlob.Text);
		if (string.IsNullOrEmpty(fileBlob.Text) == false)
		{
			return new Blob.LogEntity("eventgrid_big")
			{
				BlobCreated = fileBlob.Created,
				BlobProcessed = dateTriggered
			};
		}

		return new Blob.LogEntity("eventgrid")
		{
			BlobCreated = fileBlob.Created,
			BlobProcessed = dateTriggered
		};
	}
}

public class Event
{
	public string Topic { get; set; }
	public string Subject { get; set; }
	public string EventType { get; set; }
	public DateTime EventTime { get; set; }
	public Guid Id { get; set; }

	public EventData Data { get; set; }
}

public class EventData
{
	public string Url { get; set; }
}

BLOB

/
[FunctionName("Blob")]
[return: Table("Log", Connection = "FunctionsGrid")]
public static LogEntity Run([BlobTrigger("functionsgrid/{name}", Connection = "FunctionsGrid")]Stream myBlob, string name, TraceWriter log)
{
	var dateTriggered = DateTime.Now;
	log.Info($"C# Blob trigger function Processed blob\n Name:{name} \n Size: {myBlob.Length} Bytes");

	using (var sr = new StreamReader(myBlob))
	{
		var content = sr.ReadToEnd();
		var blob = JsonConvert.DeserializeObject<Publisher.Blob>(content);

		if (blob != null && string.IsNullOrEmpty(blob.Text) == false)
		{
			return new Blob.LogEntity("function_big")
			{
				BlobCreated = blob.Created,
				BlobProcessed = dateTriggered
			};
		}

		return new LogEntity
		{
			BlobCreated = blob.Created,
			BlobProcessed = DateTime.Now
		};
	}
}

Results

All results were saved to a table in Table Storage. I measured the exact time when a function starts its execution - the results don't care about how long a function needed to perform all tasks. There were two scenarios:

  • Upload a file with a simple JSON content, once per 10 seconds
  • Upload a 2.5MB file, once per 10 seconds

Here are the results(for 1653 executions):

What can you say about this chart? I assume these are some initial conclusions:

  • although the difference was a matter of miliseconds, Event Grid seems to notify subscriber almost with no delay while Azure Functions need to poll storage and wait for new files
  • bigger file means more delay when it comes to notification, which is true for both Grid and Functions
  • this was a simple smoke test - when we start to push more files, it's possible that result from today won't be relevant
  • there's one interesting observation - bigger file seems to be processed much slower in HTTP function, which has to download a file after being notified about its existence

In the next episode we'll try to stress this solution a little bit to check how it behaves when it comes to handling many small and bigger files. Stay tuned!