Monday, September 27, 2021

Cosmos DB Request Unit Logging

This post replaces an earlier one titled Catch Cosmos Client Statistics where I briefly explained how you can use a request handler to intercept statistics about each database query. The statistics include the Request Units (RUs) expended by each query, which is correlated to the actual cost of running the database.

A request handler class is provided with more information that I realised a couple of weeks ago. You can fully reconstruct each query and retrieve matching statistics such as RUs, elapsed time, database size and record count. All of this information can be combined to create concise logging output which can help you pinpoint performance bottlenecks or expensive queries.

Construct a CosmosClientBuilder class with the endpoint and key to the Cosmos account. Use AddCustomHandlers to add an instance of a class derived from RequestHandler. Override SendAsync and extract the following information:

request.ActivityId (a Guid string)
request.Method
request.RequestUri
request.Content (see note #1 below)
request.Headers.ContinuationToken
response.Headers["x-ms-request-charge"]
response.Headers["x-ms-resource-usage"] (see note #2 below)

Note #1 - Content

The Content Stream is null for GET requests which are generated by simple read-by-id queries, but is present for POST requests generated by more complex queries. I found the Content to actually be a MemoryStream containing the full query text. It can be extracted like this:

byte[] buff = new byte[request.Content.length]
request.Content.Read(buff, 0, buff.Length);
string query = Encoding.UTF8.GetString(buff);

Note #2 - Usage

The database usage information is unfortunately embedded in a joined string of key=value pairs, but an easy way of extract the values is like this:

  string s = response.Headers["x-ms-resource-usage"];
  Match m = Regex.Match(s, @"documentsCount=(\d+)");
  int docCount = int.Parse(m.Groups[1].Value);
  

Take care when referencing the Header collection values, as the keys present in different requests are a little bit unpredictable. It's safest to use Headers.TryGetValue("key").

While experimenting with this code in my own hobby suite called Hoarder, it revealed subtle bugs which were causing bursts of duplicate queries. One bug was a race from different places to read some important records which were supposed to be cached, but multiple reads were running before any value was cached. It's interesting to watch the raw database query activity as your app runs. Bugs and performance problems can be quickly identified.

At one point I saw a set of 6 duplicate queries and I thought it was a bug. It turns out my query was over all records, and what I was seeing was one initial query followed by 5 continuation queries. The request.Headers.Continuation token value tells you if a continuation query is running.

You can aggregate the useful information I've mentioned above and send it back to parent apps via a callback, or send it to your preferred logging library.

Addendum: When the handler is hosted in a web app live in Azure I found that the request.Header.ActivityId was sometimes null and other information was missing. I think it's safe to just skip logging in these cases, as an identical request will occur soon after with all the expected data available.


Sunday, September 12, 2021

Cosmos DB Client V3

 The latest Cosmos DB V3 client library makes coding easier for .NET developers. The class model is simpler and more sensible, and the LINQ extensions let you write concise and type-safe queries.

In the old library, to get the Id and Name of every comedy CD in my library I would need to construct a query string like the following and pass it to a long complicated database query call:

SELECT c.Id, c.Name FROM c
  WHERE c.Type=2 AND
      c.Media="CD" AND
      ARRAY_CONTAINS(c.Genres, {Name:"Comedy"}, true)

I was using nameof() to help strongly-type the resulting string (not shown), but it was rather clumsy and verbose. Now I can code a strongly-typed LINQ statement with intellisense:

var query = container.GetItemLinqQueryable<Title>()
    .Where(t => t.Type == DocumentType.Title &&
        t.Media == "CD" &&
        t.Genres.Any(g => g.Name == "Comedy"))
    .Select(t => new { t.Id, t.Name });

The LINQ query expression tree is converted into an actual database query and the results by:

var feed = query.ToFeedIterator();
return await IterateFeedResults(feed);

The last method is a helper I made up, and it's a great practical example of how you can turn a loop of async calls into a return value which is a convenient async enumerable sequence:

async IAsyncEnumerable<T> IterateFeedResults<T>(FeedIterator<T>> feed)
{
  while (feed.HasMoreResults)
  {
    foreach (var item in await feed.ReadNextAsync())
    {
      yield return item;
    }
  }
}

The logical structure of a Cosmos database is now more closely represented by the classes. The preamble to running queries is simply this:

var client = new CosmosClient(uri, dbkey);
var database = client.GetDatabase("MyDatabase");
var container = database.GetContainer("MyContainer");
var query = container.GetItemLinqQueryable(...);

For more information about how I enjoy using document databases like Cosmos DB, see my old blog post titled Collections Database History.

Monday, September 6, 2021

Catch Cosmos Client statistics

Skip this article and read the more detailed replacement above titled Cosmos DB Request Unit Logging.

This is another reminder to myself, but others may find it useful. You can add a custom handler to the CosmosClient to catch performance information and other statistics in the request headers, then save them for reporting. The technique is identical to adding a chain of handlers to the HttpClient class. You catch important information at a single function point and keep your code clean.

Construct a CosmosClientBuilder class with the endpoint and key to the Cosmos account. Use AddCustomHandlers to add an instance of a class derived from RequestHandler.

The derived class will override SendAsync, await the response and extract important information from the headers. Here is a dump of some GetItemAsync request headers from LINQPad.


Use a callback from the handler or some similar technique to extract the interesting values.

Thursday, September 2, 2021

Web API arbitrary response data

This is about .NET Framework, not .NET Core.

 A couple of times a year I have to return a Web API response with a specific status code and serialized object in the body. Sounds easy, but every time this happens I spend 30-60 minutes searching until my fingers bleed for the simplest answer. I always find lots of stupid complex answers in forums, and the MSDN documentation leads you down all sorts of complicated options that involve formatters that assume a specific restrictive response type.

All you need to do is something like this example in a controller:

if (password != "XYZZY")
{
  return return Content(
     HttpStatusCode.Forbidden,
     new MyErrorData() { Code = 123, Message = "Password failure")
  );
}

You can specify any response status you like and put any object of your choice into the body and it will be serialized back to the client using the active formatter. So the object will return as XML or JSON or whatever.

There are so many confusing pre-baked responses like Ok, BadRequest, etc, along with weird variations and overloads that you can't choose the one you need. It turns out that Content() is simplest general purpose way of sending a response.

Web API in .NET Core uses different techniques for what I've discussed here.