Home
Dynamics 365
Business Central
Handling Business Central Telemetry like a boss: my Azure Data Explorer dashboard – Pt. 2

Handling Business Central Telemetry like a boss: my Azure Data Explorer dashboard – Pt. 2

WaldoBusiness Central3 months ago262 Views

If you didn’t read my first blog post from last year about this, you can find it here: Handling Business Central Telemetry like a boss: my Azure Data Explorer dashboard.

This post is a follow-up. Because – you know – sometimes – just sometimes – I put my money where my mouth is. In my personal job, the content of the above blogpost is pretty important. I use my dashboard in Azure Data Explorer pretty much every day.. .

So you can imagine that the dashboard that I talked about in the first post has evolved quite a lot. You can find the updated version still on the same place, but just to be sure – this is the one: https://raw.githubusercontent.com/waldo1001/waldo.BCTelemetry/master/Azure%20Data%20Explorer/dashboard-waldo.BCTelemetry.json

Be careful: there is no way for you to “update” the dashboard. If you import it in Azure Data Explorer, it will overwrite your current one, and you’ll lose it, because the only option you have is:

So – I’d suggest that you simply create a new dashboard, en go from there.

All instructions from the previous blogpost are still valid :-). Use it. Or, at least you can have a reference to a bunch of KQL that you’d be able to copy from or get inspired from for your own dashboard!

In this blogpost, I just wanted to elaborate on a few things.

Page: Base Tables

On the left of the dashboard, you’ll find some pages – I use these to try to categorize the different tiles a bit.

The first page “Base Tables” is basically just an overview of some basic things:

What signals do I have, and how many of each signal?
What are the customers that are sending signals to this endpoint (tenants & companies)
As much as at all properties per company:
- Is it SaaS?
- How many signals per company?
- What version are they on?
- …
What Extensions from which publishers are sending signals to this Application Insights endpoint?
…

So, nothing more than the basics of the signals I have at hand.

But all in the end, you also see my “Sanity Check”.

Well – if you have had a look at the Base Query “allTraces“, then you’ve probably noticed it’s quite significant. Lots of extends, lots of filters by lots of parameters in different ways. The “Sanity Check” tile is merely for me to see if I didn’t mess it up the Base Query. Or in other words: does my Base Query (allTraces) return as many rows as the normal “traces” table does.. . In the screenshot you see it pretty much does!

Makes sense?

Page: Analyze Usages

That’s another page I tend to look at a lot.

The intent of this page is merely to have an overview of the amount of events I have per tenant.

For example, the very first graph shows the count of the signals per day.

You can see it’s quite stable, slower in the weekends, and at most 5M signals per day.

You want to have this stable – and you want to track if you don’t have any spikes at some point.

I mean .. once I had a customer who made a daily report in a reporting tool which caused on its own for about 30 million signals , which means I got a multitude of signals every day, which means costs raised significantly. We don’t want that – we need to watch costs – and this is a graph that displays the stability of it ;-). Tip: we took this query, and made a PowerBi Metric of it, and this way, we are tracking the Telemetry usages for our company.

Another nice thing about this dashboard page is the ability to compare customers. Like: what are the most busy customers (most signals in Application Insights), and so on. A prerequisite of course: you need to send the Telemetry of all customers to one Application Insights endpoint. Honestly, I wouldn’t have it any other way .. the ability to compare customers is invaluable!

Page: Job Queues

Yep – I’m skipping some pages because I just want to highlight what I think are most interesting (for me) – please feel free to explore the rest ;-).

“Job Queues” is a very recent page. Who cares about Job Queues? Well – I do! Because Job Queues have a major impact on the performance of your environment. The more frequent you run, or the longer processes you run, the slower your database gets. That’s simple physics ;-).

What I really like about this is that it highlights failed (errored) Job Queues. You don’t want too many of those! But even more, when looking into the failures, there is a tile that shows you which Job Queue fails most, and why (stacktrace) .

Another thing is I managed to get a glimpse on how long certain Jobs were running, and at what time.

You can clearly see that there are indeed long running jobs, but they mostly run over night. The consultants did a nice job here .

By hovering over the graph, you get more details, so getting to the actual job details is as easy as drinking milk.

Page: Perf: Slow SQL

Probably the one page I spend most time in. Not that I want to – but, well, you know, sometimes there’s simply some stuff in BC that runs slow. Just sometimes.. /s

What you need to know about slow running SQL queries, is that you get lots of metadata. Most importantly:

How long did it take to run a query
The AL callstack: what method caused the SQL query

And when you take a close look at the callstack, you’ll realize that there are 2 very interesting things in there:

The “Source Process“: the bottom of the callstack represents the object / method that started the stack.
The “Slow Object“: the top of the callstack is where it all ended – which is the part that executed the SQL Query that seems to be slow.

So, by some “simple” string parsing, you can make summaries like:

What source process caused “slow running SQL” the most?
What objects are slowest?
…

Like:

In this case, you can see that about half of the slow queries are starting from the same process. And thanks to the stacktrace, it’s pinpointed as well.

Page: Lock Timeouts

Last page I’ll go into is the Lock Timeouts. Not that the others are not interesting – I just don’t want to turn this into a book ;-).

How Business Central handles long-lasting locks by default is: they will show a message in the sorts of:

“The operation could not complete because a record in the [table name] table was locked by another user. Please retry the activity.”

So – because of the fact the process was locked too long, the process what automatically timed out – hence, the “Lock Timeout”. In the minds of the user, sure, it’s an error – but it is not a bug ;-). Locking is normal – we just have to make sure processes don’t lock too often nor too long.

So: what we need to look into is once again:

What “Source Process” is causing the lock
What “Locking Object” is doing the actual locking

And you get all that! Both:

The process that had the error message and is rolling back
But more importantly: information about the process that was locking! In fact, event “RT0013” will be sent to Telemetry when there was a case of a Lock Timeout, giving information about the process that was holding the lock(s).

I created a simple table with an overview which process and which object are causing most lock timeouts:

Pinpointing locks by matter of seconds. So so so valuable

The Pages with custom telemetry

You also see a bunch of “custom telemetry” pages:

They won’t do much for you .. yet. Yet? Yes indeed – I’m trying to make our company “Telemetry” app publicly available. Because – there really isn’t much to it, but it’s super duper cool and useful for everyone. Or at least in my opinion. Let’s see ;-).

Parameter: TenantDescription

Indeed – let’s talk parameters. Parameters are the variables on top of the page which lets you filter he tiles in the page

The “TenantDescription” parameter is my solution to translate TenantIds to actual customer names. I simply have a json-file which I read in telemetry that contains the mapping like this:

All you need to do is to create your own mapping file – place it somewhere public (Azure Blog Storage is possible as well, as explained by Kasper in this comment (thanks! ;-)), refer to it from the Base Query “entraTenantIdDescriptions“, change the url in the KQL.

The main Base Query “allTraces” takes this into account like you see here:

So – all you need to do is enjoy your queries which are now enriched with customer names! Plus – you can filter on it! Very easy to focus on one (or more) customers!

Parameter: CustomDimensionsContains

Last one I will explain is the parameter “CustomDimensionsContains“, which is exactly what it says: a filter on the Custom Dimensions.

When you for example see that you have some issues with the Item List, it’s easy to simply focus on any even that has “Item List” anywhere in the Custom Dimensions. Or an object Id. Or a certain error message. Virtually anything in custom dimensions.

This has become my most used parameter on my dashboard, because it works for virtually any kind of filter. I’m actually thinking of adding a way to have multiple of these parameters.

More to come

I’ll leave it with this – because I need some sleep :-). I can imagine there will be a “part 3”, as there is much more to tell :-). I already started a new notebook – and who knows – I might be able to make our telemetry app available. Can’t wait to share more about that!

Also – I’m doing …