Dynamics 365 workflows failure monitoring and alert with Microsoft Flow.

I’ve got a confession to make: I’m a bit in love with Microsoft Power Platform. Today it took me just 30 minutes to solve the issue that has been bugging one of our Dynamics 365 environments for quite some time.

For those of you that aren’t familiar with it, Power Platform is an app platform made up of three Microsoft BI services – Power BI, PowerApps and Flow that runs on top of Office 365 and Dynamics 365 (D365). The idea is to allow all users, no matter how much technical knowledge they have, to be able to get insight from their data.

But back to my problem. Some of our workflows were failing or being stalled due to an unknown reason (or reasons) on a very limited amount of runs.

It was having a pretty serious business impact:

  • Important email notifications were not being sent, but the business was not aware of this.
  • Certain records were not being updated or assigned to the right team members.
  • Work items were being lost and needed to be monitored manually. We couldn’t trust routing automation.

One solution to this problem would be to review the workflows and ensure they always work, but this was an imperfect fix. The same scenarios or service degradation could appear again months later without any warning.

So we needed to find a way to move from a manual review of the workflow issues to an automated one.

The answer was Dynamics 365 workflows failure monitoring and alert with Microsoft Flow.

How does this work?

We used Microsoft’s Common Data Service (CDS) to expose workflow logs – or so-called system jobs – which provided us with a super easy to get a list of records.

Here’s how you can easily do this, assuming you are working with Flow and CDS for the first time.

1. On the Office.com homepage select the Flow app.

2. On the Flow website go to the “My Flow” section and start by creating a new Flow.

3. Select a Flow type. I prefer to choose the “Scheduled” workflow to continually track the status.

4. Once Flow editor opens you can define frequency parameters.

I’ve set the timing for once a minute so I can instantly see results. After this initial period, it’s a good idea to save some computing power and to set it for once an hour, or even less frequent.

5. To define the first step for your flow, you simply need to click on “New step” button, and you can choose from hundreds of options.

I would highly recommend that anyone working with D365/PowerApps/Power Platform should learn Flow and know what kind of actions, functions, operations, and connectors they include.

For this particular job, I want to monitor the status of system jobs in my D365 production environment.

Here is how it looks in D365.

What I wanted to achieve here was to expose the records through CDS and make them accessible for Flow.

To add a CDS operation I need to filter content. To do this, I type “common data service” and the system then shows me the actions available through CDS. In this case, I want to get a List of Records, so I simply select “List records”.

6. Setting properties. Now I need to add the environment I want to connect, for example my D365 productions environment or even the dev environment it I want to test the whole idea and create a couple of failing workflows.

From the Entity Name I then select System Jobs.

7. I don’t want to get all the system job runs as there are thousands of them, and I only want to track failed ones. To do this, I set the Status Reason to “Waiting” and “Failed”.

I then open advanced settings to access the parameters.

While it is pretty intuitive to use “Filter Query” to get just the records we need, i you’ve never used the ODATA expression, you might want to read up on it before you start.

The expression I needed was “statuscode eq 31 or statuscode eq 10”.

Statuscode is the name of the field in Dynamics 365 that carries the status “Waiting” and “Failed”.

eq– is the expression for Equal.

31 and 10 – this one is a bit tricky if you have never worked with D365 APIs. This is an set of options in the D365 “Status Reason” that you have seen above, however D365 stores internal ID and labels for option set values, and through the API, it operates with option set IDs.

After another Google search I found out that the entity name is “asyncoperation” and from the asyncoperation EntityType documentation page I was able to get the details of “Status Reason” field, which is called “statuscode”.

The beauty of Flow is that it is perfect for building things and debugging.

As an example I’ve put single quotes for options set values ‘10’ instead of 10. This leads to flow failure, but instead of a long process of analysis and debugging, it only took a minute to see the response from D365 API and then Google the issue.

You can see each run of the flow, the status, where exactly it failed and the  parameters of the flow during each step. It’s really easy.

After that you get this scary message that might make you think this has all just been a useless waste of time.

{

“status”: 400,

“message”: “A binary operator with incompatible types was detected. Found operand types ‘Edm.Int32’ and ‘Edm.String’ for operator kind ‘Equal’.”,

“source”: “dynamicalex001.crm.dynamics.com”,

“errors”: [],

“debugInfo”: “clientRequestId: 1e3953ce-bb90-4627-a62e-dbb82d70c7eb”

}


But that’s why we have Google.

The very first result gives the solution – ODATA V4 does not require single quote for its parameters.

So, changing “statuscode eq ‘31’ or statuscode eq ‘10’”  to “statuscode eq 31 or statuscode eq 10” fixes the problem.

steps

Now we need to decide what to do with the data. As I like receiving email alerts between 3 a.m. and 4 a.m. that business critical software is down, I decided to send an email if the list is not empty, and also send the list itself.

To do this, I added a step to send an email. There are plenty of alternatives that allow you to get text messages on mobile or other devices.

Then I had an idea that I want to see the list of records. You can get this as a list in the email body, which I discarded as it was too big and practically useless, but the CSV file attached to this email notification provided an excellent snapshot into the issues with the workflows.

To get this snapshot, I had to add a simple step to create a CSV table, set everything to automatic and then use headers.

The CSV provides all the details I need.

9. The last step in testing if this works is to check there are no failed system jobs.

To do that I deleted all the “failed” and “waiting” resource system jobs from the instance I was checking. My Flow failed because it couldn’t generate and attach the CSV file, which is probably better than receiving email alerts on failures if there are no stalled workflows.

I then added a condition check after retrieving a list of system jobs. To do this, all you have to do is type “condition” or “control” into the filter box.

And here we need to check if the list is empty.

Click“Choose a value” in the first field then in the pop-up go to the Expression menu and start building your function. Click on “Empty” then open, and then go back to the dynamic value and insert the value of our “List of items”.

Then check if it is not equal to true.

Here we are asking the system if it is not empty. The answers will be yes (yes, I’m not empty = contains at least one failed system job) or no (No, I’m empty).

10. If we don’t have any failed or waiting system jobs we can just exit or cancel the flow and not send any messages. And if there is at least one record with a failed or waiting system job, we will receive an email alert with a CSV file attached containing all the details.

Leave a Reply

avatar

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of