Notifying when a server is offline based on when agents last added data to OMS[Update 11/29/2017: This blog post series has been superseded by a solution built to visualize server and client information which is available at: http://blogs.catapultsystems.com/cfuller/archive/2017/11/28/updating-the-server-and-client-performance-solution-to-the-new-query-language/. Please note that query examples in this deprecated blog post are for the old query language and will not work in the current query language.]
The first part of this blog series showed how Microsoft has built in a notification for what OMS agents are offline as part of their pre-built reports. This blog post will show how to use a query for OMS agents who have not reported data into OMS for the last hour. These steps include:
- Querying OMS for agents that have not reported data
- Enabling the auditing preview
- Creating a dashboard for the query
- What does the email notification look like?
Querying for OMS agents that have not reported data in the last hour:
The concept here is that we are going to develop a query for computers which haven’t reported data in the appropriate timeframe. Building this from the UI where you can see this information at under settings / connected sources. And then clicking on the number of the servers connected:
This gives us the following query in my environment:
MG:”00000000-0000-0000-0000-000000000001″ or MG:”00000000-0000-0000-0000-000000000002″ | Measure Max(TimeGenerated) as LastData by Computer | Sort Computer
To make this generic we can replace the management groups with a wildcard as shown below.
MG=* | Measure Max(TimeGenerated) as LastData by Computer | Sort Computer
Next we can restrict the query above to only show systems which haven’t reported data in the last hour.
MG=* | Measure Max(TimeGenerated) as LastData by Computer | where LastData<NOW-1HOURS
This shows the computers in each of the management groups represented which have not reported into OMS in the past hour.
Finally we remove the blank record:
MG=* | Measure Max(TimeGenerated) as LastData by Computer | where LastData<NOW-1HOURS AND Computer != “”
Enable the alerting preview:
We can also alert on this condition by using the alerting preview. As of 1/13/2016 the alerting functionality is in preview state. To enable Alerting (if it’s not already enabled) in Settings / Preview Features (details are available at: http://blogs.technet.com/b/momteam/archive/2015/12/02/announcing-the-oms-alerting-public-preview.aspx)
When creating the alert, you need to provide at least a name for the alert, the query it is using, the schedule, a time window and the threshold to activate the alert. The highlighted example below shows whether the alert would actually fire (it would not as the value is 0 when this test was done). To successfully verify that the alert rule would fire it needs to match the criteria.
On the alerts tab on settings we can see the alerts which were created:
Creating a dashboard item for the query:
It’s simple to create a dashboard for this item by using the query provided earlier in this blog post (MG=* | Measure Max(TimeGenerated) as LastData by Computer | where LastData<NOW-1HOURS) and choosing the left of the two visualizations:
The dashboard shown below uses the query defined above to provide an easy to see visualization on your OMS dashboard. An environment with no agents having issues is shown below.
Followed by an example where a single agent is not reporting data for an hour.
What does the email notification look like?
Samples of the email notification are shown below. It’s important to note there is a restriction where only up to 10 systems are included in the email notification. A single system result is shown below:
An email with two systems offline is shown below:
Please note, this test will only alert if the computer has stopped responding within the last hour. If the server is already offline when the alert is created an alert will not be generated at that point in time.
Summary: The approach in this blog post shows how to create a quick query which will alert you when any agents in the OMS subscription go offline. When it is in place it will provide a single notification every time that there is a change on what systems are no longer reporting to OMS. This mirrors closely to what Operations Manager does in terms of a heartbeat by sending you a single notification that a list of agents if no longer reporting.
In the next blog post we will look into a method to use a query for a specific agent which is not reporting in OMS.