OpsMgr: Never close an alert for a monitor – the exception to the “Rule of the monitor” (#SCOM)
For background, Operations Manager generates alerts based upon two different types of monitoring: Rules and Monitors. A rule does not impact the health state of an entity, a monitor does impact the health state of an entity. Alerts generated by monitors can automatically close when the issue identified by the alert has been resolved (there are types of alerts that do not auto-resolve but we’ll leave those out of the discussion for now). As an example, a monitor for a low disk space condition is raised when a low disk space situation occurs and that change of state generates an alert. If disk space is freed up, the alert is automatically closed by the monitor. In the case of a rule, if an alert is raised it will not close itself. For rules they key is to look at the repeat count and last modified date to see if they are still occurring.
This brings us to what I am calling the “Rule of the monitor”:
- If an alert is generated by a rule and is not repeating it can often be closed.
- If an alert is generated by a monitor it should NOT be closed.
The reasoning is pretty straightforward – if the alert generated by a rule is still occurring it will re-generate the alert when the condition is re-detected. If the alert is generated by a monitor – it will NOT re-generate the alert after the alert has been manually closed. It will only generate an alert if it changes state to another state which provides an alert. As an example:
- Low disk space condition at a warning threshold – generates an alert
- If we were to close this alert we still have a low disk space condition but we just can’t determine that we have a low disk space condition unless we review the health state of the disk and/or server
- If the low disk space condition goes to a critical threshold – it generates an alert
- Or if the low disk space condition were to be resolved it would return to a healthy state. That does not generate an alert but it brings the monitor back to a state where it would generate an alert if it were to go to either warning or critical levels again.
So the reasoning to not close an alert caused by a monitor is because OpsMgr will not notify you again of the situation unless something occurs to change the health state of that monitor (the condition is resolved, the monitor manually reset, or the entities go into maintenance mode as examples). Now that we’ve established the “Rule of the monitor” let’s talk about the exception to this situation.
Exception to the Rule of the monitor
There are times when an alert which was generated by a monitor does not auto-close itself. I have seen these in situations where agents are re-installed, or when the agent’s cache was cleared as an example. If you find an alert which is from a monitor (as shown below) and you right-click on it and go to health explorer and health explorer shows as green (further below) – the alert can be closed as it is not impacting the health of the monitor so therefore it can be closed as it does not represent the current health state of the monitor.
Summary: Never close an alert for a monitor as it will not regenerate unless it’s state changes. The only exception to this rule is for alerts created by monitors which should have auto-closed but did not do so due to a technical issue. If the alert is not relevant in your environment, the better option is to tune the monitor to either disable it for the systems which it is not relevant or to tune the thresholds for the monitor to better match your environment.