Using rules to generate alarms

In addition to generating real-time reports, rules can also detect critical situations and generate alarms. In this example, the measures we store in the ReqStats table may show a critical situation. For example, the number of requests handled exceeds a critical limit; or the average response time suddenly degrades beyond an acceptable threshold. Let us assume that the average response time for requests should not exceed 3 hours at any time. To generate an alarm where the response time is greater than three hours, you can upgrade the monitoring rule in the following two ways:

initialize {
ReqStats := new infopad<cell{avgtime:int}>[4][3]("ReqStats");
ReqStats.addColLabels("Request Origin", list{"Asia", "US", "Europe"});
ReqStats.addRowLabels("Request Type", list{"ProductOrder", "Support",
"ServiceOrder", "Training"});
ReqStats.check("alarm_1", "avgtime", "GT", 3, "transition");
}

The rule associates an alarm condition to all cells of the Support row of the ReqStats infopad. The event generated by an alarm-condition is defined by the check statement added to the infopad definition in the initialization section above. It tells BPM Events to trigger an internal event of type "BPEVENT_INFOPAD_ALARM", and of value "alarm_1" each time the average value ("avgtime")is greater than ("GT") the threshold of three hours (3) when updated. No alarm is sent if the threshold was already above 3, a distinction specified by the keyword "transition".

Add a rule, named alarm_timeReq, to handle the generated internal alarm event, and transform it into an external event with appropriate data, which in this example consists of sending an e-mail.

Note: In the e-mailing rule above, the region identification is obtained by the corresponding column of the infopad, as reported by the alarm event, and is concatenated (operator "+") to the message in second argument, to form the body of the e-mail.

However, you may further improve the defined monitoring set-up by providing a solution to the following scenario: what if the very first response time used to calculate the Support average response time, in the table ReqStats, is 4 hours? An alarm would be triggered, because of a single odd value. Because the alarm condition is associated with an average value, it must not fire off for a single odd measure, as it may happen at the beginning of the monitoring session, when very few values are used in the average. Rather, it must be ensured that the average value is representative of a significant number of support orders, before it is considered for an alarm check. Therefore, the alarm should only consider average values calculated from a significant number of requests. Let us say that at least 100 requests of type Support should be logged in order for the average response time to be meaningful. We will improve the alarm_timeReq rule by adding this condition:

rule alarm_timeReq
activated by evt1 of BPEVENT_INFOPAD_ALARM::alarm_1
if (evt1.count >= 100)
then { sendMail("john@acme.com", "response time too long for support in region:" +
evt1.column, evt1);}

The event generated by an alarm-condition automatically contains all the attributes of the faulty element of the infopad (here, the attributes of the element for which the average time attribute passed the 3 hours limit). One of these attributes is the request count. Another is the column ID of the offending cell (+ evt1.column), which represents the region where the request originates.