Test databases

The use of a test database is another common method for testing rules (or any kind of business logic, for that matter). The idea is to build a large number of test cases, with carefully chosen data, and determine what the correct system response should be for each case.

Then, the test cases are processed by the logical system and output is generated. Finally, the expected output is compared to the actual output, and any differences are investigated as possible logical bugs.

Let's construct a very small test database with only a few test cases, determine our expected outcomes, then run the tests and compare the results. We want to ensure that our rules execute properly for all cases that might be encountered in a real-life production system. To do this, we must create a set of cases that includes all such possibilities.

In our simple example of two rules, this is a relatively straightforward task:

condition | Smoker (smoker = true) | Non-Smoker (smoker = false) |

Age <= 55 | ||

Age > 55 |

In this table, we have assembled a matrix using the Values sets from each of the Conditions in our rules. By arranging one set of values in rows, and the other set in columns, we create the Cross Product (also known as the direct product or cross product) of the two Values sets, which means that every member of one set is paired with every member of the other set. Since each Values set has only two members, the Cross Product yields 4 distinct possible combinations of members (2 multiplied by 2). These combinations are represented by the intersection of each row and column in the table above. Now let's fill in the table using the expected outcomes from our rules.

Rule 1, the age rule, is represented by row 1 in the table above. Recall that rule 1 deals exclusively with the age of the applicant and is not impacted by the applicant's smoker value. To put it another way, the rule produces the same outcome regardless of whether the applicant's smoker value is true or false. Therefore, the action taken when rule 1 fires (riskRating is assigned the value of low) should be entered into both cells of row 1 in the table, as shown:

Figure 218. Rule 1 Expected Outcome

Likewise, rule 2, the smoker rule, is represented by column 1 in the table above, All Combinations of Conditions in Table Form . The action taken if rule 2 fires (riskRating is assigned the value of high) should be entered into both cells of column 1 as shown:

Figure 219. Rule 2 Expected Outcome

The table format illustrates the fact that a complete set of test data should contain four distinct cases (each cell corresponds to a case). Rearranging, our test cases and expected results can be summarized as follows:

Figure 220. Test Cases Extracted from Cross Product

The table format also highlights two problems we encountered earlier with flowcharts. In the figure Rule 2 Expected Outcome, row 1 and column 1 intersect in the upper left cell (this cell corresponds to test case #1 in the figure above). As a result, each rule tries to assert its own action – one rule assigns a low value, and the other rule assigns a high value. Which rule is correct?

Logically speaking, they both are. But if the rule analyst had a business preference, it was certainly lost in the implementation. As before, we simply can't tell by the way the two rules are expressed. Logical conflict reveals itself once more.

Also notice the lower right cell (corresponding to test case #4) – it is empty. The combination of age>55 AND non-smoker (smoker=false) produces no outcome because neither rule deals with this case – the logical incompleteness in our business rules reveals itself once more.

Before we deal with the logical problems discovered here, let's build a Ruletest in Studio that includes all four test cases in the figure above.

Figure 221. Inputs and Outputs of the 4 Test Cases

Let's look at the test case results in the figure above. Are they consistent with our expectations? With a minor exception in case #1, the answer is yes. In case #1, riskRating has been assigned the value of high. But also notice the rule statements posted: case #1 has produced two messages which indicate that both the age rule and the smoker rule fired as expected. But since riskRating can hold only one value, the system non-deterministically (at least from our perspective) assigned it the value of high.

So if using test cases works, what is wrong with using it as part of our Analysis methodology? Let's look at the assumptions and simplifications made in the previous example:

1. We are working with just two rules with two Conditions. Imagine a rule pattern comprising three Conditions – our simple 2-dimensional table expands into three dimensions. This may still not be too difficult to work with as some people are comfortable visualizing in three dimensions. But what about four or more? It is true that large, multi-dimensional tables can be flattened and represented in a 2-D table, but these become very large and awkward very quickly.

2. Each of our rules contains only a single Conditional parameter limited to only two values. Each also assigns, as its Action, a single parameter which is also limited to just two values.

When the number of rules and/or values becomes very large, as is typical with real-world business decisions, the size of the Cross Product rapidly becomes unmanageable. For example, a set of only six Conditions, each choosing from only ten values produces a Cross Product of 106, or one million combinations. Manually analyzing a million combinations for conflict and incompleteness is tedious and time-consuming, and still prone to human error.

In many cases, the potential set of cases is so large, that few project teams take the time to rigorously define all possibilities for testing. Instead, they often pull test cases from an actual database populated with real data. If this occurs, conflict and incompleteness may never be discovered during testing because it is unlikely that every possible combination will be covered by the test data.