SQL use of index statistics

SQL uses new index statistics whenever they exist. Statistics are not automatically created. Index statistics are used to estimate costs for "=" predicates and for joins on "=" predicates. In these situations, the column selectivity statistics will no longer be used. Statistic counts are used to estimate the number of rows that will be read using an index for a given set of key component values, as shown:

The statistic count used depends on the number of key components with corresponding predicates. If an index has more than three key components and the query needs a count for a key subset without an explicit count, SQL uses interpolation to produce an estimated count. For instance, suppose an index has six components, and there are predicates for the first three components only. Then SQL will use the new index counts to interpolate an estimate for three components. SQL uses linear interpolation. The linear interpolation, in effect, draws a straight line through statistics' counts, and estimates where the count would be for only three components. This is an estimation, of course, and subject to some inaccuracy.

In the majority of cases, the new index statistics will outperform the older statistics and their associated rules.

This means it is much more likely that queries will use the best index. Therefore, whenever there is a problem with index performance, or with choice of an index by the optimizer, create new index statistics. Note that algorithms used for cost estimating with new index statistics assume relatively uniform data distribution. If data distribution is highly skewed so that some key values have more instances than others, cost estimates will be less accurate.