jump to navigation

Why Is My Index Not Being Used No. 2 Solution (The Narrow Way) October 6, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.

As many have identified, the first thing to point out is that the two queries are not exactly equivalent.

The BETWEEN clause is equivalent to a ‘>= and <=’ predicate, whereas the original query only had a ‘> and <‘ predicate. The additional equal conditions at each end is significant.

The selectivity of the original query is basically costed as  (max condition – min condition)/(max column value – min column value). As this equates to 1 day/2000 days, the selectivity of the first query is therefore 0.0005.

1M rows multiplied by 0.0005 = 500 rows, the CBO cardinality estimate.

Note that 0.0005 is also the selectivity of 1 day.

The  calculation for the BETWEEN clause is different. It’s the above formula PLUS the selectivity of 1 day for the each of the two equal conditions (as the CBO assumes you not only want the values within the range but the specific values on each side of the range).

So that’s a selectivity of 0.0005 for the date range as above plus 2 x 0.0005 for each of the equal conditions = 0.0005 + 0.001 = 0.0015.

1M muliplied by 0.0015 = 1500, the CBO cardinality estimate of the second query.

As many have mentioned in the comments, the Clustering Factor of this index is awful, as each of the distinct 500 occurences of each day is spread evenly throughout the whole table. It’s therefore very expensive to use this index to retrieve a “larger” number of rows.

The cost of the first query that used the index is 505, about as bad as it gets when retrieving 500 rows. If we were to retrieve not 500 but 1500 rows, then the cost would effectively triple and be approximately 1500.

However, the cost of the FTS as highlighted in the second query is 933. This is less than 1500 and so the FTS is prefered by the CBO in the second query.

It all comes down to the relative costs and these all come down to the estimated selectivity of the query and the associated segment (and system) statistics, of which the Clustering Factor of the index is one of the most significant factors of all (no pun intended of course). If we effectively triple the estimated costs of a query as we do with the second query, then this can obviously have an impact on the CBO calculations and the resultant execution plan.

If we were to rewrite the first query to be equivalent to using the BETWEEN:

SQL> select * from bowie where hist_date >= '01-JAN-2011' and hist_date <= '02-JAN-2011';
500 rows selected.

Execution Plan
Plan hash value: 1845943507
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   933   (2)| 00:00:12 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   933   (2)| 00:00:12 |
Predicate Information (identified by operation id):
   1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
          0  recursive calls
          0  db block gets
       3344  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

then we now get the same FTS costings and execution plan.

The moral of this story is that using a BETWEEN for a small range can significantly increase the cost of the query. Something to be aware of if perhaps close enough is good enough.