## Why Is My Index Not Being Used No. 2 Solution (The Narrow Way) October 6, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.

As many have identified, the first thing to point out is that the two queries are not exactly equivalent.

The BETWEEN clause is equivalent to a ‘>= and <=’ predicate, whereas the original query only had a ‘> and <‘ predicate. The additional equal conditions at each end is significant.

The selectivity of the original query is basically costed as  (max condition – min condition)/(max column value – min column value). As this equates to 1 day/2000 days, the selectivity of the first query is therefore 0.0005.

1M rows multiplied by 0.0005 = 500 rows, the CBO cardinality estimate.

Note that 0.0005 is also the selectivity of 1 day.

The  calculation for the BETWEEN clause is different. It’s the above formula PLUS the selectivity of 1 day for the each of the two equal conditions (as the CBO assumes you not only want the values within the range but the specific values on each side of the range).

So that’s a selectivity of 0.0005 for the date range as above plus 2 x 0.0005 for each of the equal conditions = 0.0005 + 0.001 = 0.0015.

1M muliplied by 0.0015 = 1500, the CBO cardinality estimate of the second query.

As many have mentioned in the comments, the Clustering Factor of this index is awful, as each of the distinct 500 occurences of each day is spread evenly throughout the whole table. It’s therefore very expensive to use this index to retrieve a “larger” number of rows.

The cost of the first query that used the index is 505, about as bad as it gets when retrieving 500 rows. If we were to retrieve not 500 but 1500 rows, then the cost would effectively triple and be approximately 1500.

However, the cost of the FTS as highlighted in the second query is 933. This is less than 1500 and so the FTS is prefered by the CBO in the second query.

It all comes down to the relative costs and these all come down to the estimated selectivity of the query and the associated segment (and system) statistics, of which the Clustering Factor of the index is one of the most significant factors of all (no pun intended of course). If we effectively triple the estimated costs of a query as we do with the second query, then this can obviously have an impact on the CBO calculations and the resultant execution plan.

If we were to rewrite the first query to be equivalent to using the BETWEEN:

```SQL> select * from bowie where hist_date >= '01-JAN-2011' and hist_date <= '02-JAN-2011';

500 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   933   (2)| 00:00:12 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   933   (2)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
3344  consistent gets
0  redo size
5563  bytes sent via SQL*Net to client
395  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
500  rows processed```

then we now get the same FTS costings and execution plan.

The moral of this story is that using a BETWEEN for a small range can significantly increase the cost of the query. Something to be aware of if perhaps close enough is good enough.

1. Why Is My Index Not Being Used No. 2 Solution (The Narrow Way) « Ukrainian Oracle User Group - October 7, 2011

[…] Why Is My Index Not Being Used No. 2 Solution (The Narrow Way) October 6, 2011 […]

Like

2. Rick - February 1, 2012

Maybe I missed something but why do you say:
—–
The moral of this story is that using a BETWEEN for a small range can significantly increase the cost of the query. Something to be aware of if perhaps close enough is good enough.
—–
Didn’t you actually prove that when the queries are the same that you ‘now get the same FTS costings and execution plan’?

How does that show any issue related to the use of BETWEEN? I agree that many uses doen’t realize that the two queries actually are different because between includes the endpoints but that isn’t because BETWEEN was used it is because the user didn’t use BETWEEN to create the same query.

Most of the time I see users using >= and = TRUNC(SYSDATE – 1) and < TRUNC(SYSDATE)

and the other does

myDate BETWEEN TRUNC(SYSDATE – 1) and TRUNC(SYSDATE)

Then the next night the process is repeated and the BETWEEN user gets any data that has a time of 00:00:00 included twice. Causes a nasty, hard to find, data error that seems to be random. It is in that the time has to be exactly 0 and that doesn't happen often.

Good article, though.

Like

Richard Foote - February 8, 2012

Hi Rick

I might not have been clear (nothing unusual there) but the point I was trying to make is that the use of a between will increase the cardinality of the query equivalent to 2 additional values which might not be in true proportion to the overall increase of the range of possible values selected with a BETWEEN.

If “close enough is good enough” in terms of what data you might be after (for example, a range of possible values might not necessarily have to be inclusive in terms of what data gets viewed), then it could well be significantly less expensive to avoid a between clause from the CBO point of view such that it chooses a different execution plan.

That’s all 🙂

Like