jump to navigation

Why Is My Index Not Being Used No. 2 Quiz (Quicksand) October 3, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
trackback

I have a table that has 1M rows with dates that span 2000 days, all evenly distributed (so there are 500 rows per day for the mathematically challenged). All stats are 100% accurate and I have an index on the date column.
 

SQL> create table bowie (id number, hist_date date, text varchar2(30));
 
Table created.
 
SQL> insert into bowie select rownum, sysdate-mod(rownum, 2000), 'BOWIE' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_date_i on bowie(hist_date);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 
 
 
OK, I now select 1 day’s worth of data:

 
 

SQL> select * from bowie where hist_date > '01-JAN-2011' and hist_date < '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 690852991
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   500 |  9500 |   505   (0)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |   500 |  9500 |   505   (0)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |   500 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("HIST_DATE">TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "HIST_DATE"<TO_DATE(' 2011-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        506  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
 
 
Everything is perfect. The index is used and the cardinality estimate is spot on with the CBO correctly predicting that 500 rows will be returned.
 
 
OK, I now re-write the query with a BETWEEN clause:
 
 

SQL> select * from bowie where hist_date between '01-JAN-2011' and '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   933   (2)| 00:00:12 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   933   (2)| 00:00:12 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3344  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
 
And now everything has gone wrong. I’m still getting the same 500 rows but the CBO is choosing an inefficient FTS. The estimates are now way way off, with the CBO expecting 1500, not 500 rows to be returned.
 
 
QUESTION: Why has everything now gone so terribly wrong ?

Comments»

1. Dom Brooks - October 3, 2011

We know that the sql statements are not the same.

“BETWEEN … AND …” is not semantically equivalent to “> AND = AND <=".

Now that difference is enough to adjust the selectivity from 1/1999 to 3/1999 even though we know that that's not really the case.

Like

Dom Brooks - October 3, 2011

Sorry – stuffed up the formatting.
Let me try again.
Meant to say:
“BETWEEN … AND …” is not semantically equivalent to “> … AND < …” but to “>= …. AND <= ….”.

Like

Dom Brooks - October 4, 2011

Of course, the data is so “evenly distributed” that the clustering factor of the index is dreadful.

Like

Dom Brooks - October 4, 2011

…”evenly distributed” as in the date for a particular day only repeats every 2000 rows.

Like

2. Craig Simpson - October 3, 2011

Ok, so using the between operator instead of switches the query from using bounded open predicates to bounded closed predicates.

This has the effect of altering the estimated cardinality from 500 to 1500 = (500 for the range + 500 for the bounded start predicate + 500 for the bounded end predicate)

this raises the cost of the statement from ~500 to ~1500
meaning that it is costed to be more efficient to run a full table scan at a cost of 933 rather than choose the index at a cost ~ 1500

Like

3. Saurabh Manroy - October 3, 2011

The cardinality estimation is different because between is re-written as = .. which means end points are included in range. Due to this, we need to include the enpoint selectivity (1/2000) as well in estimation of overall cardinality.

Like

Saurabh Manroy - October 3, 2011

read “=” in above as “>= and <=" (formatting problems).

Like

4. Asif Momen - October 3, 2011

I am on 11.2.0.2 and Optimizer seems to be smart enough to use the index.

 
SQL> select * from bowie where hist_date between '01-JAN-2011' and '02-JAN-2011';

500 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 690852991

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   600 | 11400 |   605   (0)| 00:00:08 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |   600 | 11400 |   605   (0)| 00:00:08 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |   600 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"
SQL>

What version are you on ?

Like

5. Tony Sleight - October 3, 2011

The NLS_DATE_FORMAT plays a part in this example. The example works if you are using the default NLS_DATE_FORMAT for the implicit date conversion.
I agree with the above example using 11.2.0.2 if an

alter session set NLS_DATE_FORMAT = ‘DD-MON-YYYY’;

is issued.

TESTDB01 USER1> alter session set nls_date_format = 'DD-MON-YYYY';

Session altered.

TESTDB01 USER1> SELECT value FROM v$nls_parameters WHERE parameter ='NLS_DATE_FORMAT';

VALUE
----------------------------------------------------------------
DD-MON-YYYY

TESTDB01 USER1> set autotrace traceonly
TESTDB01 USER1> select * from bowie where hist_date > '01-JAN-2011' and hist_date < '02-JAN-2011';

500 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 690852991

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   500 |  9500 |   512   (2)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |   500 |  9500 |   512   (2)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |   500 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("HIST_DATE">TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "HIST_DATE"<TO_DATE(' 2011-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


Statistics
----------------------------------------------------------
        129  recursive calls
          0  db block gets
        554  consistent gets
          0  physical reads
          0  redo size
      16852  bytes sent via SQL*Net to client
        779  bytes received via SQL*Net from client
         35  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
        500  rows processed

TESTDB01 USER1> select * from bowie where hist_date between '01-JAN-2011' and '02-JAN-2011';

500 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |  1321  (34)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |  1321  (34)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))


Statistics
----------------------------------------------------------
         87  recursive calls
          0  db block gets
       3363  consistent gets
       3314  physical reads
          0  redo size
       9773  bytes sent via SQL*Net to client
        779  bytes received via SQL*Net from client
         35  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
        500  rows processed

TESTDB01 USER1> set autotrace off

Like

Tony Sleight - October 3, 2011

Sorry, my mistake, my laptop is on 11.2.0.1!

However, the NLS_DATE_FORMAT does need to be the same as the string you are passing in order for the example to work.

I’m just trying to work out why the cardinality estimates are so different and make such a plan dramatic change. The problem I have is that for the full table scan, 1500/1000000 is 0.15% of the records, whilst, for the index 500/1000000 is 0.05%. I would not have thought that was sufficient to change from index to FTS. It is as if the index is just not being considered at all.

I’m going to create a 10053 trace to see what it says..

Like

Tony Sleight - October 4, 2011

As Dom says above, the clustering factor of this index is bad (1000000 = number of rows in table), so it is not a good candidate to start with.

The selectivity changes by the addition of 1/num_distinct from the open ranges to the closed ranges.

num_distinct = 2000 for this table.

So, in moving from open bounded ranges to closed bounded ranges the selectivity changes by the addition of 1/2000 + 1/2000. For the 1000000 row table this increases cardinality by 1000000 * 2/2000 = 1000. This is where the estimated cardinality of 1500 comes from.

The optimiser therefore predicts a cost of 1507 which is just enough to put it over the full table scan cost of 1344 and hence the full tables scan is estimated as the best plan.

My theory is if the clustering factor was reduced from a value of NUM_ROWS to NUM_BLOCKS this would make the index more attractive.

I shall test this later!

Like

6. Gary Colbran - October 3, 2011

Since the date is truncated, you have dates rounded (all dates in the table). Oracle decides that the volume of the table to be scanned has exceeded the threshold, and a full table scan is more efficient than the index scan and the full table scan (5.26%).
You have three times the data since one is greater than and less than, the other is between or greater than or equal to and less than or equal to.
You should be able to raise or lower this with the index optimizer parameter to see when it hits the index scan rather than the full table scan.
It would be more interesting if you can give the other row counts included, being from 500 rows to 1500 rows, perhaps to find the number of rows where the full table scan occurs.

Like

7. vishal Desai - October 4, 2011

Difference is due to how cardinality is measured.

EXPLAIN PLAN SET STATEMENT_ID ‘c1’ FOR select * from bowie where id > 6000; –unbounded open
EXPLAIN PLAN SET STATEMENT_ID ‘c2’ FOR select * from bowie where id >= 6000; –unbounded closed
EXPLAIN PLAN SET STATEMENT_ID ‘c3’ FOR select * from bowie where id < 6000; — unbounded open
EXPLAIN PLAN SET STATEMENT_ID 'c4' FOR select * from bowie where id = 6000 and id = 6000 and id 6000 and id 6000 and id < 7000; — Bounded open open

unbounded – limitless
open – value is excluded
close – value is included

STATEMENT_ID Selectivity & Cardinality CARDINALITY
———————————————————- ———–
c1 S=(1000000-6000)/(1000000-1)=994000/999999
C=S*1000000 = 994000.9 994001
c2 S=994000/999999+1/1000000
C=S*1000000 = 994001.9 994002
c3 5999
c4 6000
c5 S=(7000-6000)/(1000000-1) +2(1/1000000)
C=S*1000000=1002.0 1002
c6 1002
c7 1001
c8 1001
c9 S=(7000-6000)/(1000000-1)
C=X*1000000=1000.0 1000

Like

8. vishal Desai - October 4, 2011

Formatting is messed up. Append following statments.

EXPLAIN PLAN SET STATEMENT_ID ‘c5’ FOR select * from bowie where id between 6000 and 7000; — Bounded closed closed
EXPLAIN PLAN SET STATEMENT_ID ‘c6’ FOR select * from bowie where id >= 6000 and id = 6000 and id 6000 and id 6000 and id < 7000; — Bounded open open

Like

vishal Desai - October 4, 2011

EXPLAIN PLAN SET STATEMENT_ID ‘c7’ FOR select * from bowie where id >= 6000 and id 6000 and id 6000 and id < 7000; — Bounded open open

Like

9. vishal Desai - October 4, 2011

EXPLAIN PLAN SET STATEMENT_ID ‘c8’ FOR select * from bowie where id > 6000 and id 6000 and id < 7000; — Bounded open open

Like

10. vishal Desai - October 4, 2011

EXPLAIN PLAN SET STATEMENT_ID ‘c9’ FOR select * from bowie where id > 6000 and id < 7000; — Bounded open open

Like

11. vishal Desai - October 4, 2011

Sorry for multiple posts. Need to figure out how to format it.

Like

12. Why Is My Index Not Being Used No. 2 Quiz (Quicksand) « Ukrainian Oracle User Group - October 4, 2011

[…] Why Is My Index Not Being Used No. 2 Quiz (Quicksand) October 3, 2011 […]

Like

13. Mark V. - October 4, 2011

Generate histogram for the indexed column. Cardinality will then be correct and execution plan will not change.

Like

Tony Sleight - October 4, 2011

From Jonathan Lewis’ Cost Based Oracle Fundamentals, page 69,

cost = blevel +
ceiling(leaf_blocks * effective_index_selectivity) +
ceiling(clustering_factor * effective_table_selectivity)

for the closed bounded example,

cost = 2 +
ceiling(2646 * 0.0015) +
ceiling(1000000 * 0.0015)
cost = 2 + 4 + 1500 = 1506

if the clustering factor was reduced to 3394 (number of table blocks).

cost = 2 + 4 + 6 = 12.

This is definitely a more attractive index.

Like

14. Tony Sleight - October 4, 2011

As Mark says, the addition of a histogram on the indexed column will reduce the cost sufficiently to choose the index over the full table scan.

Nevertheless, the clustering factor still makes the index an unattractive offer. If you have a system where the overhead of a histogram is prohibitive. The changing of the clustering factor would be a cheap and efficient solution.

Like

Dom Brooks - October 4, 2011

Surely changing the clustering factor just hides the fact that the order of the data in the table is not conducive to efficient index access? And by changing it, aren’t we encouraging the use of an index that we shouldn’t? (Even though the slight issue with the cardinality estimate shows the tipping point).

If it is the nature of the application/data that it arrives in this order and more often that not we want date ranges of this data, then might this not be a candidate for an IOT? We might have to mess with the PK to do it but it will at least give us nicely ordered data.

E.g. assuming id would normally be a PK even though it’s not in the original definition

SQL> create table bowie 
  2  (id        number
  3  ,hist_date date
  4  ,text      varchar2(30)
  5  ,constraint pk_bowie primary key (hist_date, id)
  6  ,constraint uk_bowie unique (id))
  7  organization index;

Table created.

SQL> insert into bowie select rownum, sysdate-mod(rownum, 2000), 'BOWIE' from dual connect by level 
<= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL>  
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_per
cent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL>  SQL> select * 
  2  from   bowie 
  3  where  hist_date > to_date('01-JAN-2011','DD-MON-YYYY') 
  4  and    hist_date < to_date('02-JAN-2011','DD-MON-YYYY');

500 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4146518033

-----------------------------------------------------------------------------
| Id  | Operation        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |          |   500 |  9500 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| PK_BOWIE |   500 |  9500 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIST_DATE">TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         39  consistent gets
          0  physical reads
          0  redo size
      12289  bytes sent via SQL*Net to client
        697  bytes received via SQL*Net from client
         35  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

SQL> 
SQL> 
SQL> select * 
  2  from   bowie 
  3  where  hist_date >= to_date('01-JAN-2011','DD-MON-YYYY') 
  4  and    hist_date <= to_date('02-JAN-2011','DD-MON-YYYY');

500 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4146518033

-----------------------------------------------------------------------------
| Id  | Operation        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |          |  1500 | 28500 |     9   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| PK_BOWIE |  1500 | 28500 |     9   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         39  consistent gets
          0  physical reads
          0  redo size
      12289  bytes sent via SQL*Net to client
        697  bytes received via SQL*Net from client
         35  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

SQL> 

The relatively minor inaccuracy of cardinality of 500 vs 1500 ceases to be an issue.

Like

Tony Sleight - October 4, 2011

That’s a good point.

The answer depends upon expectations and true organisation of data. Histograms, clustering factor and re-organising to IOT all have their place.

It all boils down to what works best for your particular circumstances.

Like

15. Houri Mohamed - October 4, 2011

(1) I don’t get full table scan in my 10.2.0.4.0 oracle release
(2) I am wondering why are you comparing string with date
(3) When I use dynamic_sampling I can get accurate estimation on table but not on the index
(4) as Dom has already pointed that out it’s clear that the index clustering_factor (100000) has a dramatic value that might have damaged the desirability of the index by the CBO. The index is such that the sys_op_lbid function when applied on it shows that 377 keys per leaf block are scattered into 2652 blokcs where the rest of keys is located into one leaf block.

That is said, I am not sure that the difference between E-Rows and A-Rows comes from this CF dramatic value but from the fact that available statistics do not reflect the real scattering of hist_date column making the CBO unable to have accurate estimations

mhouri >  select * from v$version;

BANNER                                                                          
----------------------------------------------------------------                
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi                
PL/SQL Release 10.2.0.4.0 - Production                                          
CORE	10.2.0.4.0	Production                                                      
TNS for Solaris: Version 10.2.0.4.0 - Production                                
NLSRTL Version 10.2.0.4.0 - Production                                          

mhouri >  select index_name, leaf_blocks, num_rows, clustering_factor
  2  from user_indexes
  3  where index_name = 'BOWIE_DATE_I';

INDEX_NAME   EAF_BLOCKS NUM_ROWS CLUSTERING_FACTOR         
------------------------------ ------------------  
BOWIE_DATE_I   2653    1000000     1000000         

mhouri > select
  2      keys_per_leaf, count(*) blocks
  3  FROM (
  4        select   sys_op_lbid (517668, 'L', bo.rowid) block_id,
  5               COUNT (*)             KEYS_PER_LEAF
  6        FROM BOWIE BO
  7         where hist_date is not null
  8        group by sys_op_lbid (517668, 'L', bo.rowid)
  9       )
 10  group by keys_per_leaf
 11  order by keys_per_leaf;

KEYS_PER_LEAF  BLOCKS                                                                                                
------------- ----------                                                                                                
 196          1                                                                                                
 377         2652         
		  
mhouri >  select /*+ gather_plan_statistics */
  2        * from
  3    BOWIE
  4  where HIST_DATE > '01-JAN-2011'
  5  and hist_date < '02-JAN-2011';

500 rows selected.
-------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Starts | E-Rows | A-Rows |                                               
-------------------------------------------------------------------------------
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |      1 |    500 |    500 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |      1 |    500 |    500 |
-------------------------------------------------------------------------------
                                                                               
Predicate Information (identified by operation id):                             
---------------------------------------------------                             
   2 - access("HIST_DATE">TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') 
            AND "HIST_DATE"<TO_DATE(' 2011-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                                                                            
                                                                                
20 rows selected.

mhouri >  select /*+ gather_plan_statistics */
  2     * from
  3  BOWIE
  4  where hist_date between '01-JAN-2011' and '02-JAN-2011';

500 rows selected.

-------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------------------
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |      1 |   1500 |    500 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |      1 |   1500 |    500 |
-------------------------------------------------------------------------------
                                                                                                                        
Predicate Information (identified by operation id):                                                                     
---------------------------------------------------                                                                     
                                                                                                                        
   2 - access("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                                 
            AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                                   
                                                                                                                        

20 rows selected.

mhouri >  select /*+ dynamic_sampling (bo 5) */
  2     * from
  3  BOWIE BO
  4  where bo.hist_date between to_date('2011-01-01','yyyy-mm-dd') 
    and to_date('2011-01-02','yyyy-mm-dd');
                                                               
500 rows selected.
                                                                
-------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------------------
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |      1 |    530 |    500 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |      1 |   1500 |    500 |
-------------------------------------------------------------------------------
                                                                                                                        
Predicate Information (identified by operation id):                                                                     
---------------------------------------------------                                                                     
                                                                                                                        
   2 - access("BO"."HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')                           
             AND "BO"."HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))                              
                                                                                                                        
Note                                                                                                                    
-----                                                                                                                   
   - dynamic sampling used for this statement                                                                           
                                                                                                                       

24 rows selected.


Like

albertofrosi - October 5, 2011

Hi Richard,
i don’t be able to reproduce this issue because in my system, both statements use index.
I tryed to change CF doing:
create table bowie (id number, hist_date date, text varchar2(30));

insert into bowie select rownum, sysdate-mod(rownum, 2000), ‘BOWIE’ from dual connect by level null, tabname=>’BOWIE’, cascade=>true, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE 1′);

select blocks from user_tables where table_name=’BOWIE’;

BLOCKS
———————-
3400

select clustering_factor from user_indexes where index_name = ‘BOWIE_DATE_I’;

CLUSTERING_FACTOR
———————-
3801

and here Explain plain for the right query

EXPLAIN PLANS SECTION
——————————————————————————-

1- Original
———–
Plan hash value: 1539877125

———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 500 | 9000 | 6 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| BOWIE | 500 | 9000 | 6 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BOWIE_DATE_I | 500 | | 4 (0)| 00:00:01 |
———————————————————————————————

Query Block Name / Object Alias (identified by operation id):
————————————————————-

1 – SEL$1
2 – SEL$1 / BOWIE@SEL$1
3 – SEL$1 / BOWIE@SEL$1

Predicate Information (identified by operation id):
—————————————————

1 – filter(TO_DATE(’01-GEN-2011′)’01-GEN-2011′ AND “HIST_DATE”<'02-GEN-2011')

Column Projection Information (identified by operation id):
———————————————————–

1 – "BOWIE"."ID"[NUMBER,22], "HIST_DATE"[DATE,7], "BOWIE"."TEXT"[VARCHAR2,30]
2 – "BOWIE"."ID"[NUMBER,22], "HIST_DATE"[DATE,7], "BOWIE"."TEXT"[VARCHAR2,30]
3 – "BOWIE".ROWID[ROWID,10], "HIST_DATE"[DATE,7]

——————————————————————————-

and thnis for the wrong:

EXPLAIN PLANS SECTION
——————————————————————————-

1- Original
———–
Plan hash value: 1539877125

———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1001 | 18018 | 9 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| BOWIE | 1001 | 18018 | 9 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BOWIE_DATE_I | 1500 | | 5 (0)| 00:00:01 |
———————————————————————————————

Query Block Name / Object Alias (identified by operation id):
————————————————————-

1 – SEL$1
2 – SEL$1 / BOWIE@SEL$1
3 – SEL$1 / BOWIE@SEL$1

Predicate Information (identified by operation id):
—————————————————

1 – filter(TO_DATE('01-GEN-2011')=’01-GEN-2011′ AND “HIST_DATE”= ’01-GEN-2011′ and hist_date <= '02-GEN-2011';
EXPLAIN PLANS SECTION
——————————————————————————-

1- Original
———–
Plan hash value: 1539877125

———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1001 | 18018 | 9 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| BOWIE | 1001 | 18018 | 9 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BOWIE_DATE_I | 1500 | | 5 (0)| 00:00:01 |
———————————————————————————————

Query Block Name / Object Alias (identified by operation id):
————————————————————-

1 – SEL$1
2 – SEL$1 / BOWIE@SEL$1
3 – SEL$1 / BOWIE@SEL$1

Predicate Information (identified by operation id):
—————————————————

1 – filter(TO_DATE('01-GEN-2011')=’01-GEN-2011′ AND “HIST_DATE”<='02-GEN-2011')

Column Projection Information (identified by operation id):
———————————————————–

1 – "BOWIE"."ID"[NUMBER,22], "HIST_DATE"[DATE,7], "BOWIE"."TEXT"[VARCHAR2,30]
2 – "BOWIE"."ID"[NUMBER,22], "HIST_DATE"[DATE,7], "BOWIE"."TEXT"[VARCHAR2,30]
3 – "BOWIE".ROWID[ROWID,10], "HIST_DATE"[DATE,7]

——————————————————————————-

same result of the wrong query.
the difference for me is the clause between.
Using AND operator between operator is equivalent to 2
conditions using "greater than or equal to" and
"less than or equal to" operators respectively

Ciao
Alberto

Like

16. albertofrosi - October 5, 2011

Sorry, is missing the main statements who creates the difference….
😉

select * from proddta.bowie where hist_date >= ’01-GEN-2011′ and hist_date <= '02-GEN-2011';

Alberto

Like

albertofrosi - October 5, 2011

Sorry again… :-(… another part:

but my focus is on rows 1001 vs 500 and for bytes 18018 vs 9000, and
I tryed to run this statement:

select * from proddta.bowie where hist_date >= ’01-GEN-2011′ and hist_date <= '02-GEN-2011';
EXPLAIN PLANS SECTION
——————————————————————————-

1- Original
———–
Plan hash value: 1539877125

———————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————
| 0 | SELECT STATEMENT | | 1001 | 18018 | 9 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| BOWIE | 1001 | 18018 | 9 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BOWIE_DATE_I | 1500 | | 5 (0)| 00:00:01 |
———————————————————————————————

Query Block Name / Object Alias (identified by operation id):
————————————————————-

1 – SEL$1
2 – SEL$1 / BOWIE@SEL$1
3 – SEL$1 / BOWIE@SEL$1

Predicate Information (identified by operation id):
—————————————————

1 – filter(TO_DATE('01-GEN-2011')=’01-GEN-2011′ AND “HIST_DATE”<='02-GEN-2011')

Column Projection Information (identified by operation id):
———————————————————–

1 – "BOWIE"."ID"[NUMBER,22], "HIST_DATE"[DATE,7], "BOWIE"."TEXT"[VARCHAR2,30]
2 – "BOWIE"."ID"[NUMBER,22], "HIST_DATE"[DATE,7], "BOWIE"."TEXT"[VARCHAR2,30]
3 – "BOWIE".ROWID[ROWID,10], "HIST_DATE"[DATE,7]

——————————————————————————-

same result of the wrong query.
the difference for me is the clause between.
Using AND operator between operator is equivalent to 2
conditions using "greater than or equal to" and
"less than or equal to" operators respectively

Ciao
Alberto

Like

17. Donatello Settembrino - October 5, 2011

Hi,
imho, the difference is the arithmetic used by the CBO:

“where hist_date > ’01-JAN-2011′ and hist_date select * from bowie where hist_date between ’01-JAN-2011′ and ’02-JAN-2011′;

—————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————
| 0 | SELECT STATEMENT | | 1500 | 28500 | 588 (5)| 00:00:09 |
|* 1 | TABLE ACCESS FULL| BOWIE | 1500 | 28500 | 588 (5)| 00:00:09 |
—————————————————————————

11.2.0.2@prod>select /*+ index(t, bowie_date_i) */ * from bowie t where hist_date between ’01-JAN-2011′ and ’02-JAN-2011′;

——————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————–
| 0 | SELECT STATEMENT | | 1500 | 28500 | 1507 (1)| 00:00:22 |
| 1 | TABLE ACCESS BY INDEX ROWID| BOWIE | 1500 | 28500 | 1507 (1)| 00:00:22 |
|* 2 | INDEX RANGE SCAN | BOWIE_DATE_I | 1500 | | 4 (0)| 00:00:01 |
——————————————————————————————–

consequently, the CBO decides choosing the lowest cost….a FTS

Like

18. Donatello Settembrino - October 5, 2011

hi,
problems with formatting…try again…
imho, the difference is the arithmetic used by the CBO:

“where hist_date > ’01-JAN-2011′ and hist_date select * from bowie where hist_date between ’01-JAN-2011′ and ’02-JAN-2011′;

—————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————
| 0 | SELECT STATEMENT | | 1500 | 28500 | 588 (5)| 00:00:09 |
|* 1 | TABLE ACCESS FULL| BOWIE | 1500 | 28500 | 588 (5)| 00:00:09 |
—————————————————————————

11.2.0.2@prod>select /*+ index(t, bowie_date_i) */ * from bowie t where hist_date between ’01-JAN-2011′ and ’02-JAN-2011′;

——————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————–
| 0 | SELECT STATEMENT | | 1500 | 28500 | 1507 (1)| 00:00:22 |
| 1 | TABLE ACCESS BY INDEX ROWID| BOWIE | 1500 | 28500 | 1507 (1)| 00:00:22 |
|* 2 | INDEX RANGE SCAN | BOWIE_DATE_I | 1500 | | 4 (0)| 00:00:01 |
——————————————————————————————–

chooses the lowest cost….a FTS

Like

19. Donatello Settembrino - October 5, 2011

I have problems with formatting … someone tell me how to correctly insert formulas?

Like

20. Donatello Settembrino - October 6, 2011

imho, the difference is the arithmetic used by the CBO:


where hist_date > ’01-JAN-2011′ and hist_date < '02-JAN-2011';

is calculated as:


(02-JAN-2011 – 01-JAN-2011) / (05-OCT-2011 – 15-APR-2006) = 1/1999 = 0.0005


“CARD = 1000000*0.0005= 500”


where hist_date between ’01-JAN-2011′ and ’02-JAN-2011′;

is calculated as:


(02-JAN-2011 – 01-JAN-2011) / (05-OCT-2011 – 15-APR-2006) + 2 * 1/NDV= (1/1999) + 2 * (1/2000) = 1+2*0.0005 = 0.0005 + 2*0.0005= 0.0015


“CARD = 1000000*0.0015 = 1500”

In the last case the cardinality is wrong

The cbo comparing the two plans


11.2.0.2@prod>select * from bowie where hist_date between ’01-JAN-2011′ and ’02-JAN-2011′;

—————————————————————————
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   588   (5)| 00:00:09 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   588   (5)| 00:00:09 |
—————————————————————————

11.2.0.2@prod>select /*+ index(t, bowie_date_i) */ * from bowie t where hist_date between ’01-JAN-2011′ and ’02-JAN-2011′;

——————————————————————————————–
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————————–
|   0 | SELECT STATEMENT            |              |  1500 | 28500 |  1507   (1)| 00:00:22 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |  1500 | 28500 |  1507   (1)| 00:00:22 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |  1500 |       |     4   (0)| 00:00:01 |
——————————————————————————————–

chooses the lowest cost….a FTS

Regards,

Like

21. albertofrosi - October 6, 2011

hi all,
I’m be able to reproduce this issue finally in another DB.
I agree with Donatello and his analisys, the cardinality it’s the main factor to change the plans, CBO so.
…. and now we wait the answer from Richard…. 😉

Ciao
Alberto

Like

22. Donatello Settembrino - October 6, 2011

In addition, I believe that some things have been said ambiguous ….

In the example of Richard, the CF is equal to the number of rows in the table.

Instead, a CF close to the number of blocks of the table, would leave unchanged the estimate (wrong) the selectivity, but instead of a FTS (with BETWEEN predicate) there would be access-type “range scan” …

so I think we should point out two things different:

1) the cardinality estimate is wrong for the calculation that I showed

2) the value of the CF “higher” cause an access type FTS

Regards,

Like


Leave a reply to Houri Mohamed Cancel reply