jump to navigation

Cost of Virtual Indexes (Little Lies) May 30, 2012

Posted by Richard Foote in CBO, Clustering Factor, Fake Indexes, Oracle Indexes, Virtual Indexes.
trackback

I’ve previously discussed Virtual Indexes and how they can be used to do basic “what if” analysis if such an index really existed. However, a recent comment on the OTN forums regarding using them to compare index costs made me think a follow-up post regarding the dangers of Virtual Indexes might be warranted.

The big advantage of a Virtual Index of course is that it doesn’t really exist and so consumes no storage and can be created extremely quickly/cheaply. The disadvantage of a Virtual index is that it doesn’t really exist and so Oracle can not collect segment level statistics. Without statistics however, the CBO has a very tough time of doing its job properly …

To illustrate, a simple little demo. I begin by creating a table in which the data in the table is stored in CODE column order. An index on the CODE column would therefore have an excellent (very low) Clustering Factor. Note that the Clustering Factor is the most important index related statistic regarding the efficiency and potential cost of using the index.

SQL> create table bowie_ordered (id number, code number, name varchar2(30));

Table created.

SQL> create sequence bowie_seq;

Sequence created.

SQL> declare
  2  begin
  3  for i in 1..100 loop
  4     for j in 1..10000 loop
  5        insert into bowie_ordered values (bowie_seq.nextval, i, 'DAVID BOWIE');
  6     end loop;
  7  end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

OK, I’m now going to create a Virtual Index on the CODE column and collect 100% accurate statistics on the table:

SQL> create index bowie_ordered_i on bowie_ordered(code) nosegment;

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_ORDERED', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Oracle allows statistics to be collected on the table and associated Virtual Index (so that existing statistic gathering jobs won’t now fail), however without an associated segment, no statistics can actually be derived for the index itself.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_name='BOWIE_ORDERED_I';

no rows selected

I’ll list the system statistics so anyone who wants to replicate the demo can get similar results (the database blocksize is 8K):

SQL> SELECT pname, pval1 FROM SYS.AUX_STATS$
     WHERE pname IN ('SREADTIM', 'MREADTIM', 'MBRC', 'CPUSPEED');

PNAME                               PVAL1
------------------------------ ----------
SREADTIM                                2
MREADTIM                               10
CPUSPEED                             1000
MBRC                                   20

If we run the following query:

SQL> set arraysize 5000
SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1678744259

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_ORDERED | 10000 |   195K|  1005  (13)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We note that the CBO uses a Full Table Scan as the CBO has no real choice here as Virtual Indexes are not considered by default.

However, if we change the following hidden parameter and re-run:

SQL> alter session set "_use_nosegment_indexes" = true;

Session altered.

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|        9(0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|        9(0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |        1(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice a few interesting details. Firstly, the CBO has decided to use the Virtual Index however the number of consistent gets remains the same as the previous run so we can clearly see that behind the covers, the Full Table Scan is still performed. The index is not “really there” and so at execution time, the SQL statement is reparsed using the next best available plan.

If we look at the execution plan costs, both the estimate row (10000) and byte values are spot on as these statistics are based on the underlining table/column statistics and the 100 distinct CODE values are evenly distributed. However, the index related costs look remarkably low. Just a cost of 1 to read the index and extract 10,000 index entries (that means an index entry is less than 1 byte in length on average !!). Just a cost of 9 to visit the table and read 10,000 rows. Even with the most efficient of physical indexes, these costings are not realistic and are based on highly questionable default metrics.

Basically, the creation of this Virtual Column is telling us that there is no reason why the index couldn’t potentially be used, IF (that’s a big IF in case no-one noticed) the actual index related statistics are such that the CBO determines the index to be the cheaper option. But it depends on the actual characteristics of the index which can’t be accurately determined until it’s been physically created.

As the Virtual Index suggests the index might be used if it existed, let’s now create it for real:

SQL> drop index bowie_ordered_i;

Index dropped.

SQL> create index bowie_ordered_i on bowie_ordered(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_ORDERED_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_ORDERED_I                   1000000              3546

As predicted, a Clustering Factor of 3546 on an index with 1M index entries is indeed nice and low.

If we now re-run the query again:

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|       60(4)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|       60(4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |       23(5)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         61  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

Indeed, the index has been used by the CBO. However, note that the costs are substantially higher (and more accurate) than previously suggested with the Virtual Index. Indeed the final cost of 60 is very close to the number of consistent gets (61) required by the execution plan and so suggests the CBO is making reasonable index based calculations here.

OK, another demo, but this time with a table in which the CODE values are distributed throughout the whole table (rather than being perfectly clustered together as in the first example):

SQL> create table bowie_random (id number, code number, name varchar2(30));

Table created.

SQL> insert into bowie_random select rownum, mod(rownum,100)+1, 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_RANDOM', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

When we now create a Virtual Index based on the CODE column and re-run the same query:

SQL> create index bowie_random_i on bowie_random(code) nosegment;

Index created.

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice that both the execution plan and all the associated costs are identical to those of the previous example. So although the actual Clustering Factor of the index is likely to be dramatically greater here than it was in the previous example and so likely dramatically impact the costs associated with using this index, the Virtual Index is treated and costed identically. This is the simple consequence of not having the physical index structure by which to calculate the appropriate segment statistics.

If we now physically create this index for real:

SQL> drop index bowie_random_i;

Index dropped.

SQL> create index bowie_random_i on bowie_random(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_RANDOM_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_RANDOM_I                    1000000            344700

We can see that indeed the Clustering Factor is dramatically worse than before, increasing here from 3546 to 344700.

If we now re-run the query:

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1983602984

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_RANDOM | 10000 |   195K|  1005  (13)| 00:00:03 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We see the CBO has decided to perform the now cheaper Full Table Scan. Although the Virtual Index on this column was used, once the actual characteristics of the index are determined via the index statistics, the CBO has decided the actual physical index was just too expensive to use to retrieve the 1% of rows.

If we re-run the query with an index hint:

SQL> select /*+ index (bowie_random) */ * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|  3483   (1)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|  3483   (1)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |    23   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3472  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

The index is used but we can see why at a cost of 3483, the Full Table Scan at a cost of only 1005 was selected by the CBO.

Virtual Indexes can be useful to quickly determine whether an index is a viable option if it were to be actually created. However, caution needs to be exercised if Virtual Indexes are used for cost comparison purposes and although Virtual Indexes might be  used by the CBO, it might be another story entirely once the index is physically created and the actual index related statistics determined.

Comments»

1. saruamit4 - May 30, 2012

Thanks Richard for such a nice explanation ..

Like

2. Kevin Fries - May 31, 2012

DB2 LUW has something like this and it’s poorly documented, as in I can’t find anything worthwhile. But this is a great idea for my Oracle databases as well and the lessons you’ve imparted are applicable to either.

Like

3. kamallo - June 6, 2012

Hi Richard,
sorry for the off-topic, but I get “403: Access Denied” when I try to see any of the txt files linked here: https://richardfoote.wordpress.com/articles-demos/

Thanks for the blog!


Kamal

Like

Richard Foote - June 8, 2012

Hi Kamallo

Yes, I’m sorry it’s a known issue. Should be fixed before Christmas !!

Like

4. tonyhasler - July 24, 2012

Hi Richard,

Very informative as usual. I have to confess that I can’t understand how a virtual index with no clustering factor can be of much use to a Tuning Advisor. But that may well explain why its recomendations on creating indexes is frequently bogus.

When you say “The index is not “really there” and so at execution time, the SQL statement is reparsed using the next best available plan.” a reader might conclude that if the autotrace plan used no virtual index the reparse would not take place.

Execution plans produced by autotrace are the result of an ‘explain plan’ call followed by DBMS_XPLAN.DISPLAY rather than any call to DBMS_XPLAN.DISPLAY_CURSOR or equivalent. There are lots of reasons (a cached plan from prior to the last stats run being just one of many) why the plan produced by autotrace will differ from the one actually used by the CBO at runtime. In any event, the parse used by autotrace is never used by the CBO for an actual execution regardless of the presence of a a virtual index. I have been burned by this so often that I mentally take the extreme position that if the autotrace plan matches the plan used by the CBO at runtime it is just a happy coincidence!

A call to DBMS_XPLAN.DISPLAY_CURSOR is always more reliable than autotrace and would give you the plan actually used in this case, I am sure.

I’ll get off my soapbox now 🙂

Like


Leave a comment