jump to navigation

Cost of Virtual Indexes (Little Lies) May 30, 2012

Posted by Richard Foote in CBO, Clustering Factor, Fake Indexes, Oracle Indexes, Virtual Indexes.
5 comments

I’ve previously discussed Virtual Indexes and how they can be used to do basic “what if” analysis if such an index really existed. However, a recent comment on the OTN forums regarding using them to compare index costs made me think a follow-up post regarding the dangers of Virtual Indexes might be warranted.

The big advantage of a Virtual Index of course is that it doesn’t really exist and so consumes no storage and can be created extremely quickly/cheaply. The disadvantage of a Virtual index is that it doesn’t really exist and so Oracle can not collect segment level statistics. Without statistics however, the CBO has a very tough time of doing its job properly …

To illustrate, a simple little demo. I begin by creating a table in which the data in the table is stored in CODE column order. An index on the CODE column would therefore have an excellent (very low) Clustering Factor. Note that the Clustering Factor is the most important index related statistic regarding the efficiency and potential cost of using the index.

SQL> create table bowie_ordered (id number, code number, name varchar2(30));

Table created.

SQL> create sequence bowie_seq;

Sequence created.

SQL> declare
  2  begin
  3  for i in 1..100 loop
  4     for j in 1..10000 loop
  5        insert into bowie_ordered values (bowie_seq.nextval, i, 'DAVID BOWIE');
  6     end loop;
  7  end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

OK, I’m now going to create a Virtual Index on the CODE column and collect 100% accurate statistics on the table:

SQL> create index bowie_ordered_i on bowie_ordered(code) nosegment;

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_ORDERED', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Oracle allows statistics to be collected on the table and associated Virtual Index (so that existing statistic gathering jobs won’t now fail), however without an associated segment, no statistics can actually be derived for the index itself.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_name='BOWIE_ORDERED_I';

no rows selected

I’ll list the system statistics so anyone who wants to replicate the demo can get similar results (the database blocksize is 8K):

SQL> SELECT pname, pval1 FROM SYS.AUX_STATS$
     WHERE pname IN ('SREADTIM', 'MREADTIM', 'MBRC', 'CPUSPEED');

PNAME                               PVAL1
------------------------------ ----------
SREADTIM                                2
MREADTIM                               10
CPUSPEED                             1000
MBRC                                   20

If we run the following query:

SQL> set arraysize 5000
SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1678744259

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_ORDERED | 10000 |   195K|  1005  (13)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We note that the CBO uses a Full Table Scan as the CBO has no real choice here as Virtual Indexes are not considered by default.

However, if we change the following hidden parameter and re-run:

SQL> alter session set "_use_nosegment_indexes" = true;

Session altered.

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|        9(0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|        9(0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |        1(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice a few interesting details. Firstly, the CBO has decided to use the Virtual Index however the number of consistent gets remains the same as the previous run so we can clearly see that behind the covers, the Full Table Scan is still performed. The index is not “really there” and so at execution time, the SQL statement is reparsed using the next best available plan.

If we look at the execution plan costs, both the estimate row (10000) and byte values are spot on as these statistics are based on the underlining table/column statistics and the 100 distinct CODE values are evenly distributed. However, the index related costs look remarkably low. Just a cost of 1 to read the index and extract 10,000 index entries (that means an index entry is less than 1 byte in length on average !!). Just a cost of 9 to visit the table and read 10,000 rows. Even with the most efficient of physical indexes, these costings are not realistic and are based on highly questionable default metrics.

Basically, the creation of this Virtual Column is telling us that there is no reason why the index couldn’t potentially be used, IF (that’s a big IF in case no-one noticed) the actual index related statistics are such that the CBO determines the index to be the cheaper option. But it depends on the actual characteristics of the index which can’t be accurately determined until it’s been physically created.

As the Virtual Index suggests the index might be used if it existed, let’s now create it for real:

SQL> drop index bowie_ordered_i;

Index dropped.

SQL> create index bowie_ordered_i on bowie_ordered(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_ORDERED_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_ORDERED_I                   1000000              3546

As predicted, a Clustering Factor of 3546 on an index with 1M index entries is indeed nice and low.

If we now re-run the query again:

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|       60(4)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|       60(4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |       23(5)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         61  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

Indeed, the index has been used by the CBO. However, note that the costs are substantially higher (and more accurate) than previously suggested with the Virtual Index. Indeed the final cost of 60 is very close to the number of consistent gets (61) required by the execution plan and so suggests the CBO is making reasonable index based calculations here.

OK, another demo, but this time with a table in which the CODE values are distributed throughout the whole table (rather than being perfectly clustered together as in the first example):

SQL> create table bowie_random (id number, code number, name varchar2(30));

Table created.

SQL> insert into bowie_random select rownum, mod(rownum,100)+1, 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_RANDOM', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

When we now create a Virtual Index based on the CODE column and re-run the same query:

SQL> create index bowie_random_i on bowie_random(code) nosegment;

Index created.

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice that both the execution plan and all the associated costs are identical to those of the previous example. So although the actual Clustering Factor of the index is likely to be dramatically greater here than it was in the previous example and so likely dramatically impact the costs associated with using this index, the Virtual Index is treated and costed identically. This is the simple consequence of not having the physical index structure by which to calculate the appropriate segment statistics.

If we now physically create this index for real:

SQL> drop index bowie_random_i;

Index dropped.

SQL> create index bowie_random_i on bowie_random(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_RANDOM_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_RANDOM_I                    1000000            344700

We can see that indeed the Clustering Factor is dramatically worse than before, increasing here from 3546 to 344700.

If we now re-run the query:

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1983602984

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_RANDOM | 10000 |   195K|  1005  (13)| 00:00:03 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We see the CBO has decided to perform the now cheaper Full Table Scan. Although the Virtual Index on this column was used, once the actual characteristics of the index are determined via the index statistics, the CBO has decided the actual physical index was just too expensive to use to retrieve the 1% of rows.

If we re-run the query with an index hint:

SQL> select /*+ index (bowie_random) */ * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|  3483   (1)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|  3483   (1)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |    23   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3472  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

The index is used but we can see why at a cost of 3483, the Full Table Scan at a cost of only 1005 was selected by the CBO.

Virtual Indexes can be useful to quickly determine whether an index is a viable option if it were to be actually created. However, caution needs to be exercised if Virtual Indexes are used for cost comparison purposes and although Virtual Indexes might be  used by the CBO, it might be another story entirely once the index is physically created and the actual index related statistics determined.

Introduction to Fake / Virtual / NOSEGMENT Indexes January 11, 2008

Posted by Richard Foote in Fake Indexes, Index Access Path, NOSEGMENT Option, Oracle Cost Based Optimizer, Oracle Indexes, Virtual Indexes.
9 comments

OK, as promised, answer to index fact #5 you may not have known:

“It’s possible to make the CBO reference and use within an execution plan indexes that don’t in actual fact exist”.

Before I start, please note this feature is not officially documented other than the odd Metalink note and requires the setting of an undocumented parameter to work, so please exercise caution.

Fake Indexes (also known as Virtual or Nosegment Indexes) have been around for a long time, since 8i days. They’re used primarily by Oracle Enterprise Manager and its Tuning Pack which has various wizards that can do “what if” type analysis. One of these is the Index Wizard which can kinda “pretend” to create an index and see what the Cost Based Optimizer might do if such an index really existed.

It’s possible to create these Fake indexes manually by using the NOSEGMENT clause when creating an index:

CREATE INDEX Bowie_idx ON Bowie_Table(Ziggy) NOSEGMENT;

This will populate some (but not many) DD related tables but will not actually create an index segment or consume any actual storage. It’s not maintained in any way by DML operations on the parent table and it can’t be altered or rebuilt as can a conventional, “real” index (it will generate an ORA-08114 error if you try to do so). You can analyze or run dbms_stats over the index but the index is not treated as analyzed as such (as can be seen via a 10053 trace).

It’s also only visible to the CBO, if and only if a session has the following parameter set:

ALTER SESSION SET “_use_nosegment_indexes” = true;

The CBO will now consider the index and potentially include it within an execution plan. However, at execution time Oracle can of course not use the index and will revert to the next best thing.

A Fake index is basically an index you have when you don’t really have an index in order to see if it could be useful if it really existed.

This Fake Indexes Demo shows how they work and can be used.

1 down, 7 to go … ;)

Follow

Get every new post delivered to your Inbox.

Join 1,820 other followers