jump to navigation

Important !! Clustering Factor Calculation Improvement (Fix You) May 8, 2013

Posted by Richard Foote in 11g, ASSM, CBO, Clustering Factor, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
43 comments

Believe me, this article is worth reading :)

I’m currently not allowed to discuss Oracle 12c Database goodies but I am allowed to discuss things perhaps initially intended for 12c that are currently available and already back-ported to 11g. This includes a wonderful improvement in the manageability of how the Clustering Factor (CF) of an index can now be calculated. Many thanks to Martin Decker for pointing this out to me.

As anyone who has attended my Index Seminars will know, the CF of an index is one of the most important statistics used by the Cost Based Optimizer (CBO) in determining the most efficient execution plan. As such, it has always been an issue for me that the manner in which the CF is calculated has been so flawed.

Basically, the CF is calculated by performing a Full Index Scan and looking at the rowid of each index entry. If the table block being referenced differs from that of the previous index entry, the CF is incremented. If the table block being referenced is the same as the previous index entry, the CF is not incremented. So the CF gives an indication of how well ordered the data in the table is in relation to the index entries (which are always sorted and stored in the order of the index entries). The better (lower) the CF, the more efficient it would be to use the index as less table blocks would need to be accessed to retrieve the necessary data via the index.

However, there’s a basic flaw here. The CF calculation doesn’t take into consideration the fact the referenced table block, although maybe different from the previous one index entry, might already have recently been accessed. As such, during an index scan, the table block being accessed is almost certainly still cached in the buffer cache from the previous access, thereby not reducing the effectiveness of the index in any appreciable manner. A classic example of this would be a table with a few freelists. Although the data being inserted is not ordered precisely within the same data blocks, the data might actually be very well clustered within only a few blocks of each other.

Picture a table with 100 rows being inserted by 2 sessions simultaneously, each inserting 50 rows based on an ordered sequence. With one freelist, the data is basically inserted in one block first and then once full a second table block. The data is therefore perfectly ordered/clustered and the CF will evaluate to a value of 2 on such an indexed column. But with 2 freelists, one session could insert data into one block while the other session inserts into a second block, with the ordered sequenced values being randomly distributed among the 2 blocks.  The CF could now potentially evaluate to a value of 100 as the rows are jumbled or “toggled” across the two blocks. This is a much much worse value (2 vs. 100) that can adversely impact the CBO calculations, although the efficiency of such an index is really almost identical as both table blocks are certain to be cached during an index scan regardless.

This is also a very common scenario with Automatic Segment Space Management (ASSM) tablespaces as I’ve discussed previously, which of course is now the default these days.

OK, let’s look at an example scenario. I’ll begin by creating a simple little table, an ordered sequence and a procedure that inserts 100,000 rows into the table:


SQL> create table bowie (id number, text varchar2(30));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> CREATE OR REPLACE PROCEDURE bowie_proc AS

2  BEGIN

3     FOR i IN 1..100000 LOOP

4         INSERT INTO bowie VALUES (bowie_seq.NEXTVAL, 'ZIGGY STARDUST');

5         COMMIT;

6     END LOOP;

7  END;

8  /

Procedure created.

We note the table lives in an ASSM tablespace:


SQL> select table_name, i.tablespace_name, segment_space_management

from dba_tables i, dba_tablespaces t   where i.tablespace_name = t.tablespace_name and table_name='BOWIE';

TABLE_NAME   TABLESPACE_NAME                SEGMEN

------------ ------------------------------ ------

BOWIE        USERS                          AUTO

We next have 3 different sessions that simultaneously run the procedure to load the table. Note that an ordered sequence is used which means the 3 sessions are randomly grabbing the next sequenced value to insert. The data though is basically being inserted in order of the ID column, it’s just that the data is being distributed across a few blocks as we go along the table, rather than strictly one block after the other.


SQL> exec bowie_proc

PL/SQL procedure successfully completed.

Let’s create an index on the ID (sequenced) column and collect fresh statistics:


SQL> create index bowie_id_i on bowie(id);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE',      estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor

2  FROM user_tables t, user_indexes i

3  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR

------------ ------------ ---------- ---------- -----------------

BOWIE        BOWIE_ID_I         1126     300000            241465

We notice that although the data in the table in reality is actually quite well clustered/ordered on the ID column, the actual CF of the index is not reflecting this. At a massive 241,465 it’s an extremely high (bad) CF, much closer in value to rows in the table than the number of table blocks, as the CF calculation keeps flipping back and forth between differing blocks. With such a high CF, the CBO is therefore going to cost an index scan accordingly:


SQL> select * from bowie where id between 42 and 429;

388 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------

| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |       |   389 |  7780 |   310   (1)| 00:00:04 |

|*  1 |  TABLE ACCESS FULL| BOWIE |   389 |  7780 |   310   (1)| 00:00:04 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("ID"<=429 AND "ID">=42)

Statistics

----------------------------------------------------------

0  recursive calls

1  db block gets

1093  consistent gets

0  physical reads

0  redo size

4084  bytes sent via SQL*Net to client

519  bytes received via SQL*Net from client

2  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

388  rows processed

Even though only approx. 0.13% of rows are being accessed and more importantly a similar low percentage of table blocks, the CBO has determined that a Full Table Scan (FTS) is the cheaper alternative. This is an all too familiar scenario, all down to the fact the CF is not accurately reflecting the true clustering of the data and subsequent efficiency of the index.

Finally, at long last, there’s now an official fix for this !!

Bug 13262857 Enh: provide some control over DBMS_STATS index clustering factor computation INDEX describes this scenario and currently has available patches that can be applied on both Exadata databases and Oracle versions 11.1.0.7, 11.2.0.2 and 11.2.0.3. The patches (eg. Patch ID 15830250) describe the fix as addressing “Index Clustering Factor Computation Is Pessimistic“. I couldn’t have described it better myself :)

Once applied (the following demo is on a patched 11.2.0.3 database), there is a new statistics collection preference that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of table blocks we can assume would already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering. The default is 1 (i.e. as performed presently) but can be set up to be a value between 1 and 255, meaning during the collection of index statistics, it will not increment the CF if the table block being referenced by the current index entry has already been referenced by any of the prior 255 index entries (if set to 255). It basically sets the appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering to not increment the CF if the current table block has already been accessed “x” index entries previously.

The TABLE_CACHED_BLOCKS preference can be set by either the DBMS_STATS.SET_TABLE_PREFS, DBMS_STATS.SET_SCHEMA_PREFS or DBMS_STATS.SET_DATABASE_PREFS procedures.

So let’s now change the TABLE_CACHED_BLOCKS preference for this table and re-calculate the index statistics:


SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',

pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor

2  FROM user_tables t, user_indexes i

3  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR

------------ ------------ ---------- ---------- -----------------

BOWIE        BOWIE_ID_I         1126     300000              1035

We notice that the CF has now been significantly reduced (down from 241465 to just 1035), reflecting far more accurately the true clustering of the data when considering the actual effectiveness of using the index.

If we now run the same query as before:


SQL> select * from bowie where id between 42 and 429;

388 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3472402785

------------------------------------------------------------------------------------------

| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |            |   389 |  7780 |     4   (0)|00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      |   389 |  7780 |     4   (0)|00:00:01 |

|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I |   389 |       |     2   (0)|00:00:01 |

------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID">=42 AND "ID"<=429)

Statistics

----------------------------------------------------------

0  recursive calls

0  db block gets

6  consistent gets

0  physical reads

0  redo size

9882  bytes sent via SQL*Net to client

519  bytes received via SQL*Net from client

2  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

388  rows processed

We notice the index is now being selected by the CBO. At a cost of 4 (previously the cost was somewhat greater than the 310 cost of the FTS), this much more accurately reflects the true cost of using the index (notice only 6 consistent gets are performed).

Being able to now set the TABLE_CACHED_BLOCKS preference during statistics collection finally gives us a fully supported and easy method to collect more accurate CF statistics. This in turn can only lead to more informed and accurate decisions by the CBO and ultimately better performing applications. Although available right now via the back ported patches, this will no doubt all be fully documented once the 12c database is finally released.

I can’t recommend enough the use of this new capability :)

Storage Indexes vs Database Indexes IV: 8 Column Limit (Eight Line Poem) May 1, 2013

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
2 comments

As Exadata Storage Indexes (SI) are purely memory only structures located on the Exadata storage servers, care needs to be taken in how much memory they can potentially consume. As a result, there is a limit of 8 columns (or 8 SIs) that can be defined for a given 1M storage region at any point in time, even though SIs are much smaller structures than equivalent database indexes. As database indexes are physical constructs however, there’s no such limit on the number of indexes that can be defined for a table. This can become another key difference between SIs and database indexes.

The following table has more than 8 columns, each with differing numbers of distinct values or distributions of data:

SQL> create table radiohead (id number, col1 number, col2 number, col3 number, col4 number, col5 number, col6 number, col7 number, col8 number, col9 number, col10 number, col11 number, col12 number, some_text varchar2(50));
Table created.

SQL> insert into radiohead select rownum, mod(rownum,10), mod(rownum,100), mod(rownum,1000), mod(rownum,10000), mod (rownum,100000), mod(rownum,1000000), ceil(dbms_random.value(0,10)), ceil(dbms_random.value(0,100)), ceil
 (dbms_random.value(0,1000)), ceil(dbms_random.value(0,10000)), ceil(dbms_random.value(0,100000)), ceil(dbms_random.value (0,1000000)), 'OK COMPUTER' from dual connect by level <=2000000; 2000000 rows created. SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

4000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

8000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'RADIOHEAD', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

Let’s start by running a highly selective query of the ID column:

SQL> select * from radiohead where id = 42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=121212)
 filter("ID"=121212)

If we look at the session statistics, we’ll notice that a SI has been created and saved us physical IOs. Note: If you follow the demo, you’ll need to keep track of these statistics after each query or simply reconnect as a new session to ensure a SI has or has not been used.

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1333.99219
 cell physical IO interconnect bytes returned by smart scan .164878845

OK, let’s now run a selective query on the COL1 column (there are no values 42 in this case and so no rows are returned):

SQL> select * from radiohead where col1=42;
no rows selected

Elapsed: 00:00:00.01

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 52 | 42440 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1 | 52 | 42440 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL1"=42)
 filter("COL1"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2546.90625
 cell physical IO interconnect bytes returned by smart scan .341438293

Again, a SI has been created and used here. The SI in this case has been extremely beneficial because no data exists for 42 (only the values 1 – 10 exist). However, if an existing value were to be selected, the SI would be next to useless as such a value would exist throughout all 1M storage regions. With just 10 distinct randomly distributed values, this SI has the potential to be a waste of time. But while we search for values that don’t exist, it serves a very useful purpose. An important consideration in what’s to come.

If we now run a query now using column COL4:

SQL> select * from radiohead where col4=42;
1600 rows selected.

Elapsed: 00:00:00.68

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=42)
 filter("COL4"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2612.64063 <= + 66MB
 cell physical IO interconnect bytes returned by smart scan 32.3048401

OK, this is the really important one to note. Firstly, yes again a SI has been created and used. Note though that this has not been the first SI to be created or used on this table; SIs have previously been created and used in the previous two queries on different columns. This will also not be the most recent SI to be created, more will soon follow. However, this is by far the least effective use of a SI, because of both the selectivity of the query and distribution of data. Here, only some 66MB or so of physical IOs have been saved.

We now repeat this process, running a query against a different column, being very selective and ensuring a SI is created and used. We finally reach a point when 8 SIs have been created on the table:

SQL> select * from radiohead where col9=0;
no rows selected

Elapsed: 00:00:00.02

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 15984 | 811K| 42561 (1)| 00:08:31 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 15984 | 811K| 42561 (1)| 00:08:31 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL9"=0)
 filter("COL9"=0)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 8792.94531
 cell physical IO interconnect bytes returned by smart scan 45.3872757

OK, we’ve now reached the point where  8 SIs have definitely been created on this table.

If we now run a query that could potentially use a SI but isn’t particularly effective, basically on a par to the one we saw created previously on COL4:

SQL> select * from radiohead where col10=42;
1536 rows selected.

Elapsed: 00:00:00.73

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1600 | 83200 | 42577 (1)| 00:08:31 |
|* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42577 (1)| 00:08:31 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("COL10"=42)
 filter("COL10"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 8792.94531
cell physical IO interconnect bytes returned by smart scan 45.9288864

We notice that no bytes have been saved here via a SI. We can run this query repeatedly and the results will be the same. No SI is created and no bytes are saved. Although Oracle could potentially create a SI and save some work, the fact we already have 8 SIs created for this table means we have already reached the limit on the number of SIs that can be created for this table. 8 is it.

Let’s run another query now using yet another different column (COL12), but this time it’s again a very selective query, much more selective and efficient than the previous query based on COL4 in which a SI had been created:

SQL> select * from radiohead where col12=42;
8 rows selected.

Elapsed: 00:00:00.39

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 19 | 988 | 42607 (1)| 00:08:32 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 19 | 988 | 42607 (1)| 00:08:32 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL12"=42)
 filter("COL12"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 9526.01563
 cell physical IO interconnect bytes returned by smart scan 53.5218124

This time however the number of bytes saved has again gone up from previously meaning that indeed a new SI has been created and used for column COL12. But this makes 9 SIs in total now for this table where the limit should be a maximum of 8 ?

Does this mean that a previously created SI has been dropped and replaced by this new one. If so, which SI is now gone ?

Well, let’s go back to the first SI we created and the one that hasn’t been used for the longest period of time. If we re-run the first query again:

SQL> select * from radiohead where id=42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=42)
 filter("ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844
 cell physical IO interconnect bytes returned by smart scan 53.5264816

We notice that not only has it used a SI, but it has saved the maximum amount of IO bytes possible here meaning there was no “warming up” processes happening here indicating a newly created SI. So not only did it use a SI, it clearly used a previously created one. So this SI was not obviously impacted by the creation of this “9th” SI.

However, if we run the query again that used column COL4, the query that previously used a SI but was by far the least effective in saving physical IOs:

SQL> select * from radiohead where col4 = 4242;
1600 rows selected.

Elapsed: 00:00:00.73

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=4242)
 filter("COL4"=4242)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844 No Change !!
 cell physical IO interconnect bytes returned by smart scan 54.1522598

This time we notice there’s no change to the number of bytes saved by a SI. No matter how often we run this query now, no SI is used. So the SI that was created previously is now gone as a result of creating the more effective SI on the COL12 column. Indeed there are still just the 8 SIs on this table.

So Oracle will indeed limit the number of SIs to 8 for each table/storage region. However, it’s not simply a case of “first in first served” or some such, with Oracle using (undocumented) performance metrics to determine which 8 SIs/columns to choose. This means it might be possible in some rarer scenarios where more than 8 columns get referenced in SQL statements for SIs to come and go depending on changing workloads.  Another example of where database indexes may yet play a role in Exadata environments, where currently tables have more than 8+ indexed columns.

Storage Indexes vs Database Indexes Part III: Hidden Values (Hide Away) February 7, 2013

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
3 comments

OK, my holiday to Hawaii is now slowing fading away into distant memory. Time for a new post :)

In my previous post on differences between Exadata Storage Indexes and Database Indexes Part II, I discussed how the clustering of data within the data is an important factor (pun fully intended !!) in the performance and efficiency of both types of indexes. In this post, I’ll expand a little more on this point and highlight scenarios where a normal database index can significantly improve performance but be totally ineffective as a Storage Index.

Storage Indexes can only determine where required data can’t exist within a (1M default) storage region via the associated Min/Max values of the index. If a required value can’t possibly exist within the Min/Max range boundary, the specific storage region can be ignored and not be accessed during the full scan operation. However, in some scenarios, there can be limitations in the usefulness of only having Min/Max values to work with if both Min and Max values are relatively common and randomly distributed throughout the table. This is not an uncommon scenario when dealing with columns that have data such as code status values.

In the BIG_BOWIE table I’ve been using in this series, I have a FORMAT_ID column that was initially populated with just values 2,4,6,8 and 10, evenly distributed throughout the table. I then updated a relatively small number of these values in the following manner:

SQL> update big_bowie set format_id = 3 where mod(id,10000)=0;
1000 rows updated.

SQL> commit;

Commit complete.

SQL> update big_bowie set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

SQL> commit;

Commit complete.

So I have relatively few FORMAT_ID = 3 (1000 in 10M rows, 0.01% of data) littered throughout the table and a few FORMAT_ID = 5 located in a specific portion/location of the table.

I’ll now create a normal database index on this FORMAT_ID column and a histogram to let the CBO know there are indeed relatively few occurances of values 3 and 5 within the table:

SQL> create index big_bowie_format_id_i on big_bowie(format_id);
Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE', method_opt=> 'FOR COLUMNS FORMAT_ID SIZE 10');

PL/SQL procedure successfully completed.

OK, if we now run a query looking for all the FORMAT_ID = 3:

SQL> select album_id, total_sales from big_bowie where format_id = 3;
1000 rows selected.

Elapsed: 00:00:00.01

Execution Plan
 -----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 -----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 72 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 11000 | 72 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_FORMAT_ID_I | 1000 | | 4 (0)| 00:00:01 |
 -----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("FORMAT_ID"=3)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1072 consistent gets
 0 physical reads
 0 redo size
 22984 bytes sent via SQL*Net to client
 1250 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

We notice  the CBO has used the index to efficiently retrieve just the 0.01% of required rows because of the high selectivity of the query, even though the overall Clustering Factor of the index is appalling.

It does an even better job when accessing FORMAT_ID = 5 as these specific values are all well clustered and  found within in a very tight portion of the table:

SQL> select album_id, total_sales from big_bowie where format_id = 5;
1000 rows selected.

Elapsed: 00:00:00.00

Execution Plan
 -----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 -----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 72 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 11000 | 72 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_FORMAT_ID_I | 1000 | | 4 (0)| 00:00:01 |
 -----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("FORMAT_ID"=5)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 154 consistent gets
 0 physical reads
 0 redo size
 22685 bytes sent via SQL*Net to client
 1250 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

At just 154 consistent gets in this example, the results return almost instantaneously. The database index is extremely effective in these examples because it can point just to the locations within the table where these relatively few rows of interest are located.

Let’s see how Storage Indexes cope in this scenario. First, we make the database index invisible:

SQL> alter index big_bowie_format_id_i invisible;
Index altered.

We now run the same query again (this can be repeated as many times as you like):

SQL> select album_id, total_sales from big_bowie where format_id = 3;
1000 rows selected.

Elapsed: 00:00:00.80

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 36682 (1)| 00:07:21 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 1000 | 11000 | 36682 (1)| 00:07:21 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("FORMAT_ID"=3)
 filter("FORMAT_ID"=3)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 0
 cell physical IO interconnect bytes returned by smart scan .397476196

We notice no matter how often we try, that Storage Indexes have managed to save us precisely nothing. The entire table has had to be scanned and read during the Full Table Scan operation.

SQL> select album_id, total_sales from big_bowie where format_id = 5;
1000 rows selected.

Elapsed: 00:00:00.89

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 36682 (1)| 00:07:21 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 1000 | 11000 | 36682 (1)| 00:07:21 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("FORMAT_ID"=5)
 filter("FORMAT_ID"=5)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 0
 cell physical IO interconnect bytes returned by smart scan .565643311

Same results when accessing the FORMAT_ID = 5. Any possible Storage Indexes have been totally ineffective and the Full Table Scan has had to read the entire table.

The result being performance is significantly slower than with the database indexes visible and in place.

Why ?

The data consists primarily of randomly distributed values 2,4,6,8 and 10, with the odd value 3 and 5 littered around. This means that if created, the Storage Index Min/Max values will effectively be 2 and 10 for all 1M storage regions throughout the entire table.

Therefore, both values 3 and 5 could possibly exist within all/any of the table storage regions as they sit inside the 2 – 10 Min/Max range boundaries. These 3 and 5 values are effectively “hidden” within the Storage Index Min/Max values making Storage Indexes totally ineffective in this scenario, as they can’t skip a thing.

A final point here. If one of these rarer values of interest were to be say value “12”, then a Storage Index could be useful when searching for this particular value as any Min/Max ranges of 2-10 that don’t therefore contain a value of 12 can be skipped. However, if values 3 and 5 were also of interest, the Storage Index would still only be useful in a subset of required cases. A database index would therefore still be required to cater all necessarily cases.

As Database indexes can directly point to indexed values of interest, this is another example of where we would likely not want to just drop a database index in Exadata.

Storage Indexes vs Database Indexes Part II: Clustering Factor (Fast Track) December 19, 2012

Posted by Richard Foote in Clustering Factor, Exadata, Oracle Indexes, Storage Indexes.
2 comments

Two posts in two days !! Well, with Christmas just around the corner, I thought I better finish off a couple of blog posts before I get fully immersed in the festive season :)

The Clustering Factor (CF) is the most important index related statistic, with the efficiency of an index performing multi-row range scans very much dependent on the CF of the index. If the data in the table is relatively well clustered in relation to the index (i.e. it has a “low” CF), then an index range scan can visit relatively few table blocks to obtain the necessary data. If the data is effectively randomised and not well clustered in relation to the index (i.e. has a “high” CF), then an index range scan has to visit many more table blocks and not be as efficient/effective as a result. The CBO will be less inclined to use such an index as a result, depending on the overall selectivity of the query.

It’s something I’ve discussed here many times before.

It’s a very similar story for Exadata Storage Indexes (SI) as well. The better the table data is clustered in relation to the SIs, the more efficient and effective the SIs are likely to be in relation to being able to eliminate accessing storage regions that can’t possibly contain data of interest. By having the data more clustered (or ordered) in relation to a specific SI, the Min/Max ranges associated with the SI are more likely to be able to determine areas of the table where data can’t exist.

For the data I’ll be using in the following examples, I refer you to the previous post where I setup the necessary data.

The following query is on a 10 million row table, based on the ALBUM_ID column that has an excellent CF:

SQL> select * from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:01.07

Execution Plan
 ----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 8984K| 1550 (1)| 00:00:19 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 100K| 8984K| 1550 (1)| 00:00:19 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 100K| | 199 (1)| 00:00:03 |
 ----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1590 consistent gets
 1550 physical reads
 0 redo size
 9689267 bytes sent via SQL*Net to client
 733 bytes received via SQL*Net from client
 21 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 100000 rows processed

An index range scan access path is selected by the CBO to retrieve the 100,000 rows. At just 1590 consistent gets, with such an excellent CF, the index can very efficiently access just the necessary 100,000 rows of data. Notice that most of these consistent gets are still physical reads (1550). If we re-run the query several times and are able to cache the corresponding index/table blocks in the database buffer cache:

SQL> select * from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.27

Execution Plan
 ----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 8984K| 1550 (1)| 00:00:19 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 100K| 8984K| 1550 (1)| 00:00:19 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 100K| | 199 (1)| 00:00:03 |
 ----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1590 consistent gets
 0 physical reads
 0 redo size
 9689267 bytes sent via SQL*Net to client
 733 bytes received via SQL*Net from client
 21 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 100000 rows processed

The overall execution times reduce down from 1.07 to just 0.27 seconds. Not bad at all considering we’re returning 100,000 rows.

However, if we run the same query on the same data with all the smarts turned on in Exadata (with the index made Invisible so that it doesn’t get used by the CBO):

SQL> select * from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.27
 Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 8984K| 36663 (1)| 00:07:20 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 100K| 8984K| 36663 (1)| 00:07:20 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ALBUM_ID"=42)
 filter("ALBUM_ID"=42)

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 134834 consistent gets
 134809 physical reads
 0 redo size
 4345496 bytes sent via SQL*Net to client
 73850 bytes received via SQL*Net from client
 6668 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 100000 rows processed

 SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1042.24219
 cell physical IO interconnect bytes returned by smart scan 9.56161499

We notice that although a Full Table Scan is being performed, the overall performance of the query is practically identical to that of using the index. That’s because the SIs are kicking in here and by saving 1042 MB (approximately 99% of the table), Oracle only has to actually physically access a tiny 1% of the table (basically, the 1% selectivity of the query itself). SIs based on the well clustered ALBUM_ID column are therefore very effective at eliminating the access of unnecessary data.

If we now run a query based on the TOTAL_SALES column in which the data is randomly distributed all over the place and so the associated index has a very poor CF:

SQL> select album_id, artist_id from big_bowie where total_sales between 42 and 142;
2009 rows selected.

Elapsed: 00:00:01.45

Execution Plan
 -------------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time |
 -------------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 2040 | 26520 | 2048 (1)| 00:00:25 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 2040 | 26520 | 2048 (1)| 00:00:25 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_TOTAL_SALES_I | 2040 | | 7 (0)| 00:00:01 |
 -------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("TOTAL_SALES">=42 AND "TOTAL_SALES"<=142)

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 2150 consistent gets
 2005 physical reads
 0 redo size
 43311 bytes sent via SQL*Net to client
 1987 bytes received via SQL*Net from client
 135 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2009 rows processed

We notice that although only 2009 rows are retrieved with this query, 2150 consistent gets have been performed (practically 1 for each row returned) . This is somewhat more than the 1590 consistent gets of the previous example when a full 100,000 rows were returned. Using this index therefore is nowhere near as efficient/effective in retrieving data as was the index in the previous example.

If all this data can be cached in the buffer cache however, we can again improve overall execution times:

SQL> select album_id, artist_id from big_bowie where total_sales between 42 and 142;
2009 rows selected.

Elapsed: 00:00:00.02

Execution Plan
 ------------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time |
 ------------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 2040 | 26520 | 2048 (1)| 00:00:25 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 2040 | 26520 | 2048 (1)| 00:00:25 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_TOTAL_SALES_I | 2040 | | 7 (0)| 00:00:01 |
 ------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("TOTAL_SALES">=42 AND "TOTAL_SALES"<=142)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 2150 consistent gets
 0 physical reads
 0 redo size
 43308 bytes sent via SQL*Net to client
 1987 bytes received via SQL*Net from client
 135 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2009 rows processed

So with all the index/table data now cached, we can return the 2000 odd rows in just 0.02 seconds.

If we now run the same query with the same data in Exadata:

SQL> select album_id, artist_id from big_bowie where total_sales between 42 and 142;
2009 rows selected.

Elapsed: 00:00:01.25

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 2040 | 26520 | 36700 (1)| 00:07:21 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 2040 | 26520 | 36700 (1)| 00:07:21 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42)
 filter("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42)

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 134834 consistent gets
 134809 physical reads
 0 redo size
 47506 bytes sent via SQL*Net to client
 1987 bytes received via SQL*Net from client
 135 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2009 rows processed

 SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 72.65625
 cell physical IO interconnect bytes returned by smart scan .383415222

We noticed that the physical IO bytes saved by the SIs has significantly reduced from the previous example (just 72 MBs down from 1042 MBs), even though at just 2000 odd rows we require much less data than before. In this example, only approximately 7% of table storage need not be accessed, meaning we still have to access a significant 93% of the table as the required data could potentially exist throughout the majority of the table. The poor clustering of the data in relation the TOTAL_SALES column has effectively neutralised the effectiveness of the associated SIs on the TOTAL_SALES column.

Note also that the 1.25 seconds is as good as it gets when performing a FTS with the fully generated SIs in place. In this case, using a fully cached database index access path can outperform the FTS/SI combination and provide a more efficient and ultimately more scalable method of accessing this data. As the required selectively on this column is low enough to warrant the use of a database index despite the poor CF, this is again another example of an index we may not necessarily want to automatically drop when moving to Exadata.

Storage Indexes vs Database Indexes Part I MIN/MAX (Maxwell’s Silver Hammer) December 18, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
add a comment

It’s often stated that in Exadata, you don’t need conventional database indexes anymore as everything runs so damn fast that indexes are simply a waste of time and space. Simply drop all database indexes and things will run just as fast.

Well, not quite …

There are many many scenarios where database indexes are still critical for optimal performance and scalability in an Exadata environment.

Prompted by a question in an earlier post (thanks Paul), I thought I might start looking at some scenarios where dropping a database index in Exadata would not be the best of ideas.

The first example is the scenario where one wants either the minimum/maximum of a column value. As database index entries are always ordered, this can very efficiently be found by traversing down to either the first or last index leaf block within an appropriate index. There’s an execution the INDEX FULL SCAN MIN/MAX execution path by which the CBO can decide to access an index in this manner.

For example, if I create an index on the ID column of my large DWH_BOWIE table and select the MIN(ID):

SQL> create index dwh_bowie_id_i on dwh_bowie(id);
Index created.

SQL> select min(id) from dwh_bowie;

MIN(ID)
 ----------
 1

Elapsed: 00:00:00.01

Execution Plan
 ---------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 6 | 4 (0)| 00:00:01 |
 | 1 | SORT AGGREGATE | | 1 | 6 | | |
 | 2 | INDEX FULL SCAN (MIN/MAX)| DWH_BOWIE_ID_I | 1 | 6 | 4 (0)| 00:00:01 |
 ---------------------------------------------------------------------------------------------

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 4 consistent gets
 3 physical reads
 0 redo size
 525 bytes sent via SQL*Net to client
 524 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed

With only a handful of consistent gets, we have managed to get the minimum ID value extremely quickly and efficiently.

However, if we drop this index and re-run the same query in Exadata numerous times:

SQL> select min(id) from dwh_bowie;
MIN(ID)
 ----------
 1

Elapsed: 00:00:41.31

Execution Plan
 ----------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 6 | 2345K (1)| 07:49:12 |
 | 1 | SORT AGGREGATE | | 1 | 6 | | |
 | 2 | TABLE ACCESS STORAGE FULL| DWH_BOWIE | 640M| 3662M| 2345K (1)| 07:49:12 |
 ----------------------------------------------------------------------------------------

Statist5ics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 8626963 consistent gets
 8625479 physical reads
 0 redo size
 525 bytes sent via SQL*Net to client
 524 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 0
 cell physical IO interconnect bytes returned by smart scan 7644.38194

We notice the query takes considerably longer and that the Storage Indexes have been unable to be of any help with not a single byte saved.

This is because all aggregation type operations are performed at the database level, not within the storage servers. Even though in theory Storage Indexes could be seen as perhaps being of value here, they are totally ignored by Oracle when retrieving either the minimum/maximum of a column value. Remember also that a key difference between a Storage Index and a Database Index is that a Storage Index does not necessarily have to exist fully for all regions of a given table.

Drop database indexes used in this manner at your peril …

Many more examples to come :)

Exadata Storage Indexes Part V: Warming Up (Here Come The Warm Jets) December 3, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
add a comment

As I mentioned in a previous post, there are a number of Similarities between Storage Indexes and Database Indexes.

One of these similarities is the “warming up” process that needs to take place before indexes become “optimal” after either the Storage Server (in the case of Storage Indexes) or the Database Server (in the case of Database Indexes) is restarted.

When a database server is restarted, the contents associated with the database buffer cache is lost and has to be reloaded (as well as other components of the SGA of course). This means for a period of time while the instance “warms up”, there is a spike in physical IO (PIO) activity. Index range scans, which generally greatly benefit from re-using cached data therefore need to perform additional PIOs for both the index blocks and the table blocks they reference. The overheads associated with these additional PIOs in turn slows the overall performance of (especially) these index related execution plans.

As a simple illustration, the following query is executed after a flushed buffer cache:

SQL> select * from big_bowie where album_id = 42;
1000 rows selected.

Elapsed: 00:00:01.99

Execution Plan
 ----------------------------------------------------------
 Plan hash value: 2703824528

----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 75000 | 101 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 75000 | 101 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 1000 | | 1 (0)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)
 Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 1072 consistent gets
 1005 physical reads
 0 redo size
 91231 bytes sent via SQL*Net to client
 1245 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

There is a considerable amount of physical reads associated with having to re-load the contents of the database buffer cache. If we now re-run the query:

SQL> select * from big_bowie where album_id = 42;
1000 rows selected.

Elapsed: 00:00:00.01

Execution Plan
 ----------------------------------------------------------
 Plan hash value: 2703824528

----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 75000 | 101 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 75000 | 101 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 1000 | | 1 (0)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1072 consistent gets
 0 physical reads
 0 redo size
 91231 bytes sent via SQL*Net to client
 1245 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

All the data is now cached and the overall query response time improves considerably. Once a database instance has warmed up and much of the database’s hot, data-set of interest has been cached, PIOs are minimised and performance improves accordingly.

Much has been made of the fact Storage Indexes (SIs) need to be created on the fly but in a similar manner to database indexes, they just need a period of time to warm up and be created/stored in memory to become optimal. The difference being that SIs are actually created on the fly when cached in memory unlike database indexes which physically preexist on disk.

So when an Exadata storage server is re-started, the performance associated with the SIs on the server is initially sluggish as well, while Oracle performs the additional PIOs off disk to create the necessary SIs in memory. Additionally, new queries using new column predicates will also be initially sluggish while any necessary SIs are created and cached in memory. Note that SIs are created on a 1M storage region basis so it could well take a period of time for a SI to be created across all the associated storage regions of the table. There is also a limit of 8 SIs per 1M storage region which might limit whether a SI is actually created or preserved (more on this point in a later post).

The following query is executed on a relatively new table and in a new session to capture fresh session statistics:

SQL> select artist_id, total_sales from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.89

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 1269K| 36663 (1)| 00:07:20 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 100K| 1269K| 36663 (1)| 00:07:20 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ALBUM_ID"=42)
 filter("ALBUM_ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 114.601563
 cell physical IO interconnect bytes returned by smart scan 1.77637482

The amount of physical IO bytes saved by a SI is initially minimal, about 114 MB.

However, when the query is re-executed and the associated SI on the ALBUM_ID column has been fully created:

SQL> select artist_id, total_sales from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.27

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 1269K| 36663 (1)| 00:07:20 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 100K| 1269K| 36663 (1)| 00:07:20 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ALBUM_ID"=42)
 filter("ALBUM_ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1156.84375
 cell physical IO interconnect bytes returned by smart scan 3.41764832

The amount of physical IO bytes saved by SIs increases considerably and overall performance improves as a result.

Restarting a database server will result in sluggish performance for a period until most of the database hot data is cached. This is also the case when an Exadata Storage server is re-started, in part while the necessary SIs are recreated and cached.

I’ll next discuss another similarity, that being the importance of the clustering of data within the table to the efficiency of these indexes.

Exadata Storage Indexes Part IV – Fast Full Table Scans (Speed of Life) November 8, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
2 comments

OK, let’s look at Storage Indexes in action.

But first, following is the setup for the various demos to come. I basically create one table called BIG_BOWIE that’s about 1GB in size and then simply create another table called DWH_BOWIE where the contents of this are re-insert into itself a few times to get to about a 60GB table. The various columns have differing distinct values and distributions of data.

I used an X2-2 1/2 rack as my toy and yes, once people saw the table names instantly knew who created them :)

SQL> create table big_bowie (id number not null, album_id number not null, artist_id number not null,
format_id number, release_date date, total_sales number, description varchar2(100));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> create or replace procedure pop_big_bowie as
  2  begin
  3     for v_album_id in 1..100 loop
  4         for v_artist_id in 1..100000 loop
  5             insert into big_bowie values (bowie_seq.nextval, v_album_id, v_artist_id, ceil(dbms_random.value(0,5)) * 2,
  6               trunc(sysdate-ceil(dbms_random.value(0,10000))), ceil(dbms_random.value(0,500000)),
'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS');
  7         end loop;
  8         commit;
  9     end loop;
 10     commit;
 11  end;
 12  /

Procedure created.

SQL> exec pop_big_bowie

PL/SQL procedure successfully completed.

I modified some of the data to have a few occurrences of some specific data. This will be used on a later post.

SQL> update big_bowie set format_id = 3 where mod(id,10000)=0;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> update big_bowie set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE‘, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select blocks from dba_tables where table_name = 'BIG_BOWIE';

    BLOCKS
----------
    134809

Like I said, another DWH_BOWIE table as basically created with this data re-inserted into itself 6 times to create roughly a 60GB table.

OK, let’s now run an “expensive” query on this bigger table without any of the Exadata smart scan magic enabled:

SQL> alter session set cell_offload_processing=false;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:04:43.38

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    8644512  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

So it takes about 4 mins and 43 secs to return the necessary 266,176 rows from an approx. 60GB table (that’s 8.6 million 8K blocks). Not bad really.

However, if we run the same query with the Exadata smarts enabled:

SQL> alter session set cell_offload_processing=true;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:00:03.97

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS STORAGE FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)
       filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    8626963  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

We see that execution times reduce significantly, down to just 4 secs. So we manage to read via a FTS a 60GB table in under 4 seconds, not bad at all.

A hint that something different might have occurred here is the appearance of the “storage” predicate listing. This basically tells us that a smart scan “might” have occurred. Note though that the “reported” physical reads and consistent gets as reported by the database is basically the same as the previous run. More on this in later posts.

The reason for this improvement is due in large part to the use of Exadata Storage Indexes. As discussed previously, a Storage Index can potentially automatically avoid having to read significant portions of data by determining areas of storage that can’t possibly contain data of interest. This is a classic example of Storage Indexes in operation, putting into practice the notion that the quickest way to do something is to avoid doing as much work as possible.

But how to tell whether a Storage Index really did kick in and how much work did it actually save ?

The V$SQL view now has a number of additional columns that provides useful information:

SQL> select sql_text, io_cell_offload_eligible_bytes, io_interconnect_bytes, io_cell_uncompressed_bytes, io_cell_offload_returned_bytes
     from v$sql where sql_id = 'admdhphbh1a1c';

SQL_TEXT
--------------------------------------------------------------------------------
IO_CELL_OFFLOAD_ELIGIBLE_BYTES IO_INTERCONNECT_BYTES IO_CELL_UNCOMPRESSED_BYTES
------------------------------ --------------------- --------------------------
IO_CELL_OFFLOAD_RETURNED_BYTES
------------------------------
select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200
                    1.4132E+11              55217792                 1.6718E+10
                      55217792

There are also numerous new statistics which begin with ‘cell ‘ that I generally capture before and after a particular operation to see the storage related workloads. Two statistics I find particularly useful are:

SQL> select name , value/1024/1024 MB from v$statname n,  v$mystat s
where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME                                                                MB
----------------------------------------------------------- ----------
cell physical IO bytes saved by storage index               59414.9453
cell physical IO interconnect bytes returned by smart scan   26.329895

The cell physical IO bytes saved by storage index statistic denotes how much storage has not had to be read due to Storage Indexes. So this tells us just how useful Storage Indexes have been in reducing physical IO operations. The cell physical IO interconnect bytes returned by smart scan statistic denotes how much data has actually been returned to the database servers as a result of a smart scan.

As the above numbers highlight (note this last query had been the only activity within the session), the Storage Indexes were highly effective and were able to physically skip reading the vast majority of the table (59,414MB) during the Full Table Scan operation and that only a  relatively small amount data (26MB) had to be returned back to the database servers.

By not having to read most of the data, the resultant Full Table Scan on this relatively large table was completed not in minutes as previously, but in a matter of a few seconds.

The potential power of Storage Indexes …

Exadata Storage Indexes Part III – Similarities With Database Indexes (Same Old Scene) October 15, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
2 comments

As discussed previously, there are quite a number of differences between Storage Indexes (SIs) and Database Indexes (DIs). However, there are also a number similarities between both of them as well.

The obvious one is that they’re both designed specifically to reduce the overheads associated with retrieving the required data out of the database. Both index structures provides a method by which Oracle can avoid having to read every row/block in a table when searching for data of interest. It’s just the actual implementation of this mechanism that differs between the two general index types as I’ve previously discussed.

The efficiency of both index types is very largely dependant upon the clustering of the indexed data within the table (i.e. the index Clustoring Factor). The better the clustering of the related indexed data within the table, the more tightly coupled the required data is likely to be and so the more efficient both index types would be in avoiding accessing unnecessary data. If the required data were to be less well clustered and randomly distributed throughout the table, the less efficient would be both index types in retrieving the necessary data. Some actual examples of this to come in future posts.

Both index types have a period of “warming up” before being fully effective. It’s simply the manner in which this warming up process occurs that differs between the two. Database indexes on a freshly bounced database server initially incur substantial physical I/Os (PIOs) until the related index/table data is cached within the database buffer cache (and indeed in the flash cache). These PIOs can significantly reduce the performance of the SQL plans utilising database indexes. Storages indexes on a freshly bounced storage server need to be recreated and can’t immediately eliminate accessing unnecessary storage regions. This can significantly reduce the performance of Full Table Scans until the associated SIs are fully created. Again, some actual demos on all this to come in future posts.

Both index types can use “Index Combine” like logic and use multiple indexes in combination with each other to further reduce the actual number of table blocks that need to be accessed to retrieve the data of interest. Storage and Bitmap database indexes are especially suited to these types of index combine operations, although B-Tree indexes can also be used in this manner.

Both Oracle index types are really quite interesting and often misunderstood and so meets the general theme of this blog, meaning I can quite happily blog about them without shocking too many people in the process :)

Like I said, more to come …

Exadata Storage Indexes Part II – Differences With Database Indexes (Space Dementia) October 9, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
14 comments

Let’s explore some of the key differences between Storage Indexes (SI) and Database Indexes (DI). In no particular order, they include:

SIs are structures that exist only within the storage servers of an Exadata box, while DIs logically exist and can be accessed within the database servers.

SIs are purely memory only structures while DIs are physical segments that take up storage. As such, DIs are relatively expensive to both create and subsequently maintain as they generate considerable undo and redo within the database, can cause concurrency issues and require storage resources. SIs meanwhile require no physical storage and have little impact on DML operations.

SIs are generated automatically and transparently while DIs generally need to be explicitly created (except in some scenarios such as in the creation of Primary/Unique keys when they can be implicitly created).

SIs being memory only structures are transient in that if a storage server were to be restarted, the corresponding SIs are lost and need to be re-created. Additionally, Oracle may decide to drop a SI for a particular column and create one on a different column depending on current load and conditions. DIs are permanent objects that need to be explicitly dropped (except in some scenarios such as the dropping/disabling of Primary/Unique Key constraints when they can be implicitly dropped).

SIs can be stored in memory as they contain very brief summary information, just the min/max value and a null flag for each 1MB storage region. A corresponding DI (especially a B-Tree) would generally be significantly larger as it needs to store all indexed values from the table with associated rowids (unless compressed but still likely much larger even so).

SIs can index only a portion of a table at a specific point in time as they get generated and dropped (see above). DIs index the entire table/partition (prior to 12c), unless using smarts such as decode function-based indexes (which also index the entire table but based only on the results of the function).

SIs are limited to only 8 columns whereas DIs have no such limitations per table.

SIs reference a 1MB storage region whereas a DI references a specific database block (say 8K). Therefore, a DI is more “focused” in terms of the minimum amount of data that needs to be accessed.

SIs basically work by determining which areas of storage can not possibly contain data of interest, accessing just those storage regions that might contain data of interest. Therefore, it’s quite possible for a SI to generate false positives by having to access storage that might in the end not actually contain data of interest after all. A DI meanwhile via the rowid explicitly points to the exact location of an indexed value and does not generate false positives. A bitmap index is a little different in the manner in which rowids are stored and generated (and can have 0 bits set for rows that don’t actually exist) but again do not generate false positives when actually accessing the table blocks.

SIs are only used during Smart Scan operations, which in turn are only performed during direct-reads of full scans of larger database segments (tables / indexes / materialized views and partitions thereof). Therefore SIs are only used when DIs are not.

As SIs access data during a Smart Scan, the resultant data by-passes the Database Buffer Cache and can not be re-used by subsequent database users/operations. Therefore, SI accessed data may need to be frequently physically re-accessed. DIs perform single block reads (except for Fast Full Index Scans) which are cached in the Database Buffer Cache and which can therefore be globally reused by the database. Once cached, it may be unnecessary to subsequently physically re-access the DI retrieved data.

SIs are used, even if the majority of the data needs to be accessed regardless. As SIs are only used during a FTS, the concept of only using an index when it’s the cheaper alternative doesn’t apply to SIs. If a SI can save (say) just 5% of physical I/Os during a FTS, it’s better than no savings at all. DIs meanwhile are only used when the Cost Based Optimizer (CBO) considers it the cheapest option when accessing data.

As SIs are storage based structures, the CBO has no knowledge of their existence and play no part in the CBO cost calculations. DIs are fully known to the CBO and the DI related statistics are an important factor in the CBO calculations. The CBO only determines whether a FTS is the cheaper alternative, however the decision to perform a Direct-Read operation and so potentially enable the use of SIs is a run-time decision not made by the CBO.

SIs can be effectively used for IS NULL predicates, thanks to the null existence flag component of the SI. B-Tree Indexes can’t if all indexed columns are null (as such entries are not indexed) although Bitmap indexes can.

SIs can not be used to police Primary/Unique constraints. DIs can.

SIs can not be used to avoid performance issues in relation to Foreign Keys (such as locking implications and FTS requirements on child tables when deleting parent rows). DIs can.

SIs can not avoid sort operations. DIs can as data read via an index range scan is guaranteed to be returned in the order of the index entries.

SIs can not provide additional statistical information to the CBO, such as accurate selectivity information in multi-column predicates available in concatenated index distinct keys statistics. DIs can.

SIs can not be used to efficiently access the MIN/MAX of a column. DIs can.

Function-Based SIs are not supported. Function-Based DIs are supported.

SIs can not be treated as smaller tables and used as an alternative by the CBO to access just index related data, eg. select count(*), select indexed_column, etc. as SIs do not contain all the required data and are not visible to the CBO anyways. DIs can be treated as smaller tables and accessed accordingly if appropriate.

OK, that’s enough of a list for now to get one thinking about some of these differences :)

In the following posts, I’ll go through the benefits of SIs and show examples of how they’re implemented and used by Oracle.

Exadata Storage Indexes – Part I (Beginning To See The Light) October 4, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
12 comments

Thought I might discuss Exadata Storage Indexes, explore what they are, how they work and discuss their advantages and disadvantages. Following is but a brief introduction on this very cool Exadata feature.

A Storage Index basically maintains summary information about database table data down on the Exadata storage servers. The information consists of the minimum value, the maximum value and a flag to denote whether any Nulls exist for up to 8 different columns within each 1M storage region of disk.

My little diagram above attempts to describe this (click on image for a larger version).

The Storage Indexes are created automatically and transparently based on the SQL predicate information executed by Oracle and passed down to the storage servers from the database servers. Storage Indexes take up no physical storage of themselves and are built and maintained entirely in memory. As only this very basic summary information is stored for a maximum of 8 columns for each 1M of storage, Storage Indexes are very lightweight and can be created and maintained with minimal general overheads.

So how are they used ?

During an Exadata Smart Scan, Oracle can perform predicate filtering down at the storage layer and so only return just the rows of interest back up to the database. As part of this process, Oracle can use the Storage Indexes to visit just the 1M storage regions that can potentially contain rows of interest. Those storage regions that can’t possibly contain data of interest can be eliminated and not accessed at all during the Smart Scan operation.

So, in a very simple example, if we have an SQL predicate such as WHERE CODE = 5, if a corresponding Storage Index on the CODE column of the first 1M region of the table has MIN=2 and MAX=10, Oracle would need to access this portion of the table as the CODE value of interest could potentially exist here. However, if the next 1M storage region had a CODE Storage Index with a MIN=7 and MAX=12, then the CODE value of 5 can’t possibly exist within this portion of the table and can be ignored and not accessed at all during the Smart Scan.

So depending on the column, predicates and data distribution, a Storage Index can potentially eliminate having to physically access significant portions of a table during a Smart Scan. In an extreme example, on a search of a CODE value = 42 where the maximum CODE value that actually exists is say 35, a Smart Scan can perform a so-called Full Table Scan (FTS) via a Storage Index that doesn’t actually have to perform any physical I/O at all and can ignore the entire table.

The less physical I/O performed, the less work required and the faster the response time. If a FTS of say a 500GB table only had to physically read and access a few MBs here and there AND just return the data of interest back to the database servers, that would significantly improve the overall performance and overheads associated with the FTS.

Storage Indexes can be very cool indeed.

Much more to come …

Indexes: Oracle11g New Features Presentation (Get Back) September 19, 2012

Posted by Richard Foote in 11g, 11g New features, Oracle Indexes.
5 comments

I’m in the early stages of compiling an Oracle12c Indexes New Features presentation so I thought I might make available the 11g version I’ve presented previously at Oracle OpenWorld and InSync conferences:

http://richardfoote.files.wordpress.com/2012/09/oracle-indexing-new-features-oracle-11g-release-1-and.pdf

Enjoy :)

E4 2012 Enkitec Extreme Exadata Expo (Blue Jean) June 14, 2012

Posted by Richard Foote in Advert, Exadata, Oracle Indexes.
6 comments

I’m very pleased to have been invited to speak at the E4 2012 Enkitec Exadata Expo to be held in Dallas, USA on 13-14 August.

It’s basically the first ever conference that’s dedicated exclusively to Oracle’s exciting Exadata platform. It should be a fabulous event, featuring some of the best Oracle talent going around, including Jonathan Lewis, Tanel Poder, Kerry Osbourne, Cary Millsap, Arup Nanda, Fritz Hoogland, Doug Burns, Karen Morton, Maria Colgan and Peter Bach to name but a few. The full list of speakers is listed here.

My talk will naturally be on “Indexing In Exadata“, the abstract being:

There’s often confusion regarding how indexing requirements may change when moving to Exadata, with some even suggesting that indexes are perhaps no longer required at all. Considering indexes can consume a considerable proportion of total storage within a database and can be crucial to general database performance, care needs to be taken to fully consider indexing requirements when moving to Exadata. This presentation will discuss the indexing structures unique to Exadata, how indexing considerations change (and don’t change), how database usage is critical to indexing requirements and how to implement safely an appropriate indexing strategy when migrating to Exadata that will ensure indexes get used when appropriate without compromising Exadata specific features such as Smart Scans and Storage Indexes.”

I’ve had the opportunity to research and have a good play with Exadata since re-joining the Oracle mothership so I’m looking forward to sharing some of what I’ve learnt.

Hopefully, you can make this historic event. If not in person, you might still be able to attend the event in a virtual capacity.

I’m hoping to confirm my attendance at another conference in the very near future. More details soon.

And yes, I’m planning to blog at some point about the challenges of having an appropriate indexing strategy on Exadata, busting a few myths in the making while I’m at it :)

Indexes vs. Full Table Scan: Picture vs. 1000 Words (Pictures Of Lily) June 8, 2012

Posted by Richard Foote in CBO, Clustering Factor, Oracle Indexes.
19 comments

I’m in the process of writing a number of new presentations and in one I’ve included a favorite little graph of mine that I’ve used over the years to help illustrate the relationship between the cost of using an index vs. the cost of using a Full Table Scan (FTS). It’s occurred to me that I’ve never actually shared this graph on this blog, so I thought it about time I did.

The Cost Based Optimizer (CBO) when choosing between an index scan and a FTS will simply go for the cheapest option. The more rows that are retrieved (or the greater the percentage of rows retrieved), the more expensive the index option as it needs to perform more logical I/Os. There will generally be a point when the selectivity of  a query is such, that so many rows are retrieved, that the index costs will increase beyond those of the FTS and the FTS becomes the cheaper option.

The cost of a FTS meanwhile is pretty well constant regardless of  the number of rows retrieved. It needs to read all the blocks in the table, whatever the selectivity of the query.

Although I’ve not quite reached 1000 words, the below graph illustrates this point:

The red line represents the constant cost of the FTS. The green lines represents the cost of using various indexes, which increases as more rows are retrieved. The “steepness” of the green line and the subsequent increase in cost of the index as more rows are retrieved is due entirely to the Clustering Factor of the index. The steeper the line, the worse (higher) the Clustering Factor, the less efficient the index and the quicker we get to the point when the FTS becomes cheaper. The less steep the line, the better (lower) the Clustering Factor, the more efficient the index and the longer it takes for the FTS to become the cheaper option.

In some rarer cases, the index might be so efficient (or the FTS so inefficient) that the index never reaches the point of the FTS and the CBO decides it’s overall cheaper for the index to potentially access 100% of all rows in a table rather than via a FTS.

Ok, so now you have almost 1000 words and the picture :)

Cost of Virtual Indexes (Little Lies) May 30, 2012

Posted by Richard Foote in CBO, Clustering Factor, Fake Indexes, Oracle Indexes, Virtual Indexes.
5 comments

I’ve previously discussed Virtual Indexes and how they can be used to do basic “what if” analysis if such an index really existed. However, a recent comment on the OTN forums regarding using them to compare index costs made me think a follow-up post regarding the dangers of Virtual Indexes might be warranted.

The big advantage of a Virtual Index of course is that it doesn’t really exist and so consumes no storage and can be created extremely quickly/cheaply. The disadvantage of a Virtual index is that it doesn’t really exist and so Oracle can not collect segment level statistics. Without statistics however, the CBO has a very tough time of doing its job properly …

To illustrate, a simple little demo. I begin by creating a table in which the data in the table is stored in CODE column order. An index on the CODE column would therefore have an excellent (very low) Clustering Factor. Note that the Clustering Factor is the most important index related statistic regarding the efficiency and potential cost of using the index.

SQL> create table bowie_ordered (id number, code number, name varchar2(30));

Table created.

SQL> create sequence bowie_seq;

Sequence created.

SQL> declare
  2  begin
  3  for i in 1..100 loop
  4     for j in 1..10000 loop
  5        insert into bowie_ordered values (bowie_seq.nextval, i, 'DAVID BOWIE');
  6     end loop;
  7  end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

OK, I’m now going to create a Virtual Index on the CODE column and collect 100% accurate statistics on the table:

SQL> create index bowie_ordered_i on bowie_ordered(code) nosegment;

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_ORDERED', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Oracle allows statistics to be collected on the table and associated Virtual Index (so that existing statistic gathering jobs won’t now fail), however without an associated segment, no statistics can actually be derived for the index itself.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_name='BOWIE_ORDERED_I';

no rows selected

I’ll list the system statistics so anyone who wants to replicate the demo can get similar results (the database blocksize is 8K):

SQL> SELECT pname, pval1 FROM SYS.AUX_STATS$
     WHERE pname IN ('SREADTIM', 'MREADTIM', 'MBRC', 'CPUSPEED');

PNAME                               PVAL1
------------------------------ ----------
SREADTIM                                2
MREADTIM                               10
CPUSPEED                             1000
MBRC                                   20

If we run the following query:

SQL> set arraysize 5000
SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1678744259

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_ORDERED | 10000 |   195K|  1005  (13)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We note that the CBO uses a Full Table Scan as the CBO has no real choice here as Virtual Indexes are not considered by default.

However, if we change the following hidden parameter and re-run:

SQL> alter session set "_use_nosegment_indexes" = true;

Session altered.

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|        9(0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|        9(0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |        1(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice a few interesting details. Firstly, the CBO has decided to use the Virtual Index however the number of consistent gets remains the same as the previous run so we can clearly see that behind the covers, the Full Table Scan is still performed. The index is not “really there” and so at execution time, the SQL statement is reparsed using the next best available plan.

If we look at the execution plan costs, both the estimate row (10000) and byte values are spot on as these statistics are based on the underlining table/column statistics and the 100 distinct CODE values are evenly distributed. However, the index related costs look remarkably low. Just a cost of 1 to read the index and extract 10,000 index entries (that means an index entry is less than 1 byte in length on average !!). Just a cost of 9 to visit the table and read 10,000 rows. Even with the most efficient of physical indexes, these costings are not realistic and are based on highly questionable default metrics.

Basically, the creation of this Virtual Column is telling us that there is no reason why the index couldn’t potentially be used, IF (that’s a big IF in case no-one noticed) the actual index related statistics are such that the CBO determines the index to be the cheaper option. But it depends on the actual characteristics of the index which can’t be accurately determined until it’s been physically created.

As the Virtual Index suggests the index might be used if it existed, let’s now create it for real:

SQL> drop index bowie_ordered_i;

Index dropped.

SQL> create index bowie_ordered_i on bowie_ordered(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_ORDERED_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_ORDERED_I                   1000000              3546

As predicted, a Clustering Factor of 3546 on an index with 1M index entries is indeed nice and low.

If we now re-run the query again:

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|       60(4)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|       60(4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |       23(5)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         61  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

Indeed, the index has been used by the CBO. However, note that the costs are substantially higher (and more accurate) than previously suggested with the Virtual Index. Indeed the final cost of 60 is very close to the number of consistent gets (61) required by the execution plan and so suggests the CBO is making reasonable index based calculations here.

OK, another demo, but this time with a table in which the CODE values are distributed throughout the whole table (rather than being perfectly clustered together as in the first example):

SQL> create table bowie_random (id number, code number, name varchar2(30));

Table created.

SQL> insert into bowie_random select rownum, mod(rownum,100)+1, 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_RANDOM', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

When we now create a Virtual Index based on the CODE column and re-run the same query:

SQL> create index bowie_random_i on bowie_random(code) nosegment;

Index created.

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice that both the execution plan and all the associated costs are identical to those of the previous example. So although the actual Clustering Factor of the index is likely to be dramatically greater here than it was in the previous example and so likely dramatically impact the costs associated with using this index, the Virtual Index is treated and costed identically. This is the simple consequence of not having the physical index structure by which to calculate the appropriate segment statistics.

If we now physically create this index for real:

SQL> drop index bowie_random_i;

Index dropped.

SQL> create index bowie_random_i on bowie_random(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_RANDOM_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_RANDOM_I                    1000000            344700

We can see that indeed the Clustering Factor is dramatically worse than before, increasing here from 3546 to 344700.

If we now re-run the query:

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1983602984

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_RANDOM | 10000 |   195K|  1005  (13)| 00:00:03 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We see the CBO has decided to perform the now cheaper Full Table Scan. Although the Virtual Index on this column was used, once the actual characteristics of the index are determined via the index statistics, the CBO has decided the actual physical index was just too expensive to use to retrieve the 1% of rows.

If we re-run the query with an index hint:

SQL> select /*+ index (bowie_random) */ * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|  3483   (1)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|  3483   (1)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |    23   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3472  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

The index is used but we can see why at a cost of 3483, the Full Table Scan at a cost of only 1005 was selected by the CBO.

Virtual Indexes can be useful to quickly determine whether an index is a viable option if it were to be actually created. However, caution needs to be exercised if Virtual Indexes are used for cost comparison purposes and although Virtual Indexes might be  used by the CBO, it might be another story entirely once the index is physically created and the actual index related statistics determined.

Index Rebuild – Does it use the Index or the Table ? (Nothing Touches Me) May 15, 2012

Posted by Richard Foote in Index Rebuild, Oracle Indexes, Secondary Indexes.
10 comments

A common question that gets asked is does Oracle access the index itself or the parent table during an index rebuild to extract the necessary data for the index ? Thought it might be worth a blog post to discuss.

Now if the index is currently in an UNUSABLE state, then Oracle clearly can’t use the existing index during the index rebuild operation. So we’ll assume both table and index are hunky dory.

OK, to setup the first demo (using 11.2.0.1), we create and populate a table and index with the index being somewhat smaller than the parent table as is most common:

SQL> create table bowie (id number, code number, name1 varchar2(30), name2 varchar2(30), name3 varchar2(30), name4 varchar2(30), name5 varchar2(30), name6 varchar2(30), name7 varchar2(30), name8 varchar2(30), name9 varchar2(30), name10 varchar2(30));

Table created.

SQL> insert into bowie select rownum, mod(rownum, 100), 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE','DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_code_i on bowie(code);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>null, cascade=> true);

PL/SQL procedure successfully completed.

If we look at the corresponding size of table and index:

SQL> select table_name, blocks from dba_tables where table_name = 'BOWIE';

TABLE_NAME                         BLOCKS
------------------------------ ----------
BOWIE                               19277

SQL> select index_name, leaf_blocks from dba_indexes where index_name = 'BOWIE_CODE_I';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE_CODE_I                          1948

As is common, the table is somewhat larger than the corresponding index.

Now in my first demo, I’m just going to perform a normal offline Index Rebuild. I’ll however trace the session to see what might be happening behind the scenes (the good old alter session set events ‘10046 trace name context forever, level 12′; still does the job). I’ll also flush the buffer cache as well to ensure the trace file shows me which blocks from which object get accessed.

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

There’s lots of information of interest in the resultant trace file, well, for someone with an unhealthy interest in Oracle indexes anyways :) However, the portion that’s of direct interest in this discussion is to see which object Oracle accesses in order to read the necessary data for the index rebuild. The trace file will contain a relatively extensive section with the following wait events (the following is just a short sample):

WAIT #6: nam=’db file scattered read’ ela= 933 file#=4 block#=79339 blocks=5 obj#=75737 tim=20402099526
WAIT #6: nam=’db file scattered read’ ela= 1016 file#=4 block#=79344 blocks=8 obj#=75737 tim=20402102334
WAIT #6: nam=’db file scattered read’ ela= 978 file#=4 block#=79353 blocks=7 obj#=75737 tim=20402106904
WAIT #6: nam=’db file scattered read’ ela= 9519 file#=4 block#=80000 blocks=8 obj#=75737 tim=20402119605
WAIT #6: nam=’db file scattered read’ ela= 2800 file#=4 block#=80009 blocks=7 obj#=75737 tim=20402131869

….

If we query the database for the identity of object 75737:

SQL> select object_name from dba_objects where object_id = 75737;

OBJECT_NAME
-----------------------

BOWIE_CODE_I

We can see that Oracle has accessed the data from the Index itself, using multi-block reads. As the index is the smallest segment that contains the necessary data, Oracle can very efficiently read all the required data (the expensive bit) from the index itself, perform a sort of all the data (as a multi-block read will not return the data in a sorted format) and complete the rebuild process relatively quickly. Note the table is locked throughout the entire index rebuild operation preventing DML operations on the table/index and so for an offline index rebuild, Oracle can access the Index segment without complication.

I’m going to repeat the same process but this time perform an Online index rebuild operation:

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie_code_i rebuild online;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

We notice this time there are many more wait events than previously and that another object is referenced:

WAIT #5: nam=’db file scattered read’ ela= 8259 file#=4 block#=5635 blocks=5 obj#=75736 tim=4520179453
WAIT #5: nam=’db file scattered read’ ela= 1656 file#=4 block#=5640 blocks=8 obj#=75736 tim=4520181368
WAIT #5: nam=’db file scattered read’ ela= 891 file#=4 block#=5649 blocks=7 obj#=75736 tim=4520182459
WAIT #5: nam=’db file scattered read’ ela= 886 file#=4 block#=5656 blocks=8 obj#=75736 tim=4520183544
WAIT #5: nam=’db file scattered read’ ela= 827 file#=4 block#=5665 blocks=7 obj#=75736 tim=4520184579

SQL> select object_name from dba_objects where object_id = 75736;

OBJECT_NAME
-------------------------

BOWIE

This time, the much larger BOWIE parent table has been accessed. So with an Online rebuild, Oracle is forced to use the parent table to access the data for the rebuild operation due to the concurrency issues associated with changes being permitted to the underlying table/index during the rebuild process. So although an online index rebuild has availability advantages, it comes at the cost of having to access the parent table which can result in much additional I/O operations. So if you don’t have availability concerns, an offline index rebuild is probably going to be the more efficient option.

In fact, Oracle can be quite clever in deciding which object to access with an offline rebuild …

In this next example, I’m going to create another table/index, only this time the index is somewhat larger than the parent table. This scenario is less common but certainly possible depending on circumstances:

SQL> create table bowie2 (id number, code number, name varchar2(30));

Table created.

SQL> insert into bowie2 select rownum, mod(rownum,100), 'DAVID BOWIE' from dual connect by level<= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie2_code_i on bowie2(code) pctfree 90;

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=>null, cascade=> true);

PL/SQL procedure successfully completed.

SQL> select table_name, blocks from dba_tables where table_name = 'BOWIE2';

TABLE_NAME                         BLOCKS
------------------------------ ----------
BOWIE2                               3520

SQL> select index_name, leaf_blocks from dba_indexes where index_name = 'BOWIE2_CODE_I';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE2_CODE_I                        21726

So the index is indeed much larger than the table. Which object will Oracle access now during an offline rebuild ?

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie2_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

A look at the trace file reveals:

WAIT #15: nam=’db file scattered read’ ela= 2278 file#=4 block#=81723 blocks=5 obj#=75744 tim=8570990574
WAIT #15: nam=’db file scattered read’ ela= 2733 file#=4 block#=81728 blocks=8 obj#=75744 tim=8570994765
WAIT #15: nam=’db file scattered read’ ela= 2398 file#=4 block#=81737 blocks=7 obj#=75744 tim=8570999057
WAIT #15: nam=’db file scattered read’ ela= 2661 file#=4 block#=81744 blocks=8 obj#=75744 tim=8571003369
WAIT #15: nam=’db file scattered read’ ela= 1918 file#=4 block#=81753 blocks=7 obj#=75744 tim=8571006709

SQL> select object_name from dba_objects where object_id = 75744;

OBJECT_NAME
----------------------------

BOWIE2

In this case, the smaller table segment is accessed. So during an offline rebuild, Oracle will access either the table or index, depending on which one is smaller and cheaper to read.

What if we now create another index that also contains the CODE column which is smaller than both the table and the existing index.

SQL> create index bowie2_code_id_i on bowie2(code, id);

Index created.

SQL> select index_name, leaf_blocks from dba_indexes where index_name = 'BOWIE2_CODE_ID_I';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE2_CODE_ID_I                      2642

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie2_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

A look at the trace file reveals:

WAIT #6: nam=’db file scattered read’ ela= 2070 file#=4 block#=85179 blocks=5 obj#=75747 tim=8925949081
WAIT #6: nam=’db file scattered read’ ela= 2864 file#=4 block#=85184 blocks=8 obj#=75747 tim=8925957161
WAIT #6: nam=’db file scattered read’ ela= 2605 file#=4 block#=85193 blocks=7 obj#=75747 tim=8925969901
WAIT #6: nam=’db file scattered read’ ela= 10636 file#=4 block#=85536 blocks=8 obj#=75747 tim=8925989726
WAIT #6: nam=’db file scattered read’ ela= 2188 file#=4 block#=85545 blocks=7 obj#=75747 tim=8925996890

SQL> select object_name from dba_objects where object_id = 75747;

OBJECT_NAME
------------------------------

BOWIE2_CODE_ID_I

In this case, the smaller alterative index is actually accessed. So it might not be the table or the index being rebuilt that gets accessed, but the smallest segment that contains the data of interest which in this case is another index entirely.

My final little demo brings me back to the subject of secondary indexes on Index Organized Tables (IOTs) I’ve been recently discussing. In this example, I create an IOT and a much smaller secondary index:

SQL> create table bowie3 (id number constraint bowie_pk primary key, code number, name1 varchar2(30), name2 varchar2(30), name3 varchar2(30), name4 varchar2(30), name5 varchar2 (30), name6 varchar2(30), name7 varchar2(30), name8 varchar2(30), name9 varchar2(30), name10 varchar2(30)) organization index;

Table created.

SQL> insert into bowie3 select rownum, mod(rownum, 100), 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE','DAVID BOWIE','DAVID BOWIE', 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie3_code_i on bowie3(code);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE3', estimate_percent=>null, cascade=> true);

PL/SQL procedure successfully completed.

SQL> select index_name, leaf_blocks from dba_indexes where table_name = 'BOWIE3';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE_PK                             16950
BOWIE3_CODE_I                         2782

So the secondary index is much smaller. However, if I rebuild it offline:

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie3_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

A look at the trace file reveals:

WAIT #5: nam=’db file scattered read’ ela= 13019 file#=4 block#=217856 blocks=4 obj#=75733 tim=8949436015
WAIT #5: nam=’db file scattered read’ ela= 1869 file#=4 block#=72915 blocks=5 obj#=75733 tim=8949438360
WAIT #5: nam=’db file scattered read’ ela= 3023 file#=4 block#=72920 blocks=8 obj#=75733 tim=8949442877
WAIT #5: nam=’db file scattered read’ ela= 2381 file#=4 block#=72929 blocks=7 obj#=75733 tim=8949448410
WAIT #5: nam=’db file scattered read’ ela= 2613 file#=4 block#=72936 blocks=8 obj#=75733 tim=8949453521

SQL> select object_name from dba_objects where object_id = 75733;

OBJECT_NAME
---------------------------

BOWIE_PK

In this case, we see that the much larger IOT PK segment is accessed and not the smaller secondary index. When rebuilding the secondary index of an IOT, Oracle has no choice but to access the parent IOT PK segment itself as of course the secondary index doesn’t contain all the necessary information required for the index rebuild operation. The physical guess component within the secondary index might be stale and the only way for Oracle to determine the correct current address of all the rows is to access the IOT PK segment. This is another disadvantage of secondary indexes associated with IOTs, even offline index rebuilds must access the potentially much larger IOT PK segment in order to ensure the correctness of the physical guess components of the logical rowids.

So the general answer of whether an index rebuild accesses the table or index is that it depends and that it could very well be neither of them …

Follow

Get every new post delivered to your Inbox.

Join 1,918 other followers