jump to navigation

Clustering Factor Calculation Improvement Part II (Blocks On Blocks) May 14, 2013

Posted by Richard Foote in 11g, Clustering Factor, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
6 comments

My previous post on the new TABLE_CACHED_BLOCKS statistics gathering preference certainly generated some interest 🙂 My blog hits for the week have gone off the charts !!

One of the concerns raised by this new capability was that setting such a preference might result in really unrealistic and inaccurate Clustering Factor (CF) values, especially for those tables that truly have appalling CFs. Although there are certainly some dangers, Oracle has limited the possible “abuse” by ensuring TABLE_CACHED_BLOCKS can only be set to a maximum of 255. This means Oracle will only ignore a maximum of 255 table blocks that have recently been accessed during the CF calculation. For larger tables with truly randomised data patterns, not even the maximum 255 setting if utilised will make an appreciable difference to the final CF.

A couple of examples to demonstrate.

The first table is a relatively “large” table that has a DOB column that is effectively randomised throughout the table. There are approximately 20,000 different DOB values in a 2 million row table (so each DOB occurs approximately 100 times, give or take).

SQL> create table major_tom (id number, DOB date, text varchar2(30));

Table created.

SQL> insert into major_tom select rownum,  sysdate-trunc(dbms_random.value(0, 20000)), 'DAVID BOWIE' from dual connectby level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

Let’s now create an index on this DOB column and have a look at the CF:

SQL> create index major_tom_dob_i on major_tom(dob);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'MAJOR_TOM', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1988164

So at 1,988,164, the CF is terrible. This is as expected as the DOB values are all randomised throughout the table. The index is not being used as we had hope (naively) so let’s use the new TABLE_CACHED_BLOCKS preference to now improve the calculated CF by setting it to the maximum 255 setting and recalculate the index statistics:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'MAJOR_TOM',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'MAJOR_TOM_DOB_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1941946

We notice that although the CF has improved marginally, at whopping 1,941,946 it’s still terrible and has made no real appreciable difference. Why ?

Well let’s do some basic maths here. There are 9077 blocks in the table and the next DOB referenced in the index can potentially be in any one of them. Therefore, the chances of the next DOB being in one of the 255 previously accessed table blocks is only 255/9077 x 100 = approximately 2.8%. So in only 2.8% of the time is the CF likely to not be incremented and so the CF is only likely to drop by around this 2.8% amount.

Let’s check. (1988164 – 1941946)/1988164 x 100  indeed does equal approximately 2.8%.

So statistically with such a poor CF on such a “large” table, to limit the CF calculation if any of the last 255 table blocks are referenced is only going to improve things by 2.8% on average. Effectively of no real use at all.

Another example now, but this time with a CODE column with just 100 distinct values that are randomly distributed throughout another reasonable “large” 2 million row table. For those mathematically challenged, that means each value occurs approximately 20,000 times, give or take:

SQL> create table ziggy (id number, code number, text varchar2(30));

Table created.

SQL> insert into ziggy select rownum,  trunc(dbms_random.value(0, 100)), 'DAVID
BOWIE' from dual connect by level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy_code_i on ziggy(code);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY', estimate_percent=> null, cascade=> true,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

So at 662,962 it’s what I would describe as a “poor to average” CF. It’s not particularly great with there being just  7,048 table blocks but it’s still some distance from the 2,000,000 row value.

The index is not being used in SQL statements as we (naively) wish, so let’s try and improve things by lowering the index CF by setting the new TABLE_CACHED_BLOCKS preference to the maximum 255 setting:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'ZIGGY',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'ZIGGY_CODE_I',
estimate_percent=>null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

We notice to our great disappointment (well, not really) that the CF remains completely unchanged at 662,962 !! Why ?

Again, let’s do some basic maths and consider the data distribution.

The table has some 7048 blocks but each distinct CODE value has some 20,000 occurrences on average. Therefore, each value is going to be found 20000/7048 = roughly 2 to 3 times per block. As the index is in CODE order and for each CODE in rowid order, the CF is going to increment for each CODE value for each distinct block we visit. We will therefore only go back to a previously visited table block (except for the 2 to 3 visits to the current block) when the CODE value changes but this will take us all the way back to the first block which is always going to be some 7047 blocks away from the current one. As 7047 is much greater than the 255 the CF calculation will only cater for, the CF is going to remain unchanged from the default calculation as a result.

And this is all as it should be, as the fundamental CF is indeed poor for these scenarios and even going back the maximum 255 data blocks will not reduce appreciably the manner in which the CF is calculated.

Of course, if there was no limit, then a setting of TABLE_CACHED_BLOCKS  of say 7100 would enable the CF to be recalculated as being perfect in the above scenario, which would indeed be a concern. But 255 is the limit and so limits the potential “damaged” that can be done.

More on all this to come 🙂

Important !! Clustering Factor Calculation Improvement (Fix You) May 8, 2013

Posted by Richard Foote in 11g, ASSM, CBO, Clustering Factor, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
57 comments

Believe me, this article is worth reading 🙂

I’m currently not allowed to discuss Oracle 12c Database goodies but I am allowed to discuss things perhaps initially intended for 12c that are currently available and already back-ported to 11g. This includes a wonderful improvement in the manageability of how the Clustering Factor (CF) of an index can now be calculated. Many thanks to Martin Decker for pointing this out to me.

As anyone who has attended my Index Seminars will know, the CF of an index is one of the most important statistics used by the Cost Based Optimizer (CBO) in determining the most efficient execution plan. As such, it has always been an issue for me that the manner in which the CF is calculated has been so flawed.

Basically, the CF is calculated by performing a Full Index Scan and looking at the rowid of each index entry. If the table block being referenced differs from that of the previous index entry, the CF is incremented. If the table block being referenced is the same as the previous index entry, the CF is not incremented. So the CF gives an indication of how well ordered the data in the table is in relation to the index entries (which are always sorted and stored in the order of the index entries). The better (lower) the CF, the more efficient it would be to use the index as less table blocks would need to be accessed to retrieve the necessary data via the index.

However, there’s a basic flaw here. The CF calculation doesn’t take into consideration the fact the referenced table block, although maybe different from the previous one index entry, might already have recently been accessed. As such, during an index scan, the table block being accessed is almost certainly still cached in the buffer cache from the previous access, thereby not reducing the effectiveness of the index in any appreciable manner. A classic example of this would be a table with a few freelists. Although the data being inserted is not ordered precisely within the same data blocks, the data might actually be very well clustered within only a few blocks of each other.

Picture a table with 100 rows being inserted by 2 sessions simultaneously, each inserting 50 rows based on an ordered sequence. With one freelist, the data is basically inserted in one block first and then once full a second table block. The data is therefore perfectly ordered/clustered and the CF will evaluate to a value of 2 on such an indexed column. But with 2 freelists, one session could insert data into one block while the other session inserts into a second block, with the ordered sequenced values being randomly distributed among the 2 blocks.  The CF could now potentially evaluate to a value of 100 as the rows are jumbled or “toggled” across the two blocks. This is a much much worse value (2 vs. 100) that can adversely impact the CBO calculations, although the efficiency of such an index is really almost identical as both table blocks are certain to be cached during an index scan regardless.

This is also a very common scenario with Automatic Segment Space Management (ASSM) tablespaces as I’ve discussed previously, which of course is now the default these days.

OK, let’s look at an example scenario. I’ll begin by creating a simple little table, an ordered sequence and a procedure that inserts 100,000 rows into the table:


SQL> create table bowie (id number, text varchar2(30));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> CREATE OR REPLACE PROCEDURE bowie_proc AS

2  BEGIN

3     FOR i IN 1..100000 LOOP

4         INSERT INTO bowie VALUES (bowie_seq.NEXTVAL, 'ZIGGY STARDUST');

5         COMMIT;

6     END LOOP;

7  END;

8  /

Procedure created.

We note the table lives in an ASSM tablespace:


SQL> select table_name, i.tablespace_name, segment_space_management

from dba_tables i, dba_tablespaces t   where i.tablespace_name = t.tablespace_name and table_name='BOWIE';

TABLE_NAME   TABLESPACE_NAME                SEGMEN

------------ ------------------------------ ------

BOWIE        USERS                          AUTO

We next have 3 different sessions that simultaneously run the procedure to load the table. Note that an ordered sequence is used which means the 3 sessions are randomly grabbing the next sequenced value to insert. The data though is basically being inserted in order of the ID column, it’s just that the data is being distributed across a few blocks as we go along the table, rather than strictly one block after the other.


SQL> exec bowie_proc

PL/SQL procedure successfully completed.

Let’s create an index on the ID (sequenced) column and collect fresh statistics:


SQL> create index bowie_id_i on bowie(id);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE',      estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor

2  FROM user_tables t, user_indexes i

3  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR

------------ ------------ ---------- ---------- -----------------

BOWIE        BOWIE_ID_I         1126     300000            241465

We notice that although the data in the table in reality is actually quite well clustered/ordered on the ID column, the actual CF of the index is not reflecting this. At a massive 241,465 it’s an extremely high (bad) CF, much closer in value to rows in the table than the number of table blocks, as the CF calculation keeps flipping back and forth between differing blocks. With such a high CF, the CBO is therefore going to cost an index scan accordingly:


SQL> select * from bowie where id between 42 and 429;

388 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------

| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |       |   389 |  7780 |   310   (1)| 00:00:04 |

|*  1 |  TABLE ACCESS FULL| BOWIE |   389 |  7780 |   310   (1)| 00:00:04 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("ID"<=429 AND "ID">=42)

Statistics

----------------------------------------------------------

0  recursive calls

1  db block gets

1093  consistent gets

0  physical reads

0  redo size

4084  bytes sent via SQL*Net to client

519  bytes received via SQL*Net from client

2  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

388  rows processed

Even though only approx. 0.13% of rows are being accessed and more importantly a similar low percentage of table blocks, the CBO has determined that a Full Table Scan (FTS) is the cheaper alternative. This is an all too familiar scenario, all down to the fact the CF is not accurately reflecting the true clustering of the data and subsequent efficiency of the index.

Finally, at long last, there’s now an official fix for this !!

Bug 13262857 Enh: provide some control over DBMS_STATS index clustering factor computation INDEX describes this scenario and currently has available patches that can be applied on both Exadata databases and Oracle versions 11.1.0.7, 11.2.0.2 and 11.2.0.3. The patches (eg. Patch ID 15830250) describe the fix as addressing “Index Clustering Factor Computation Is Pessimistic“. I couldn’t have described it better myself 🙂

Once applied (the following demo is on a patched 11.2.0.3 database), there is a new statistics collection preference that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of table blocks we can assume would already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering. The default is 1 (i.e. as performed presently) but can be set up to be a value between 1 and 255, meaning during the collection of index statistics, it will not increment the CF if the table block being referenced by the current index entry has already been referenced by any of the prior 255 index entries (if set to 255). It basically sets the appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering to not increment the CF if the current table block has already been accessed “x” index entries previously.

The TABLE_CACHED_BLOCKS preference can be set by either the DBMS_STATS.SET_TABLE_PREFS, DBMS_STATS.SET_SCHEMA_PREFS or DBMS_STATS.SET_DATABASE_PREFS procedures.

So let’s now change the TABLE_CACHED_BLOCKS preference for this table and re-calculate the index statistics:


SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',

pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor

2  FROM user_tables t, user_indexes i

3  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR

------------ ------------ ---------- ---------- -----------------

BOWIE        BOWIE_ID_I         1126     300000              1035

We notice that the CF has now been significantly reduced (down from 241465 to just 1035), reflecting far more accurately the true clustering of the data when considering the actual effectiveness of using the index.

If we now run the same query as before:


SQL> select * from bowie where id between 42 and 429;

388 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3472402785

------------------------------------------------------------------------------------------

| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |            |   389 |  7780 |     4   (0)|00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      |   389 |  7780 |     4   (0)|00:00:01 |

|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I |   389 |       |     2   (0)|00:00:01 |

------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID">=42 AND "ID"<=429)

Statistics

----------------------------------------------------------

0  recursive calls

0  db block gets

6  consistent gets

0  physical reads

0  redo size

9882  bytes sent via SQL*Net to client

519  bytes received via SQL*Net from client

2  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

388  rows processed

We notice the index is now being selected by the CBO. At a cost of 4 (previously the cost was somewhat greater than the 310 cost of the FTS), this much more accurately reflects the true cost of using the index (notice only 6 consistent gets are performed).

Being able to now set the TABLE_CACHED_BLOCKS preference during statistics collection finally gives us a fully supported and easy method to collect more accurate CF statistics. This in turn can only lead to more informed and accurate decisions by the CBO and ultimately better performing applications. Although available right now via the back ported patches, this will no doubt all be fully documented once the 12c database is finally released.

I can’t recommend enough the use of this new capability 🙂

Storage Indexes vs Database Indexes IV: 8 Column Limit (Eight Line Poem) May 1, 2013

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
2 comments

As Exadata Storage Indexes (SI) are purely memory only structures located on the Exadata storage servers, care needs to be taken in how much memory they can potentially consume. As a result, there is a limit of 8 columns (or 8 SIs) that can be defined for a given 1M storage region at any point in time, even though SIs are much smaller structures than equivalent database indexes. As database indexes are physical constructs however, there’s no such limit on the number of indexes that can be defined for a table. This can become another key difference between SIs and database indexes.

The following table has more than 8 columns, each with differing numbers of distinct values or distributions of data:

SQL> create table radiohead (id number, col1 number, col2 number, col3 number, col4 number, col5 number, col6 number, col7 number, col8 number, col9 number, col10 number, col11 number, col12 number, some_text varchar2(50));
Table created.

SQL> insert into radiohead select rownum, mod(rownum,10), mod(rownum,100), mod(rownum,1000), mod(rownum,10000), mod (rownum,100000), mod(rownum,1000000), ceil(dbms_random.value(0,10)), ceil(dbms_random.value(0,100)), ceil
 (dbms_random.value(0,1000)), ceil(dbms_random.value(0,10000)), ceil(dbms_random.value(0,100000)), ceil(dbms_random.value (0,1000000)), 'OK COMPUTER' from dual connect by level <=2000000; 2000000 rows created. SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

4000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

8000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'RADIOHEAD', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

Let’s start by running a highly selective query of the ID column:

SQL> select * from radiohead where id = 42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=121212)
 filter("ID"=121212)

If we look at the session statistics, we’ll notice that a SI has been created and saved us physical IOs. Note: If you follow the demo, you’ll need to keep track of these statistics after each query or simply reconnect as a new session to ensure a SI has or has not been used.

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1333.99219
 cell physical IO interconnect bytes returned by smart scan .164878845

OK, let’s now run a selective query on the COL1 column (there are no values 42 in this case and so no rows are returned):

SQL> select * from radiohead where col1=42;
no rows selected

Elapsed: 00:00:00.01

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 52 | 42440 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1 | 52 | 42440 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL1"=42)
 filter("COL1"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2546.90625
 cell physical IO interconnect bytes returned by smart scan .341438293

Again, a SI has been created and used here. The SI in this case has been extremely beneficial because no data exists for 42 (only the values 1 – 10 exist). However, if an existing value were to be selected, the SI would be next to useless as such a value would exist throughout all 1M storage regions. With just 10 distinct randomly distributed values, this SI has the potential to be a waste of time. But while we search for values that don’t exist, it serves a very useful purpose. An important consideration in what’s to come.

If we now run a query now using column COL4:

SQL> select * from radiohead where col4=42;
1600 rows selected.

Elapsed: 00:00:00.68

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=42)
 filter("COL4"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2612.64063 <= + 66MB
 cell physical IO interconnect bytes returned by smart scan 32.3048401

OK, this is the really important one to note. Firstly, yes again a SI has been created and used. Note though that this has not been the first SI to be created or used on this table; SIs have previously been created and used in the previous two queries on different columns. This will also not be the most recent SI to be created, more will soon follow. However, this is by far the least effective use of a SI, because of both the selectivity of the query and distribution of data. Here, only some 66MB or so of physical IOs have been saved.

We now repeat this process, running a query against a different column, being very selective and ensuring a SI is created and used. We finally reach a point when 8 SIs have been created on the table:

SQL> select * from radiohead where col9=0;
no rows selected

Elapsed: 00:00:00.02

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 15984 | 811K| 42561 (1)| 00:08:31 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 15984 | 811K| 42561 (1)| 00:08:31 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL9"=0)
 filter("COL9"=0)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 8792.94531
 cell physical IO interconnect bytes returned by smart scan 45.3872757

OK, we’ve now reached the point where  8 SIs have definitely been created on this table.

If we now run a query that could potentially use a SI but isn’t particularly effective, basically on a par to the one we saw created previously on COL4:

SQL> select * from radiohead where col10=42;
1536 rows selected.

Elapsed: 00:00:00.73

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1600 | 83200 | 42577 (1)| 00:08:31 |
|* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42577 (1)| 00:08:31 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("COL10"=42)
 filter("COL10"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 8792.94531
cell physical IO interconnect bytes returned by smart scan 45.9288864

We notice that no bytes have been saved here via a SI. We can run this query repeatedly and the results will be the same. No SI is created and no bytes are saved. Although Oracle could potentially create a SI and save some work, the fact we already have 8 SIs created for this table means we have already reached the limit on the number of SIs that can be created for this table. 8 is it.

Let’s run another query now using yet another different column (COL12), but this time it’s again a very selective query, much more selective and efficient than the previous query based on COL4 in which a SI had been created:

SQL> select * from radiohead where col12=42;
8 rows selected.

Elapsed: 00:00:00.39

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 19 | 988 | 42607 (1)| 00:08:32 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 19 | 988 | 42607 (1)| 00:08:32 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL12"=42)
 filter("COL12"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 9526.01563
 cell physical IO interconnect bytes returned by smart scan 53.5218124

This time however the number of bytes saved has again gone up from previously meaning that indeed a new SI has been created and used for column COL12. But this makes 9 SIs in total now for this table where the limit should be a maximum of 8 ?

Does this mean that a previously created SI has been dropped and replaced by this new one. If so, which SI is now gone ?

Well, let’s go back to the first SI we created and the one that hasn’t been used for the longest period of time. If we re-run the first query again:

SQL> select * from radiohead where id=42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=42)
 filter("ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844
 cell physical IO interconnect bytes returned by smart scan 53.5264816

We notice that not only has it used a SI, but it has saved the maximum amount of IO bytes possible here meaning there was no “warming up” processes happening here indicating a newly created SI. So not only did it use a SI, it clearly used a previously created one. So this SI was not obviously impacted by the creation of this “9th” SI.

However, if we run the query again that used column COL4, the query that previously used a SI but was by far the least effective in saving physical IOs:

SQL> select * from radiohead where col4 = 4242;
1600 rows selected.

Elapsed: 00:00:00.73

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=4242)
 filter("COL4"=4242)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844 No Change !!
 cell physical IO interconnect bytes returned by smart scan 54.1522598

This time we notice there’s no change to the number of bytes saved by a SI. No matter how often we run this query now, no SI is used. So the SI that was created previously is now gone as a result of creating the more effective SI on the COL12 column. Indeed there are still just the 8 SIs on this table.

So Oracle will indeed limit the number of SIs to 8 for each table/storage region. However, it’s not simply a case of “first in first served” or some such, with Oracle using (undocumented) performance metrics to determine which 8 SIs/columns to choose. This means it might be possible in some rarer scenarios where more than 8 columns get referenced in SQL statements for SIs to come and go depending on changing workloads.  Another example of where database indexes may yet play a role in Exadata environments, where currently tables have more than 8+ indexed columns.