jump to navigation

Rebuilding Indexes: Danger With Clustering Factor Calculation (Chilly Down) July 17, 2018

Posted by Richard Foote in CBO, Clustering Factor, Data Clustering, Index Rebuild, Oracle Indexes, TABLE_CACHED_BLOCKS.
add a comment

chilly down

Let me start by saying if you don’t already following Jonathan Lewis’s excellent Oracle blog, do yourself a favour. In a recent article, Jonathan highlighted a danger with rebuilding indexes (or indeed creating an index) when used in relation to collecting index statistics with the TABLE_CACHED_BLOCKS preference.

I’ve discussed the importance of the TABLE_CACHED_BLOCKS statistics collection preference a number of times previously, but the issue discussed by Jonathan is worth repeating here.

Let me start by repeating a demo I’ve used previously, by creating a table stored in an ASSM tablespace with data that is well clustered, but reported as being badly clustered due to how the Clustering Factor (CF) is calculated by default.

Firstly, I create a simple table and sequence and run a procedure that populates the table with a monotonically increasing ID column  populated via the sequence. But importantly, the procedure is executed concurrently from 3 separate sessions such that the monotonically increasing ID values are not stored in the table in precisely ID order as each of the 3 sessions inserts rows into different sets of table blocks:

SQL> create table bowie_assm (id number, name varchar2(42));

Table created.

SQL> create sequence bowie_assm_seq order;

Sequence created.

SQL> create or replace procedure pop_bowie_assm as
2 begin
3 for i in 1..100000 loop
4 insert into bowie_assm values (bowie_assm_seq.nextval, 'DAVID BOWIE');
5 commit;
6 end loop;
7 end;
8 /

Procedure created.

The following is executed concurrently in 3 different sessions:

SQL> exec pop_bowie_assm

PL/SQL procedure successfully completed.

If you can imagine 3 different blocks within the table, block one has rows with ID values 1,4,7,10,13,16…, block two has rows with ID values 2,5,8,11,14,17… and block three has rows with ID values 3,6,9,12,15,18…

So the data is well clustered in that the data for a large number of consecutive IDs are stored within a few blocks, but they’re not stored precisely in ID order within the table.

If we now create an index on the ID column and look at the Clustering Factor (CF) of the index:

SQL> create index bowie_assm_id_i on bowie_assm(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

We note the calculated CF is extremely poor at 219416 (a value much closer to the number of index entries than the number of blocks in the table) as the default calculation notes that most index entries have a rowid that points to a different table block to the previous index entry rowid.

If we run a query that only requires a moderate number of rows (approx. 0.13% of the table) to be returned:

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows | Bytes | Cost (%CPU) |  Time     |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            |  389 |  6613 |    282 (11) |  00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE_ASSM |  389 |  6613 |    282 (11) |  00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
 974 consistent gets
   0 physical reads
   0 redo size
8869 bytes sent via SQL*Net to client
 883 bytes received via SQL*Net from client
  27 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
 388 rows processed

We note the CBO decides to use a Full Table Scan (FTS) as the index is too costly and inefficient to use with such a poor CF value.

However, if say retrieving 100 rows, the CBO thinks it needs to visit many more table blocks than the 3 blocks that in actual fact contain the 100 rows of interest.

The TABLE_CACHED_BLOCKS statistics preference allows us to modify how the CF is calculated by not incrementing the CF value if an index rowid points to a block that was visited just TABLE_CACHED_BLOCKS ago.

If we now re-calculate the CF but with the TABLE_CACHED_BLOCKS preference set to say 42:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE_ASSM', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ASSM_ID_I',estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000               909

We notice the CF has dropped significantly, down to just 909 from its previous 219416 value.

If we now re-run the same query as before:

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                 |  389 |  6613 |       4 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_ASSM      |  389 |  6613 |       4 (0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_ASSM_ID_I |  389 |       |       2 (0) |  00:00:01 |
-------------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
   6  consistent gets
   0  physical reads
   0  redo size
8734  bytes sent via SQL*Net to client
 608  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   0  sorts (memory)
   0  sorts (disk)
 388  rows processed

We notice the CBO now automatically decides to use the index and more importantly, that at just 6 consistent gets, the query is now much more efficient as a result.

The index was always the more efficient access method, but because of the poor CF that was previously calculated, the CBO got it wrong. Now that a more “accurate” CF is calculated, all is now well.

However, if we now decide to rebuild this index:

alter index bowie_assm_id_i rebuild;

Index altered.

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows | Bytes | Cost (%CPU) | Time      |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            |  389 |  6613 |    282 (11) |  00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE_ASSM |  389 |  6613 |    282 (11) |  00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   3 recursive calls
   0 db block gets
 956 consistent gets
   0 physical reads
   0 redo size
4094 bytes sent via SQL*Net to client
 608 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
 388 rows processed

So we’re back to the less efficient FTS. Why ? A look at the CF reveals the problem:

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

When the index is rebuilt and so when the index statistics are implicitly recalculated, the TABLE_CACHED_BLOCKS preference is ignored. This applies even if this preference is set at the schema or database level:

SQL> exec dbms_stats.set_schema_prefs(ownname=>user, pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_database_prefs(pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> alter index bowie_assm_id_i rebuild online;

Index altered.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

This issue also applies when an index is newly created, any TABLE_CACHED_BLOCKS setting is ignored, until the time when statistics are again collected via DBMS_STATS:

SQL> drop index bowie_assm_id_i;

Index dropped.

SQL> create index bowie_assm_id_i on bowie_assm(id);

Index created.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ASSM_ID_I',estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000               909

This is currently being investigation by Oracle as unpublished bug 28292026.

Again, another example of the dangers of blindly rebuilding indexes without a valid justification…

Advertisements

Clustering Factor Calculation Improvement Part III (Too Much Rope) June 4, 2013

Posted by Richard Foote in 11g, CBO, Clustering Factor, Index statistics, TABLE_CACHED_BLOCKS.
9 comments

In my previous post, I discussed how for larger tables that really do have randomised data, not even setting the new TABLE_CACHED_BLOCKS to its 255 maximum value is going to make any real difference to the resultant Clustering Factor (CF) of an index. The 255 maximum value here protects us somewhat from abusing this capability and setting unrealistic CF values.

However, for smaller tables in particular, we do need to exercise some caution.

In the following example, we’re only creating a relatively small table and associated index with a CODE column that is randomly distributed throughout the table:

SQL> create table bowie (id number, code number, text varchar2(30));
Table created.

SQL> insert into bowie select rownum,  trunc(dbms_random.value(0, 100)), 'DAVID BOWIE'
from dual connect by level <= 70000;

70000 rows created.

SQL> commit;

Commit complete.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true,
     method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> create index bowie_code_i on bowie(code);

Index created.

If we look at the CF of this index:

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ ------------ ---------- ---------- -----------------
BOWIE        BOWIE_CODE_I        244      70000             22711

We notice that at 22,711, the CF it’s pretty average. The table though is quite small at only 244 blocks.

If we run a simple query:

SQL> select * from bowie where code between 42 and 44;
2050 rows selected.

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  2814 | 56280 |    65   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  2814 | 56280 |    65   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter('CODE'<=44 AND 'CODE'>=42)

Statistics
----------------------------------------------------------

0  recursive calls
1  db block gets
254  consistent gets
0  physical reads
0  redo size
25044  bytes sent via SQL*Net to client
519  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2050  rows processed

The CBO goes for a Full Table Scan. This is not unexpected as we likely have to visit all 244 blocks anyways to fetch the required 2050 rows due to the CODE data being so randomly distributed throughout the table. In a naive attempt to improve things, we decide to improve the CF by setting the TABLE_CACHED_BLOCKS to the maximum 255 value:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I';

TABLE_NAME   INDEX_NAME         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ -------------- ---------- ---------- -----------------
BOWIE        BOWIE_CODE_I          244      70000               240

Indeed, we have improved the CF, dramatically reducing it down to just 240 from the previous 22711. Even though the column value for the CODE really is randomly distributed throughout the table, the CF now suggests the data is perfectly clustered. This is because with a table with only 244 blocks, incrementing the CF if the current index entry references a table block more than 255 blocks ago is now impossible. The CF is now guaranteed to be “perfect” as each index entry can only reference one of the 244 table blocks and so is incremented only when each table block is referenced the first time.

This dramatic reduction in the CF will certainly make the index more attractive to the CBO. But is this really a good thing:

SQL> select * from bowie where code between 42 and 44;
2050 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1602289932
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  2814 | 56280 |    17   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |  2814 | 56280 |    17   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_CODE_I |  2814 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access('CODE'>=42 AND 'CODE'<=44)

Statistics
----------------------------------------------------------

0  recursive calls
0  db block gets
681  consistent gets
0  physical reads
0  redo size
20895  bytes sent via SQL*Net to client
519  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2050  rows processed

The CBO is now using the index, but is now performing significantly more consistent gets, 681 rather than the previous 254. This execution plan is actually much less efficient than the previous FTS execution plan. The CBO is getting this wrong now as the CF isn’t really anywhere near as good as it’s now being lead to believe.

Caution setting TABLE_CACHED_BLOCKS to a value that is anywhere close to the number of blocks in the table. This is one of the reasons for Oracle having a 1% of table blocks default value for this setting.

Clustering Factor Calculation Improvement Part II (Blocks On Blocks) May 14, 2013

Posted by Richard Foote in 11g, Clustering Factor, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
6 comments

My previous post on the new TABLE_CACHED_BLOCKS statistics gathering preference certainly generated some interest 🙂 My blog hits for the week have gone off the charts !!

One of the concerns raised by this new capability was that setting such a preference might result in really unrealistic and inaccurate Clustering Factor (CF) values, especially for those tables that truly have appalling CFs. Although there are certainly some dangers, Oracle has limited the possible “abuse” by ensuring TABLE_CACHED_BLOCKS can only be set to a maximum of 255. This means Oracle will only ignore a maximum of 255 table blocks that have recently been accessed during the CF calculation. For larger tables with truly randomised data patterns, not even the maximum 255 setting if utilised will make an appreciable difference to the final CF.

A couple of examples to demonstrate.

The first table is a relatively “large” table that has a DOB column that is effectively randomised throughout the table. There are approximately 20,000 different DOB values in a 2 million row table (so each DOB occurs approximately 100 times, give or take).

SQL> create table major_tom (id number, DOB date, text varchar2(30));

Table created.

SQL> insert into major_tom select rownum,  sysdate-trunc(dbms_random.value(0, 20000)), 'DAVID BOWIE' from dual connectby level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

Let’s now create an index on this DOB column and have a look at the CF:

SQL> create index major_tom_dob_i on major_tom(dob);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'MAJOR_TOM', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1988164

So at 1,988,164, the CF is terrible. This is as expected as the DOB values are all randomised throughout the table. The index is not being used as we had hope (naively) so let’s use the new TABLE_CACHED_BLOCKS preference to now improve the calculated CF by setting it to the maximum 255 setting and recalculate the index statistics:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'MAJOR_TOM',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'MAJOR_TOM_DOB_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1941946

We notice that although the CF has improved marginally, at whopping 1,941,946 it’s still terrible and has made no real appreciable difference. Why ?

Well let’s do some basic maths here. There are 9077 blocks in the table and the next DOB referenced in the index can potentially be in any one of them. Therefore, the chances of the next DOB being in one of the 255 previously accessed table blocks is only 255/9077 x 100 = approximately 2.8%. So in only 2.8% of the time is the CF likely to not be incremented and so the CF is only likely to drop by around this 2.8% amount.

Let’s check. (1988164 – 1941946)/1988164 x 100  indeed does equal approximately 2.8%.

So statistically with such a poor CF on such a “large” table, to limit the CF calculation if any of the last 255 table blocks are referenced is only going to improve things by 2.8% on average. Effectively of no real use at all.

Another example now, but this time with a CODE column with just 100 distinct values that are randomly distributed throughout another reasonable “large” 2 million row table. For those mathematically challenged, that means each value occurs approximately 20,000 times, give or take:

SQL> create table ziggy (id number, code number, text varchar2(30));

Table created.

SQL> insert into ziggy select rownum,  trunc(dbms_random.value(0, 100)), 'DAVID
BOWIE' from dual connect by level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy_code_i on ziggy(code);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY', estimate_percent=> null, cascade=> true,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

So at 662,962 it’s what I would describe as a “poor to average” CF. It’s not particularly great with there being just  7,048 table blocks but it’s still some distance from the 2,000,000 row value.

The index is not being used in SQL statements as we (naively) wish, so let’s try and improve things by lowering the index CF by setting the new TABLE_CACHED_BLOCKS preference to the maximum 255 setting:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'ZIGGY',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'ZIGGY_CODE_I',
estimate_percent=>null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

We notice to our great disappointment (well, not really) that the CF remains completely unchanged at 662,962 !! Why ?

Again, let’s do some basic maths and consider the data distribution.

The table has some 7048 blocks but each distinct CODE value has some 20,000 occurrences on average. Therefore, each value is going to be found 20000/7048 = roughly 2 to 3 times per block. As the index is in CODE order and for each CODE in rowid order, the CF is going to increment for each CODE value for each distinct block we visit. We will therefore only go back to a previously visited table block (except for the 2 to 3 visits to the current block) when the CODE value changes but this will take us all the way back to the first block which is always going to be some 7047 blocks away from the current one. As 7047 is much greater than the 255 the CF calculation will only cater for, the CF is going to remain unchanged from the default calculation as a result.

And this is all as it should be, as the fundamental CF is indeed poor for these scenarios and even going back the maximum 255 data blocks will not reduce appreciably the manner in which the CF is calculated.

Of course, if there was no limit, then a setting of TABLE_CACHED_BLOCKS  of say 7100 would enable the CF to be recalculated as being perfect in the above scenario, which would indeed be a concern. But 255 is the limit and so limits the potential “damaged” that can be done.

More on all this to come 🙂

Important !! Clustering Factor Calculation Improvement (Fix You) May 8, 2013

Posted by Richard Foote in 11g, ASSM, CBO, Clustering Factor, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
55 comments

Believe me, this article is worth reading 🙂

I’m currently not allowed to discuss Oracle 12c Database goodies but I am allowed to discuss things perhaps initially intended for 12c that are currently available and already back-ported to 11g. This includes a wonderful improvement in the manageability of how the Clustering Factor (CF) of an index can now be calculated. Many thanks to Martin Decker for pointing this out to me.

As anyone who has attended my Index Seminars will know, the CF of an index is one of the most important statistics used by the Cost Based Optimizer (CBO) in determining the most efficient execution plan. As such, it has always been an issue for me that the manner in which the CF is calculated has been so flawed.

Basically, the CF is calculated by performing a Full Index Scan and looking at the rowid of each index entry. If the table block being referenced differs from that of the previous index entry, the CF is incremented. If the table block being referenced is the same as the previous index entry, the CF is not incremented. So the CF gives an indication of how well ordered the data in the table is in relation to the index entries (which are always sorted and stored in the order of the index entries). The better (lower) the CF, the more efficient it would be to use the index as less table blocks would need to be accessed to retrieve the necessary data via the index.

However, there’s a basic flaw here. The CF calculation doesn’t take into consideration the fact the referenced table block, although maybe different from the previous one index entry, might already have recently been accessed. As such, during an index scan, the table block being accessed is almost certainly still cached in the buffer cache from the previous access, thereby not reducing the effectiveness of the index in any appreciable manner. A classic example of this would be a table with a few freelists. Although the data being inserted is not ordered precisely within the same data blocks, the data might actually be very well clustered within only a few blocks of each other.

Picture a table with 100 rows being inserted by 2 sessions simultaneously, each inserting 50 rows based on an ordered sequence. With one freelist, the data is basically inserted in one block first and then once full a second table block. The data is therefore perfectly ordered/clustered and the CF will evaluate to a value of 2 on such an indexed column. But with 2 freelists, one session could insert data into one block while the other session inserts into a second block, with the ordered sequenced values being randomly distributed among the 2 blocks.  The CF could now potentially evaluate to a value of 100 as the rows are jumbled or “toggled” across the two blocks. This is a much much worse value (2 vs. 100) that can adversely impact the CBO calculations, although the efficiency of such an index is really almost identical as both table blocks are certain to be cached during an index scan regardless.

This is also a very common scenario with Automatic Segment Space Management (ASSM) tablespaces as I’ve discussed previously, which of course is now the default these days.

OK, let’s look at an example scenario. I’ll begin by creating a simple little table, an ordered sequence and a procedure that inserts 100,000 rows into the table:


SQL> create table bowie (id number, text varchar2(30));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> CREATE OR REPLACE PROCEDURE bowie_proc AS

2  BEGIN

3     FOR i IN 1..100000 LOOP

4         INSERT INTO bowie VALUES (bowie_seq.NEXTVAL, 'ZIGGY STARDUST');

5         COMMIT;

6     END LOOP;

7  END;

8  /

Procedure created.

We note the table lives in an ASSM tablespace:


SQL> select table_name, i.tablespace_name, segment_space_management

from dba_tables i, dba_tablespaces t   where i.tablespace_name = t.tablespace_name and table_name='BOWIE';

TABLE_NAME   TABLESPACE_NAME                SEGMEN

------------ ------------------------------ ------

BOWIE        USERS                          AUTO

We next have 3 different sessions that simultaneously run the procedure to load the table. Note that an ordered sequence is used which means the 3 sessions are randomly grabbing the next sequenced value to insert. The data though is basically being inserted in order of the ID column, it’s just that the data is being distributed across a few blocks as we go along the table, rather than strictly one block after the other.


SQL> exec bowie_proc

PL/SQL procedure successfully completed.

Let’s create an index on the ID (sequenced) column and collect fresh statistics:


SQL> create index bowie_id_i on bowie(id);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE',      estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor

2  FROM user_tables t, user_indexes i

3  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR

------------ ------------ ---------- ---------- -----------------

BOWIE        BOWIE_ID_I         1126     300000            241465

We notice that although the data in the table in reality is actually quite well clustered/ordered on the ID column, the actual CF of the index is not reflecting this. At a massive 241,465 it’s an extremely high (bad) CF, much closer in value to rows in the table than the number of table blocks, as the CF calculation keeps flipping back and forth between differing blocks. With such a high CF, the CBO is therefore going to cost an index scan accordingly:


SQL> select * from bowie where id between 42 and 429;

388 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------

| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT  |       |   389 |  7780 |   310   (1)| 00:00:04 |

|*  1 |  TABLE ACCESS FULL| BOWIE |   389 |  7780 |   310   (1)| 00:00:04 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("ID"<=429 AND "ID">=42)

Statistics

----------------------------------------------------------

0  recursive calls

1  db block gets

1093  consistent gets

0  physical reads

0  redo size

4084  bytes sent via SQL*Net to client

519  bytes received via SQL*Net from client

2  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

388  rows processed

Even though only approx. 0.13% of rows are being accessed and more importantly a similar low percentage of table blocks, the CBO has determined that a Full Table Scan (FTS) is the cheaper alternative. This is an all too familiar scenario, all down to the fact the CF is not accurately reflecting the true clustering of the data and subsequent efficiency of the index.

Finally, at long last, there’s now an official fix for this !!

Bug 13262857 Enh: provide some control over DBMS_STATS index clustering factor computation INDEX describes this scenario and currently has available patches that can be applied on both Exadata databases and Oracle versions 11.1.0.7, 11.2.0.2 and 11.2.0.3. The patches (eg. Patch ID 15830250) describe the fix as addressing “Index Clustering Factor Computation Is Pessimistic“. I couldn’t have described it better myself 🙂

Once applied (the following demo is on a patched 11.2.0.3 database), there is a new statistics collection preference that can be defined, called TABLE_CACHED_BLOCKS. This basically sets the number of table blocks we can assume would already be cached when performing an index scan and can be ignored when incrementing the CF during statistics gathering. The default is 1 (i.e. as performed presently) but can be set up to be a value between 1 and 255, meaning during the collection of index statistics, it will not increment the CF if the table block being referenced by the current index entry has already been referenced by any of the prior 255 index entries (if set to 255). It basically sets the appropriate parameter in the sys_op_countchg function used to calculate the CF value during statistic gathering to not increment the CF if the current table block has already been accessed “x” index entries previously.

The TABLE_CACHED_BLOCKS preference can be set by either the DBMS_STATS.SET_TABLE_PREFS, DBMS_STATS.SET_SCHEMA_PREFS or DBMS_STATS.SET_DATABASE_PREFS procedures.

So let’s now change the TABLE_CACHED_BLOCKS preference for this table and re-calculate the index statistics:


SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',

pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ID_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor

2  FROM user_tables t, user_indexes i

3  WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ID_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR

------------ ------------ ---------- ---------- -----------------

BOWIE        BOWIE_ID_I         1126     300000              1035

We notice that the CF has now been significantly reduced (down from 241465 to just 1035), reflecting far more accurately the true clustering of the data when considering the actual effectiveness of using the index.

If we now run the same query as before:


SQL> select * from bowie where id between 42 and 429;

388 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 3472402785

------------------------------------------------------------------------------------------

| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT            |            |   389 |  7780 |     4   (0)|00:00:01 |

|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      |   389 |  7780 |     4   (0)|00:00:01 |

|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I |   389 |       |     2   (0)|00:00:01 |

------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID">=42 AND "ID"<=429)

Statistics

----------------------------------------------------------

0  recursive calls

0  db block gets

6  consistent gets

0  physical reads

0  redo size

9882  bytes sent via SQL*Net to client

519  bytes received via SQL*Net from client

2  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

388  rows processed

We notice the index is now being selected by the CBO. At a cost of 4 (previously the cost was somewhat greater than the 310 cost of the FTS), this much more accurately reflects the true cost of using the index (notice only 6 consistent gets are performed).

Being able to now set the TABLE_CACHED_BLOCKS preference during statistics collection finally gives us a fully supported and easy method to collect more accurate CF statistics. This in turn can only lead to more informed and accurate decisions by the CBO and ultimately better performing applications. Although available right now via the back ported patches, this will no doubt all be fully documented once the 12c database is finally released.

I can’t recommend enough the use of this new capability 🙂

Index Internals Seminar December 15, 2007

Posted by Richard Foote in .
comments closed

I resigned from Oracle in September 2017 and started my own independent company Richard Foote Consulting. As such I will again for running my “Oracle Indexing Internals and Best Practices” seminar.

Please contact me at richard@richardfooteconsulting.com for further information or upcoming seminars or visit my Seminar Dates page.

I run a very in-depth 2 day Oracle Indexing Internals and Best Practices Seminar. There is also a condensed 1 day version of the Oracle Index Internals and Best Practices Seminar upon request.

Indexes are fundamental to every Oracle database and are crucial for optimal performance. However, there’s an incredible amount of misconception, misunderstanding and pure myth regarding how Oracle indexes function and should be maintained. Many applications and databases are suboptimal and run inefficiently primarily because an inappropriate indexing strategy has been implemented.

This seminar examines most available Oracle index structures/options and discusses in considerable detail how indexes function, how/when they should be used and how they should be maintained. A key component of the seminar is how indexes are costed and evaluated by the Cost Based Optimizer (CBO) and how appropriate data management practices are vital for an effective indexing strategy.  It also covers many useful tips and strategies to maximise the benefits of indexes on application/database performance and scalability, as well as maximising Oracle database investments. Much of the material is exclusive to this seminar and is not generally available in Oracle documentation or in Oracle University courses.

This is a must attend seminar that would be of much benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high performance Oracle-based applications. The seminar is developed and personally delivered by Richard “Mr Index” Foote, a well-respected expert in Oracle Database technologies. All seminars are small class environments, with plenty of opportunity for attendees to ask questions specific to their particular environment.

For list of all upcoming seminars, see the Seminar Dates page.

All seminars include:

  • Detailed course notes
  • Tea/Coffee
  • Lunch

 

Seminar Content (Subject To Minor Changes)

In no particular order, the following Oracle Indexing “myths” (or misleading generalisations) are discussed throughout my seminar and are still prevalent out there in the Oracle universe:

  • Oracle B-Tree indexes become “unbalanced” over time and need to be rebuilt
  • Deleted space in index is “deadwood” and over time requires index to be rebuilt
  • If index height greater than ‘x’, it becomes inefficient and needs to be rebuilt
  • If index grows to 2x its height, it is 2x more costly to use
  • PCTFREE enables space for index entries to grow within current leaf block
  • If index has a poor (very high) Clustering Factor, rebuild the index
  • To improve the Clustering Factor, you have to rebuild the underling table
  • Clustering Factor “as good as it gets” after gathering 100% estimate index statistics
  • To improve performance, regularly rebuild indexes
  • You never have to rebuild an index to improve performance
  • Statistics from INDEX_STATS provides reliable metrics on when to rebuild an index
  • If delete rows as reported in INDEX_STATS > x%, rebuild
  • If delete rows as reported in INDEX_STATS < x%, don’t rebuild
  • Analyze Index Validate Structure is a safe method of collecting index metrics
  • Index rebuilds are inexpensive and unobtrusive
  • Primary/Unique Key constraints require a unique index
  • Drop/disable a constraint, unique index policing index is automatically dropped
  • All Foreign Key constraints must be indexed
  • Indexes should eliminate sorting
  • Only indexed columns require statistics
  • Bitmap Indexes only useful with low cardinality columns
  • Bitmap Index will be used when a B-tree is not for low cardinality columns
  • Null values are not indexed
  • Small tables (say < 100 rows) don’t benefit from indexing
  • Separating indexes from tables in tablespaces improves performance
  • Range scans not possible with Reverse Key indexes
  • Local indexes improve performance
  • Put most discriminating column first in concatenated indexes
  • If SQL references all columns in index, index column order is irrelevant
  • If leading column of index is not referenced in SQL, index not considered by CBO
  • Monitoring Indexes will highlight which indexes can be safely dropped
  • Indexing Tracking will highlight which indexes can be safely dropped
  • Index Compression make indexes smaller
  • B-Tree Index only useful with high cardinality columns
  • Pointless indexing a column with one distinct value
  • If more than x% rows returned, index is inappropriate, where x% between 0 & 100
  • Full Table Scan more efficient than index range scan with table access, when returning 100% of data
  • The CBO cost an internal value of no practical use for tuning/comparison purposes
  • Index is best solution to return 0% of data
  • You don’t need indexes in Exadata
  • Indexes less important in Exadata, as Storage Indexes can take over if database index is missing

It’s important to note it potentially only takes just the one bad row inserted in a table, one inaccurate statistic, one index being used inappropriately, one missing index not created during a application upgrade, to bring down a whole Oracle database, RAC nodes running on an Exadata included…

 

Session One: Overview of Oracle Index Structures and Options

  • Common Myths and Misconceptions
  • Oracle Indexing Structures
  • Oracle Indexing Options

Session Two: Introduction To B-Tree Indexes

  • Understanding B-Tree Index Structure and Navigation
  • B-Tree Index Structure Internals
  • Tree Dumps
  • Index Block Dump Examination
  • Study of DML Operations on Index Internals
  • Study of Concurrent Transactions on Index Internals
  • Index Block Split Internals
  • Index Root Block Internals
  • Index INITRANS

Session Three: Index Statistics

  • Available Index Statistics – DBA_INDEXES, INDEX_STATS, V$SEGMENT_STATS
  • Shortfalls with VALIDATE STRUCTURE
  • Clustering Factor Study: The Most Important Statistic
  • How To Ensure Clustering Factor Has Appropriate Value
  • Clustering Attribute
  • Table Re-Organisation And Index Importance
  • TABLE_CACHED_BLOCKS
  • Index Height
  • Statistics Collection

Session Four: Indexes and Constraints

  • Indexing PK and Unique Constraints
  • Important Differences Between Unique and Non-Unique Indexes
  • Indexing Options With Database Constraints
  • Impact of Constraint Options on Indexing
  • Indexing and Constraint Management
  • Use Cases For Multiple Indexes On Same Column List
  • IGNORE_ROW_ON_DUPKEY_INDEX Use Cases
  • Foreign Keys and Indexing Strategies
  • 18c Memoptimised Rowstore
  • 18c Scalable Indexes

Session Five: Rebuilding, Coalescing and Shrinking Indexes

  • Index PCTFREE
  • Deleted Index Space Management
  • Index Fragmentation Internals
  • How To Avoid Index Fragmentation
  • Index Rebuild Criteria: The Chosen Few
  • How To Determine Index Optimal Size
  • Dangers Of Index Rebuilds
  • Index BLEVEL
  • Index Rebuild Internals
  • Index Coalesce Internals
  • Index Shrink Internals
  • When to Rebuild or Coalesce or Shrink
  • Deferred Invalidation During Index Maintenance

Session Six: Indexes And The Cost Based Optimizer (CBO)

  • Indexes and CBO Case Studies
  • How CBO Calculates Selectivity
  • How CBO Costs Index Accesses
  • How to Use Index To Access 100% of Rows
  • Table Clustering Attribute
  • CBO and System Statistics
  • Indexes vs. Full Table Scans
  • Index Related Access Paths
  • Indexes and Sorting
  • Index Related CBO Parameters
  • Online Table Maintenance and Index Impact

Session Seven: Miscellaneous Index Tips, Tricks and Traps

  • Concatenated Index Column Order Study
  • Why Isn’t Oracle Using My Index ?
  • Outliers and Other Statistic Anomalies
  • Using B-Tree Indexes With Low Cardinality Columns
  • Indexing NULLs
  • Using Zero Sized Indexes
  • Index Compression Options
  • Index Monitoring
  • Tracking Index Usage
  • Indexing Small Tables
  • Indexing Extended Data Types
  • Indexing Encrypted Data
  • SKIP_UNUSABLE_INDEXES

Session Eight: Additional Indexing Options

  • Reverse Key Indexes Internals
  • Index Organized Tables Internals
  • Index Organized Table Secondary Indexes Internals
  • Function-Based Indexes and Virtual Columns
  • Fake Indexes
  • Invisible Indexes
  • Case In-Sensitive Indexes
  • Indexing JSON Document Store

Session Nine: Partitioned Indexes

  • Partitioning Options
  • Partitioned Index Block Dumps
  • Partition Pruning
  • Global Indexes (Partitioned / Non-Partitioned)
  • Local Indexes (Prefixed / Non-Prefixed)
  • Unique Partitioned Indexes
  • Partial Indexes
  • Partition Access Paths
  • Online Partition Index Conversion
  • Asynchronous Global Index Maintenance
  • Partition Statistics

Session Ten: Bitmap Indexes

  • Overview Of Bitmap Indexes
  • Bitmap Index Block Internals
  • Bitmap Index Misconceptions
  • Bitmap Index Size Considerations
  • Bitmap Index Access Paths
  • Star Transformations
  • OLTP and Bitmap Indexes: Locking Implications
  • Bitmap-Join Indexes
  • Bitmap Index Restrictions

 

Optional Section: Indexing with Exadata

  • Exadata Smart Scans vs. Index Accesses
  • Storage Indexes and their Limitations
  • Zone Maps