Clustering Factor Calculation Improvement Part III (Too Much Rope)

Clustering Factor Calculation Improvement Part III (Too Much Rope) June 4, 2013

Posted by Richard Foote in 11g, CBO, Clustering Factor, Index statistics, TABLE_CACHED_BLOCKS.
trackback

In my previous post, I discussed how for larger tables that really do have randomised data, not even setting the new TABLE_CACHED_BLOCKS to its 255 maximum value is going to make any real difference to the resultant Clustering Factor (CF) of an index. The 255 maximum value here protects us somewhat from abusing this capability and setting unrealistic CF values.

However, for smaller tables in particular, we do need to exercise some caution.

In the following example, we’re only creating a relatively small table and associated index with a CODE column that is randomly distributed throughout the table:

SQL> create table bowie (id number, code number, text varchar2(30));
Table created.

SQL> insert into bowie select rownum,  trunc(dbms_random.value(0, 100)), 'DAVID BOWIE'
from dual connect by level <= 70000;

70000 rows created.

SQL> commit;

Commit complete.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true,
     method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> create index bowie_code_i on bowie(code);

Index created.

If we look at the CF of this index:

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ ------------ ---------- ---------- -----------------
BOWIE        BOWIE_CODE_I        244      70000             22711

We notice that at 22,711, the CF it’s pretty average. The table though is quite small at only 244 blocks.

If we run a simple query:

SQL> select * from bowie where code between 42 and 44;
2050 rows selected.

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  2814 | 56280 |    65   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  2814 | 56280 |    65   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter('CODE'<=44 AND 'CODE'>=42)

Statistics
----------------------------------------------------------

0  recursive calls
1  db block gets
254  consistent gets
0  physical reads
0  redo size
25044  bytes sent via SQL*Net to client
519  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2050  rows processed

The CBO goes for a Full Table Scan. This is not unexpected as we likely have to visit all 244 blocks anyways to fetch the required 2050 rows due to the CODE data being so randomly distributed throughout the table. In a naive attempt to improve things, we decide to improve the CF by setting the TABLE_CACHED_BLOCKS to the maximum 255 value:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I';

TABLE_NAME   INDEX_NAME         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ -------------- ---------- ---------- -----------------
BOWIE        BOWIE_CODE_I          244      70000               240

Indeed, we have improved the CF, dramatically reducing it down to just 240 from the previous 22711. Even though the column value for the CODE really is randomly distributed throughout the table, the CF now suggests the data is perfectly clustered. This is because with a table with only 244 blocks, incrementing the CF if the current index entry references a table block more than 255 blocks ago is now impossible. The CF is now guaranteed to be “perfect” as each index entry can only reference one of the 244 table blocks and so is incremented only when each table block is referenced the first time.

This dramatic reduction in the CF will certainly make the index more attractive to the CBO. But is this really a good thing:

SQL> select * from bowie where code between 42 and 44;
2050 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1602289932
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  2814 | 56280 |    17   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |  2814 | 56280 |    17   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_CODE_I |  2814 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access('CODE'>=42 AND 'CODE'<=44)

Statistics
----------------------------------------------------------

0  recursive calls
0  db block gets
681  consistent gets
0  physical reads
0  redo size
20895  bytes sent via SQL*Net to client
519  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2050  rows processed

The CBO is now using the index, but is now performing significantly more consistent gets, 681 rather than the previous 254. This execution plan is actually much less efficient than the previous FTS execution plan. The CBO is getting this wrong now as the CF isn’t really anywhere near as good as it’s now being lead to believe.

Caution setting TABLE_CACHED_BLOCKS to a value that is anywhere close to the number of blocks in the table. This is one of the reasons for Oracle having a 1% of table blocks default value for this setting.

Comments»

1. Martin Preiss - June 5, 2013

Hi Richard,

one loosely related question: how can the CF be smaller than the number of blocks? Even if the CF “is incremented only when each table block is referenced the first time” I would expect at least 244.

Doing the most simple test I could imagine I just saw that this is happening all the time:

create table test 
as 
select rownum id 
  from dual 
connect by level <= 10000;

create index test_idx on test(id);

select index_name
     , LEAF_BLOCKS
     , CLUSTERING_FACTOR
  from dba_indexes
 where table_name = 'TEST';

INDEX_NAME                     LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ----------- -----------------
TEST_IDX                                21                16

select sys_op_countchg(substrb(rowid,1,15), 1) cf 
  from test;

        CF
----------
        16

-- so perhaps the leaf_blocks are misleading?
-- not according to a tree dump
alter session set events 'immediate trace name treedump level 102308';

----- begin tree dump
branch: 0x1408261 21004897 (0: nrow: 21, level: 1)
   leaf: 0x1408262 21004898 (-1: nrow: 485 rrow: 485)
   leaf: 0x1408263 21004899 (0: nrow: 479 rrow: 479)
   leaf: 0x1408264 21004900 (1: nrow: 479 rrow: 479)
   leaf: 0x1408265 21004901 (2: nrow: 479 rrow: 479)
   leaf: 0x1408266 21004902 (3: nrow: 479 rrow: 479)
   leaf: 0x1408267 21004903 (4: nrow: 479 rrow: 479)
   leaf: 0x1408268 21004904 (5: nrow: 479 rrow: 479)
   leaf: 0x1408269 21004905 (6: nrow: 479 rrow: 479)
   leaf: 0x140826a 21004906 (7: nrow: 479 rrow: 479)
   leaf: 0x140826b 21004907 (8: nrow: 479 rrow: 479)
   leaf: 0x140826c 21004908 (9: nrow: 479 rrow: 479)
   leaf: 0x140826d 21004909 (10: nrow: 479 rrow: 479)
   leaf: 0x140826e 21004910 (11: nrow: 479 rrow: 479)
   leaf: 0x140826f 21004911 (12: nrow: 479 rrow: 479)
   leaf: 0x1408270 21004912 (13: nrow: 479 rrow: 479)
   leaf: 0x1408271 21004913 (14: nrow: 479 rrow: 479)
   leaf: 0x1408272 21004914 (15: nrow: 479 rrow: 479)
   leaf: 0x1408273 21004915 (16: nrow: 479 rrow: 479)
   leaf: 0x1408274 21004916 (17: nrow: 479 rrow: 479)
   leaf: 0x1408275 21004917 (18: nrow: 479 rrow: 479)
   leaf: 0x1408276 21004918 (19: nrow: 414 rrow: 414)
----- end tree dump

Why does sys_op_countchg ignore some of the leafs?

Regards

Martin

LikeLike

Martin Preiss - June 5, 2013

oh, I just realized that I was talking nonsense …- of course it’s the number of switches between table blocks that sys_op_countchg is counting and the number of index blocks doesn’t matter …

LikeLike

Richard Foote - June 6, 2013

Hi Martin

No worries 🙂

One thing some people do get a little confused on is how can one have a CF less than the number of blocks in the table below the segment HWM.

This is entirely possible if such table blocks don’t contain any rows. This can happen obviously if you delete a whole heap of rows, leaving some table blocks effectively empty. It can also happen to very densely populated tables when the HWM increments by whatever number of blocks but some of these have yet to be populated by subsequent inserts.

So having a CF less than the number of table blocks is quite possible.

LikeLike

2. Neues vom Clustering Factor « Oraculix - June 5, 2013

[…] dritten Artikel wird als Gegenbeispiel die maximale Einstellung auf eine kleine Tabelle angewendet und dargestellt, […]

LikeLike

3. Martin Preiss - June 6, 2013

Hi Richard,

the role of unused blocks was the reason for my block dump – but it was late in the evening and I began to confuse the basics …

For the table blocks a block dump of course would not be necessary: in this case I would only ask dbms_rowid:

select count(distinct dbms_rowid.rowid_relative_fno(rowid)
                      || ',' dbms_rowid.rowid_block_number(rowid)) block_count
  from test;

BLOCK_COUNT
-----------
         16

So in this simple case without deletes I get my expected result: CF = number of table blocks.

LikeLike

4. Vijay - June 26, 2013

Hi Richard,

Nice explanation , This seems to be issue when 1) Table blocks are less than 255 and 2) Number of occurrences for each distinct Col Value is very high (this case 700) ,

But same case looks good if Number of occurrences for each distinct Col Value : in below case its 7

Example:
insert into bowie select rownum, trunc(dbms_random.value(0, 10000)), ‘DAVID BOWIE’
from dual connect by level <= 70000;

Seems like Oracle should consider above two when calculation CF.
any suggestions ?

Not sure why Oracle came up with TABLE_CACHED_BLOCKS setting at table level which effecting all columns/indexes CF in table ,I think its good to have TABLE_CACHED_BLOCKS setting at Column level as CF is related how tables blocks clustered/ordered on column which we are interested .

Thanks
Vijay

LikeLike

Richard Foote - June 26, 2013

Hi Vijay

Being able to set this per column has merit but I guess will also complicate things further. At the end of the day, for a given table, some columns would have a good CF, most an average/bad CF. However, it’s for those columns that are actually well clustered but have a bad CF because of the block toggling effect associated with ASSM, freelists and bad luck for which this will have a (correct) impact. If the CF really is bad and the table is large enough, then this will have no impact. If the CF is already good, then again no impact. So the risks associated with this are minimised for most scenarios but there are always exceptions as this article highlights.

LikeLike

5. Saurabh - May 30, 2014

Grouped in wrong section 11G, TABLE_CACHED_BLOCKS seems to be 12C feature, correct me if I am wrong

LikeLike

6. Richard Foote - July 8, 2014

Hi Saurabh

No, this was first introduced via a patch (13262857) in 11.2, so it fits in with an 11g tag. It was only officially documented in 12c.

LikeLike

	PP on Costing Concatenated Indexes W…
	Zeeshan Moin on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	PP on The End Of Richard Foote…
	Ram on The Fake Index Trap (“No…
	Richard Foote on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Sayan Malakshinov on The End Of Richard Foote…

Richard Foote's Oracle Blog