Clustering Factor Calculation Improvement Part III (Too Much Rope) June 4, 2013
Posted by Richard Foote in 11g, CBO, Clustering Factor, Index statistics, TABLE_CACHED_BLOCKS.trackback
In my previous post, I discussed how for larger tables that really do have randomised data, not even setting the new TABLE_CACHED_BLOCKS to its 255 maximum value is going to make any real difference to the resultant Clustering Factor (CF) of an index. The 255 maximum value here protects us somewhat from abusing this capability and setting unrealistic CF values.
However, for smaller tables in particular, we do need to exercise some caution.
In the following example, we’re only creating a relatively small table and associated index with a CODE column that is randomly distributed throughout the table:
SQL> create table bowie (id number, code number, text varchar2(30)); Table created. SQL> insert into bowie select rownum, trunc(dbms_random.value(0, 100)), 'DAVID BOWIE' from dual connect by level <= 70000; 70000 rows created. SQL> commit; Commit complete. SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> create index bowie_code_i on bowie(code); Index created.
If we look at the CF of this index:
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I'; TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR ------------ ------------ ---------- ---------- ----------------- BOWIE BOWIE_CODE_I 244 70000 22711
We notice that at 22,711, the CF it’s pretty average. The table though is quite small at only 244 blocks.
If we run a simple query:
SQL> select * from bowie where code between 42 and 44; 2050 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1845943507 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2814 | 56280 | 65 (4)| 00:00:01 | |* 1 | TABLE ACCESS FULL| BOWIE | 2814 | 56280 | 65 (4)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter('CODE'<=44 AND 'CODE'>=42) Statistics ---------------------------------------------------------- 0 recursive calls 1 db block gets 254 consistent gets 0 physical reads 0 redo size 25044 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2050 rows processed
The CBO goes for a Full Table Scan. This is not unexpected as we likely have to visit all 244 blocks anyways to fetch the required 2050 rows due to the CODE data being so randomly distributed throughout the table. In a naive attempt to improve things, we decide to improve the CF by setting the TABLE_CACHED_BLOCKS to the maximum 255 value:
SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE', pname=>'TABLE_CACHED_BLOCKS', pvalue=>255); PL/SQL procedure successfully completed. SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I'; TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR ------------ -------------- ---------- ---------- ----------------- BOWIE BOWIE_CODE_I 244 70000 240
Indeed, we have improved the CF, dramatically reducing it down to just 240 from the previous 22711. Even though the column value for the CODE really is randomly distributed throughout the table, the CF now suggests the data is perfectly clustered. This is because with a table with only 244 blocks, incrementing the CF if the current index entry references a table block more than 255 blocks ago is now impossible. The CF is now guaranteed to be “perfect” as each index entry can only reference one of the 244 table blocks and so is incremented only when each table block is referenced the first time.
This dramatic reduction in the CF will certainly make the index more attractive to the CBO. But is this really a good thing:
SQL> select * from bowie where code between 42 and 44; 2050 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1602289932 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2814 | 56280 | 17 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| BOWIE | 2814 | 56280 | 17 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | BOWIE_CODE_I | 2814 | | 7 (0)| 00:00:01 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access('CODE'>=42 AND 'CODE'<=44) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 681 consistent gets 0 physical reads 0 redo size 20895 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2050 rows processed
The CBO is now using the index, but is now performing significantly more consistent gets, 681 rather than the previous 254. This execution plan is actually much less efficient than the previous FTS execution plan. The CBO is getting this wrong now as the CF isn’t really anywhere near as good as it’s now being lead to believe.
Caution setting TABLE_CACHED_BLOCKS to a value that is anywhere close to the number of blocks in the table. This is one of the reasons for Oracle having a 1% of table blocks default value for this setting.
Hi Richard,
one loosely related question: how can the CF be smaller than the number of blocks? Even if the CF “is incremented only when each table block is referenced the first time” I would expect at least 244.
Doing the most simple test I could imagine I just saw that this is happening all the time:
Why does sys_op_countchg ignore some of the leafs?
Regards
Martin
LikeLike
oh, I just realized that I was talking nonsense …- of course it’s the number of switches between table blocks that sys_op_countchg is counting and the number of index blocks doesn’t matter …
LikeLike
Hi Martin
No worries 🙂
One thing some people do get a little confused on is how can one have a CF less than the number of blocks in the table below the segment HWM.
This is entirely possible if such table blocks don’t contain any rows. This can happen obviously if you delete a whole heap of rows, leaving some table blocks effectively empty. It can also happen to very densely populated tables when the HWM increments by whatever number of blocks but some of these have yet to be populated by subsequent inserts.
So having a CF less than the number of table blocks is quite possible.
LikeLike
[…] dritten Artikel wird als Gegenbeispiel die maximale Einstellung auf eine kleine Tabelle angewendet und dargestellt, […]
LikeLike
Hi Richard,
the role of unused blocks was the reason for my block dump – but it was late in the evening and I began to confuse the basics …
For the table blocks a block dump of course would not be necessary: in this case I would only ask dbms_rowid:
So in this simple case without deletes I get my expected result: CF = number of table blocks.
LikeLike
Hi Richard,
Nice explanation , This seems to be issue when 1) Table blocks are less than 255 and 2) Number of occurrences for each distinct Col Value is very high (this case 700) ,
But same case looks good if Number of occurrences for each distinct Col Value : in below case its 7
Example:
insert into bowie select rownum, trunc(dbms_random.value(0, 10000)), ‘DAVID BOWIE’
from dual connect by level <= 70000;
Seems like Oracle should consider above two when calculation CF.
any suggestions ?
Not sure why Oracle came up with TABLE_CACHED_BLOCKS setting at table level which effecting all columns/indexes CF in table ,I think its good to have TABLE_CACHED_BLOCKS setting at Column level as CF is related how tables blocks clustered/ordered on column which we are interested .
Thanks
Vijay
LikeLike
Hi Vijay
Being able to set this per column has merit but I guess will also complicate things further. At the end of the day, for a given table, some columns would have a good CF, most an average/bad CF. However, it’s for those columns that are actually well clustered but have a bad CF because of the block toggling effect associated with ASSM, freelists and bad luck for which this will have a (correct) impact. If the CF really is bad and the table is large enough, then this will have no impact. If the CF is already good, then again no impact. So the risks associated with this are minimised for most scenarios but there are always exceptions as this article highlights.
LikeLike
Grouped in wrong section 11G, TABLE_CACHED_BLOCKS seems to be 12C feature, correct me if I am wrong
LikeLike
Hi Saurabh
No, this was first introduced via a patch (13262857) in 11.2, so it fits in with an 11g tag. It was only officially documented in 12c.
LikeLike