jump to navigation

Clustering Factor Calculation Improvement Part II (Blocks On Blocks) May 14, 2013

Posted by Richard Foote in 11g, Clustering Factor, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
6 comments

My previous post on the new TABLE_CACHED_BLOCKS statistics gathering preference certainly generated some interest :) My blog hits for the week have gone off the charts !!

One of the concerns raised by this new capability was that setting such a preference might result in really unrealistic and inaccurate Clustering Factor (CF) values, especially for those tables that truly have appalling CFs. Although there are certainly some dangers, Oracle has limited the possible “abuse” by ensuring TABLE_CACHED_BLOCKS can only be set to a maximum of 255. This means Oracle will only ignore a maximum of 255 table blocks that have recently been accessed during the CF calculation. For larger tables with truly randomised data patterns, not even the maximum 255 setting if utilised will make an appreciable difference to the final CF.

A couple of examples to demonstrate.

The first table is a relatively “large” table that has a DOB column that is effectively randomised throughout the table. There are approximately 20,000 different DOB values in a 2 million row table (so each DOB occurs approximately 100 times, give or take).

SQL> create table major_tom (id number, DOB date, text varchar2(30));

Table created.

SQL> insert into major_tom select rownum,  sysdate-trunc(dbms_random.value(0, 20000)), 'DAVID BOWIE' from dual connectby level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

Let’s now create an index on this DOB column and have a look at the CF:

SQL> create index major_tom_dob_i on major_tom(dob);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'MAJOR_TOM', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1988164

So at 1,988,164, the CF is terrible. This is as expected as the DOB values are all randomised throughout the table. The index is not being used as we had hope (naively) so let’s use the new TABLE_CACHED_BLOCKS preference to now improve the calculated CF by setting it to the maximum 255 setting and recalculate the index statistics:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'MAJOR_TOM',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'MAJOR_TOM_DOB_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1941946

We notice that although the CF has improved marginally, at whopping 1,941,946 it’s still terrible and has made no real appreciable difference. Why ?

Well let’s do some basic maths here. There are 9077 blocks in the table and the next DOB referenced in the index can potentially be in any one of them. Therefore, the chances of the next DOB being in one of the 255 previously accessed table blocks is only 255/9077 x 100 = approximately 2.8%. So in only 2.8% of the time is the CF likely to not be incremented and so the CF is only likely to drop by around this 2.8% amount.

Let’s check. (1988164 – 1941946)/1988164 x 100  indeed does equal approximately 2.8%.

So statistically with such a poor CF on such a “large” table, to limit the CF calculation if any of the last 255 table blocks are referenced is only going to improve things by 2.8% on average. Effectively of no real use at all.

Another example now, but this time with a CODE column with just 100 distinct values that are randomly distributed throughout another reasonable “large” 2 million row table. For those mathematically challenged, that means each value occurs approximately 20,000 times, give or take:

SQL> create table ziggy (id number, code number, text varchar2(30));

Table created.

SQL> insert into ziggy select rownum,  trunc(dbms_random.value(0, 100)), 'DAVID
BOWIE' from dual connect by level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy_code_i on ziggy(code);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY', estimate_percent=> null, cascade=> true,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

So at 662,962 it’s what I would describe as a “poor to average” CF. It’s not particularly great with there being just  7,048 table blocks but it’s still some distance from the 2,000,000 row value.

The index is not being used in SQL statements as we (naively) wish, so let’s try and improve things by lowering the index CF by setting the new TABLE_CACHED_BLOCKS preference to the maximum 255 setting:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'ZIGGY',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'ZIGGY_CODE_I',
estimate_percent=>null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

We notice to our great disappointment (well, not really) that the CF remains completely unchanged at 662,962 !! Why ?

Again, let’s do some basic maths and consider the data distribution.

The table has some 7048 blocks but each distinct CODE value has some 20,000 occurrences on average. Therefore, each value is going to be found 20000/7048 = roughly 2 to 3 times per block. As the index is in CODE order and for each CODE in rowid order, the CF is going to increment for each CODE value for each distinct block we visit. We will therefore only go back to a previously visited table block (except for the 2 to 3 visits to the current block) when the CODE value changes but this will take us all the way back to the first block which is always going to be some 7047 blocks away from the current one. As 7047 is much greater than the 255 the CF calculation will only cater for, the CF is going to remain unchanged from the default calculation as a result.

And this is all as it should be, as the fundamental CF is indeed poor for these scenarios and even going back the maximum 255 data blocks will not reduce appreciably the manner in which the CF is calculated.

Of course, if there was no limit, then a setting of TABLE_CACHED_BLOCKS  of say 7100 would enable the CF to be recalculated as being perfect in the above scenario, which would indeed be a concern. But 255 is the limit and so limits the potential “damaged” that can be done.

More on all this to come :)

Follow

Get every new post delivered to your Inbox.

Join 1,894 other followers