jump to navigation

Clustering Factor Calculation Improvement Part II (Blocks On Blocks) May 14, 2013

Posted by Richard Foote in 11g, Clustering Factor, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
trackback

My previous post on the new TABLE_CACHED_BLOCKS statistics gathering preference certainly generated some interest 🙂 My blog hits for the week have gone off the charts !!

One of the concerns raised by this new capability was that setting such a preference might result in really unrealistic and inaccurate Clustering Factor (CF) values, especially for those tables that truly have appalling CFs. Although there are certainly some dangers, Oracle has limited the possible “abuse” by ensuring TABLE_CACHED_BLOCKS can only be set to a maximum of 255. This means Oracle will only ignore a maximum of 255 table blocks that have recently been accessed during the CF calculation. For larger tables with truly randomised data patterns, not even the maximum 255 setting if utilised will make an appreciable difference to the final CF.

A couple of examples to demonstrate.

The first table is a relatively “large” table that has a DOB column that is effectively randomised throughout the table. There are approximately 20,000 different DOB values in a 2 million row table (so each DOB occurs approximately 100 times, give or take).

SQL> create table major_tom (id number, DOB date, text varchar2(30));

Table created.

SQL> insert into major_tom select rownum,  sysdate-trunc(dbms_random.value(0, 20000)), 'DAVID BOWIE' from dual connectby level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

Let’s now create an index on this DOB column and have a look at the CF:

SQL> create index major_tom_dob_i on major_tom(dob);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'MAJOR_TOM', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1988164

So at 1,988,164, the CF is terrible. This is as expected as the DOB values are all randomised throughout the table. The index is not being used as we had hope (naively) so let’s use the new TABLE_CACHED_BLOCKS preference to now improve the calculated CF by setting it to the maximum 255 setting and recalculate the index statistics:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'MAJOR_TOM',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'MAJOR_TOM_DOB_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='MAJOR_TOM_DOB_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
MAJOR_TOM    MAJOR_TOM_DOB_I       9077    2000000           1941946

We notice that although the CF has improved marginally, at whopping 1,941,946 it’s still terrible and has made no real appreciable difference. Why ?

Well let’s do some basic maths here. There are 9077 blocks in the table and the next DOB referenced in the index can potentially be in any one of them. Therefore, the chances of the next DOB being in one of the 255 previously accessed table blocks is only 255/9077 x 100 = approximately 2.8%. So in only 2.8% of the time is the CF likely to not be incremented and so the CF is only likely to drop by around this 2.8% amount.

Let’s check. (1988164 – 1941946)/1988164 x 100  indeed does equal approximately 2.8%.

So statistically with such a poor CF on such a “large” table, to limit the CF calculation if any of the last 255 table blocks are referenced is only going to improve things by 2.8% on average. Effectively of no real use at all.

Another example now, but this time with a CODE column with just 100 distinct values that are randomly distributed throughout another reasonable “large” 2 million row table. For those mathematically challenged, that means each value occurs approximately 20,000 times, give or take:

SQL> create table ziggy (id number, code number, text varchar2(30));

Table created.

SQL> insert into ziggy select rownum,  trunc(dbms_random.value(0, 100)), 'DAVID
BOWIE' from dual connect by level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy_code_i on ziggy(code);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY', estimate_percent=> null, cascade=> true,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

So at 662,962 it’s what I would describe as a “poor to average” CF. It’s not particularly great with there being just  7,048 table blocks but it’s still some distance from the 2,000,000 row value.

The index is not being used in SQL statements as we (naively) wish, so let’s try and improve things by lowering the index CF by setting the new TABLE_CACHED_BLOCKS preference to the maximum 255 setting:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'ZIGGY',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_index_stats(ownname=>user, indname=>'ZIGGY_CODE_I',
estimate_percent=>null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2  FROM user_tables t, user_indexes i
3  WHERE t.table_name = i.table_name AND i.index_name='ZIGGY_CODE_I';

TABLE_NAME   INDEX_NAME          BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ --------------- ---------- ---------- -----------------
ZIGGY        ZIGGY_CODE_I          7048    2000000            662962

We notice to our great disappointment (well, not really) that the CF remains completely unchanged at 662,962 !! Why ?

Again, let’s do some basic maths and consider the data distribution.

The table has some 7048 blocks but each distinct CODE value has some 20,000 occurrences on average. Therefore, each value is going to be found 20000/7048 = roughly 2 to 3 times per block. As the index is in CODE order and for each CODE in rowid order, the CF is going to increment for each CODE value for each distinct block we visit. We will therefore only go back to a previously visited table block (except for the 2 to 3 visits to the current block) when the CODE value changes but this will take us all the way back to the first block which is always going to be some 7047 blocks away from the current one. As 7047 is much greater than the 255 the CF calculation will only cater for, the CF is going to remain unchanged from the default calculation as a result.

And this is all as it should be, as the fundamental CF is indeed poor for these scenarios and even going back the maximum 255 data blocks will not reduce appreciably the manner in which the CF is calculated.

Of course, if there was no limit, then a setting of TABLE_CACHED_BLOCKS  of say 7100 would enable the CF to be recalculated as being perfect in the above scenario, which would indeed be a concern. But 255 is the limit and so limits the potential “damaged” that can be done.

More on all this to come 🙂

Comments»

1. vipan - May 14, 2013

Hi Richard,

I followed your test case and tried to reproduce the issue on my 11.2.0.3 test db but i could not. I don’t see FTS in my execution plan although CF is almost same as your test case. I get index range scan for me. can you please tell what might be going on here.

Like

Richard Foote - May 15, 2013

Hi Vipan

There are a lot of factors that go into the CBO costings, such as the system statistics, optimizer parameter settings as well as the actual segment statistics. There wasn’t much margin for error with my example so a little variable here or there can make the difference. Note the cost of your index execution plan, run another more expensive (less selective) query to see the cost of a FTS on the table and you’ll note the difference between the two, which is likely not much. Perhaps twaek the system statisitcs to increase the MBRC a tad to make the FTS less expensive.

Like

2. Новое в статистике | Oracle mechanics - May 16, 2013

[…] С описанием и тестами патча можно ознакомиться в блоге Richard Foote Important !! Clustering Factor Calculation Improvement (Fix You) и Clustering Factor Calculation Improvement Part II (Blocks On Blocks): […]

Like

3. Lonny (@sql_handle) - June 4, 2013

Hello! I’ve read your blog posts on optimizer_index_cost_adj, optimizer_index_caching, and table_cached_blocks. Good stuff.
My interest in these parameters is to lower total IO, and especially IO queuing pressure, on a system where up to 100 similar concurrent queries can be executed. Direct path read for full table scans and fast full index scans seems to be the behavior I wan to eliminate. But I’d rather not mess with _small_table_threshold and/or _large_object_threshold.
Seems like the trio of optimizer_index_cost_adj, optimizer_index_caching, and table_cached_blocks should get me where I want: potentially replacing repetitive scans into PGA with more targeted use of indexes, with the index and table blocks inserted at the default LRU queue midpoint for as long as a time window for cache hit as they’ll get.
It seems like optimizer_index_cost_adj is the big hammer among them, followed by optimizer_index_caching, with table_cached_blocks providing potentially the most subtle effect of the three. Based on that assumption, I’m planning to attempt to tune optimizer_index_cost_adj for the workload first, then optimizer_index_caching, then table_cached_blocks, and finally come back to look for cleanup (signs of inefficient plans due to targeted index use rather than a full table scan or fast full index scan that is decidedly more efficient).
Does that sound reasonable? I’d rather work from a hierarchy to tune these, than perform comparative tests for each position in a grid of multiple values for each of the trio.
Thanks!
Lonny

Like

Richard Foote - June 6, 2013

Hi Lonny

At the end of the day, you just want the CBO to pick the most efficient execution plan, be it involving an index range scan or a full table/fast index scan. With reasonably accurate segment and system statistics, the CBO generally does a really good job of it, without having to touch the optimizer parameters.

Out of the trio you’ve listed, I would begin with this new table_cached_blocks preference because it can easily “fix” very common inaccuracies with the CF, which in many ways is the most vital (and problematic) of all the statistics and it can be focused just on those tables that involve problematic execution plans.

That should then leave you with very few problem cases in which inappropriate full scans are being performed. For some fringe cases, you might well prefer a slightly less efficient index scans because it means caching data that can be reused and avoiding repetitive physical direct reads (this can sometimes be the case in Exadata). But these are rare scenarios and best handled on a case by case basis IMHO.

Like

4. Neues vom Clustering Factor « Oraculix - June 5, 2013

[…] zweiten Artikel geht Richard Foote auf Bedenken aus den Kommentaren zu Teil 1 ein, speziell die Bedenken, daß der […]

Like


Leave a comment