jump to navigation

BLKS_GETS_PER_ACCESS Index Rebuild Criteria ? (Twisted Logic) April 20, 2011

Posted by Richard Foote in Indexing Myth, Oracle Indexes, Validate Structure.
trackback

A recent question on the database OTN forum and a previous request by Charles Hooper that I cover some basic indexing concepts for newbie’s who might be confused by “dubious” information out there in internet land has prompted me to discuss the BLKS_GETS_PER_ACCESS metric, available in INDEX_STATS after an analyze validate structure operation.
 
The OTN thread had someone questioning why after freshly rebuilding an index, the index still had a particularly “high” BLKS_GETS_PER_ACCESS value of 110 and wouldn’t reduce down below a value of 5. They requested if someone could please explain why the BLKS_GETS_PER_ACCESS was not getting reduced, as they wanted it to be below 5.
 
The two obvious questions I had in return were why of earth would anyone want the BLKS_GETS_PER_ACCESS to be below 5 and why on earth would they think rebuilding the index would reduce the BLKS_GETS_PER_ACCESS from a value of 110 down to 5.
 
However a quick Google search confirmed my suspicions, subsequently confirmed by the OP. This “Identifying Which Indexes to Rebuild” article by Don Burleson states:
 
Gets per index access
 
The number of “gets” per access refers to the amount of logical I/O that is required to fetch a row with the index. As you may know, a logical “get” is not necessarily a physical I/O since much of the index may reside in the Oracle buffer cache. However, any SAP index with a number greater than 10 would probably benefit from an index rebuild.”
 
and
 
We might want to rebuild an index if the “block gets” per access is greater than five, since excessive “blocks gets” indicate a fragmented b-tree structure.”
 
The same claims are made on page 727 in “Oracle Tuning: The Definitive Reference” .
 
The problem with all this of course is that it’s a complete nonsense. The very idea of having a rebuild criteria based on BLKS_GETS_PER_ACCESS being greater than value “X” and that a subsequent index rebuild will reduce BLKS_GETS_PER_ACCESS down to less than value “X” shows a complete lack of understanding of what BLKS_GETS_PER_ACCESS actually represents.
 
BLKS_GETS_PER_ACCESS is basically the number of blocks required to get a randomly chosen row of interest via an index.
 
The Oracle Reference Manual describes BLKS_GETS_PER_ACCESS somewhat ambiguously as:

Expected number of consistent mode block reads per row, assuming that a randomly chosen row is accessed using the index. Used to calculate the number of consistent reads that will occur during an index scan.”
 
The key point here is that it’s a “randomly” chosen row when accessed via the specific index.
 
Now in the case of a Unique index, the number of blocks needed to access a random row can easily be determined as simply being the height of the index plus one additional block to access the associated table block. If the index is Unique, there can only be one row per index value which requires precisely one visit to the table.
 
In the following example, we create a table with 1M rows with two indexes. The index on the ID column is effectively unique as there are 1M distinct values while the index on the CODE column is Non-Unique with only 100 distinct values and so with 10000 occurrences of each indexed value.

  
SQL> create table bowie (id number, code number, name varchar2(30));
 
Table created.
 
SQL> insert into bowie select rownum, mod(rownum,100), 'ZIGGY STARDUST' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> create index bowie_code_i on bowie(code);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select index_name, num_rows, distinct_keys from dba_indexes where table_name='BOWIE';
 
INDEX_NAME     NUM_ROWS DISTINCT_KEYS
------------ ---------- -------------
BOWIE_CODE_I    1000000           100
BOWIE_ID_I      1000000       1000000

 
 

If we now collect INDEX_STATS on the effectively Unique index:

  
SQL> analyze index bowie_id_i validate structure;
 
Index analyzed.
 
SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats;
 
HEIGHT    LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------ ---------- ------------- ------------ --------------------
     3    1000000       1000000            1                    4
 

 
 

We can see that the BLKS_GETS_PER_ACCESS is 4 as expected. 3 due to the index height and additional 1 to access the specific row in the associated table block.
 
The formula Oracle uses here is basically:

HEIGHT + (LF_ROWS/DISTINCT_KEYS + 1)/2 = HEIGHT + (ROWS_PER_KEY + 1)/2 = 3 + (1 + 1)/2 = 3 + 2/2 = 3 + 1 = 4.
 
So basically to calculate the BLKS_GETS_PER_ACCESS, we simply take the ROWS_PER_KEY, add 1 to it, then divide the total by 2 and finally add the index height to get the final value. The reason for this exact formula makes more sense when we look at a Non-Unique index.
 
How do we cost the access to a specific row via a non-unique index when there could be multiple occurrences of the specific index value ?
If there are say 100 occurrences of an indexed value, if we want the first of these within the index, then we need to access the same blocks as per the unique index (index height plus 1). However, if we want the last of these 100 occurrences referenced within the index, then we might need to access index height + the 100 blocks until we reach the last occurrence of the indexed value. If we’re simply interested in an “average” row, then on average we might need to access 1/2 the ROWS_PER_KEY in addition to the index height.

So the formula is now basically HEIGHT + ROWS_PER_KEY/2. But as this is only an average guesstimate to begin with and so as we don’t ruin the perfect value we can derive for Unique Indexes, the formula is adjusted a tad by adding 1 to the ROWS_PER_KEY/2 figure so that the result makes sense and is accurate for Unique indexes as well.
 
Hence the final formula of HEIGHT + (ROWS_PER_KEY + 1)/2.
 
If we now look at the index on the CODE column, which has only 100 distinct values (and so 10000 occurrences per distinct value):

 

SQL> analyze index bowie_code_i validate structure;
 
Index analyzed.
 
SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats;
 
HEIGHT    LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------ ---------- ------------- ------------ --------------------
     3    1000000           100        10000               5003.5

 
 

HEIGHT + (ROWS_PER_KEY + 1)/2 = 3 + (10000 + 1)/2 = 3 + 5000.5 = 5003.5.
 
We can see how this 5003.5 figure is actually derived.
 
The actual blocks needed to be accessed when using the index is of course also very highly dependant on the Clustering Factor (CF) of the index but the CF is not calculated as part of Index_stats. The BLKS_GETS_PER_KEY can therefore be viewed as being a guesstimate of the number of blocks required to read a specific indexed value, based on an “average” CF. As the maximum CF is LF_ROWS, an “average” CF can therefore be viewed as simply being 1/2 of LF_ROWS.
 
In the above example, an “average” CF would therefore be 1000000/2 = 500000.
 
To access all table blocks for a specific indexed value would therefore basically be:

500000/distinct keys = 500000/100 = 5000.

If we then add the index height = 5000 + 3 = 5003.
 
5003 is almost identical to 5003.5, the 0.5 difference simply attributed to the additional 1 that’s added in the previous formula.
 
So BLKS_GETS_PER_ACCESS can effectively be viewed as being the either the number of blocks required to access a specific row “on average” within the table or the number of blocks required to read all occurrences of a specific index value IF the index had an average CF.
 
Note that both definitions are nothing but wild guesstimates of the blocks that might actually need to be accessed as both are making very broad assumptions in that the CF is indeed “average”, that the accessed row of interest is indeed “somewhere near the middle” of the index range scan, that the data is indeed evenly distributed, etc. etc. The actual blocks that might need to be accessed when using the index could therefore be significantly different if these basic assumptions are not correct.
 
Now here come a few interesting points.
 
Firstly, note the formula used only takes into consideration the index height and the average expected accesses to the table. The possible accesses to additional index leaf blocks is not considered at all.
 
Why ?
 
Likely because the vast majority of accesses involving an index range scan actually involves the block reads associated with accessing the table, not reading the index itself. As the figure is only a very rough estimate to begin with, it’s somewhat pointless adding a little here or there for the trivial additional leaf blocks that might need to be accessed during a scan . So in order to keep things simple, only the index height is considered in the actual BLKS_GETS_PER_ACCESS costings.
 
Therefore, an index rebuild is likewise going to have a trivial impact on the derived BLKS_GETS_PER_ACCESS as only the index height is considered in its calculation. In the vast majority of cases, rebuilding an index will have absolutely no impact at all as most index heights are not impacted by an index rebuild. In extremely rare occasions, an index might reduce its height but then the final BLKS_GETS_PER_INDEX is only going to reduced by 1. The absolute maximum amount that an index rebuild can impact the BLKS_GETS_PER_ACCESS is just HEIGHT-1.
 
In the above example, the BLKS_GETS_PER_ACCESS is 5003.5, substantially greater than 5 or 10 or even 42. However, rebuilding the index:

  
SQL> alter index bowie_code_i rebuild;
 
Index altered.
 
SQL> analyze index bowie_code_i validate structure;
 
Index analyzed.
 
SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats;
 
HEIGHT    LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------ ---------- ------------- ------------ --------------------
     3    1000000           100        10000               5003.5

 

  
makes precisely no difference at all.
 
So what do we do, just keep rebuilding this same index week after week after week as it continually meets the ill-considered index rebuild guidelines …
 
Note also that the actual value of BLKS_GETS_PER_ACCESS could be anything, as it’s primarily based on the number of rows per keys. For a very very large table on an index with relatively few distinct values, this figure could likewise be “very large”, even with the most perfectly compact, defragmented index.
 
Therefore, having an index rebuild criteria based if some nominal value of BLKS_GETS_PER_ACCESS “is excessive” (be it 5 or 10 or 42 or 5000 or whatever) is simply nonsensical as this value has practically nothing to do with the efficiency of an index but is directly proportional to the average number of rows per index key.

Additionally, suggesting an index rebuild will have a significant impact on BLKS_GETS_PER_ACCESS is likewise nonsensical as the only cost directly attributed to the index itself is the index height, which very rarely changes following an index rebuild. In fact, the BLKS_GETS_PER_ACCESS formula specifically negates the impact of any potential index rebuilds by implicitly excluding index block leafs in its calculations.
 
In short, basing an index rebuild criteria on the value of BLKS_GETS_PER_ACCESS is totally ludicrous. It’s actually similar to the silly myth that one should rebuild an index if the clustering factor is “too large”, where in actual fact there is no such threshold value and an index rebuild doesn’t impact the subsequent CF anyways.
 
I hate to think how many indexes have been repeatedly rebuilt unnecessarily based on a rebuild criteria where BLKS_GETS_PER_ACCESS is greater than value “X”, when such an index rebuild has made absolutely no difference to the original rebuild criteria :(

About these ads

Comments»

1. David Aldridge - April 21, 2011

Hmmm, very interesting indeed. I’m amused that the first time I become aware of BLK_GETS_PER_ACCESS is in the context of finding out what it is _not_ useful for. I’m still left with a warm glow of edjumication, though.

Interesting quote from Page 728 of that book: “Even if index rebuilding were to be proven as [sic] a useless activity, the Placebo effect on the end users is enough to justify the task.”. It opens up a whole new class of performance tuning … perhaps it could be called “Faith-based Tuning”.

Jonathan Lewis - April 21, 2011

If the only effect is a placebo effect couldn’t you just tell the users that you had rebuilt the indexes rather than actually doing it ?

Richard Foote - April 21, 2011

Hi Jonathan

Exactly !!

Just tell the users you’re going to rebuild some really important techo things but actually drink a few beers and share a pizza instead :)

Of course, if they want “proof” things generally run faster after being forced to have their applications unavailable, well Houston, we might have a problem …

David Aldridge - April 22, 2011

Oh I don’t think I could condone dishonesty of that type. Not with a straight face anyway.

Richard Foote - April 21, 2011

Hi David

Bizarre isn’t it !! Can you imagine an email from your bank:

Unfortunately, you won’t be able to access your banking facilities for the next 24 hours while we rebuild all our indexes (yes, we’re not entirely sure what indexes are either ?). However, our IT department assure us things will run so much faster afterwards. Apologies for any inconvenience this may cause, it will be worth it though, honest“.

The next day during dinner, the wife remarks just how much faster it appeared to get money out of the machine (usually it takes 20 seconds at least, but this time it felt as if it was only 18 or 19 seconds at most).

Yes, it was a pain paying the bills late when we couldn’t access our money that day but boy, aren’t we lucky we’re with a bank that rebuilds those index things all the time …

Unbelievable !!

2. Jimmy - April 21, 2011

So the “Oracle Tuning: The Definitive Resource” is not really definitive? Why am I not surprised – considering the source.

Richard Foote - April 21, 2011

Hi Jimmy

For an excellent review of this book, see this wonderful post by Charles Hooper:

http://hoopercharles.wordpress.com/2010/11/07/book-review-oracle-tuning-the-definitive-reference-second-edition/

3. Alberto - April 21, 2011

Great article, i don’t really know this Placebo effect, it’s often present, i’ts a very Oracle persistent mith, otherwise i prefer this http://www.placeboworld.co.uk like Placebo effect. :-)

Richard Foote - April 21, 2011

Hi Alberto

Check this out, Placebo with David Bowie:

Enjoy !!

4. Richard Foote - April 22, 2011

Hi David

Yes, I couldn’t condone it either, beside it would only help to put on the weight :)

5. Uwe Hesse - April 26, 2011

Hi Richard, thank you for clarifying this!
I really appreciate your efforts about making the community aware of needless index rebuilts based on wrong criteria since this seems to be a too common case.
Sometimes, though, I suspect that you Oak Table guys have made up the whole DKB just to have a contact surface. Admit it, he is not real, right?

Richard Foote - April 27, 2011

Hi Uwe

Oh he’s real, oh boy is he real :(

I’ll just keep plugging away, showing people how things really work and slowly but surely, the message will get out there and less folk like the poor poster on the OTN thread mentioned in this post will have unnecessary downtimes to perform unnecessary validate structures of indexes to determine invalid criteria in which to rebuild indexes unnecessarily.

6. Andy - April 28, 2011

Wow, the placebo effect advice is totally out of this world! I will certainly start using it on my clients today :p

The world is a better place, thanks to DKB! Amen…

Richard Foote - May 22, 2011

:)

7. DEL_LF_ROWS Index Rebuild Criteria ? (Codex) « Richard Foote’s Oracle Blog - May 22, 2011

[...] the same article and book where the erroneous claims regarding the BLKS_GETS_PER_ACCESS were made, we also find the following recommendation:   “Another rebuild condition would be [...]

8. Charles Hooper - May 28, 2011

Richard,

Thank you for taking the time to start writing the back to basics indexing concepts series of articles.

If the Oracle Documentation library (or Metalink/MOS for that matter) were as clear on the purpose of the BLKS_GETS_PER_ACCESS and DEL_LF_ROWS statistics as your articles, there certainly would be fewer printed Oracle Database books on the market that simply get the facts wrong (cases where the authors and technical reviewers simply spend a far too limited time testing/verifying the accuracy of the book’s contents).

I am looking forward to your next article in this series.

Richard Foote - June 6, 2011

Hi Charles,

Yes, I agree the doco is not exactly great on this. I guess that’s why we have blogs and websites such as ours :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,895 other followers

%d bloggers like this: