BLKS_GETS_PER_ACCESS Index Rebuild Criteria ? (Twisted Logic) April 20, 2011
Posted by Richard Foote in Indexing Myth, Oracle Indexes, Validate Structure.18 comments
A recent question on the database OTN forum and a previous request by Charles Hooper that I cover some basic indexing concepts for newbie’s who might be confused by “dubious” information out there in internet land has prompted me to discuss the BLKS_GETS_PER_ACCESS metric, available in INDEX_STATS after an analyze validate structure operation.
The OTN thread had someone questioning why after freshly rebuilding an index, the index still had a particularly “high” BLKS_GETS_PER_ACCESS value of 110 and wouldn’t reduce down below a value of 5. They requested if someone could please explain why the BLKS_GETS_PER_ACCESS was not getting reduced, as they wanted it to be below 5.
The two obvious questions I had in return were why of earth would anyone want the BLKS_GETS_PER_ACCESS to be below 5 and why on earth would they think rebuilding the index would reduce the BLKS_GETS_PER_ACCESS from a value of 110 down to 5.
However a quick Google search confirmed my suspicions, subsequently confirmed by the OP. This “Identifying Which Indexes to Rebuild” article by Don Burleson states:
“Gets per index access
The number of “gets” per access refers to the amount of logical I/O that is required to fetch a row with the index. As you may know, a logical “get” is not necessarily a physical I/O since much of the index may reside in the Oracle buffer cache. However, any SAP index with a number greater than 10 would probably benefit from an index rebuild.”
and
“We might want to rebuild an index if the “block gets” per access is greater than five, since excessive “blocks gets” indicate a fragmented b-tree structure.”
The same claims are made on page 727 in “Oracle Tuning: The Definitive Reference” .
The problem with all this of course is that it’s a complete nonsense. The very idea of having a rebuild criteria based on BLKS_GETS_PER_ACCESS being greater than value “X” and that a subsequent index rebuild will reduce BLKS_GETS_PER_ACCESS down to less than value “X” shows a complete lack of understanding of what BLKS_GETS_PER_ACCESS actually represents.
BLKS_GETS_PER_ACCESS is basically the number of blocks required to get a randomly chosen row of interest via an index.
The Oracle Reference Manual describes BLKS_GETS_PER_ACCESS somewhat ambiguously as:
“Expected number of consistent mode block reads per row, assuming that a randomly chosen row is accessed using the index. Used to calculate the number of consistent reads that will occur during an index scan.”
The key point here is that it’s a “randomly” chosen row when accessed via the specific index.
Now in the case of a Unique index, the number of blocks needed to access a random row can easily be determined as simply being the height of the index plus one additional block to access the associated table block. If the index is Unique, there can only be one row per index value which requires precisely one visit to the table.
In the following example, we create a table with 1M rows with two indexes. The index on the ID column is effectively unique as there are 1M distinct values while the index on the CODE column is Non-Unique with only 100 distinct values and so with 10000 occurrences of each indexed value.
SQL> create table bowie (id number, code number, name varchar2(30)); Table created. SQL> insert into bowie select rownum, mod(rownum,100), 'ZIGGY STARDUST' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> create index bowie_id_i on bowie(id); Index created. SQL> create index bowie_code_i on bowie(code); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> select index_name, num_rows, distinct_keys from dba_indexes where table_name='BOWIE'; INDEX_NAME NUM_ROWS DISTINCT_KEYS ------------ ---------- ------------- BOWIE_CODE_I 1000000 100 BOWIE_ID_I 1000000 1000000
If we now collect INDEX_STATS on the effectively Unique index:
SQL> analyze index bowie_id_i validate structure; Index analyzed. SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats; HEIGHT LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS ------ ---------- ------------- ------------ -------------------- 3 1000000 1000000 1 4
We can see that the BLKS_GETS_PER_ACCESS is 4 as expected. 3 due to the index height and additional 1 to access the specific row in the associated table block.
The formula Oracle uses here is basically:
HEIGHT + (LF_ROWS/DISTINCT_KEYS + 1)/2 = HEIGHT + (ROWS_PER_KEY + 1)/2 = 3 + (1 + 1)/2 = 3 + 2/2 = 3 + 1 = 4.
So basically to calculate the BLKS_GETS_PER_ACCESS, we simply take the ROWS_PER_KEY, add 1 to it, then divide the total by 2 and finally add the index height to get the final value. The reason for this exact formula makes more sense when we look at a Non-Unique index.
How do we cost the access to a specific row via a non-unique index when there could be multiple occurrences of the specific index value ?
If there are say 100 occurrences of an indexed value, if we want the first of these within the index, then we need to access the same blocks as per the unique index (index height plus 1). However, if we want the last of these 100 occurrences referenced within the index, then we might need to access index height + the 100 blocks until we reach the last occurrence of the indexed value. If we’re simply interested in an “average” row, then on average we might need to access 1/2 the ROWS_PER_KEY in addition to the index height.
So the formula is now basically HEIGHT + ROWS_PER_KEY/2. But as this is only an average guesstimate to begin with and so as we don’t ruin the perfect value we can derive for Unique Indexes, the formula is adjusted a tad by adding 1 to the ROWS_PER_KEY/2 figure so that the result makes sense and is accurate for Unique indexes as well.
Hence the final formula of HEIGHT + (ROWS_PER_KEY + 1)/2.
If we now look at the index on the CODE column, which has only 100 distinct values (and so 10000 occurrences per distinct value):
SQL> analyze index bowie_code_i validate structure; Index analyzed. SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats; HEIGHT LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS ------ ---------- ------------- ------------ -------------------- 3 1000000 100 10000 5003.5
HEIGHT + (ROWS_PER_KEY + 1)/2 = 3 + (10000 + 1)/2 = 3 + 5000.5 = 5003.5.
We can see how this 5003.5 figure is actually derived.
The actual blocks needed to be accessed when using the index is of course also very highly dependant on the Clustering Factor (CF) of the index but the CF is not calculated as part of Index_stats. The BLKS_GETS_PER_KEY can therefore be viewed as being a guesstimate of the number of blocks required to read a specific indexed value, based on an “average” CF. As the maximum CF is LF_ROWS, an “average” CF can therefore be viewed as simply being 1/2 of LF_ROWS.
In the above example, an “average” CF would therefore be 1000000/2 = 500000.
To access all table blocks for a specific indexed value would therefore basically be:
500000/distinct keys = 500000/100 = 5000.
If we then add the index height = 5000 + 3 = 5003.
5003 is almost identical to 5003.5, the 0.5 difference simply attributed to the additional 1 that’s added in the previous formula.
So BLKS_GETS_PER_ACCESS can effectively be viewed as being the either the number of blocks required to access a specific row “on average” within the table or the number of blocks required to read all occurrences of a specific index value IF the index had an average CF.
Note that both definitions are nothing but wild guesstimates of the blocks that might actually need to be accessed as both are making very broad assumptions in that the CF is indeed “average”, that the accessed row of interest is indeed “somewhere near the middle” of the index range scan, that the data is indeed evenly distributed, etc. etc. The actual blocks that might need to be accessed when using the index could therefore be significantly different if these basic assumptions are not correct.
Now here come a few interesting points.
Firstly, note the formula used only takes into consideration the index height and the average expected accesses to the table. The possible accesses to additional index leaf blocks is not considered at all.
Why ?
Likely because the vast majority of accesses involving an index range scan actually involves the block reads associated with accessing the table, not reading the index itself. As the figure is only a very rough estimate to begin with, it’s somewhat pointless adding a little here or there for the trivial additional leaf blocks that might need to be accessed during a scan . So in order to keep things simple, only the index height is considered in the actual BLKS_GETS_PER_ACCESS costings.
Therefore, an index rebuild is likewise going to have a trivial impact on the derived BLKS_GETS_PER_ACCESS as only the index height is considered in its calculation. In the vast majority of cases, rebuilding an index will have absolutely no impact at all as most index heights are not impacted by an index rebuild. In extremely rare occasions, an index might reduce its height but then the final BLKS_GETS_PER_INDEX is only going to reduced by 1. The absolute maximum amount that an index rebuild can impact the BLKS_GETS_PER_ACCESS is just HEIGHT-1.
In the above example, the BLKS_GETS_PER_ACCESS is 5003.5, substantially greater than 5 or 10 or even 42. However, rebuilding the index:
SQL> alter index bowie_code_i rebuild; Index altered. SQL> analyze index bowie_code_i validate structure; Index analyzed. SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats; HEIGHT LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS ------ ---------- ------------- ------------ -------------------- 3 1000000 100 10000 5003.5
makes precisely no difference at all.
So what do we do, just keep rebuilding this same index week after week after week as it continually meets the ill-considered index rebuild guidelines …
Note also that the actual value of BLKS_GETS_PER_ACCESS could be anything, as it’s primarily based on the number of rows per keys. For a very very large table on an index with relatively few distinct values, this figure could likewise be “very large”, even with the most perfectly compact, defragmented index.
Therefore, having an index rebuild criteria based if some nominal value of BLKS_GETS_PER_ACCESS “is excessive” (be it 5 or 10 or 42 or 5000 or whatever) is simply nonsensical as this value has practically nothing to do with the efficiency of an index but is directly proportional to the average number of rows per index key.
Additionally, suggesting an index rebuild will have a significant impact on BLKS_GETS_PER_ACCESS is likewise nonsensical as the only cost directly attributed to the index itself is the index height, which very rarely changes following an index rebuild. In fact, the BLKS_GETS_PER_ACCESS formula specifically negates the impact of any potential index rebuilds by implicitly excluding index block leafs in its calculations.
In short, basing an index rebuild criteria on the value of BLKS_GETS_PER_ACCESS is totally ludicrous. It’s actually similar to the silly myth that one should rebuild an index if the clustering factor is “too large”, where in actual fact there is no such threshold value and an index rebuild doesn’t impact the subsequent CF anyways.
I hate to think how many indexes have been repeatedly rebuilt unnecessarily based on a rebuild criteria where BLKS_GETS_PER_ACCESS is greater than value “X”, when such an index rebuild has made absolutely no difference to the original rebuild criteria 😦