jump to navigation

An Index Only Performs How Much Work ??? November 12, 2009

Posted by Richard Foote in Index Rebuild, Oracle Indexes, Oracle Myths.
34 comments

One of the main reasons suggested for performing periodic index rebuilds is to improve “performance”. After rebuilding indexes, applications now run so much faster. Users notice a significant improvement in performance. And so on.
 
There are of course situations when rebuilding specific indexes can indeed improve performance, here’s but one example I’ve discussed previously.
 
However, the question I always ask when someone claims an index rebuild has made things run faster is to simply ask why. Why is such a simple, but powerful question. Why have things improved ? What has specifically changed as a result of rebuilding the index, that Oracle has now reduced the overall work associated with performing the activity, to the point that things run noticeably faster.
 
Knowing why is really important because it confirms that indeed there was an improvement and that it was indeed associated directly with the index rebuild. It means when a similar situation arises again, you know how to directly resolve the problem appropriately and effectively next time. Also knowing why means you can determine the specific root cause, perhaps preventing things from deteriorating so bad in the first place, such that rebuilding the index becomes unnecessary. Prevention being the best cure …
 
Now the most common answer I get for why rebuilding an index has been so beneficial is because the index is now so much smaller after the rebuild that the overheads of reading the index have substantially reduced. If the index is smaller, it means one can read the same amount of index related data with less I/Os. Less I/Os means better performance.
 
For example, if you can reduce the index by half, you can fit the index into only half the number of leaf blocks and that’s a 50% performance improvement right there.
 
Well, firstly it assumes that the index has indeed reduced by half. It would actually be a relatively unusual index for it to be so poorly fragmented or for it to have so much “wasted” space that it could be rebuilt into only half the number of leaf blocks.
 
Possible but somewhat unusual.
 
However, it also assumes that by having a 50% performance improvement, reading the index blocks constitutes the vast majority of the work. Again possible in some scenarios.
 
With most index related activities though, reading the index blocks actually constitutes only a small percentage of the overall work. In most cases, the index only contributes a very small percentage of the overall I/O activity. Therefore by potentially only reducing a small percentage of the overall work by rebuilding just the index, the overall impact is generally going to be minimal.
 
I thought I might perform some basic mathematics to illustrate and put into perspective just what little impact index rebuilding can have in improving performance, even if by doing so the index dramatically reduces in size, because the index itself actually constitutes only a small percentage of the overall costs.
 
Let say we have one of these more unusual indexes that is so poorly fragmented that it has approximately 50% wasted space throughout the entire index structure. Let’s say rebuilding such an index reduces the overall size of the index by half.
 

Before the index rebuild, an index has 50% of wasted space and say:
 
Height of 3
 
1 Root Block
 
50 Intermediate Branch Blocks
 
20,000 Leaf blocks
 

After the rebuild, the index has no wasted space and has now:
 
Height of 3
 
1 Root Block
 
25 Intermediate Branch Blocks
 
10,000 Leaf Blocks
 

Let’s assume the table contains 2M rows and that they fit in 100,000 blocks (i.e. the index is about 1/10 that of the table and the average row size is such that we fit say 20 rows on average per block). Let’s also assume there’s nothing special about this index and that it has an “average” clustering factor of 1M, before and after the rebuild ;) 1M being somewhere in the middle of possible clustering factor values.

The first thing to note is that the height remains the same after such a rebuild, even though the index is only now half the size. It would be extremely unlikely and the index would have to be particularly small and within a very narrow range of sizes for all the contents of all the intermediate branch blocks to fit within just the one root block. The only way for the index height to reduce down from 3 would be for the contents of all intermediate branches to fit within the root block. Possible, but again quite unusual. 

OK, let’s look at the cost of various scans before and after the rebuild, using the index costing formula I’ve discussed recently.
 
If we’re after just one row (a unique index perhaps), then to read the one row before the rebuild would be:
 
1 I/O for the root block + 1 I/O for a branch block + 1 I/O for a leaf block + 1 I/O for the table block = 4 LIOs.
 
After the rebuild, the total cost would be:
 
1 I/O for the root block + 1 I/O for a branch block + 1 I/O for a leaf block + 1 I/O for the table block = 4 LIOs.
 
In effect, no change. Well, we’re not likely to have too much of a performance improvement there.

 
 
Let’s increase the size of the range scan. What if we retrieve 100 rows (or approx. 0.005% of data):
 
Before the rebuild it would be
 
1 I/O for the root block +
1 I/O for a branch block +
1 for all the leaf blocks (0.00005 x 20000) +
50 for all the table blocks (0.00005 x 1M)
= 53.
 
After the rebuild it would be:
 
1 I/O for the root block +
1 I/O for a branch block +
1 for all the leaf blocks (0.00005 x 10000) +
50 for all the table blocks (0.00005 x 1M)
= 53.
 
Again, no difference …

 
 
OK, let’s increase the number of rows accessed substantially to 10,000 (or approx. 0.5% of the data).
 
Before the rebuild it would be:
 
1 I/O for the root block +
1 I/O for a branch block +
100 for all the leaf blocks (0.005 x 20000) +
5000 for all the table blocks (0.005 x 1M)
= 5102.
 

After the rebuild it would be:
 
1 I/O for the root block +
1 I/O for a branch block +
50 for all the leaf blocks (0.005 x 10000) +
5000 for all the table blocks (0.005 x 1M)
= 5052.
 
Or in percentage terms, a reduction of I/Os of approximately 1%. That’s just 1 tiny little percent …
 
So even an index that accesses 10,000 rows, a reasonable number and at 0.5% a reasonable percentage of the overall table, even an index that has reduced in size by half, a substantial reduction in size, only reduces the overall number of I/Os by an unimpressive 1% for such a query in the above scneario.
 
Would reducing I/Os on such a query by 1% really improve performance “substantially” ? Will users really notice much of a difference ?
 
It’s of course all in the numbers and in how much work the actual index is performing, in how many I/Os are actually performed by the index itself and in how much of a reduction in the overall I/Os an index rebuild will actually contribute. I’ll leave it to you to plug in different ranges of selectivity to see what impact rebuilding such an index with the above characteristics might have.
 
The point is that for the vast majority of index related queries, most of the work is performed in getting the data out of the table, not the index.
 
Therefore, reducing the size of an index, even by possibly a substantial amount, may not necessarily reduce the overall I/Os associated with a query if the index only performs a tiny fraction of all the work. You could eliminate the index entirely and providing Oracle could still magically retrieve just the 0.5% of rows of interest in the above example, performance for such a query would unlikely improve “significantly”.
 
So not only must an index reduce in size but the activities associated with directly reading the index must constitute a significant proportion of the overall work (eg. fast full index scan, index only scans, etc.) or something else must have changed for performance to have improved “significantly”.

Follow

Get every new post delivered to your Inbox.

Join 1,852 other followers