Separate Indexes From Tables, Some Thoughts Part I (Everything In Its Right Place) April 16, 2008Posted by Richard Foote in Oracle General, Oracle Indexes, Oracle Myths, Oracle Opinion, Tablespace Management.
Although by no means as common as it once was, there’s still some who believe separating indexes in a different tablespace from their parent tables somehow improves performance.
The theory goes that by having indexes in their own tablespace, we can reduce overall contention issues and thereby improve the overall performance of the database.
Here are some thoughts for consideration for those who might be so inclined …
First, let’s just have a little look at the behaviour of a “typical” large scale index range scan, using an index with a height of say 3.
We first begin by accessing the root block of the index. This is a single block read which for many high accessed indexes would typically be cached and not result in a physical I/O. Next we read an intermediate branch block. This is also a single block read and is also likely to be cached if the index is heavily accessed. Regardless, it’s another index related I/O. Next we finally reach and read the first index leaf block containing the start of the index entries of interest. Again, it’s a single block I/O and again it’s index related.
So far we’ve performed 3 “random”, single block I/Os of index related blocks. If the index were in a separate tablespace, all the action would only be on the index tablespace thus far.
We next read our first table block containing the first row referenced by the first index entry of interest. This yet again is a single block I/O that could potentially be any block within the table. If the table were in a separate tablespace from the index, we would still need to perform a physical I/O (assuming the block isn’t already cached) on a “random” block within the table tablespace. If the table were in the same tablespace as the index, we again need to perform a physical I/O on a random table block. Still no difference thus far.
We next (very likely) reference the same index leaf block to determine the second row of interest. Note this block will almost certainly still be cached as it’s just been accessed. Therefore, if the index were in the same or different tablespace to the table, still no difference as there’s no associated physical I/O.
We then read the second table block of interest via a single block I/O. Unless this index has a very good clustering factor, we’re likely to read a totally different table block that could be any other block within the table. It’s extremely unlikely therefore to be the block that is physically contiguous to the block previously read. Only if the index were very well clustered, could it possibly be the same block as previously read or possibly the next logical block in the table.
However, in all these scenarios, having the table in a separate tablespace still makes no difference at this stage. We either need to perform another physical I/O on the table or we perform just a logical I/O. Even in the extremely unlikely case the next block read is physically contiguous to the previous block read, it would still be contiguous whether the index was separate or not and not be impacted by the index read activity thus far. Again, thus far it makes no real difference having the index in a separate tablespace.
We go back to the same index leaf block to determine the next row of interest and then access the next table block, which for a large randomly distributed table is again likely to be another different block. The point being we’re accessing the index and the table in a sequential fashion, reading the index, then reading the table. Reading the index and then reading the table again.
For small index scans, the index leaf block in question is likely to be the same single leaf block as a leaf block can potentially store hundreds of index entries (depending of course on block size, index row size and where within the index leaf block we logically begin to read the index entries of interest). So for small scans, it’s not going to have any real impact having indexes in a separate tablespace as we’re basically reading a few index related blocks followed by the table related blocks.
The table blocks are likely to be different blocks in a randomly distributed, poorly clustered index or possibly (although more rarely) a small sample of blocks in a well clustered index. However, in either scenario, if if we need to access just the one leaf block, it makes no difference whether the index is in a separate tablespace or not, the I/Os and so-called contention are the same regardless.
In some scenarios, Oracle can perform a prefetch step whereby it orders the index entries based on the rowids to first determine which table blocks need to be accessed, thus preventing the same table block having to be re-read several times. However, again, it makes no difference thus far if the index is in a separate tablespace or not as the I/O requirements are the same regardless.
In larger index range scans however, we might need to visit the next logical index leaf block or indeed subsequently many such index leaf blocks. Note each leaf block contains a pointer (referred to as kdxlenxt in a block dump) so Oracle can directly access the next index leaf block. If our index were in a separate tablespace and making the HUGE assumption that there’s thus far been no other activity in the index tablespace, the disk head may not have moved from where it left off after reading the last leaf block. With the indexes and tables coexisting in the same tablespace, we have very likely moved on from this location with any subsequent table related I/O activity.
Maybe now at last, finally we have a benefit in having indexes in their own tablespace …
However, reading the next index leaf block is again a single block read and most importantly is not necessarily “physically” contiguous to the previous leaf block. Remember, index leaf blocks split as part of their natural growth and the new block allocated is simply the next block available in the index freelist. Therefore the next logical index leaf block in an index structure could physically be virtually anywhere within the extents allocated to the index. When we read the next “logical” index leaf block, it does not necessarily mean it’s the next “physical” block within the index segment. It’s likely just another random, single block I/O.
That being the case, again we have no benefit in the index being in a separate tablespace. In both scenarios, we have to go scanning the disk looking for the physical location of the next index leaf block (again assuming the index leaf block isn’t already cached). This activity needs to be performed whether the index is in it’s own tablespace or not.
When we move back to read the next table block based on the first index entry from the newly accessed index leaf block, again, it’s extremely unlikely the next table block accessed will be the next contiguous block from the previously read table block. So again, we very likely need to go a hunting for the next table block on disk, regardless of it being in a separate tablespace from the index. Again, separating indexes from tables makes no real difference.
So not only do we move between index and table in a sequential manner but the actual blocks read within both the index and the table are likely to be totally random, non contiguous, single block reads.
That being the case, what are the performance benefits of storing indexes and tables separately ? How does storing indexes and tables separately actually reduce contention when most physical I/Os in both index and table segments are effectively random, single block reads ?
Now this example has just been a single index scan, performed by one user on just one index and table. The benefits therefore of separating indexes and tables even in a single user environment are somewhat “dubious”.
However, how many environments only have the one user. Not many. Most environments have lots of users, some with many hundreds, some with many thousands of concurrent users . All these users are potentially performing concurrent I/O operations, not only potentially on these very same tables and indexes but on lots of different tables and lots of different indexes within our table and index tablespaces. Even if index leaf blocks were to be physically contiguous in some cases (such as monotonically increasing indexes where this is more likely), by the time we’ve read the index leaf block, processed and read all the associated table blocks referenced by the index leaf block, the chances of there being no subsequent physical activity in the index tablespace due to another user session is virtually nil. We would still need to re-scan the disk to physically access the next index leaf block (or table block) anyways.
Add to the mix the fact many sites now use SANS, NAS, ASM etc. and what might appear to be one contiguous file could actually be physically split and spread all over the place. The whole notion of what is actually physically contiguous and what isn’t is blurred anyways.
The next time someone suggests separating indexes from table improves performance, you may just want to ask a couple of little questions; why and how ?
However, I’ll next discuss how indeed performance can improve by storing indexes in a separate tablespace. But just like our mad scientist thinking flies with no wings go deaf, I’ll explain how the performance improvement is not actually directly related to the indexes being separate from the tables.
I’ll also discuss how database recoveries are not typically helped by having indexes in a separate tablespace as often suggested.