I’ve already written a few posts regarding concatenated indexes and things to consider (and not consider) when deciding the column order of a concatenated (composite) index.
A comment I see from time to time is that the whole question of column order within an index is now somewhat redundant anyways as Oracle since 9i has introduced the Index Skip Scan access path. Prior to 9i, if the leading column of an index wasn’t specified in a predicate, the index was effectively ignored by the CBO. However, if the leading column isn’t referenced now, Oracle can use the index anyways via an Index Skip Scan access path.
So the column order in a concatenated index doesn’t matter that much now, right ?
Well, not quite ….
An Index Skip Scan can only actually be used and considered by the CBO in very specific scenarios and is often an indicator there’s either a missing index or an exisiting index has the columns in the wrong order.
If the leading column of an index is missing, it basically means the values in subsequently referenced columns in the index can potentially appear anywhere within the index structure as the index entries are sorted primarily on the leading indexed column. So if we have column A with 100,000 distinct values and column B with 100,000 distinct values and an index based on (A,B), all index entries are sorted primarily on column A and within a specific value of column A, sorted by column B. Therefore if we attempt a search on just Column B = 42, these values could potentially appear anywhere within the index structure and so the index can not generally be effectively used.
However, what if the leading column actually contained very few distinct values ? Yes, the subsequent column(s) values could appear anywhere within the index structure BUT if these subsequent columns have relatively high cardinality, once we’ve referenced the required index entries for a specific occurrence of a leading column value, we can ignore all subsequent index row entries with the same leading column value. If the leading column has few distinct values, this means we can potentially “skip” several/many leaf blocks until the leading column value changes again, where we can again “probe” the index looking for the subsequent indexed column values of interest.
So if we have a leading column with few distinct values, we may be able to use the index “relatively” efficiently by probing the index as many times as we have distinct leading column values.
On the other hand, if the leading column has relatively high cardinality then an Index Skip Scan is not a viable option. An index can generally store many hundreds of index entries per index leaf block, depending on the block size and the average size of the index row entries of course. So if the leading column were to change just once on average within the subsequent index leaf block, Oracle would be forced to scan the next index leaf block anyways as it may contain the required index row entry.
For Oracle to be able to “skip” an index leaf block, the leaf block must contain nothing but the same leading column value as last changed in a preceding leaf block. Hopefully, there are many leaf blocks where the leading column value doesn’t change and so hopefully there are many leaf blocks that can potentially be “skipped”.
Therefore, the cardinality of the leading column is crucial for an Index Skip Scan to be viable. In the example above where we had 100,000 distinct values for columns A and B, unless the table is massive, it’s unlikely an Index Skip Scan will be viable regardless of which column is the first column in the index. However, if column B only had 10 distinct values, then an index based on (B,A) may very well be able to use an Index Skip Scan whereas an index on (A,B) would not.
Note though that an Index Skip Scan must probe the index at least as many times as there are distinct values of the leading column. This will not be as efficient as an index that only requires the one index probe. Therefore although a query with a predicate based on A=42 could use an Index Skip Scan with an index on (B,A) assuming column B had few distinct values, an index on (A,B) or (A) would be more efficient as it would only require the one index probe.
However, if the performance of index (B,A) were “good enough” and/or a search on just A=42 was uncommon, then the index on (B,A) may be quite adequate and an index on (A,B) may be unnecessary. The index on (B,A) would also be able to handle queries based on columns A and B and queries based on just column B (providing the CBO determined the selectivity acceptable, which it might for unevenly distributed rare values of column B).
See this Index Skip Scan demo to see when it all may prove useful.
No, an Index Skip Scan doesn’t mean we don’t need to consider the column order of an index. If anything, it’s something else that needs to be considered and along with index compression, is another reason why low cardinality leading index columns have advantages.