jump to navigation

Indexes and Sorts (Chant Of The Ever Circling Skeletal Family) May 26, 2008

Posted by Richard Foote in Oracle General, Oracle Indexes, Oracle Myths.
trackback

In my previous post regarding Reading 100% of Data via an Index, one of the examples I described was the scenario where an index was used to avoid a sort.

Index entries are always logically stored in the same order as the indexed columns (except of course when using a Reverse Key Index). Therefore if an index range scan is used to retrieve data, the data is returned in indexed column order. Sorts are relatively expensive operations, so if data needs to be retrieved in a specific order, the CBO can use the fact indexes return sorted data to its advantage and can potentially avoid performing the sort operation.

However, there’s a common misconception that if an appropriate index exists, Oracle will always use the index to avoid the sort. This is simply not true. Oracle will only use the index to avoid a sort if the costs of doing so is less than other alternatives. It could well be that the cost of performing say a Full Table Scan plus an associated sort might well be less than performing a really expensive and inefficient index range scan without the need for a sort. It depends on the relative costs of each possible option.

This Indexes and Sorts Part I Demo sets up a scenario and shows how an index can be used to select 100% of all rows in a table. However, note that the index in question has an excellent Clustering Factor with the rows stored in the table in basically the same order as that of the index. An index range scan of the whole table in this case is relatively efficient in that Oracle only needs to generally access each specific block the once. Therefore reading the entire table via an index and thus preventing the sort operation has a lesser cost than a potentially more efficient FTS but with the additional sorting overhead.

One of the more common reasons why an index is ignored when it possibly appears it should be used is due to the possibility of NULL values in the result set. As previously discussed, index entries that consist of nothing but NULLs are not indexed. Therefore an attempt to read all values from a table that has the potential to include NULL values may not use an associated index as the index may not reference all the required rows. The query either needs to be modified to exclude NULL values or the column(s) need to have a NOT NULL constraint for an index to be considered in this scenario.

This Indexes and Sorts Part II Demo shows how an index can not be used to eliminate a sort operation as the query result set could potentially return NULL values as the associated column does not have a NOT NULL constraint. By rewriting the query to exclude NULL values, the index is subsequently used by the CBO to retrieve all data, thus eliminating the sort operation.

Another perhaps more common reason why an index is not used to retrieve data in the index order, thus eliminating the need for a sort is that the cost of using the index to retrieve the necessary data is too costly and the savings in not performing the sort don’t outweigh the additional overheads of using the inefficient index. If an index has a very poor Clustering Factor, it can be extremely expensive for the CBO to use the index to retrieve data as most visits to the table requires a different table block to be accessed. In an extreme scenario, it may be necessary to read and re-read the same table block as many times as there are rows in the block and it may require as many distinct table block visits as there are rows in the table to retrieve all necessary rows via an index.

The cost of reading the table and the number of table block visits calculated by the CBO when using an index range scan is basically the selectivity of the query multiplied by the Clustering Factor of the index. Therefore the Clustering Factor is a crucial variable in deciding whether it’s actually worth using the index to prevent the sort or whether it’s actually less costly to use an alternative access path, such as perhaps an efficient FTS and subsequent sort.

The Indexes and Sorts Part III Demo is similar to the previous Part I demo except it uses a table that is ordered differently thus making the associated index result in a dreadful Clustering Factor. Consequently, the same query that previously used the index and no sort suddenly uses a FTS and a subsequent sort to read the entire table. However, even a query that only retrieves 10% of the data still uses a FTS and a subsequent sort. In fact even a query that retrieves just 1% of the data performs a FTS and subsequent sort. In this specific example, it wasn’t until approximately 0.11% of data was retrieved that the CBO decided it was cheaper to use the index and eliminate the sort than to use a FTS and subsequent sort.

The key message here is that yes, an index can be used to retrieve data in the specific order of the index and thus eliminate a sort operation. However, it will only do so if the cost of reading the necessary data via the index is less than alternative access paths, including the feared FTS, plus associated sort. If an index has a poor Clustering Factor, it is less likely to be considered as a method of eliminating a sort operation. 

Comments»

1. dizwell - May 27, 2008

Correct me if I’m wrong, but the second sentence of your second paragraph is claiming that if an index is used to retrieve data, you will see that data presented in sorted order, without you having to have specified an explicit ORDER BY clause.

But that is not guaranteed, is it? Especially in 11g, where prefetching and other optimiser enhancements can mean that, although the index is used to retrieve the data, the data can be fetched back to you in block order, or some other order… but not (necessarily) the sort order.

I’d be interested in knowing if Tom Kyte, for example, would let the idea that you get ordered data without an explicit order by clause pass, given his comments in this thread, for example: http://tinyurl.com/3f8kes. Or Jonathan Lewis. I think they wouldn’t, especially since I was just discussing this with Jonathan a couple of weeks ago.

Like

2. dizwell - May 27, 2008

I meant to add: it is well established that if you specify an order by clause, Oracle **MAY** decide it doesn’t actually need to perform a physical sort, if it’s fetched it via an index. But that’s subtly different, I think, from saying that ‘if you retrieve via an index, your data WILL be returned in sorted order’.

Anyway, I may have misunderstood the point you were making in that second sentence of para. 2. Apologies if so…

Like

3. Richard Foote - May 27, 2008

Hi Howard

It was certainly not my intention to suggest one doesn’t need an ORDER BY clause to guarantee the order of a result set. If you look at all the demo links, you’ll notice each and every SQL statement indeed has an ORDER BY clause. The statement you pointed out was just trying to suggest that if Oracle decided to perform an index range scan to satisfy a sorting requirement, it can retrieve all the necessary data in the order of the index and thus eliminate the need to perform the actual sort operation. It’s not trying to claim “you” but the “CBO” can see the presented data in sorted order via an index range scan. If you look at all the SQL examples in which the CBO decides to use the index, you’ll notice that all the execution plans only consist of the index range scan and do not have an associated SORT ORDER BY operation as the CBO can indeed guarantee the data is retrieved in the required order. If the SQL statements didn’t include the ORDER BY clause, then of course the data could potentially be returned in any order, I totally agree.

The other point I was trying to emphasise was that just because there’s an index Oracle could use to eliminate the need for the actual sort operation, it may not necessarily use the index if it’s too expensive and inefficient to retrieve all the necessary data. If you look at all the examples in the demos where Oracle doesn’t use an index, they all have an associated SORT ORDER BY operation as the data has to be actually sorted, as without the associated index range scan, the data is effectively retrieved in random column order and doesn’t satisfy the ORDER BY clause.

So yes, I totally agree, if you want the data in a specific order, use an ORDER BY clause as I have done in all my examples.

Hope this helps to “sort” it out 🙂

Like

4. Brian Tkatch - May 27, 2008

Thanx for the post, with the examples.

Good reading.

Like

5. Greg McGuire - June 11, 2008

Hi Richard –

I’d be curious to hear your thoughts on this topic as they pertain to concatenated indexes. As near as I can tell Oracle will only use this sort trick for the first column of a concatenated index, and not for subsequent columns. For example, if I have an index on a, b, and c in a table test and I have a query like:

select c
from test
where a = :something
order by b;

Oracle (9.2.0.8 at least) seems to resort the data after scanning the index for a. I would think that it should be able to range across a, and the data it pulls would already be sorted by b, so no need to resort. This does not appear to be true though.

Additionally, Oracle seems to need to do a TABLE ACCESS (BY INDEX ROWID) to get c, even though it’s in the index.

Thanks!

Like

6. Greg McGuire - June 12, 2008

Following up on my own question, the sort question is still open, but the last part about the TABLE ACCESS is solved. I neglected to mention that b in this example is a timestamp, a datatype which oracle indexes as a function based index (sys_extract_utc(b)) behind the scenes. So you need to say

order by sys_extract_utc(b)

and then your TABLE ACCESS goes away.

Like

7. Richard Foote - June 17, 2008

Hi Greg

The sort must be performed in your example. By using the index, you retrieve the data in column A order. But you want the data sorted via column B which is effectively randomly ordered in the A,B,C index as column B is only ordered with respect to column A.

Therefore you are incorrect in suggesting the data it pulls is already sorted by column b, it isn’t, it’s only sorted via column A. Hence the sort operation.

Like


Leave a comment