Indexes vs. Full Table Scan: Picture vs. 1000 Words (Pictures Of Lily)

Indexes vs. Full Table Scan: Picture vs. 1000 Words (Pictures Of Lily) June 8, 2012

Posted by Richard Foote in CBO, Clustering Factor, Oracle Indexes.
trackback

I’m in the process of writing a number of new presentations and in one I’ve included a favorite little graph of mine that I’ve used over the years to help illustrate the relationship between the cost of using an index vs. the cost of using a Full Table Scan (FTS). It’s occurred to me that I’ve never actually shared this graph on this blog, so I thought it about time I did.

The Cost Based Optimizer (CBO) when choosing between an index scan and a FTS will simply go for the cheapest option. The more rows that are retrieved (or the greater the percentage of rows retrieved), the more expensive the index option as it needs to perform more logical I/Os. There will generally be a point when the selectivity of a query is such, that so many rows are retrieved, that the index costs will increase beyond those of the FTS and the FTS becomes the cheaper option.

The cost of a FTS meanwhile is pretty well constant regardless of the number of rows retrieved. It needs to read all the blocks in the table, whatever the selectivity of the query.

Although I’ve not quite reached 1000 words, the below graph illustrates this point:

The red line represents the constant cost of the FTS. The green lines represents the cost of using various indexes, which increases as more rows are retrieved. The “steepness” of the green line and the subsequent increase in cost of the index as more rows are retrieved is due entirely to the Clustering Factor of the index. The steeper the line, the worse (higher) the Clustering Factor, the less efficient the index and the quicker we get to the point when the FTS becomes cheaper. The less steep the line, the better (lower) the Clustering Factor, the more efficient the index and the longer it takes for the FTS to become the cheaper option.

In some rarer cases, the index might be so efficient (or the FTS so inefficient) that the index never reaches the point of the FTS and the CBO decides it’s overall cheaper for the index to potentially access 100% of all rows in a table rather than via a FTS.

Ok, so now you have almost 1000 words and the picture 🙂

Comments»

1. Alberto Frosi (@Albertofro) - June 8, 2012: Hi Richard,
great graph, more than 1000 words, so in the future we must prepare to view more FTS in our execution plan than index scan…good news for the future so …:-).
Become more important how we get statistics, for CBO best choice.
Thanks for sharing.
Ciao

LikeLike

Reply
Richard Foote - June 8, 2012: Hi Alberto

Indeed, the future holds more promise for the FTS, however don’t worry there’s plenty of life in the old index yet 🙂

LikeLike

Reply
2. Tim Hall - June 8, 2012: Flippin’ heck mate. I normally have to allocate an evening to reading one of your posts. It’s rare to get such a quick hit. 🙂

Cheers

Tim…

LikeLike

Reply
Richard Foote - June 8, 2012: Hi Tim

LOL !! I’ve decided to give writing short stories a go. Let’s see, there’s this cat in a hat …

LikeLike

Reply
nlitchfield - June 8, 2012: Have you observed it yet? If so you know how the story ends? If not its all to play for,….

Nice blog entry

LikeLike

Reply
3. Uwe Hesse - June 8, 2012: Just to inform you: I needed to use Internet Explorer to actually see the graph, while firefox did not show it. Was close to reply: “What graph?” 🙂 And nice visualization, by the way!

LikeLike

Reply
4. Uwe Hesse - June 8, 2012: Oops – now firefox also shows it – just a bit slow on my machine. Never mind!

LikeLike

Reply
5. Ron - June 8, 2012: Would this be the case on an Exadata machine as well ?

LikeLike

Reply
Richard Foote - June 9, 2012: Hi Ron

Yes, fundamentally the case would be the same on an Exadata machine as well. However, there are a number of important differences.

The first is that due to the efficiences of full scans on Exadata, with storage based processing such as smart scans and the use of storage indexes, the FTS line is lower, in cases significantly lower than on Non-Exadata. Therefore indexes that might previously have been below the line might now be above the line and so no longer the cheaper alternative. Note though that the line doesn’t reach all the way down to a cost of 0 and the number of indexes that remain vaiable will depend heavily on the type of processing of the database.

A second difference is that the FTS line is not necessarily the same for all queries in Exadata and can hover up and down depending on the characteristics of the query, the existance of appropriate storage indexes, etc.

A third difference is that the CBO on the database is not fully aware of the cost implications of the processing that is offloaded to the storage cells and so can make inaccurate assumptions of the height of the FTS line. This can therefore lead to indexes being used inappropriately because the CBO is still assuming their costs are below that of the FTS line.

In short, things become interesting when it comes to managing indexes appropriately in Exadata and that’s why one of the papers I’m currently writing is on this very subject, “Indexing In Exadarta” 🙂

LikeLike

Reply
6. Brian Tkatch - June 9, 2012: Nice short one Richard.

If i type in a thousand words, plus shipping, will you send me a signed copy of the image? 🙂

LikeLike

Reply
Richard Foote - June 9, 2012: Hi Brian

But of course 🙂

LikeLike

Reply
7. rsiz - June 9, 2012: Beautiful. Only quibble I have is regards the use of the word selectivity in that I usually reverse the sense of greater and lesser, meaning that greater selectivity means fewer rows. Taking your whole sentence you’re not ambiguous, but it is the reverse of the usage I’m used to. As in “the column predicate values make index good_i very selective so very few rows will be returned and you likely can access them with less total work via the index.”

Using the slope of the line to represent the approximate effect of clustering factor is brilliant.

Another important case where the line usually does not reach the fts constant is when the selected columns are all satisfied by the index, and that of course bears no relation to the cluster factor as a slope. Anyone have any ideas how to graph that? I guess the the right hand point is lower than the fts by the number of (leaf) blocks fewer in the index than in the table and the slope is linear from 0 to that point.

The picture holds true on exadata, but the rate at which you can do the fts is better. So that affects when the index access path crosses the line of the fts to become more expensive, because the fts line is lower.

Very nice Richard

LikeLike

Reply
Richard Foote - June 9, 2012: Hi Mark

Thanks for the feedback !!

I’ve taken your comments on board regarding the wording of selectivity (I agree with you) and changed it a little to be clearer in what I’m saying.

I actually graph the case of a Fast Full Index Scan as another constant line, somewhat lover than the line that corresponds to the FTS as generally the index is a smaller, skinnier segment than the parent table.

LikeLike

Reply
Jonathan Lewis - June 9, 2012: It gets more complicated with Exadata with compression, when the size of the index can be much larger than the size of the table. But then it can get worse the other way round too, because the clustering_factor of an index can be much smaller than the number of blocks in the table even when the index addresses every row in the table !

You’d better believe it when Richard says the optimizer isn’t “fully aware” of Exadata – though possibly that’s a euphemism for “is completely clueless”.

LikeLike

Reply
saruamit4 - June 15, 2012: Hi Jonathan,

Agreed on the point that ‘optimizer isn’t “fully aware” of Exadata’, then how oracle will judge which plan is optimized one as so many time FTS could be faster using smart scan then what could be the point of judgement for optimizer to generate a new plan ?

LikeLike

Reply
Richard Foote - June 19, 2012: Hi Jonathan

“Is completely clueless” is another way of putting it 🙂

LikeLike

Reply
8. David Alejo Marcos - June 10, 2012: I have spent more time reading comments than the blog itself ;). Nice post.

LikeLike

Reply
Richard Foote - June 19, 2012: Hi David

Some of the best content here can be found within the comments 🙂

LikeLike

Reply
9. Richard Foote - June 19, 2012: Hi Saruamit

That’s one of the challenges 🙂

I’m currently writing a new prsentation that answers this question so it’s not something that can easily be addressed with a comment on a blog.

LikeLike

Reply

	PP on Costing Concatenated Indexes W…
	Zeeshan Moin on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	PP on The End Of Richard Foote…
	Ram on The Fake Index Trap (“No…
	Richard Foote on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Sayan Malakshinov on The End Of Richard Foote…

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Richard Foote's Oracle Blog