jump to navigation

The CBO and Indexes: An Introduction (Absolute Beginners) June 9, 2009

Posted by Richard Foote in CBO, Oracle Indexes.
trackback

One of the more common questions I get asked and one of the most common questions asked in the various Oracle related forums is the general question of why doesn’t the CBO choose to use a particular index. There are various reasons why an index might be ignored but generally and rather simplistically (for now), the answer is often simply because the CBO considers the cost of using the index to be more expensive than other alternatives.

As one would expect with the CBO, the cost is a rather important consideration and yet many don’t understand what the costs associated with an execution plan actually means and represents. Some people think the costs associated with an execution plan are basically meaningless and are only used by the CBO in “mysterious ways”. Some people even go so far as to suggest that the CBO sucks and recommend all sorts of inappropriate ways to force Oracle to use an index.

It all comes back to simply not understanding CBO fundamentals and how Oracle costs the basic use of an index.

In reality of course, the CBO doesn’t “suck” with 10g, in fact it’s actually extremely clever and resourceful when it comes to determining appropriate, efficient execution plans. If the CBO is armed with “accurate enough” statistics, it will generally do a remarkably good job. The costs and the associated cardinality of the various steps within an execution plan provide valuable information on why the CBO has made its decisions.

So I thought I might discuss the CBO a little and start with a very basic introduction into how the CBO costs an index to get this message across.

The basic formula for costing an index based range scan is:

basic index range scan cost = index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)

Note there are various slight variations and interpretations of this formula from database version to database version and for differing specific scenarios, which can vary the resultant costs marginally. All the following examples are from a windows based 10.2.0.3 database.

So from an index perspective, the index blevel, the number of leaf blocks and the clustering factor are all statistics that directly influence the cost of using the index. However, just as important are the associated selectivities of accessing both the index and the table. These statistics are often based on the associated column statistics (but not always) and if Oracle gets these selectivities wrong, then the CBO will generate wrong costings and all bets are off regarding the appropriateness of the resultant execution plan.

OK, let’s create a nice, simple little scenario which will be the basis of future demos.

SQL> CREATE TABLE bowie_stuff AS SELECT (mod(rownum,100)+1)*10 id, 'Ziggy Stardust' name FROM dual CONNECT BY LEVEL <= 100000;

Table created.

SQL> CREATE INDEX bowie_stuff_i ON bowie_stuff(id);

Index created.

We’ll use the BOWIE_STUFF table in future demos, but for now I’m after a relatively large index with an excellent clustering factor. So I’ll create a second table, ordered on the ID column and double it’s size …

SQL> CREATE TABLE bowie_stuff2 AS SELECT * FROM bowie_stuff ORDER BY id;

Table created.

SQL> INSERT INTO bowie_stuff2 SELECT * FROM bowie_stuff2;

100000 rows created.

SQL> commit;

Commit complete.

SQL> CREATE INDEX bowie_stuff2_i ON bowie_stuff2(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE_STUFF2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we now look at some of the key statistics, we note the following:

SQL> SELECT table_name, blocks, num_rows FROM user_tables WHERE table_name = 'BOWIE_STUFF2'; 
TABLE_NAME   BLOCKS NUM_ROWS
------------ ------ --------
BOWIE_STUFF2    659   200000

The table has 200,000 rows and consists of 659 data blocks.

SQL> SELECT column_name, num_distinct, density, num_nulls FROM user_tab_col_statistics WHERE table_name = 'BOWIE_STUFF2' and column_name = 'ID';
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS
----------- ------------ ------- ---------
ID                   100     .01         0

The indexed ID column has 100 distinct values, which are perfectly evenly distributed.  The density of the ID is an “accurate” 0.01 with a selectivity of 1%, so a selection of one distinct value will return 200,000 x 0.01 = 2,000 rows. Nice easy numbers …

SQL> SELECT index_name, blevel, leaf_blocks, num_rows, distinct_keys, clustering_factor FROM user_indexes WHERE index_name = 'BOWIE_STUFF2_I';

INDEX_NAME      BLEVEL LEAF_BLOCKS
-------------- ------- -----------
BOWIE_STUFF2_I       2         605  

NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR
-------- ------------- -----------------  
200000           100               854

The index has a blevel of 2, it has 605 leaf blocks and a rather good clustering factor of just 854 (note the rows in the table were basically inserted in ID order).

OK, we now have all the information we need.

Note to start with, I’ll only use the “older” IO costing model by setting the following parameter:

SQL> alter session set "_optimizer_cost_model" = io;

Session altered.

As we’ll see in future posts, the use of the CPU costing model actually has minimal effect on many index related access costings but we’ll use the IO costing model as a starting point.

This first equality SQL demo shows the costings in relation to a simple, single value equality predicate (I’ve linked to a PDF to preserve formatting).

OK, 2 important pieces of information to note. Firstly, the CBO has got the expected cardinality (2000 rows) spot on.

As the stats are 100% accurate and the values are perfectly evenly distributed, this is to be expected. If Oracle gets the cardinality correct, we can be reasonably confident that the CBO to have made a reasonable decision here.

As previously mentioned, there are 100 distinct values of which we want one of those those values. Oracle has calculated the selectivity of the query to be 1/100 = 1% or a value of 0.01 (as defined in the density column statistic). The expected number of rows is therefore 200,000 x 0.01 = 2000. Like I said, spot on.

Secondly, the CBO has calculated the cost of the index range scan to be 9 and the overall cost of the query to be 18. How has the CBO derived these costings ? Well as it’s a simple, single column index and associated query predicate, we can plug in the 0.01 selectivity into the above formula as follows:

index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)

2 + ceil(0.01 x 605) + ceil(0.01 x 854) = 2 + 7 + 9 =    9 + 9 = 18

So we can see that the costs associated with accessing just the index:

blevel + ceil(index selectivity x leaf blocks) = 2 + ceil(0.01 x 605) = 2 + 7 = 9

indeed comes to a total of 9

and the overall cost of the execution plan indeed does come to 18.

The CBO costings all do actually make sense and add up as expected.

In this second IN list demo, we’re just going to expand on things a touch by selecting 3 distinct values in a IN list. However, the actual principles and cost calculations are basically the same.

This time we’re after 3 possible values as listed in the IN clause. So that’s 3 out of the 100 possible values, 3/100 = 3% or 0.03. The required cardinality is therefore 200,000 rows x 0.03 = 6000 which again Oracle has got spot on. Usually a good sign that the CBO has made a reasonable decision.

This time the costs have come to 23 for the index range scan part of the execution plan and 49 overall. Again using our basic formula:

index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)

2 + 3 x ceil(0.01 x 605) + ceil(0.03 x 854) = 2 + 3×7 + 26 = 2 + 21 + 26 = 23 + 26 = 49.

So the cost associated with the index is indeed 23 and the overall cost of the query is indeed 49. Again, all these CBO costings make sense. These costings aren’t meaningless, internal numbers but values that give us a useful insight into how and why the CBO has come to a specific cost and so to a specific execution plan.

Note BTW, one of those subtle differences in how the formula is implemented with 10g (from say 9i) is the fact the selectivity is calculated for the index by performing the CEIL function for one value first and then multiplied by the overall number of expected values in the IN list:

2 + 3 x  ceil(0.01 x 605) = 23

rather than determining and using the total index selectivity within the CEIL function:

2 +  ceil(0.03 x 605) = 21

The net result being such costs are going to be just that little bit higher in 10g than they would have been in 9i.

So indeed, these things can all change a little from release to release but the basic principle is still the same. Feed the CBO acurate enough statistics and it’ll likely do the right thing. If it doesn’t, understanding the CBO and how it costs various execution plans will help to determine what the issues might be. Huge clues are provided by the costings and cardinality estimates in the associated query execution plans. 

I’ll expand on all this is future posts but for more information on this subject, I highly recommend the writings of Wolfgang Breitling (and his excellent paper “A Look Under The Hood Of The CBO: The 10053 Event” in particular) and the writings of Jonathan Lewis (and his excellent book “Cost-Based Oracle Fundamentals” in particular).

Like I said, this is only meant as an introduction, more to come …

About these ads

Comments»

1. Brian Tkatch - June 10, 2009

1) Is there a particular reason the ids are multiplied by 10? “SELECT (mod(rownum,100)+1)*10″

2) The article uses upper(). Should that be CEIL()?

3) Typo: “6000which” is missing a space.

4) I bought and tried to read CBO Fundamentals. It obviously explains it well, but is very confusing if the terminology is not already known.

5) I’m excited about this new series. I’ve read parts of these costings, but not everything has clicked yet. I hope this series makes it all clear.

Richard Foote - June 10, 2009

Hi Brian

1) This is a little table I use as a basis for number of different basic demos and it helps to show the effects of bounded and unbounded range scans if the data doesn’t contain all possible integers between 2 values. Depends what I finally discuss here whether or not it comes into play.

2), 3). Yes indeed, fixed thanks.

4). It’s a great book, useful for beginners and more experienced folk alike. I learnt lots of useful bits of information.

5). Hope I don’t disappoint !!

2. mwidlake - June 10, 2009

Hi Richard,

I think it is really good that you are going back and covering the basics again – not that I don’t like the advanced stuff a hell of a lot too, but covering the basics (a) helps people who don’t know it yet (b) reminds those of use who thought we knew it about it but might have fogotten {i’m getting old} (c) looking at the basics you can realise that something you thought you knew for years… you didn’t. I recently did a simple presentation on explain plan and realised I did not really know what a db block get was….I worked it now, it’s on my own blog.

I have one question. I would have expected the “in” example to result in 3 seperate index interogations which fed back to the table scan, as each value in the “in” would need a new starting point in the index to be found and thus 3 walks down the index depth and a scan for each. As such I would have expected to need 3*(B level +ceil(index selectivty*leaf blocks)) + ceil(table selectivty*clustering factor). But your numbers add up and mine would not (though index selectivity*leaf blocks is 6.02, veeery close to 6… OK, maths does not work like that) .
How does Oracle avoid working down the depth of the index 3 times?

Richard Foote - June 10, 2009

Hi Martin

“How does Oracle avoid working down the depth of the index 3 times” ?

That’s a good question which I’ll expand upon later. The blevel is actually a bit of a variable in the formula and how it’s interpreted between releases and between differing queries and between differing index sizes.

The thing to note is that if the index is accessed a lot, there’s a good chance the non-leaf blocks are going to be cached as they only take up a small proportion of the overall index size. The CBO doesn’t like to include what it thinks might be cached blocks as the cost is meant to determine the overall I/Os required for the execution plan.

If an execution plan has just worked down the index structure, the CBO assumes if it does so again within the same execution plan, there’s a very very good chance those blocks will already be cached and so doesn’t need to include the non-leaf blocks again in its costings.

It’s an automatic use of the optimizer_index_caching parameter if you like, but only for non-leaf blocks. This parameter is used to make a similar assumption for the index leaf blocks as well.

Like I said, I’ll expand on all this later.

3. Unhelpful “helpful” people « Martin Widlake’s Yet Another Oracle Blog - June 10, 2009

[…] Foote has gone back to explaining the basics of indexes and the CBO, which if you are new to CBO or indexes, hot-foote {sorry} it over there immediately and check it […]

4. Brian Tkatch - June 10, 2009

Still a few upper()s in there. I see them at the bottom.

Richard Foote - June 10, 2009

Hi Brian

Blimey, they’re everywhere !!

It’s a real mental block with me, the number of times I write upper in SQL when I mean ceil !!

Thanks again, I think I’ve got them all this time :)

5. Blogroll Report 06/06/09 – 12/06/09 « Coskan’s Approach to Oracle - June 13, 2009

[…] Richard Foote – The CBO and Indexes: An Introduction (Absolute Beginners) […]

6. The CBO and Indexes: Introduction Continues … « Richard Foote’s Oracle Blog - June 15, 2009

[…] just going to look at another example now using the same table setup as before, but this time running an SQL query that has 5 distinct values in an IN list predicate […]

7. Vyacheslav Kryuchkov - June 24, 2009

Hello Richard,

I always believed that cost is some meaningless, internal number. Now you are going to break my dreams :)
In my practice I had optimized some queries, so query’s cost was increased but execution time decreased from minutes to seconds.
It means statistics or multiblock_read_count or some else parameters were wrong in that database?

Thank you.

Richard Foote - June 25, 2009

Hi Vyancheslav

Come on, you must have more interesting dreams than that ;)

Remember, the CBO doesn’t deal strickly with response times, it deals with “costs”, either I/O or I/O and CPU related. Lower costs should ideally mean lower resources and so low response times but not always, especially if the fundamental cost calculations are incorrect.

Generally in the scenario you describe, one either has incorrect cost calculations due (usually) to incorrect cardinality estimates due (usually) to imprecise enough statistics or the CBO is making incorrect assumptions regarding the relative costs associated with the various types of I/Os.

Again, understanding how this number is derived can help in determining why some of these things may be occuring with your execution plans. If a higher cost plan is “better”, you want to know why so that the CBO can ideally make the necessary corrections.

8. Richard Foote - July 23, 2009

For those of you who are regular readers, you’ll hopefully know why the “Oracle SQL Optimizer Equations” described here are wrong:

http://www.dba-oracle.com/t_sql_optimizer_costing_equations.htm
:)

9. Rudi Kristanto - September 3, 2009

from user_indexes we know that leaf block = 605 and clustering factor = 854

but further in your calculation, you used 602 for leaf block and once 852 for clustering factor

little, but irritating :)

Richard Foote - September 6, 2009

Hi Rudi

The use of these numbers were of course fully intentional, designed specifically to see if people were paying attention ;)

I’ve now removed the little irritations :)

Thanks !!

10. The CBO CPU Costing Model and Indexes – Another Introduction « Richard Foote’s Oracle Blog - September 16, 2009

[…] Richard Foote in CBO, Index statistics, Oracle Indexes, System Statistics. trackback I’ve previously discussed some basic concepts and formulas regarding how the CBO derives index related costings via the I/O […]

11. Book Review: Oracle Database 11gR2 Performance Tuning Cookbook (Part 1) « Charles Hooper's Oracle Notes - March 4, 2012

[…] when determining the effectiveness of the index for a specific SQL statement (reference reference2 reference3 page […]

12. Log Buffer #150: A Carnival of the Vanities for DBA’s - February 21, 2013

[…] back to basics, Richard Foote explains Oracle’s cost-based optimizer in CBO and Indexes, an Introduction for Absolute Beginners. Speaking of optimizations, Valcora has Another Way To Do Performance Tuning — make sure you […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,915 other followers

%d bloggers like this: