Monterey and ODTUG Kaleidoscope 2009 Review June 30, 2009Posted by Richard Foote in Kaleidoscope, Oracle ACE Director, Richard's Musings, Travel.
I’ve just recently returned from a really enjoyable trip to the USA to attend an Oracle ACE Director briefing at Redwood Shores and the ODTUG Kaleidoscope Conference in Monterey, CA.
During the week, I had the opportunity to finally meet some of the other Oracle ACE Directors such as Tim Hall, Mark Rittman, Lucus Jellema, Lonneke Dikmans, Mike van Alst, Husnu Sensoy, Debra Lilley and Harshad Oak to name but a few. I also had a great time hanging out with fellow Aussies Chris Muir and Marcel Kratochvil, sharing many a beer and bottle of red wine. A highlight was sitting outside one balmy evening, drinking one of Marcel’s expensive reds with Chris and Tim Hall, whist discussing the bizzare topic of body hair !! During the week, it was announced Chris had been awarded the Oracle ACE Director of the Year, a much deserved honour and Marcel got the opportunity to spread his dislike of having Foreign Keys in the database to anyone who would (and would not) listen.
I also got to meet the infamous Stanley as well. Life can be strange at times …
The ACE Director briefing was a very long day. I can’t say too much until tomorrow’s big Oracle Fusion Middleware 11g Global Launch …
How many Oracle ACE Directors do you recognise (and can you spot me) ;)
The ODTUG Kaleidoscope Conference was excellent, a really well run event, held in the lovely surroundings of the Hyatt Hotel complex in beautiful Monterey, California. As usual with these types of events, the quality of the presentations varied considerably. Some of the highlights included an excellent presentation by Bryn Llewellyn on “Doing SQL from PL/SQL: Best and Worst Practices” and some really interesting presentations by Toons Koppelaars on the importance of “Fat Databases: A layered Approach” and on “Semantic Query Optimization”. Lowlights included a truly awful, error riddled presentation by Burleson on so-called Oracle 11g SQL Tuning Secrets. I was warned beforehand, I should have known better and listened …
Also had time to explore Monterey and check out some of its attractions. It’s a really scenic part of the world and it was great to escape the Canberra winter for a couple of weeks. The Monterey Bay Aquarium is a must visit, one of the most impressive I’ve ever visited and I’ve seen quite a few around the world. The Kelp Forest display with its school of Sardines was just superb as were the jelly fish tanks.
Fisherman’s Wharf was worth checking out with it’s old wooden buildings as was Cannery Row, made famous by John Steinbeck. However, probably the highlight for me was the Whale watching cruise where I along with Chris, Marcel and Debra Lilley were all lucky enough to see two Humpback Whales. What an incredible experience, although I almost (but not quite) nearly saw my breakfast again in the choppy conditions. The smell of the Humpback whale’s plume (something akin to a mixture of rotting broccoli and rotting fish) had me gagging again but I managed to just hang on. If you ever get the chance, do it, but pick a mild calm day if possible.
Finally, I also had the very great pleasure in meeting members of the hard working OTN team responsible for administrating the Oracle ACE program (Justin Kestelyn : follow link for photos, Todd Trichler, Vikki Lira and Lillian Buziak). Thanks guys for all your wonderful work and support in organising such a wonderful event for all the Oracle ACEes.
Back to the CBO and indexes in the next few days.
The CBO and Indexes: Introduction Continues … June 15, 2009Posted by Richard Foote in CBO, Oracle Indexes.
OK, I previously briefly covered how the CBO calculates the basic cost of an index range scan. Yes, those cardinality/rows values in the execution plans are vitally important as they highlight whether or not the CBO has used the appropriate selectivity values in the index costing formula. And yes, the associated cost values are meaningful and potentially useful as they determine the actual costs associated with the execution plan in terms of the expected number of I/Os the CBO estimates will be required (when using the IO costing model and often the CPU costing model as well).
I’m just going to look at another example now using the same table setup as before, but this time running an SQL query that has 5 distinct values in an IN list predicate on our demo table (again, follow the link to see the query and formatted execution plan).
The first thing we notice in this example, is that Oracle has decided to use a FTS rather than use the index on the ID column. Considering we’re only after 5 values out of the possible 100 values, some may not see this as expected behaviour, especially considering the index has such a good Clustering Factor. Basically Oracle is deciding to access each and every block below the HWM of the table, retrieving all 100% of the rows in the table, only to ultimately discard 95% of them.
It certainly appears at first glance to be a more “costly” option than using the index to directly access just the 5% of rows we’re interested in …
The first thing to check is the estimated cardinality figures, to see if the CBO has miscalculated the expected number of rows it needs to retrieve. However, as the statistics have just been fully computed and that the ID column has perfectly even distributed values, we notice the cardinality figures are again spot on. The query returns 10,000 rows and indeed the rows estimate in the execution plan is exactly10,000 rows. The calculation is simply 0.01 (density of column) x 200,000 (rows) x 5 (values in select list) = 10,000.
Let’s now calculate the cost of using the index using our index costing formula, using the CEIL function this time ;)
index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)
2 + 5 x ceil(0.01 x 602) + ceil(0.05 x 854) = 2 + 5 x 7 + 43 = 37 + 43 = 80.
So the cost of using an index range scan to retrieve 5% of the rows in our example comes to a total of 80.
If we look at the cost of the FTS in our explain plan in the above link, we notice the cost is just 65. 65 is less than 80, so the FTS wins.
So how did the CBO come to a cost of just 65 when it has to read all 659 blocks in the table ?
Well, the first missing piece of information is the value of the db_file_multiblock_read_count parameter because this governs how many blocks Oracle will attempt to read within a single logical multiblock I/O call. Remember, when performing a FTS, Oracle knows it has to read all the table related blocks below the HWM and so rather than reading one tiny little block at a time, it does so more efficiently by reading multiple blocks at a time. This is the fundamental advantage of the FTS over the index range scan which can only ever access the one block at a time.
SQL> show parameter db_file_multi
NAME TYPE VALUE ----------------------------- ------- ----- db_file_multiblock_read_count integer 16
So the db_file_multiblock_read_count is 16.
The next thing to note is that it’s very unlikely that Oracle will actually read the full 16 blocks at a time as there are a number of factors that prevents this from occurring. Extent boundaries is one classic example (a multiblock read can not span across extent boundaries) but the more common issue is a block within the table already being stored in the buffer cache. Rather than storing the same block at the same consistent point twice in memory, Oracle breaks up the multiblock read and only reads up to the block that is already cached in the buffer cache. Therefore, for Oracle to actually read the entire table using the full 16 block multiblock I/Os, it would mean there are no cached blocks from the table currently in the buffer cache, an unlikely event.
Therefore, Oracle doesn’t use the full 16 value when determining the number of expected multiblock I/Os, but a modified “fudge” value which equates to approximately 10.4. for a MBRC of 16. Again, Jonathan Lewis in his excellent “Cost-Based Oracle Fundamentals” book discusses all this is some detail.
Remember also that Oracle needs to access the segment header as part of a FTS as I explained is some detail in my “Indexes and Small Table” series. So that’s an additional single block I/O on top of the multiblock I/Os.
Therefore the cost of performing a FTS is:
segment header I/O + ceil(table blocks/fudged mbrc value) = 1 + ceil(659/10.4) = 1 + 64 = 65.
The 65 cost for the FTS does make sense when one understands a little how this value is derived by the CBO …
As the FTS can read big chunks of the table at a time whereas the index range scan can only read each necessary block one at a time, the FTS can indeed read the table and retrieve the required 5% of data in fewer LIOs and so has the lesser associated cost than the index.
Now there are a few issues with all of this. Firstly, is the db_file_multiblock_read_count actually a valid and correct setting as this directly impacts not only the actual size of the multiblock read operations but critically, the associated costs relating to FTS operations (and indeed Fast Full Index Scans as well) ?
Also, is it really correct and valid to assume the cost of a multiblock I/O to be the same and equal to the cost of a single block I/O ? Surely, the process of performing a single block I/O is likely to be “cheaper” than that of a multiblock I/O and yet the CBO treats both types of I/Os as having the same fundamental “cost”.
Also the CPU overheads of having to access each and every row in each and every block is likely going to be more significant than the CPU required to access just specific data from specific blocks when using an index.
Perhaps, the more “expensive” index range scan might actually be a better alternative than the FTS if these factors were taken into consideration ?
Now this may indeed be true, if these factors were correctly taken into consideration. However, this may also indeed be quite false and the FTS may really truly be the better and more efficient alternative and attempts to force the use of the index may be inappropriate and ultimately more expensive.
I’ll next discuss some really bad (although still very common) methods of making the CBO favour indexes, using generally inappropriate so-called “Silver Bullets” …
For all the Radiohead fans out there, some exciting news.
They’ve just announced that their “In Rainbows” Disk 2, which was previously only available with the limited edition “In Rainbows” Box-Set, is now available for digital download from their w.a.s.t.e site.
I wrote a piece on the Radiohead “In Rainbows” Box-Set a while ago when it was first released, where I briefly described each track. It’s a fantastic collection of songs, with “Last Flowers To The Hospital” which dates all the way back to “OK Computer” being worth the price of admission alone.
If you’re a Radiohead fan, or simply a lover of great rock music, I highly recommend you check it out.
Dunning-Kruger Effect (The Idiot) June 11, 2009Posted by Richard Foote in Richard's Musings.
In an interesting blog piece by Martin Widlake regarding his annoyance at just how unhelpful some people can unfortunately be in forums, Jonathan Lewis has a really interesting comment regarding the Dunning-Kruger Effect.
I’ve never heard of it before, but it’s really all quite fascinating. In short, the Dunning-Kruger Effect is when someone keeps making mistakes or coming to wrong conclusions but their incompetence robs them of the ability to realise it. Some of the hypotheses of this effect include:
- Incompetent people tend to overestimate their own level of skill
- Incompetent people tend to fail to recognise skills in others
- Incompetent people tend to fail to recognise the extent of their own inadequacies
David Dunning and Justin Kruger who came up with this theory won an Ig Nobel Prize for their report “Unskilled and Unaware Of It: How Difficulties In Recognising Ones Own Incompetence Lead To Inflated Self-Assessments“. The Ig Nobel Prize is given each year for 10 achievements that first make people laugh and then makes them think.
Boy, did I had a chuckle when some people instantly sprang to mind as I read about it all ;)
One of the more common questions I get asked and one of the most common questions asked in the various Oracle related forums is the general question of why doesn’t the CBO choose to use a particular index. There are various reasons why an index might be ignored but generally and rather simplistically (for now), the answer is often simply because the CBO considers the cost of using the index to be more expensive than other alternatives.
As one would expect with the CBO, the cost is a rather important consideration and yet many don’t understand what the costs associated with an execution plan actually means and represents. Some people think the costs associated with an execution plan are basically meaningless and are only used by the CBO in “mysterious ways”. Some people even go so far as to suggest that the CBO sucks and recommend all sorts of inappropriate ways to force Oracle to use an index.
It all comes back to simply not understanding CBO fundamentals and how Oracle costs the basic use of an index.
In reality of course, the CBO doesn’t “suck” with 10g, in fact it’s actually extremely clever and resourceful when it comes to determining appropriate, efficient execution plans. If the CBO is armed with “accurate enough” statistics, it will generally do a remarkably good job. The costs and the associated cardinality of the various steps within an execution plan provide valuable information on why the CBO has made its decisions.
So I thought I might discuss the CBO a little and start with a very basic introduction into how the CBO costs an index to get this message across.
The basic formula for costing an index based range scan is:
basic index range scan cost = index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)
Note there are various slight variations and interpretations of this formula from database version to database version and for differing specific scenarios, which can vary the resultant costs marginally. All the following examples are from a windows based 10.2.0.3 database.
So from an index perspective, the index blevel, the number of leaf blocks and the clustering factor are all statistics that directly influence the cost of using the index. However, just as important are the associated selectivities of accessing both the index and the table. These statistics are often based on the associated column statistics (but not always) and if Oracle gets these selectivities wrong, then the CBO will generate wrong costings and all bets are off regarding the appropriateness of the resultant execution plan.
OK, let’s create a nice, simple little scenario which will be the basis of future demos.
SQL> CREATE TABLE bowie_stuff AS SELECT (mod(rownum,100)+1)*10 id, 'Ziggy Stardust' name FROM dual CONNECT BY LEVEL <= 100000; Table created. SQL> CREATE INDEX bowie_stuff_i ON bowie_stuff(id); Index created.
We’ll use the BOWIE_STUFF table in future demos, but for now I’m after a relatively large index with an excellent clustering factor. So I’ll create a second table, ordered on the ID column and double it’s size …
SQL> CREATE TABLE bowie_stuff2 AS SELECT * FROM bowie_stuff ORDER BY id; Table created. SQL> INSERT INTO bowie_stuff2 SELECT * FROM bowie_stuff2; 100000 rows created. SQL> commit; Commit complete. SQL> CREATE INDEX bowie_stuff2_i ON bowie_stuff2(id); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE_STUFF2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we now look at some of the key statistics, we note the following:
SQL> SELECT table_name, blocks, num_rows FROM user_tables WHERE table_name = 'BOWIE_STUFF2'; TABLE_NAME BLOCKS NUM_ROWS ------------ ------ -------- BOWIE_STUFF2 659 200000
The table has 200,000 rows and consists of 659 data blocks.
SQL> SELECT column_name, num_distinct, density, num_nulls FROM user_tab_col_statistics WHERE table_name = 'BOWIE_STUFF2' and column_name = 'ID'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS ----------- ------------ ------- --------- ID 100 .01 0
The indexed ID column has 100 distinct values, which are perfectly evenly distributed. The density of the ID is an “accurate” 0.01 with a selectivity of 1%, so a selection of one distinct value will return 200,000 x 0.01 = 2,000 rows. Nice easy numbers …
SQL> SELECT index_name, blevel, leaf_blocks, num_rows, distinct_keys, clustering_factor FROM user_indexes WHERE index_name = 'BOWIE_STUFF2_I'; INDEX_NAME BLEVEL LEAF_BLOCKS -------------- ------- ----------- BOWIE_STUFF2_I 2 605 NUM_ROWS DISTINCT_KEYS CLUSTERING_FACTOR -------- ------------- ----------------- 200000 100 854
The index has a blevel of 2, it has 605 leaf blocks and a rather good clustering factor of just 854 (note the rows in the table were basically inserted in ID order).
OK, we now have all the information we need.
Note to start with, I’ll only use the “older” IO costing model by setting the following parameter:
SQL> alter session set "_optimizer_cost_model" = io; Session altered.
As we’ll see in future posts, the use of the CPU costing model actually has minimal effect on many index related access costings but we’ll use the IO costing model as a starting point.
This first equality SQL demo shows the costings in relation to a simple, single value equality predicate (I’ve linked to a PDF to preserve formatting).
OK, 2 important pieces of information to note. Firstly, the CBO has got the expected cardinality (2000 rows) spot on.
As the stats are 100% accurate and the values are perfectly evenly distributed, this is to be expected. If Oracle gets the cardinality correct, we can be reasonably confident that the CBO to have made a reasonable decision here.
As previously mentioned, there are 100 distinct values of which we want one of those those values. Oracle has calculated the selectivity of the query to be 1/100 = 1% or a value of 0.01 (as defined in the density column statistic). The expected number of rows is therefore 200,000 x 0.01 = 2000. Like I said, spot on.
Secondly, the CBO has calculated the cost of the index range scan to be 9 and the overall cost of the query to be 18. How has the CBO derived these costings ? Well as it’s a simple, single column index and associated query predicate, we can plug in the 0.01 selectivity into the above formula as follows:
index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)
2 + ceil(0.01 x 605) + ceil(0.01 x 854) = 2 + 7 + 9 = 9 + 9 = 18
So we can see that the costs associated with accessing just the index:
blevel + ceil(index selectivity x leaf blocks) = 2 + ceil(0.01 x 605) = 2 + 7 = 9
indeed comes to a total of 9
and the overall cost of the execution plan indeed does come to 18.
The CBO costings all do actually make sense and add up as expected.
In this second IN list demo, we’re just going to expand on things a touch by selecting 3 distinct values in a IN list. However, the actual principles and cost calculations are basically the same.
This time we’re after 3 possible values as listed in the IN clause. So that’s 3 out of the 100 possible values, 3/100 = 3% or 0.03. The required cardinality is therefore 200,000 rows x 0.03 = 6000 which again Oracle has got spot on. Usually a good sign that the CBO has made a reasonable decision.
This time the costs have come to 23 for the index range scan part of the execution plan and 49 overall. Again using our basic formula:
index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)
2 + 3 x ceil(0.01 x 605) + ceil(0.03 x 854) = 2 + 3×7 + 26 = 2 + 21 + 26 = 23 + 26 = 49.
So the cost associated with the index is indeed 23 and the overall cost of the query is indeed 49. Again, all these CBO costings make sense. These costings aren’t meaningless, internal numbers but values that give us a useful insight into how and why the CBO has come to a specific cost and so to a specific execution plan.
Note BTW, one of those subtle differences in how the formula is implemented with 10g (from say 9i) is the fact the selectivity is calculated for the index by performing the CEIL function for one value first and then multiplied by the overall number of expected values in the IN list:
2 + 3 x ceil(0.01 x 605) = 23
rather than determining and using the total index selectivity within the CEIL function:
2 + ceil(0.03 x 605) = 21
The net result being such costs are going to be just that little bit higher in 10g than they would have been in 9i.
So indeed, these things can all change a little from release to release but the basic principle is still the same. Feed the CBO acurate enough statistics and it’ll likely do the right thing. If it doesn’t, understanding the CBO and how it costs various execution plans will help to determine what the issues might be. Huge clues are provided by the costings and cardinality estimates in the associated query execution plans.
I’ll expand on all this is future posts but for more information on this subject, I highly recommend the writings of Wolfgang Breitling (and his excellent paper “A Look Under The Hood Of The CBO: The 10053 Event” in particular) and the writings of Jonathan Lewis (and his excellent book “Cost-Based Oracle Fundamentals” in particular).
Like I said, this is only meant as an introduction, more to come …