How Does An Execution Plan Suddenly Change When The Statistics (And Everything Else) Remains The Same ? (In Limbo)

How Does An Execution Plan Suddenly Change When The Statistics (And Everything Else) Remains The Same ? (In Limbo) February 16, 2010

Posted by Richard Foote in CBO, Index Access Path, Oracle Cost Based Optimizer, Oracle Indexes, Oracle Myths.
trackback

I’ve slipped this post in as there have been a number of discussions recently on how execution plans have changed while nothing else appears to have changed in the database. How can an execution plan suddenly change when no one has made any changes to the database ?

By no changes, it means that there have been no alterations to any segments, no new indexes have been added, no changes associated bind peeking (indeed, there may not even be any bind variables), no parameters changes, no new patches or upgrades, no new outlines or profiles, no new system stats and perhaps most prevalent of all, no changes to any CBO statistics.

The DBA hasn’t touched a thing and yet suddenly, for no apparent reason, execution plans suddenly change and (say) an inappropriate index is suddenly used and causes performance degradation.

How can this be possible ?

There are two key points I want to emphasise.

The first is that there’s a common misperception that if no new statistics are gathered (and assuming nothing else is altered in the database), that execution plans must always remain the same. That by not collecting statistics, one somehow can ensure and guarantee the database will simply perform in the same manner and generate the same execution plans.

This is fundamentally not true. In fact, quite the opposite can be true. One might need to collect fresh statistics to make sure vital execution plans don’t change. It’s the act of not refreshing statistics that can cause execution plans to suddenly change.

The second point is that when one goes through all the things that might have changed in the database, two important aspects are often overlooked.

The first thing that does usually change within most databases is the actual data within the database. Those damn users log on and keep adding new data and modifying data all the time. It might not be a database change as such but the fact the data changes within a database is a critical change that can directly influence CBO behaviour. When pointing the finger at what might have caused an execution plan to change, many simply ignore the fact the data is constantly changing in the background.

The other aspect that always changes is time. People have tried but it’s very difficult to stop time. When things worked well, it was at a different point in time than now when things have suddenly gone wrong.

So some things do change that are not in direct control of the DBA.

But if we don’t collect fresh statistics, even though the data might have changed, won’t those data changes be effectively invisible to the CBO? Won’t the statistics not reflect any possible data changes and if the CBO doesn’t think the data has changed, doesn’t that mean it can’t suddenly change how it determines an execution plan ?

Not true. It’s quite possible that because the statistics haven’t changed, the CBO is forced into makings changes in how it costs and determines an execution plan.

A very simple example follows, a classic case of why not refreshing statistics has caused the CBO to suddenly change an execution plan for no apparent reason.

I’ll begin by creating a simple little table and populate it with approximately 5 years worth of data.

 
SQL> create table muse (id number, muse_date date, name varchar2(10));
 
Table created.
 
SQL> declare
  2  v_count  number;
  3  begin
  4  v_count:=0;
  5  for i in 1..1830 loop
  6     for j in 1..1000 loop
  7     v_count:= v_count+1;
  8     insert into muse values (v_count, sysdate-i, 'MUSE');
  9     end loop;
 10  end loop;
 11  commit;
 12  end;
 13  /
 
PL/SQL procedure successfully completed.
 
SQL> create index muse_i on muse(muse_date);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'MUSE', casca
de=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

So the procedure basically populates the table, setting the MUSE_DATE column with approximately 5 years worth of data, with 1000 rows for each day so the data is evenly distributed across those 5 years.

Note that I’ve collected statistics on the table and index and they’re fully up to date.

The following query is a typical query in our application, where we’re only interested in looking at the previous year’s worth of data. It simply selects all data that is only a year old. This is a query that’s run all the time and only looks at a “moving window” of data, that being just those rows that were inserted up to a year ago.


 
SQL> select * from muse where muse_date > sysdate - 365;
 
364000 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2738706195
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   363K|  6390K|  1330  (11)| 00:00:07 |
|*  1 |  TABLE ACCESS FULL| MUSE |   363K|  6390K|  1330  (11)| 00:00:07 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("MUSE_DATE">SYSDATE@!-365)
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5992  consistent gets
       5912  physical reads
          0  redo size
    3638996  bytes sent via SQL*Net to client
       1188  bytes received via SQL*Net from client
         74  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     364000  rows processed

Notice how the CBO has decided to use a Full Table Scan (FTS) as a year is quite a chunk of the table and is more effectively accessed in this manner. Notice also how the CBO has got the cardinality spot on and has correctly predicted the number of rows to be returned. If the CBO gets the selectivity and so the cardinality of the query correct, we have some confidence that it has indeed come up with the most efficient execution plan. Indeed, the users are perfectly happy with the performance of the query, the DBAs are happy and because we don’t really want to risk the CBO suddenly changing things, we decide to not collect statistics on this table any more.

However, more data is pumped into the table each and every day by the end-users.

The following procedure will add another years worth of data into the table to simulate how the table will be populated in a year’s time …

SQL> declare
  2  v_count  number;
  3  begin
  4  v_count:=1830000;
  5  for i in 1..365 loop
  6     for j in 1..1000 loop
  7     v_count:= v_count+1;
  8     insert into muse values (v_count, sysdate+i, 'MUSE');
  9     end loop;
 10  end loop;
 11  commit;
 12  end;
 13  /
 
PL/SQL procedure successfully completed.

Note that we have NOT collected any new statistics.

OK, let’s now fast track ourselves one year into the future and run the same query again. Note in a year’s time, we will be 365 days past the current sysdate. So we’ll mimic running the identical query by simply adding 365 days to the sysdate and again querying for the latest year’s worth of data:


 
SQL> select * from muse where muse_date > (sysdate+365) - 365;
 
365000 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1682432684
 
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   944 | 16992 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MUSE   |   944 | 16992 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | MUSE_I |   944 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("MUSE_DATE">SYSDATE@!+365-365)
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       4005  consistent gets
       1301  physical reads
     134192  redo size
    4024147  bytes sent via SQL*Net to client
       1188  bytes received via SQL*Net from client
         74  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     365000  rows processed

We notice that the execution plan has now changed !!

It’s now suddenly starting to use an index where previously it was using a FTS. Notice also that the CBO has got the cardinalty estimate way wrong, predicting only 944 rows will be returned. Instead of estimating it will get a year’s worth of data, the CBO is estimating only approximately 1 days worth of data or the selectivity based on one distinct value. If the CBO get’s this so terribly wrong, it’s a good chance it has also got the execution plan terribly wrong as well.

The query is effectively the same query that would be run in a year’s time, the statistics have not been changed and yet the execution plan has indeed changed. The CBO suddenly using this index may be a terrible thing, resulting in a really inefficient execution plan and a massive increase in LIOs.

Why has the plan changed when the statistics have not ?

The key issue here is that the CBO thinks the maximum date in the table was from a year ago when the statistics were last calculated. However, the query is attempting to select data that is beyond the range of values known to the CBO. How can it now know the estimated cardinality of the query, the number of expected rows to be returned when we’re only interested in rows that are beyond its maximum known range of data ?

The answer is that it can’t. Not accurately anyway.

The CBO has to guess and depending on the version of Oracle and the type of query being parsed, it can guess in a number of different ways. Because the query is effectively after data that is greater than the maximum known value, the CBO is basically “guessing” there will only be a days worth of data greater than its maximum known value, not the full years worth that’s really in the table. The CBO having to guess is not a good thing, especially when it’s more than likely to get the guess hopelessly wrong.

Note this change will have occurred suddenly one day into the future when the CBO starts to consider there are so few days worth returning that the index suddenly becomes the best and cheapest option.

How do we fix this inefficient execution plan ?

Rather than having the CBO guess how many rows might be returned, let it actually know. Simply collect fresh statistics and let the CBO know that we actually have a full year’s worth of data since the statistics were previously collected:

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'MUSE', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.

If we run the same query again now …


 
SQL> select * from muse where muse_date > (sysdate+365) - 365;
 
365000 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2738706195
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   364K|  6413K|  1652  (14)| 00:00:09 |
|*  1 |  TABLE ACCESS FULL| MUSE |   364K|  6413K|  1652  (14)| 00:00:09 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("MUSE_DATE">SYSDATE@!+365-365)
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       7205  consistent gets
       6709  physical reads
          0  redo size
    4024147  bytes sent via SQL*Net to client
       1188  bytes received via SQL*Net from client
         74  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     365000  rows processed

We now notice the CBO has got the cardinality spot on again and choses to use the efficient FTS.

So yes, an execution plan can change even if we don’t make any changes to the database, including not collecting fresh statistics. If you think by not collecting statistics, things will simply remain the same, one day when you least expect it, things might suddenly go terribly wrong.

Solving such issues can be extremely different if you try to do so by looking at what might have changed, especially if you don’t know what you’re looking for …

Comments»

1. Stefan - February 16, 2010: Hello Richard,
when i was reading that topic header in my feed reader, i would have bet that this issue can depend on the different ” dynamic sampling levels” in combination with changed data.

http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i42991 (Point “14.5.6.4 Dynamic Sampling Levels”).

Nice post anyway 🙂

Regards
Stefan

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Stefan

Good old spam checker picked on your comment there for a while 😉

Yes indeed, dynamic sampling can be a issue but of course in my little demo, I had stats on the table so was innocent this time.

LikeLike

Reply
Stefan - February 18, 2010: Hello Richard,
dynamic sampling can also have influence, if statstics exist.

That’s why i posted the official documentation link to reference the different sampling leves (it is getting interesting with 3 or higher).

Just think about the correlation problem with 2 or more columns. With 10.2.04 (and higher) the optimizer is collecting simplified column group statistics, if the sampling level is set to a correpsonding level (i have seen some examples with level 7).

So an amount of changed data and a “higher dynamic sampling” level can have influence on the execution plan, even if statitics exist and are not updated.

Regards
Stefan

LikeLike

Reply
2. baskar.l - February 16, 2010: Wonderful sir…Thanks for all your postings …

LikeLike

Reply
Richard Foote - February 17, 2010: Why thank-you Baskar for your nice feedback.

LikeLike

Reply
3. Jon Riegel - February 17, 2010: I’ve encountered a somewhat different version of this same phenomenon, in what appears to be a pretty generalizable case. We had a sizeable table with a composite index on a user column + a date column, and a second index solely on the date column. The application was running queries looking at a week’s worth of data for a particular user. When the statistics got out of date by more than a week, the execution plan stopped using the composite index and started using the date index, and performance suffered. After we updated the statistics, the query performed as normal.

From a developer (i.e. someone not in charge of statistics collection) perspective, I personally would argue that any query similar to the above example, where the amount of time being scanned in the query is less than or equal to the length of time that passes between automatic collection of statistics, should specify the composite index with a hint so as to ensure that the above scenario cannot occur.

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Jon

Indeed, the time scale might not be a year but a month or a week or possibly only a day. It depends on the query and the manner in which data is being accessed beyond the known ranges within the stats.

Certainly a hint is an option, although stored outlines and profiles are other options to consider as well. Or indeed a stats gathering methodology appropriate to your database and application.

LikeLike

Reply
Bertran Saragusti - April 13, 2010: Hi,

Just wanted to share an experience. We had a report on the last day ( sysdate-1 ) which started to perform bad after 10g migration.

After many tests and a metalink SR, we arrived to a workaround consisting in deleting the offending column stats ( when detected ) via a cron script daily ran.

This way the optimizer is forced to calculate min/max on the fly avoiding the prorated density calculation ( due to not having last day stats )

Bertran Saragusti.

LikeLike

Reply
4. Tony Sleight - February 17, 2010: A nice example on how time can appear to affect execution plan change. Is there not another effect in place as well? In your example, you initially create a table with 1000 x 1830 rows ie 1,830,000 rows. Then you collect statistics. Next you insert another 365,000 rows. Depending upon your version of Oracle, assume it is 10.2, once the number of insert/updates/deletes > 10% rows the statistics are marked as stale and would not be used in the costing calculation. So the conclusion that time has caused the change is not necessarily true, the volume of changes could make it happen much quicker. With Oracle 11G I understand the staleness percentage can be modified, and hence your time example would be complete if staleness percentage was set to say > 20%.

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Tony

I think you’re confusing a few a issues. Firstly, stale stats still get “used”, it’s just that Oracle can detect stale stats and re-collect them as necessary. The key point I’m making is that no new stats are being gathered at all and hence Oralce only has the stale stats to go on.

Not also this issue can occur with far less than 10% of data being added. See Martin’s comments below for a perfect example. There’s nothing “special” about 10%, it’s just an arbitrary figure Oracle has plucked which doesn’t cause constant re-gathering of stats but ensure stats are gathered after a moderate amount of change.

When stats go “stale”, the automatic stats gathering job will re-calculate the stats. However, my point is that many get rid of this job, collect stats manually and believe by not collecting stats, everything should remain the same. I’m just highlighting that this might not be the case.

LikeLike

Reply
5. Brian TKatch - February 17, 2010: typo: “that usually does usually change ”

“People have tried but it’s very difficult to stop time.”

I once did stop it, but none of my queries would finish running.`

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Brian

As I usually do, I thank you for picking out my typos 😉

LikeLike

Reply
6. Asif Momen - February 17, 2010: Superb !!!

LikeLike

Reply
Richard Foote - February 17, 2010: Thanks Asif 🙂

LikeLike

Reply
7. mwidlake - February 17, 2010: Tony is right to point out that once a segment is changed by 10% or more then the automatic stats job will pick up the object as stale (11g tweaks allowing), gather stats and the plan will change “with nothing happening”. This performance instability, in amongst other issues, does lead to many sites turning off the automatic stats job. Most of the plan changes are good but the odd few that are bad can be serious issues.

Where the effect that Richard is highlighting, that of the processing window getting beyond the known range of the exisiting data and thus the plan changing, is magnified by the use of partitions.

Let us say you partition an orders table by week and you have lots of code that looks back at the data for the last hour, to pick up the latest orders. On Monday a new partition begins to be used and data starts going into the partition. You gather stats that night. The date range of the data in that table is something like 00:00am monday morning to 01:00am Tuesday morning. 25 hours.
By sometime tuesday, that one hour “report on” processing window is going to be so far out of the date minimum/maximum range that the CBO will estimate the data that will be found to be less than half of the actual value. By the end of Tuesday, before the next stats collection runs, the CBO could be estimating you are getting no rows (rounded to 1 row) for that hour processing window and the plans could be very bad indeed.

If you use partitons and see wildy varying performance, you may well need to think about how far beyond the know range of values in a partition you get to before new stats are gathered. You might want to not gather stats on such partitions at all, until the partition is “old”, and let dynamic sampling look after things….

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Martin

Exactly !!

I just used a nice simple little demo. However, this can most certainly occur after only a relatively small period of time as your excellent example demostrates.

There’s nothing more dangerous than Oracle thinking it’s only going to return 1 row, when in fact it returns 1000s. The manner in which tables are accessed, the order they’re joined and the type of join can combine to make the execution plan from hell.

Thanks for your comments 🙂

LikeLike

Reply
Peter Scott - February 20, 2010: Martin – I am not sure that not having partitions stats is good in the cases when Oracle can ‘create’ an approximation of global stats from the partitions (providing all of the partitions have stats – on missing => no global stats estimate)
I was bitten by that on 8i when the dba team routinely built next week’s partitions on a Friday and the empty, unanalysed caused range queries to bomb…

LikeLike

Reply
Martin Widlake - February 20, 2010: Hi Peter,

As you say, all partitions have to have had stats gathered for the global stats to be derived from them. Just one partition missing stats stops this occuring.
However, gathering stats on an empty partition that then starts having data put in it and this queried, that leads to trouble as oracle sees “zero” in the stats, rounds this to one and does nested loops. It also, as you say, gets it terribly wrong with range queries.
I have found that the CBO usually works ok for queries against the last partition if there are no stats against that partition and so uses dynamic sampling, and uses global stats when the query would covers more than one partition.
However, you need all partitions to have stats to get the calculates global stats. Catch 22. Answer? gather the “empty” stats to allow global calculation and then delete the stats on the current and future partitions. Clunky but works. Usually.

LikeLike

Reply
Richard Foote - February 25, 2010: Hi Peter and Martin

Nice discussion.

I agree that having no stats is better than having stats suggesting there’s nothing when there are in fact rows.

It’s a similar issue we have when we create a bunch of new tables on an application release but they’re not populated yet until the users logon later and start using the new functionality.

It’s better having no stats at all until the table “warms up a bit” with some usage rather than collect stats prematurely on the empty (or close to) table.

LikeLike

Reply
8. Richard Foote on the Impact of stats staying the same. « Martin Widlake’s Yet Another Oracle Blog - February 17, 2010: […] in performance. Tags: Blogging, performance, statistics trackback I just wanted to highlight this very good posting by Richard Foote. He is talking about how SQL execution plans can change when “nothing else does”. Not […]

LikeLike

Reply
9. Noons - February 17, 2010: Er… We *have* made changes to the database! Adding rows is a change, is it not?
But I understand what you mean.
The question left in my mind is at what point are the new stats of advantage? While I am totally against not collecting stats for a year, I am also against collecting stats every 5 minutes.
Where would the boundary be?

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Noons

I guess what I’m trying to emphasise is that when the finger of blame gets pointed as to why an execution plan has changed, when questions get asked as to who made what change to the database to cause this sudden performance degradation, when the DBA swears I didn’t change the stats honest, one often forgets there all 1000s of applications users out there who have in the background been adding data that is crucial to what’s happened. Ultimately, the finger of blame perhaps should be pointed at the DBA for NOT changing the stats when they should.

My example of a year was just to make things simple. Note though that even in my example, this change in execution plan would actually have occured somewhat less than the year as each day we didn’t collect stats, Oracle kept thinking the rows being returned was less and less until eventually, the cost of using the index to return the rows suddenly became the cheaper option (althought he actual rows being return was always the same).

The boundary depends on the application and the manner in which data beyond the current known limits is being acessed (and the database version is important as well). Perhaps for some tables, one might need to collect stats daily, for other tables it might indeed be yearly. Note using an unbounded query (as in my example) might be more “dangerous” rather than a bounded query. There are a lot of factors to consider.

Considering the number of hits I’ve had today following this article, perhaps I should expand on a few more points as it seems to have generated a bit of interest out there !!

LikeLike

Reply
10. Connor McDonald - February 17, 2010: 11.2 throws an even bigger spanner into the works…”Problematic” predicates are tracked during execution, so a query with

a) no stats changes in the dictionary
b) no data changes to underlying tables
c) no SQL changes
d) no adaptive cursor changes

might *still* pick up a new plan on subsequent executions….

That’s gonna be fun 🙂

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Connor

Indeed !! learning by one’s mistakes and after executing a statement a number of times, the CBO can realise it’s initial assumptions weren’t correct and modify things as it goes along.

Things will generally change for the better, except for all those exceptions to the rule of cause 😉

LikeLike

Reply
11. Something about Execution plan in Oracle « Rangarajan Iyengar's Blog - February 17, 2010: […] else remained same except the addition or modification to the data, then you need to read this excellent article by Richard Foote […]

LikeLike

Reply
12. illiyaz mohammad - February 17, 2010: Good one Richard…woke me up from afternoon’s sleep ;).I am sure this is just one of the reasons and probably one of the major ones as to why a plan has changed…..How about having a series of articles explaining the other possible reasons for plan change (something like your Index and cost calculation series)…i am sure that this is a huge topic but even if a handful of them are covered, it will help the commmon people like me understand the nuances of Oracle…Keep up the good work…

LikeLike

Reply
Richard Foote - February 17, 2010: Hi Illiyaz

Nothing better than an afternoon nap !!

Yes, I might need to expand on a few points but I’m also keen to look at a few block dumps as well.

So much to say, so little time to say it in !!

LikeLike

Reply
13. Uwe Hesse - February 18, 2010: Richard,
congratulations to this straight and instructive demonstration! It does indeed explain very well why execution plans can change out of a sudden although “we (DBAs) changed nothing”. I will add this to my collection of articles that I point to in my courses 🙂

LikeLike

Reply
Richard Foote - February 18, 2010: Hi Uwe

Thank-you !!

I must say I sometimes look back on my days as an Oracle University instructor with some fondness.

But then I remember just how hard it is to teach week after week and the stress of teaching a new features course for the first time and I then think, thank goodness I don’t do it anymore 🙂

LikeLike

Reply
14. Richard Foote - February 18, 2010: Hi Stefan

Got it !!

One of the reasons I started this blog was not only to share information but hopefully to learn plenty myself in the process.

You have thankfully made me consider a possibilty I wouldn’t have otherwise picked up so a big thank-you 🙂

LikeLike

Reply
15. Hans van Driel - February 19, 2010: Richard

First congratulations with this excellent demonstration.

Bind variable peeking can also change the plan even when the data has not been changed or fresh statistics just have been collected.

LikeLike

Reply
Richard Foote - February 19, 2010: Hi Hans

Absolutely and is perhaps the most common issue I’m come across with regard to plans suddenly changing.

However, in my little example, I explicity ruled this out by not using bind variables.

LikeLike

Reply
16. tec - February 19, 2010: Very nice example Richard. I always learn so much from your blog postings!

Curious though when just looking at the Statistics output, between the INDEX RANGE SCAN (IRS) and the last FTS, one could be lead to believe the IRS performed better?

IRS Stats :
——————–
4005 consistent gets
1301 physical reads
134192 redo size

FTS Stats:
——————–
7205 consistent gets
6709 physical reads
0 redo size

Can you shed some light on this?

LikeLike

Reply
17. Richard Foote - February 19, 2010: Hi Tec

The FTS might scan more actual blocks but might do so by reading them in more efficiently (via larger multiblock reads) than fewer single block reads of the index.

The CBO doesn’t pick the plan with the less CRs, but the plan that will ultimately produce the faster overall response time.

LikeLike

Reply
18. ADT Security West Hartford - February 20, 2010: Ah, This is exactly what I was looking for! Clears up
some misnomers I’ve been hearing.

LikeLike

Reply
19. Log Buffer #179: a Carnival of the Vanities for DBAs | The Pythian Blog - February 20, 2010: […] Richard Foote, for his part, looks for answers to this puzzler: how does an execution plan suddenly change when the statistics (and everything else) remains the sam… […]

LikeLike

Reply
20. Kerry Osborne - February 20, 2010: Very nice post Richard. Obviously you struck a chord with a number of people. I have to agree with comments on this particular issue affecting partitions, as that’s really the only place I’ve come across the issue “in the wild”. I would enjoying seeing another post on the same topic.

Looking forward to hearing you speak at Hotsos in a few weeks by the way.

Kerry

LikeLike

Reply
Richard Foote - February 25, 2010: Hi Kerry

Looking forward very much to Hotsos although I’m not sure I’ve got the ideal spot following Tom Kyte on the first day !!

Make sure you say hello 🙂

LikeLike

Reply
Kerry Osborne - March 2, 2010: Ha. I had that same situation a few years ago (my first presentation at Hotsos Symposium). I had a pretty big audience, mainly because I was the speaker after Tom and I think everyone was just too lazy to get up and move to the other room. Anyway, Tom stuck around and listened to a bit of my talk. I got a mention in his blog the next day. He basically said that he was really enjoying my talk, right up to the point where I used his name and the words “ad nauseam” in the same sentence. I was talking about developers not using bind variables, which is one of Tom’s pet peeves. So I had to apologize. I told him that what I meant to say was that he had covered the topic in such exquisite detail that there was really no need for me to comment further. it just came out as “Tom’s gone on about that subject ad nauseam”.

Kerry

LikeLike

Reply
21. No Changes,No Alterations..Explain Plan Suddenly Changed !!!! « Anand's Blog - February 23, 2010: […] https://richardfoote.wordpress.com/2010/02/16/how-does-an-execution-plan-suddenly-change-when-the-sta… […]

LikeLike

Reply
22. Anand - February 23, 2010: Awesome..no other word Sir!!!!

Regards,
Anand

LikeLike

Reply
Richard Foote - February 25, 2010: Thanks Anand, much appreciated 🙂

LikeLike

Reply
23. Henish - March 2, 2010: Excellent Example Sir!!

One quick question, if the data in the table is static will collecting regular stats change execution plan?

Thanks

Regard’s

Henish

LikeLike

Reply
mwidlake - March 2, 2010: Hi Henish,

I know I am not Richard, but I can answer that question for you (I imagine Richard is preparing for presenting at Hotsos, so is a very busy man).
The simple answer is that, if the data in the table has not changed since you last gathered stats on the table, gathering stats again will almost certainly not change any plans and will just be a waste if IO and CPU.
This is the basis of only gathering stats on tables or indexes when there has been change, ie the table has become “stale”, which is the case with the automatic stats collection in 10g onwards. Oracle only collects stats on tables where at least 10% of the rows have been changed.
Why do I say “almost certainly” not change the plans?
Well, if you last gathered stats with a sample size of 1% and now you gather stats with a sample size of 25%, Oracle will be more accurate in it’s figures. You have not changed a single row in the table, but the high and low values for columns will be more accurate, the number of distinct values higher and there is a very small but real chance the plan might change.
I see this occasionally with low sample sizes. A 0.1% sample size on a very large table, say 100 million rows, will estimate there are 1000 different values for a column. A 5% sample size on the same table, no changes to the rows in the table, and the same column is recognised as having 2000 different values.
Think of it as getting a telephone directory and pulling 100 random pages out of it and counting the number of individual surnames. Now pull out 500 pages and count the number of surnames.

so the answer is “No, no change”. But there are edge cases where the answer is “maybe”.

I hope that helps Henish.

Martin

LikeLike

Reply
Timur Akhmadeev - March 2, 2010: Well, gathering stats on the unchanged table doesn’t have to be run with different parameters. Gathering stats will invalidate (possibly rolling in 10g onwards) all cursors dependent on this table, which, in turn, will result in hard parse. And hard parse is enough to change the plan (if you don’t use some type of plan stability).

LikeLike

Reply
Richard Foote - March 2, 2010: Hi Henish

Both Martin and Timur have given you good answers.

If by collecting stats you change the stats somehow (based on sample size and the such) then there could be plan differences.

After collecting stats, the statement is reparsed and (say) bind peeking kicks in, again a plan could change.

LikeLike

Reply
24. Michael Mao - March 2, 2010: Hi Richard,

Good to read you articles.
We plan to change our statistics gather strategy and want to know you opinion on this.
The old strategy is doing the statistics gather job once a week for all the tables. The percent is 70%-90%。The total time window is 14 hours.

In order to improve the execution plan stability, we plan to change to the new strategy. The new strategy is to keep the TOP 100 tables’ statistics unchanged and use DBMS_SET_TABLE/COLUMN scripts to set some column’s high/low value(mostly with date type columns).

The application is Oracle’s EBS system and the db version is 9.2.0.8.

We can find out some columns which need to be set with the high/low values(mostly with date type columns), but maybe not all the columns with high/low values.

Do you think the unchanged statistics with setting high/low value for some columns can improve the plan stablity?

Thanks,
Michael

LikeLike

Reply
Richard Foote - March 2, 2010: Hi Michael

It does all sound very “manual” and difficult to automate and it does lend itself to human error and stale statistics (such as density or number of distinct values which are important stats as well, as indeed are many of the index stats). Then there are possible histograms that need to be collected as well.

I would recommend:

1) Moving off 9i and getting to 11g when possible as it’s stats sampling and stats accuracy is much improved

2) Reducing the percent to much lower value (say 10%) and reduce your total time window that way if it’s an issue.

3) Monitor and see over time what your problematic SQL might be in terms of plan stability and focus your stats issues on them specifically if possible, perhaps with the use of stored outlines if obsolutely necessary

What you suggest might indeed work in your environment but you’re making a rod for your back which is likely not necessary

LikeLike

Reply
25. dan - March 2, 2010: Plans can also change when you change your server hardware, for example cpu_count influences CBO when parallel queries are used, correct?

LikeLike

Reply
Richard Foote - March 2, 2010: Hi Dan

Yes of course new hardware might make a big difference not only as you suggest but also with significantly differing system stats that can change and impact execution plans.

So yes, correct 🙂

LikeLike

Reply
26. Anand - March 2, 2010: Hi Richard,

How can we determine the “estimate percent” for gathering the stats of the database?What value can be good??

Regards,
Anand

LikeLike

Reply
Richard Foote - March 3, 2010: Hi Anand

There’s no golden number else Oracle would simply use it by default.

Well, actually there is I guess.

100% means the stats are 100% accurate and are not impacted in any way due to any sampling issue. So if you can offer the time ans resources, you can’t beat 100%.

However, since 11g, Oracle’s default of Auto sampling size is excellent in all my tests and is what I would now recommend.

We currently use a sample size of 5% with 10g which has proven to but quite acceptable but there’s no magic sample size which is perfect for all databases (except perhaps 100%)

LikeLike

Reply
Gary - March 4, 2010: 100% accurate at the time they are taken. As soon as there is a data change they become less accurate.
One potential drawback of 100% is they pick up outlier mistakes. Had an app where someone had entered a date of 2010 instead of 2001. One rogue row in a million row table threw out the HIGH_VALUE of the stats and upset the cardinality calculation….

LikeLike

Reply
mwidlake - March 4, 2010: As Gary says, if you do a 100% sample size you will potentialy pick up all the records in the table and might get an odd plan due to the Outlier records {often a result of someone deciding to use an “unbelievable” value to replace null, occasional due to typo issues as gary says}.

Histograms might avoid that problem if you collect them. They might not.

But then histograms might themselves become the source of other problems (binds variable and histograms, on V10 anway, can be bad bedmates). *sigh*

I’m not having a pop at Gary at all, his point is very valid, I just think it is worth mentioning that , as the CBO and Stats add more flexibility and capability to the sql Optimizer, it massively adds to the edge cases and “ahh but” situations where, for your database working with your workload, things go wrong/right. Which is why a perfect sample size is impossible to give.

If you have a complex or massively large database, you are probably going to have problems with stats. That’s OK. You just need to make time to look at your system, the problems you hit, and work out how to fix those problems. There is no magic sample size.

I’ve traditionally developed code that selects a sample size based on the size of the segment, as, the bigger the segment gets, the lower the percentage size that is required to get representative stats. It works really well for me. Apart from the odd exception, where I need to gather better stats on that system for that table.

I personally think the best rule of thumb is to always expect exceptions and plan to work around them.

LikeLike

Reply
27. Henish - March 3, 2010: Thanks Martin , Timur and Richard for your valuble comment

LikeLike

Reply
28. Anand - March 4, 2010: Hi Richard,
One more question popped up in my mind…how to decide the value for METHOD_OPTS while gathering the stats?

Anand

LikeLike

Reply
Richard Foote - March 4, 2010: Hi Anand

I recommend FOR ALL COLUMNS SIZE 1 as a default and collect histograms on just those columns that really need them, which you can determine over time.

I don’t like SIZE AUTO for various reasons as I’ve discussed previously:

DBMS_STATS METHOD_OPT default behaviour changed in 10g. Be careful …

Even in latest version of 11g.

LikeLike

Reply
29. Richard Foote - March 4, 2010: Hi Gary

I agree, as soon as you collect the stats, they become out of date. I’ve previously discussed the dangers of outlier values:

Outlier Values – An Enemy Of The Index

And yes, 100% stats will pick up the naughty row. But then again one day when you least expect it, so might a sample size of 1%.

There’s no 100% correct answer to what the percentage should be. Any value has a danger one way or the other. As does not collecting them at all or for a considerable period of time.

You can win/lose either way.

That’s why being an Oracle DBA is such fun 😉

LikeLike

Reply
30. Richard Foote - March 4, 2010: Hi Martin

You make a good point about the larger the table, the smaller the sample. It’s actually the small tables that can be really problematic if the stats are out or just outdated as the discrepancy can be proportionally much larger or significant.

LikeLike

Reply
31. Blogroll Report 12/02/2009 – 19/02/2010 « Coskan’s Approach to Oracle - March 19, 2010: […] 21-How does and execution plan change when statistics remain same? (comments ***) Richard Foote-How Does An Execution Plan Suddenly Change When The Statistics (And Everything Else) Remains The Sam… […]

LikeLike

Reply
32. Anand - March 24, 2010: Hi Richard,
What would you comment on, if a table was using indexes and later after gathering stats its going for full table scan.The index being used was composite index with all the column names which were present in the where clause of the query.Can it be a indexing issue?
As per my understanding, if the clustering_factor is near to number of rows, FTS is preferred, and same is the case with above scenario.How do we resolve it??

Regards,
Anand

LikeLike

Reply
33. Richard Foote - March 24, 2010: Hi Anand

Have a read of my series on the CBO and the CPU costing model to understand how Oracle calculates the relative costs. The FTS is cheaper than the index, question is why if it’s not actually so when you run the query.

A good idea is to run a 10053 trace to see how Oracle has calculated the various options or simply use a index hint and see the calulations for the index access path and see what the index related costs are.

Certainly the CF could be an issue but so could a whole bunch of other factors, you need to look at the actual cost comparisons to make an informed decision.

LikeLike

Reply
34. Richard Foote - April 13, 2010: @Bertran

Indeed a strategy that can work with acceptable overheads in some scenarios. Thanks for sharing.

LikeLike

Reply
35. Performance Tipping Points « Martin Widlake's Yet Another Oracle Blog - April 14, 2010: […] For an excellent posting on why code suddenly changes it’s execution plan when nothing has happened but time passing, seethis excellent post and comments on the topic on Richard Foote’s blog […]

LikeLike

Reply
36. Vijay - May 9, 2010: Thanks a lot Richard,This was really very good info.

The dilemma of gather stats on a reguar basis is that it might pick a bad execution plan.

This precisely happened in our env , one fine monday , the plans changed for a major SQL queries , bringing down the performance of the whole database.

Nothing was changed , however we have the stats running every day through the DBMS _scheduler which is supposed to gather the stale stats.

But this is changing the execution plan to worse.

Infact we had cases where we had to restore the old stats to get the same execution plan as earlier and performance was good.

So my question is the new updated stats always good?

LikeLike

Reply
Richard Foote - May 12, 2010: Hi Vijay

There’s no simple, fool proof way of collecting statistics such that it will always make things better, not worse.

All it takes is one single new value in a table to make the CBO suddenly make a new, totally incorrect decision. Just one single value.

It’s a question of getting it right most of the time and knowing how to fix things if and when things sometimes go wrong 🙂

However, people who think by not collecting stats things will always remain safe will one day be bitterly disappointed and likely have absoluetly no idea what has happened and how to fix things up.

LikeLike

Reply
Uwe Hesse - August 24, 2010: Richard, one addition to: “There’s no simple, fool proof way of collecting statistics such that it will always make things better, not worse.”
Well actually there is a way since 11g: You can collect stats and defer their publishing until you tested them in your own session. Its simple – but of course it is an effort to do so.
Brief example:
http://uhesse.wordpress.com/2009/03/03/pending-statistics/

Second possibility is to collect SQL Plan Baselines for the critical statements and if new statistics lead to new plans, these new plans are not used until there is proof that they are better. Also an 11g New Feature 🙂

LikeLike

Reply
37. Hashmi - August 25, 2010: Sir,
I didn’t understand the whole point of this question?
The topic says
”
How Does An Execution Plan Suddenly Change When The Statistics (And Everything Else) Remains The Same ?”
The execution plan changed because statistics are same. The optimizer is unaware of the current state of the database.
You said ” How can an execution plan suddenly change when no one has made any changes to the database ?”
You said “By no changes, it means that there have been no alterations to any segments,”

Since new data has been added so there has been alterations to segments.
So there has been changes in the database.
The reason for explain plan to change is state statistics.

LikeLike

Reply
Richard Foote - September 9, 2010: Hi Hashmi

The whole point of the piece is that when people are trying to determine why plans suddenly bad and working out what has changed, they forget that the data in the database changes and that having stale stats can change a plan just as much as it can when you do make changes to the stats.

Many people think that by not changing stats, executions plans will remain stable. You and I know that’s not necessarily the case 😉

LikeLike

Reply
38. Hashmi - August 25, 2010: One more thing.
You said “I agree that having no stats is better than having stats suggesting there’s nothing when there are in fact rows.”

So is it better to use RBO forgetting completely about statistics.
Why CBO is a better option than RBO?

LikeLike

Reply
Richard Foote - September 9, 2010: Hi Hashmi

No. I’m suggesting that dynamic sampling might be better than having totally incorrect stats such as a 0 when there are indeed rows.

LikeLike

Reply
39. Mr Blues - August 31, 2010: BTW, if you collect system stats (in 11gR2) you should be aware of the fact that there is a bug that causes multiblock read and single block read times to be totally wrong. I have seen it in my lab earlier this year. It took a while until Oracle confirmed that we have met a bug. So, meanwhile you should either not to collect system statistics or delete the old ones. By doing so, you then rely only on I/O cost on execution plans and not CPU.

LikeLike

Reply
Richard Foote - September 9, 2010: Hi Mr Blues

Do you have any details such a bug number ?

BTW, as a matter of sheer coincidence, I’m actually currently listening to Mr Blue Sky by ELO, in honour of poor Mike Edwards who died recently in a freak accident:

http://www.autospies.com/news/Founding-Member-Of-70-s-Rock-Band-ELO-Dies-In-Freak-Car-Accident-57439/

LikeLike

Reply
40. Richard Foote - September 9, 2010: @Uwe

Hi Uwe

Not sure that comprehensively testing out an application before publishing new stats is entirely “fool proof” 🙂

To test effectively a large application would be very time consuming, unless you had something like database replay in place as well and even that has it’s weaknesses.

Certainly 11g has these new features such as plan stability and sql baselines and the such which fundamentally makes databases more stable and predictable in it’s processing after the collection of newer stats.

BTW, there’s a possibility I might make it back to Germany next month to present my seminar.

LikeLike

Reply
41. Anonymous - October 12, 2010: An easy example can be found on the below link
http://www.abhyast.com/abhyastcom/post/How-Statistics-Alone-Changes-the-Execution-Plan.aspx

LikeLike

Reply
Richard Foote - October 12, 2010: Hi Anonymous

The whole idea of this blog piece is that things change without having to change the stats. That plans can change when stats change is kinda stating the bleeding obvious …

LikeLike

Reply
42. Kishore - February 23, 2011: Hi Richard ,

That is an awesome explaination on what can cause the plans to change.
We have two databases (A & B databases ) which are similar in every aspect.But when the same query run in A ,it exhibits a good plan. But the same query when ran in B exhibits a bad plan.
Why would it differ in the execution plan ? what factors makes it change ? Can you please shed some light on this ?

Thanks a ton !!!

LikeLike

Reply
Tony Sleight - February 23, 2011: Have you checked when statistics were last gathered on the tables in database A and B? The whole essence of this thread revolves around valid and up to date statistics accurately describing the distribution of data within tables.
If all things are equal in your databases that’s the first place I’d look.

LikeLike

Reply
Richard Foote - February 23, 2011: Hi Kishore

As Tony mentions, are you sure the stats are the same on both databases. Not only the schema stats, but the system stats as well. Are you sure all the optimizer related parameters the same, the same indexes, the same data distributions, etc. etc.

Are you sure you’re using the same baselines or the same stored outlines, etc. etc.

There can be many factors that can make plans differ in “similar” databases.

LikeLike

Reply
43. Kishore - February 24, 2011: Hi Richard/Tony,

The DB version is (10.2.0.4) .
The statistics are created on once in every weekend at the same time on both the databases .All the schemas,objects are same in Databases A and B.
We collect the stats with estimate_percent=>40 with no histograms (due to the Datawarehouse application-OLTP) at table level .
However the system resources (memory/cpu) of both the databases on different servers are not the same. Does this account to the optimizer exhibiting different behaviour even though the system stats are collected ? I understand that if the data volume across both the database are not same,then the optimizers plan changes .

LikeLike

Reply
Tony Sleight - February 24, 2011: Kishore,
I assume from your comment ‘However the system resources (memory/cpu) of both the databases on different servers are not the same.’ you have Database A and B on servers with different hardware configurations. Are the system statistics different and reflecting the server configuration? System statistics can change execution plans. We had an example of this on one server when we upgraded the hardware platform. After collecting system statistics on the new hardware some queries changed execution plans from an efficient index to a full table scan. the reason for this was the new SAN SREADTIM and MREADTIM were roughly equal due to SAN cache( i.e. SREADTIM/MREADTIM = 2/3). We reverted back to previous system statistics where SREADTIM/MREADTIM = 1/4 and plan reverted back to using the index.
But, what we cured was a symptom, and not the disease. By looking closely at the queries which were now running with an inefficient plan it was seen that CBO had made this choice through lack of information regarding the distribution of data within the table.

The queries were performing a select on a column representing a status of the record. There are eight possible status types, and the query was looking for seven of them. There was no histogram describing the distribution of data on the status column, so the CBO assumed we were looking for 7/8 (87.5%) of the 3 million records and the full table scan was seen as the most efficient means of retrieving the data. In reality, the picture was different, the one status value the query was not interested in represented 99.9% of the table contents, and the query only returned 0.1% of the table contents. By creating a histogram statisic on the status column the CBO now understood that the query would return a small proportion of the table contents and so used the efficient index to return the data. Even when the system statistics were updated to reflect the new hardware performance, the query utilised the index. In the old hardware it had been luck that the system statistics MREADTIM and SREADTIM were sufficiently different for the CBO to choose an index over a full table scan, in the new hardware, the syatem stats indicated to the CBO that the full table scan was best.

It may be a similar situation with your system. My advice would be to identify the poorly performing query and check that appropriate statistics had been gathered with attention to a histogram as a potential candidate for improving the query.
Also, check the system statistics are appropriate for the server hardware configuration.

LikeLike

Reply
Richard Foote - February 24, 2011: Hi Tony

Thanks for your input 🙂

LikeLike

Reply
Kishore - February 25, 2011: Hi Tony ,

Thank you for the detailed explaination.Appreciate the time and patience to explain .I now have a direction to work on with.
Thank you .

LikeLike

Reply
Richard Foote - February 24, 2011: Hi Kishore

I’ve written a number of posts on the CPU based optimizer, check them out as they’ll explain why yes it can be significant.

Check out what differences in stats you might have at both the object and system levels and see how they impact the plans. The plans have costs and cardinality estimates associated with them, do they make sense, are they accurate, how do they differ between each plan.

As I mentioned before, they can be a number of differences between the two “similar” databases that can make all the difference and yes, system stats is one of them.

LikeLike

Reply
Kishore - February 25, 2011: Hi Richard,

Thank you for the posts/response. I will check the posts you have mentioned.

Thanks again Richard for this informative site with detailed posts.

Thanks
Kishore

LikeLike

Reply
44. Richard Foote - February 25, 2011: Hi Kishore,

No worries, they should help explain why the CBO can do different things in similar databases.

LikeLike

Reply
45. anuragvidyarthiAnurag - June 8, 2011: Hi Richard,

I have a question related to dbms_stats.get_table_stats and dbms_stats.set_table_stats.
1. If get_stats and store in variables.
2. Truncate and reload the same table with same data and then
3. Set the stats from the saved variables.

in that way I can save some time that could have been wasted in stats gathering.

PS: this does not mean that I will always avoid the stats gathering…it is just I will do it during the peak hours of my data warehouse environment.

Please suggest !!

Regards,
Anurag

LikeLike

Reply
Richard Foote - June 16, 2011: Hi Anurag

You can save even more time by simply not changing or saving anything as existing statistics are valid after you truncate the table. If the data is virtually the same after the reload, don’t do anything with the stats.

LikeLike

Reply
46. vrajesh dalal - August 4, 2011: I just don’t have words…

simply great..!!!

Regards,
Vrajesh

LikeLike

Reply
Richard Foote - August 23, 2011: Thanks Vrajesh, the words you used are greatly appreciated 🙂

LikeLike

Reply
47. Gaurav Gulati - November 24, 2011: An Expert Post from an Expert! Please do keep up the good work. We all have something to learn from experts like you everyday.

Thanks again!

LikeLike

Reply
Richard Foote - December 19, 2011: Hi Gaurav

Thanks for your kind feedback, much appreciated 🙂

LikeLike

Reply
48. Chandan - March 18, 2012: HI Richard,

We have an issue with one of our database … it was upgraded to 11.2.0.2 from 10.2.0.5 and migrated to a AIX box from HP … 3 weeks back … last week indexes were rebuild … after that one of the application query is having issues .. its a simple query .. which hav 2 bind variables and values are passed to those variables … in 10.2.0.5 it was running smoothly .. but after upgradation … the query is causing more I/0 and switching between multiple execution plans with diff hash values … i need u r expertise here .. please suggest wat may be the causes for this

LikeLike

Reply
Richard Foote - April 5, 2012: Hi Chandan

Not much to go on, but I suspect it’s likely more of an issue of having moved to 11.2.0.2 than to AIX and due to the changes in SQL Plan management and adaptive cursor sharing.

In 11g databases I managed, I implemented SQL Plan baselines and evolved plans as necessary to get around some of these issues.

LikeLike

Reply
49. Nick Strange - March 27, 2012: The simple answer (but not verry helpful) is that the stats should reflect the data.
Hopefully more helpful is that a material change to the data should result in new stats. A material change can be as little as as a single row change that may for instance insert a new high value into a column. It may also be that enough time has passed to make all our previous high values too far out of date (as Richard points out above). The trick is identyfing what is a material change. You need to understand your data, how it changes and how its used. 90(ish)% of the time collecting stats when they are maked as stale works well. The trick is identyfing the rest.

LikeLike

Reply
Richard Foote - April 5, 2012: Hi Nick

You’re not related to Steve are you ?

Yes, I’m an old Visage fan 🙂

Absolutely, having stats that accurately (enough) refects the data is very important.

LikeLike

Reply
Nick Strange - April 5, 2012: Hi, Im not related to Steve – I don’t think.

I should have added that sequences often prove to be problematic from a stats point of view. Given that they usually increase and that new high values in columns can be associated with stale stats – they are nearly as bad as dates. Both sequences and dates can easily account for the ‘nothing has changed but the database is slow today’ comments that I sometimes hear.

LikeLike

Reply
50. Richard Foote - May 7, 2012: Hi Nick

Steve Strange brings back lots of fonds memories from my days growing up in Manchester, UK:

http://en.wikipedia.org/wiki/Visage

🙂

LikeLike

Reply
Nick Strange - May 7, 2012: Another common issue that I often see that causes the plan to change ‘for no reason’ is that the sql is badly written, and/or the data model is not verry good, What can then happen is that the optimizer get lucky and produces a good plan – under these conditions the plan can be verry sensitive to small changes and can collapse horribly. I thinking of thinks like UPPER(col), TRUNC(Col), columns with strong dependencies between them , redundant predicates, the list is pretty long. I recomend that developers check xplan even for sql that is behaving well to ensure that the optimizer understands the requirement. The optimizer needs to understand two things – your data model and what you want to get ouit of it. It is is easy to get either or both wrong – throw in out of date stats as well and you have a recipe for disaster.

LikeLike

Reply
Richard Foote - May 11, 2012: Hi Nick

Certainly, the worse the design, the more sensitive the application and the more common and likely “loose wires” can suddenly become disconnected 🙂

LikeLike

Reply
51. Mohammed - October 17, 2012: Hi Richard,

Thanks for sharing good information.. its very useful…

I have done the same test case… Please have look below.. the database version is 10.2.0.4 and cursor_sharing parameter is Exact.

I have done the test but first query itself getting into index scan rather than full table scan as expected…

Please shed some light on this..

SQL> select * from v$version;

BANNER
—————————————————————-
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi
PL/SQL Release 10.2.0.4.0 – Production
CORE 10.2.0.4.0 Production
TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 – Productio
NLSRTL Version 10.2.0.4.0 – Production

SQL>

SQL> show user
USER is “SYS”
SQL>
SQL>
SQL> create user scott identified by tiger account unlock;

User created.

SQL> grant connect,resource to scott;

Grant succeeded.

SQL> grant dba to scott;

Grant succeeded.

SQL> conn scott/tiger
Connected.
SQL>
SQL>
SQL> show user
USER is “SCOTT”
SQL> select * from tab;

no rows selected

SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL> create table muse (id number, muse_date date, name varchar2(10));

Table created.

SQL> set timing on
SQL> declare
v_count number;
begin
v_count:=0;
for i in 1..1830 loop
for j in 1..1000 loop
v_count:= v_count+1;
insert into muse values (v_count, sysdate-i, ‘MUSE’);
end loop;
end loop;
commit;
end;
/

PL/SQL procedure successfully completed.

Elapsed: 00:02:32.13
SQL> SQL>
SQL>
SQL> create index muse_i on muse(muse_date);

Index created.

Elapsed: 00:00:05.82
SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats(ownname=>’SCOTT’, tabname=>’MUSE’, cascade=>true, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

Elapsed: 00:00:14.22

SQL> select * from muse where muse_date > sysdate – 365;

364000 rows selected.

SQL> set autotrace traceonly explain
SQL>
SQL> set lines 180
SQL> select * from muse where muse_date > sysdate – 365;
Elapsed: 00:00:00.00

Execution Plan
———————————————————-
Plan hash value: 3330108091

————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————–
| 0 | SELECT STATEMENT | | 364K| 6401K| 1286 (3)| 00:00:16 |
| 1 | TABLE ACCESS BY INDEX ROWID| MUSE | 364K| 6401K| 1286 (3)| 00:00:16 |
|* 2 | INDEX RANGE SCAN | MUSE_I | 364K| | 500 (3)| 00:00:07 |
————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“MUSE_DATE”>SYSDATE@!-365)

SQL> set autotrace off
SQL>
SQL>

SQL> select sql_text ,sql_id,plan_hash_value,executions from v$sql where sql_text like ‘select * from muse%’;

SQL_TEXT SQL_ID PLAN_HASH_VALUE EXECUTIONS
———————————————————— ————- ————— ———-
select * from muse where muse_date > sysdate – 365 0kxrbzks20vjf 3330108091 1

Elapsed: 00:00:07.07
SQL>
SQL>

Now added one more year data as per artical…

SQL> declare
v_count number;
begin
v_count:=1830000;
for i in 1..365 loop
for j in 1..1000 loop
v_count:= v_count+1;
insert into muse values (v_count, sysdate+i, ‘MUSE’);
end loop;
end loop;
commit;
end;
/ 2 3 4 5 6 7 8 9 10 11 12 13

PL/SQL procedure successfully completed.

Elapsed: 00:00:55.60
SQL>
SQL>

Here I hve executed this below query and got the response of 365000 rows…

SQL>select * from muse where muse_date > (sysdate+365) – 365;

365000 rows selected.

Elapsed: 00:00:47.57
SQL>
SQL>
SQL> set autotrace traceonly explain
SQL> set line 180
SQL>
SQL> select * from muse where muse_date > (sysdate+365) – 365;
Elapsed: 00:00:00.00

Execution Plan
———————————————————-
Plan hash value: 3330108091

————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————–
| 0 | SELECT STATEMENT | | 923 | 16614 | 5 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| MUSE | 923 | 16614 | 5 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | MUSE_I | 923 | | 3 (0)| 00:00:01 |
————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“MUSE_DATE”>SYSDATE@!+365-365)

SQL>

SQL> select sql_text ,sql_id,plan_hash_value from v$sql where sql_text like ‘select * from muse%’;

SQL_TEXT SQL_ID PLAN_HASH_VALUE
———————————————————— ————- —————
select * from muse where muse_date > sysdate – 365 0kxrbzks20vjf 3330108091
select * from muse where muse_date > (sysdate+365) – 365 b1gsudzjfra3f 3330108091

Elapsed: 00:00:07.46
SQL>
SQL>
SQL>

SQL> set autotrace off
SQL> exec dbms_stats.gather_table_stats(ownname=>’SCOTT’, tabname=>’MUSE’, cascade=>true, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

Elapsed: 00:00:13.84
SQL>

After gather again executed same SQL…

SQL> select * from muse where muse_date > (sysdate+365) – 365

365000 rows selected.

Elapsed: 00:00:44.16

SQL> set autotrace traceonly explain
SQL> select * from muse where muse_date > (sysdate+365) – 365;
Elapsed: 00:00:00.01

Execution Plan
———————————————————-
Plan hash value: 3330108091

————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————–
| 0 | SELECT STATEMENT | | 364K| 6415K| 1253 (3)| 00:00:16 |
| 1 | TABLE ACCESS BY INDEX ROWID| MUSE | 364K| 6415K| 1253 (3)| 00:00:16 |
|* 2 | INDEX RANGE SCAN | MUSE_I | 364K| | 493 (4)| 00:00:06 |
————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – access(“MUSE_DATE”>SYSDATE@!+365-365)

SQL>

Please suggest…

Thanks

Mohammed.

LikeLike

Reply
Richard Foote - October 17, 2012: Hi Mohammed

The “problem” here is that the costs of a FTS are more expensive in your environment. If you use the FULL hint or a 10053 trace, you will see that the cost of a FTS is something above the 1286 cost of using the index.

Note your example almost worked in that the cost of the SQL dropped down to just 5 in the next year example.

You just need to reduce the costs of a FTS a tad. Reduce (say) the MREADTIM and/or increase the MBRC system stats values to reduce the overall cost of using the FTS.

LikeLike

Reply
52. Rob - December 31, 2012: Great article. Thanks for sharing. We don’t gather stats weekly in our environment for this very reason…To be fair, we’ve never had an issue. We gather stats when we run upgrades and rebuild/modify tables/indexes. We removed the stats gathering because it was causing SQL plan changes…Bit of a catch-22 situation.Certainly food for thought.

LikeLike

Reply
Richard Foote - February 7, 2013: Hi Rob

It’s what makes managing databases such fun 🙂

LikeLike

Reply
53. daniesh - February 10, 2013: Hi Richard,

Thanks for this. Even i am facing slowness issue. I suspect this is due to wrong execution plan.While refreshing the materialized one DEL query is taking more than 1 hr . However there is only 25 rows to be deleted.

DELETE FROM “APPS”.”GL_BAL_MV
” SNA$ WHERE “C3” IN (SELECT /*+ NO_MERGE */ * FROM (SELECT
CHARTOROWID(“MAS$”.”M_ROW$$”) RID$ FROM
“GL”.”MLOG$_GL_SETS_OF_BOOKS” “MAS$” WHERE “MAS$”.SNAPTIME$$ > :B_ST1
) AS OF SNAPSHOT(:B_SCN) MAS$)

Plan hash value: 2704021294

——————————————————————————————————-

PLAN_TABLE_OUTPUT
————————————————————————————————————————————————————————————————————————————————————————————————————
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time |
——————————————————————————————————-
| 0 | DELETE STATEMENT | | | | 339K(100)| |
| 1 | DELETE | GL_BAL_MV | | | | |
| 2 | NESTED LOOPS | | 1 | 168 | 339K (1)|999:59:59 |
| 3 | SORT UNIQUE | | 1 | 138 | 2 (0)| 00:02:31 |
| 4 | TABLE ACCESS FULL| MLOG$_GL_SETS_OF_BOOKS | 1 | 138 | 2 (0)| 00:02:31 |
| 5 | INDEX RANGE SCAN | C3BOOKS | 110M| 3164M| 339K (0)|999:59:59 |
——————————————————————————————————-

In MLOG$_GL_SETS_OF_BOOKS tables there is only 25 rows. I should 25 rows from MLOG$_GL_SETS_OF_BOOKS tables. the query is using index C3Books which is correct. rows it is showing 110M. Could you please assist me in knowing me that query is using wrong execution plan .
Row would have been 25 here.

Please suggest

Thanks

LikeLike

Reply
54. itguy69 - December 19, 2013: Hi
I am not DBA Just IT Support
I had reached this forum googling around after 72 hours of downtime where processing was slowed down and finally root cause was found to be “execution plan changed”.
What I am amazed is that this is a known issue since 2010 and Oracle has done nothing about it.
Not a single post points a finger at Oracle.
I am pretty sure if OS freezes and locks up live production activity a lot of heads would roll and would be asked to give the remediation.
However what would this blog give as a remediation for this.
IMHO I would call it a bug.

LikeLike

Reply
55. Richard Foote - January 3, 2014: Hi ITGuy

What I find totally remarkable is that it took a full 72 hours to determine that an execution plan changed so negatively !!

Note that Oracle has had a number of mechanisms for many many years such as stored outlines, sql profiles, sql baselines, etc. to help prevent the sudden change of execution plans.

Execution plans change, the key is to try and make sure they only change for the better and the key here is the DBAs 🙂

LikeLike

Reply
56. Jatin - February 7, 2014: I think the example that you gave above different queries were used to get the data from the muse table.
Possibility can also be that in the second case the query went for hard parse and changed the query plan. Not necessarily because of the data change in this example.

LikeLike

Reply
Richard Foote - February 14, 2014: Hi Jatin

For the plan to change, then a re-parse of some description would have taken place …

I used a different query because I didn’t particularly want to wait a whole year to get my point across, that being when you indeed re-parse a query, Oracle might adopt a totally different execution plan even though the object statistics haven’t changed 🙂

LikeLike

Reply
57. Charlie Callaway - May 15, 2014: Hello Richard–

We have a legacy app on 11.2.0.2 that is running RBO because the vendor won’t certify it on CBO. We had an execution plan change out of the blue that caused heartache until I used a baseline to revert it back to the “good” plan. Since it is on RBO, what could have caused it to change? I’m not even sure where to look for documentation on RBO anymore, much less “why did my RBO execution plan change”?

I checked all the underlying objects, there were no modifications.

Thoughts?

Thank you very much.
Charlie.

LikeLike

Reply
Richard Foote - July 8, 2014: Hi Charlie

It’s been a looooong time since I’ve had to tune a RBO application. The most common reason for a plan change in the old days was that some stats were introduced that caused the CBO to kick in but on 11.2, with no object modifications, not really sure. Perhaps the nature of the plan change might trigger some thoughts ?

LikeLike

Reply
58. Ajay - August 6, 2014: Hello Richard,

Thank you for the clarification. I few question give below, please let me know where I have made a mistake –

1) I have created the table and index and gathered statistics as per your example, and used the query – “select * from muse where muse_date > (sysdate+365)-365;”, the CBO have opted for “INDEX RANGE SCAN” and the estimated rows are 903. So my question is why “INDEX RANGE SCAN”? I expected it to be FTS
2) Then I have inserted additional rows, using your example and does not gathered the statistics, then used query – “select * from muse where muse_date > sysdate-365;”, the CBO performed “FTS”, even though I haven’t gathered the statistics. So my question is if both queries are same then why CBO is choosing FTS for one query and IDXSCAN for other? Does rephrasing the predicate alter the plan?

Thank you in advance

Ajay.

LikeLike

Reply
Richard Foote - September 3, 2014: I think the answer to both questions can be found in this blog post 🙂

It used the index in your first example because 903 is cheaper than the FTS at the time and visa versa in the second example (confirm this with appropriate hints). Although it’s logically the same query, they’re actually looking at different windows of data and the window slowly moves away from the known ranges of data as stored in the schema over time. This in turn can change the associate costs of then running the query as the CBO thinks it’s getting less data than it really is.

LikeLike

Reply
ajay - September 4, 2014: Thank you so much Richard for your explanation. Here is what I have understood from your comment –
As you said “they’re actually looking at different windows of data and the window slowly moves away from the known ranges of data as stored in the schema over time”, how do they come to know of this window of data, even though we have not collected the statistics, in case your statement is true, then CBO is generating statistics while generating the execution plan, irrespective of the statistics available, to understand about the slowly moving window of data. Kindly let me know if this is correct.

Thank you
Ajay

LikeLike

Reply
59. Richard Foote - October 8, 2014: Hi Ajay

Oracle takes the maximum (or minimum) known value for a column and compares this with the required data is see if it’s outside the known range.

It then determines how far past the known range by comparing this to the full range (both min and max) and determines by what percentage it exceeds the listed full range.

So simplistically, if we have a min value of 0 and a max of 100 and we search for a value of say 142, this is outside of the known value and it’s out by a factor of some 42% of the full range of known values. So Oracle reduces the selectivity by 42% when determining how many rows will likely be returned. So the further out you are from the known range of values, the more the estimate is reduced until you are out by the full range of known values when Oracle now begins assume no rows are returned.

So if you don’t update your stats, and the min/max values become more and more stale, then the less and less the CBO will assume will actually be returned over time. Fine if indeed true, perhaps not so good if not true.

LikeLike

Reply
Ajay - October 9, 2014: Thank you so much Richard for all your time and efforts.

Thanks
Ajay

LikeLike

Reply
60. vamshee - April 3, 2015: Hats off to you Richard for this excellent post and the explanation in detail !!!
We recently had this exact scenario on our Prod where the execution plan for certain set of queries changed without a single change on the database (In my case, this issue still occurred even though we already had a weekly stats job in place ). and everyone is under the impression that it is the stats that changed the plans.
Overall, it ‘s been very challenging to make understand the root cause details with the business users.
I wish Oracle has come up with such explanation that we can show it to the world 😉

LikeLike

Reply
Richard Foote - June 24, 2015: Hi vamshee

No worries. Yes this can all certainly be a challenge 🙂 One of the benefits of SQL Plan Baselines is that you can have some control over how plans change and mutate but even baselines has its own challenges.

LikeLike

Reply
61. Krishna - May 27, 2015: Excellent Sir. As per my understanding when the data changes(modifications or new data being added) the table would become stale. From Oracle version 10 we have automatic stats gathering and so we expect the optimizer would take the best plan. If a query which was performing good till yesterday is performing bad today. What all could be the reasons? Please correct me if my understanding is not correct. Thank you so much.

LikeLike

Reply
Richard Foote - June 24, 2015: Hi Krishna

Yes, you can have automatic stats jobs that do as you say but many environments want to have more control over how this important aspect of database management is handled. We expect the CBO to take the best plan but for a whole bunch of reasons, even with the automatic stats job in place, this might not be the case.

Some of these reasons include not collecting statistics for all necessary objects within maintenance window, insufficient accurate enough stats, skew data and missing histograms, outlier datasets, insufficient partition-based stats, insufficient extended-based stats, incorrectly set optimizer parameters, incorrect or out-of-date system statistics, complex queries and inability to determine correct join cardinalities, bugs to name a few. A very common reason I come across frequently is having embedded, outdated hints in the SQL that forces the CBO down inappropriate paths.

LikeLike

Reply
Nick Strange - June 24, 2015: stale stats are a bad (at best) way of keeping stats up to date. If you have an extremely large table – then it may take weeks to get to 10% changes. One important indicator for the optimiser is the max and min values on join columns. In the case of the big (not yet stale) table the max and min values can be a long way out of date. And getting further out of date – despite the fact ‘that nothing changed’

LikeLike

Reply
62. Richard Foote - June 25, 2015: Hi Nick

Yes I agree and it’s one of the reasons for my mentioning not accurate enough stats as a cause for why the best plan may not be chosen.

The whole issue of min/max values is one of the key points I raised in this blog piece.

LikeLike

Reply
63. Kumar - July 15, 2015: Dear Richard,
I have two different database and executed the same query of those two. Received 2 different execution plan.

One is like
Id | Operation | Name
0 | select statement |
1 | count |
2 | index range scan descendng(idx_pk_id)

other one is like
Id | Operation | Name
0 | select statement |
1 | sort order by |
2 | index fast full scan (idx_pk_id)
I just want to understand why this is different. I wonder about ‘SORT ORDER BY’ how it will happen if the same query used for the analysis.

Please clarify and advise.

LikeLike

Reply
Richard Foote - October 20, 2015: In the 1st execution plan, Oracle walks through the index slowly in descending order, single leaf block reads at a time, getting the data in the correct order as data is ordered as required within the index.

In the 2nd execution plan, Oracle treats the index as a skinny table, reading the whole thing efficiently using multi-block reads, but therefore not in a sorted manner and so requires the additional sort step to get the data sorted as required.

Possibly reasons why they differ in the databases are that sorts are less expensive in the 2nd database (perhaps larger PGA size) or that multi-block reads are cheaper in the 2nd database (perhaps differing system statistics).

A 10053 trace should tell all.

LikeLike

Reply
64. Suresh Kumar - September 29, 2015: Hi, A basic question: If I am creating a new table and its indexes, do i need to gather stats for them?
Another situation is, if I create a partitioned table and populate it with one of existing table, rename the existing table with some other name, and then name the newly partitioned created table with the one which we just renamed. Recreate all indexes for this table.
Do I need to gather stats for this newly created (renamed) table?
Is that necessary? please suggest.

LikeLike

Reply
Richard Foote - October 20, 2015: Depends on how you create and load the table initially and version of Oracle: https://richardfoote.wordpress.com/2014/07/09/12c-index-like-table-statistics-collection-wearing-the-inside-out/

When you build or rebuild an index, statistics are automatically collected.

LikeLike

Reply
65. Como pode um plano de execução mudar quando não há alterações no banco? | DBTimeWizard - April 4, 2016: […] https://richardfoote.wordpress.com/2010/02/16/how-does-an-execution-plan-suddenly-change-when-the-st… […]

LikeLike

Reply
66. Pin old execution plan to memory | datacartondotcom - April 25, 2016: […] Explanation by Richard Foote […]

LikeLike

Reply
67. Borra Rao - October 3, 2016: Unable to run this function in postgresql:
SQL> declare
2 v_count number;
3 begin
4 v_count:=0;
5 for i in 1..1830 loop
6 for j in 1..1000 loop
7 v_count:= v_count+1;
8 insert into muse values (v_count, sysdate-i, ‘MUSE’);
9 end loop;
10 end loop;
11 commit;
12 end;
13 /

LikeLike

Reply
68. Tony - October 10, 2016: Hi Richard,

i agree with the fact that when the data especially in cases where the optimizer doesn’t fed with the correct distribution of data, it has to make assumptions or defaults built within.

But if we compare your test to real world applications, the SQL generated from the app is still remain the same.In your demo, the 2nd SQL is different from the first one. It is expected that optimizer would produce bad plans with outliers or not fed with right statistics.

With all respects to your work, the demo is not mimicking the reality.

Thanks

LikeLike

Reply
Richard Foote - November 24, 2016: Hi Tony

I think you’ve missed the point here, the demo is indeed “mimicking” reality.

But that’s fine. If you want to actually replicate reality, simply run my first demo, add the years worth of data and then just wait a year so that the sysdate has progressed a full year and then run the same SQL again.

You’ll find you get the same results as I highlighted, except you’ve had to waste a year waiting …

LikeLike

Reply
69. pazyp - November 18, 2016: Reblogged this on pazikas.com.

LikeLike

Reply
70. Derrick C - November 2, 2017: Thanks for this awesome work. I had some doubts which are now cleared.
re engineering services

LikeLike

Reply
71. Biswajit - June 20, 2018: Thank you for your clear explanation.
Just wondering if we collect stats during runtime on a table, will the same effect takes place when the query will be launched the second time.
Let say, if I see a query is slow and during runtime, I execute the stats for that table XXXX. After the stats collection is finished, do you think that the same query will be parsed again and the execution plan will change.

Pardon me, if you think this is not the correct place to ask the question.

LikeLike

Reply
Richard Foote - June 20, 2018: Hi Biswajit

This behaviour can be controlled with the no_invalidate option/parameter when you collect statistics.

LikeLike

Reply
72. Ryan - September 14, 2019: nice article Richard!

LikeLike

Reply

	PP on Costing Concatenated Indexes W…
	Zeeshan Moin on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	PP on The End Of Richard Foote…
	Ram on The Fake Index Trap (“No…
	Richard Foote on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Sayan Malakshinov on The End Of Richard Foote…

Richard Foote's Oracle Blog