jump to navigation

DBMS_STATS METHOD_OPT default behaviour changed in 10g. Be careful … January 4, 2008

Posted by Richard Foote in Index statistics, Oracle Cost Based Optimizer, Oracle General, Performance Tuning, Richard's Musings.
trackback

A question on the OTN forum has prompted me to quickly knock up a demo on the possible dangers of the default behaviour in 10g with regard to the METHOD_OPT option in DBMS_STATS.

When collecting statistics with DBMS_STATS in 9i, the default value of METHOD_OPT was ‘FOR ALL COLUMNS SIZE 1′. This basically says to Oracle please only collect basic column statistics (min, max, distinct values etc.), do not collect histograms on these columns. For columns that are evenly distributed and for columns that are not referenced in SQL statements, this is perfectly adequate. If a column was unevenly distributed and detrimentally impacted the CBO’s costings of an execution plan, then one could generate histograms for those particular columns separately.

However, this default behaviour changed in 10g and IMHO this change is possibly the most significant and problematic difference when migrating to 10g.

The new default value of METHOD_OPT with 10g is ‘FOR ALL COLUMNS SIZE AUTO’. This basically means that Oracle will automatically decide for us which columns need histograms and which columns don’t based on what it considers to be the distribution of values within a column and based on the “workload” associated with the table (basically are there any SQL statements running in the database referencing columns which might need histograms for those statements to be costed correctly).

This sounds like an ideal scenario, just let Oracle work it out for us.

However, the problem is that Oracle in many cases doesn’t do a particularly good job at determining when it should generate a histogram and when it shouldn’t. In fact, the likelihood is that Oracle will actually generate many many many unnecessary histograms while still missing out on some columns that should have them.

In environments with few tables and with few users executing few distinct SQL statements, the impact of some unnecessary histograms may be minimal. However in environments with many tables and columns (potentially many thousands) with many users executing many different SQL statements, the ramifications of potentially suddenly having thousands of additional histograms can be disastrous.

Note also that by having a histogram, Oracle changes the manner in which the DENSITY statistic for a column is calculated (as stored in DBA_TAB_COLUMNS). This is often used by Oracle to determine the selectivity of predicates so the impact of suddenly having additional unnecessary histograms can be wider and more significant than one might initially imagine.

Of course, the impact on the shared_pool and the row_cache and it’s associated latches in particular can be extremely ugly indeed if suddenly Oracle had to deal with thousands of new histograms when parsing statements.

This silly little demo, “Dangers of default METHOD_OPT behaviour in 10g“,  creates a simple little table with three columns. The first column has an outlier value and as previously discussed here, a histogram might be required to correctly cost range scans. The second column is perfectly distributed, it has 10 distinct values with 100,000 occurrences of each. The third column is also perfectly distributed but it’s a special example in that it has only 1 distinct value.

As you can see by the results of the demo, Oracle has got it wrong one way or the other in varying degrees in all three examples. It hasn’t created a histogram when it was needed and created histograms when they weren’t needed, impacting the Density column statistics as a result.

My advice. Just be very careful when using the default method_opt ‘FOR ALL COLUMNS SIZE AUTO’ behaviour in 10g.

About these ads

Comments»

1. Donald K. Burleson - January 4, 2008

Hi Richard,

Very well done, and I like the offline justification.

“However in environments with many tables and columns (potentially many thousands) with many users executing many different SQL statements, the ramifications of potentially suddenly having thousands of additional histograms can be disastrous.”

That is the core issue! Bingo.

It’s a perpetual trade-off between sub-optimal plans and unneeded overhead.

An exploration of “intelligent histogram creation”, would be great. You could use AWR to correlate the SQL to the objects, avoiding histograms that are never used, and develop a method to only create histograms that “make a difference”.

Good stuff Richard . . . .

PS I gotta ask, WTF is a radiohead? Is that like a Pothead?

2. Neil - January 4, 2008

“WTF is a radiohead?”

Only “On a Friday” would someone ask this…

Excellent blog by the way Richard, really good content

3. Jason Bucata - January 4, 2008

I’d be curious to see how this works with SIZE SKEWONLY, not taking into account the “workload” figures.

4. Jason Bucata - January 4, 2008

I gave it a try myself just now. I reproduced your original behavior on 10.2.0.3 (WinXP), then tried SKEWONLY and saw that it did the same thing.

5. Greg Rahn - January 4, 2008

As mentioned the default for METHOD_OPT has changed from 9i to 10g, but I think it is worth noting that because workloads evolve, so do the stats, and in this case, the potential for histograms to be collected. New histograms can be collected the first time stats are gathered in 10g (from an 9i upgrade) as well as any time the SQL workload and data change to trigger this.

I am not sure that I agree with your view that a histogram should be collected on column ID1. There is only one value that is outside the contiguous range of values and the query you used has an equality predicate, not a range predicate. If there was 1000 distinct values with ranges from 1 to 100000000 and none of them were contiguous or they were in several contiguous ranges, and the workload used range predicates, it might be more plausible that queries could benefit from a histogram.

For example, if you take your same example and run this update statement:
update hist_test set id1 = id1+50000 where id1 between 500 and 999;
And add this query to your workload:
select * from hist_test where id1 between 1 and 999;
and then gather stats again using FOR ALL COLUMNS SIZE AUTO you should see a hight balanced histogram of 254 buckets.

6. Richard Foote - January 4, 2008

Hi Don

Thanks for your comments. The CBO, statistics and indexes all go hand in hand so I plan to discuss them all going forward

BTW, a Radiohead is very similar to a Pothead, only slightly different in shape ;)

7. Richard Foote - January 4, 2008

Hi Jason

Thanks for your comments for for having a play with it all.

SKEWONLY is very similar to AUTO, only worse. It doesn’t even consider workload activity and is likely to produce even more unnecessary histograms.

Also something to be careful of using in larger scale environments.

8. Richard Foote - January 4, 2008

Hi Greg

Yes, very good point, my demo (which I have to admit I put together in a bit of a rush after I saw the OTN thread last night) didn’t produce a very good workload for the first column.

I’ve now changed the demo to now illustrate how problematic the query on the first query could be with a range scan that doesn’t perform at all well due to the outlier value and how the AUTO option still doesn’t generate the histogram which would help.

Hopefully this makes it a lot clearer what I was trying to highlight and how the AUTO option doesn’t deal particularly well with outlier values.

9. Richard Foote - January 4, 2008

Hi Neil

Thanks for the feedback. I dedicate the blog to all the Paranoid Androids out there ;)

10. Richard Foote - January 5, 2008

Hi Greg

I’ve now updated the demo yet again by making the table somewhat larger to highlight the differences in costs a larger table may have with all this to to show how a histogram would have been useful in dealing with an outlier value by creating a histogram manually at the end.

Thanks again for picking up the weakness with the original demo.

11. martin W - January 5, 2008

Hi Richard,

I am not a big fan of this new behaviour either. Oracle’s implementation seems sub-optimal and it is turned on by default when you go to V10 (though I think ‘strongly encouraging’ regular stats collection generally is a positive step forward).

Histogram stats and large numbers of partitions can also lead to your SYSTEM tablespace increasing in size under V10 also. This can lead to your SYSTEM tablespace suddenly increasing from the usual 1/2GB or so to a couple of GB. I’ll need to be back in the office to qualify this with a couple of references though.

The issue seems to be that, with the new default behaviour under 10 of gathering histogram stats where Oracle decides it is sensible, all partitions now get far more histogram data. Each time a partition is DBMS_STATS.GATHER’d (ensured by the nightly/weekend DBMS_STATS_PROC_JOB when the object becomes stale) the old stats are stored away so you can return to them.

Lots of partitions, lots of column histograms, lots of stats gathering as the table grows…It creates a lot of data. The increase of SYSTEM tablespace may catch you out, but so can anything looking at the historgram stats on tables with lots of partitions. I’ve not yet dug into the actual impact on the system of all this data but it makes my hand-cut scripts to look at histogram stats run very slowly :-(.

It does become less of a problem as you get beyond a month past migrating to ten as Oracle trims the stored stats gathering information but it is something worth knowing about if you are migrating a large system with many, many partitions to V10.

12. Richard Foote - January 6, 2008

Hi Martin

Thanks for your comments.

I think the biggest issue with collecting histograms unnecessarily is the subsequent impact it has on the parsing of statements and the additional stress and overheads it puts on the shared pool and the row cache in particular.

The golden rule with regard to histograms (or at least my golden rule) is to create them only when they’re required and don’t create them when they’re unnecessary.

The AUTO option breaks this rule because it doesn’t necessarily create them when they’re required but more importantly, it creates way way way too many unnecessarily.

And lots of unnecessary histograms just add overheads for no benefit, which is never a good thing.

13. Alberto Dell'Era - January 6, 2008

Hi Richard,

I absolutely agree on your caveat about using SIZE AUTO.

A possibly interesting observation: of course, density is used for the cardinality estimation (of equality predicates) only for unpopular values, and since the Frequency Histogram on ID2 shows only values whose multiplicity is > 1, they are all popular. Hence if you select an existent value, the cardinality estimation is exact, otherwise, if you select a non-existent value, the cardinality estimation is 1 (actually 0.5 rounded up to 1) – as exact as it can be:

select * from hist_test where id2 = 6;

———————————————–
| Id | Operation | Name | Rows |
———————————————–
| 0 | SELECT STATEMENT | | 100K|
|* 1 | TABLE ACCESS FULL| HIST_TEST | 100K|
———————————————–

select * from hist_test where id2 = 6.1;

———————————————————
| Id | Operation | Name | Rows |
———————————————————
| 0 | SELECT STATEMENT | | 1 |
| 1 | TABLE ACCESS BY INDEX ROWID| HIST_TEST | 1 |
|* 2 | INDEX RANGE SCAN | T_ID2_IDX | 1 |
———————————————————
(NB: I’ve added T_ID2_IDX on ID2)

Without an histogram, the cardinality for “6.1” would be 100K as well, not accurate at all, and producing a useless FTS.

So it might seem that the histogram is doing a good job – generating a more accurate plan for all values.

But, enter bind variables and bind variable peeking: no problem if the value of the bind variable is always an existent value (or always a non-existent one) – big problem if the application selects a mix of existent and non-existent values (or, a value that is not represented in the histogram since it was added after the stats collection) thus generating a plan that is optimal only for a part of the statements. It would be probably better, in this case, to get rid of the histogram, since a single non-optimal range scan of 100K rows (thus issuing potentially 100K consistent gets on the table) is probably enough to consume a noticeable share of the resources/latches … oh, unless the application almost always selects non-existent values …

So, it really depends on the data, the workload, whether the statements are binded or not … and only a (competent) DBA or Developer has all the informations needed to make a good decision.

14. Richard Foote - January 7, 2008

Hi Alberto

Thanks for your valuable comments.

I think your last statement sums it all up perfectly !!

Just tying this up with my previous discussion on constraints, if I had a table with a column such as ID2, which only has a smallish number of possible values (eg. 1,2,3,4,5,6,7,8,9,10) that were evenly distributed, I would have no histograms as it would be somewhat useless for values that exist AND a validated, non-deferrrable check constraint which would protect us from values that don’t exist (10g onwards).

But even so, it all still depends on bind variables, the workload etc. as you correctly stated.

15. Arul Ramachandran - January 9, 2008

Hi Richard

This is definitely an item to watch out when upgrading to 10g.

Jonathan Lewis had mentioned this before. I am not sure when I came across this, either in his CBO book or his blog entry in Feb-2007: http://jonathanlewis.wordpress.com/2007/02/02/10g-upgrade/

Nevertheless, your blog and the comments make this quite interesting.

One other thingy that has been sneaked into 10g is the default for NO_INVALIDATE. In 9i the default is FALSE, but in 10g the default is DBMS_STATS.AUTO_INVALIDATE – this would enable Oracle to automatically decide whether to invalidate dependent cursors or not. One more thing to watch out for.

-Arul

16. Richard Foote - January 9, 2008

Hi Arul,

Thanks for your input. The change in no_invalidate is indeed another trap for the unwary in 10g …

17. 10g Upgrade - issues with DBMS_STATS package « Mani’s Oracle Scratchpad - January 28, 2008

[...] Note:  I later found Richard Foote has also written an excellent article on this : http://richardfoote.wordpress.com/2008/01/04/dbms_stats-method_opt-default-behaviour-changed-in-10g-… [...]

18. VKasi - February 26, 2008

Hi Richard,

This might be very basic…!
We have some tables where stats are collected only for indexed columns with no histograms…(method_opt => for indexed columns size 1).
Currently, the num_distinct,low_value,high_value,density,num_nulls ar all zero for all other columns except for columns that are indexed.
My understanding is that it is necessary to have basic stats on all columns and not just the indexed columns even when they are not part of the predicates as they would influence the optimiser’s decision on selectivity of those unindexed columns.
But, i am not sure on this.
Could this anamoly influence the execution plan?

We are on 9.2.0.8.

19. Richard Foote - February 27, 2008

Hi Vkasi

Absolutely.

You really should collect the minimum statistics on all your columns, not just the indexed ones. Without them, Oracle will be unable to get accurate cardinality estimates for the various steps in your execution plans not involving indexed columns, such as the cardinality of unindexed predicates and many of your join results. If these are inaccurate, then the order and position of subsequent joins could be sub-optimal, resulting in poor execution plans.

In some cases, dynamic sampling may assist but only in some cases.

Start collecting stats …

20. Yosi - June 18, 2008

Some syntax how to check the variables you mention, will be useful. Thanks.

21. Sam - August 12, 2008

I know I am late to the table, we have just finally upgraded to 10g. Performance was bad untill we made sure all instances had good system stats.

We have a couple of schemas which still perform badly. One is a warehousing schema which contains many partitioned tables. I am not sure why but when we gather stats manually on the affected tables (those performing badly during data loads). Performance returns to normal. It seems like the auto stats process is not gathering good stats on these tables. The script we have been using did not set METHOD_OPT so we were using the 10g default.
The second schema is an OLTP system which performs many updates to quantity columns, only a small percentage of the other attributes change. Oracle has gathered many histograms on this schema.
I would like to change the METHOD_OPT paramater to the 9i default. However, since the auto stats gathering process also gathers stats on system tables. I am concerned that changing the default METHOD_OPT will impact stats on the system tables. Is there a danger in setting ‘METHOD_OPT’ to ‘FOR ALL COLUMNS SIZE 1′ for the gathering of stats on the system tables?

We have been using all the defaults with disasterous results using OEM to view tablespaces and files. Response is pitiful against LMT (as much as 5 minutes) but great against the system tablespace which is still DMT (1 -2 seconds). This applies only to 10g databases. We still have a few 9.2.0.7 because 3rd party vendors are not ready. A TAR with oracle did not help. I gave up after more than a month of running queries and submitting trace file after trace file.

Do you have any comments on this?

22. Richard Foote - August 13, 2008

Hi Sam

Yes, we also had issues with execution plans with partitioned objects as well, several of which Oracle Support were not able to sort out.

We solved most issues by manually calculating global partition level stats and by using stored outlines on any remaining nasties.

My suggestion would be to use SIZE 1, even for system statistics. Only use histograms when you need to on a case by case basis. You simply don’t need or desire 1000s of unnecessary histograms which is the danger of the auto options, which can be just as disasterous with system generated tables and code.

Updated: That should have read manually calculating partition level stats

23. Anantha - September 26, 2008

We do have an environment,upgraded from 8i(using RBO) to 10g.

The schema has about 4000 objects with 21,000 indexes,which is a packaged application. The SQL’s are generated dynamically.
The performance is acceptable till now and we have been using the METHOD_OPT’ to ‘FOR ALL COLUMNS SIZE AUTO and ESTIMATE_PERCENT=>DBMS_STATS.AUTO_SAMPLE_SIZE for Stats Collection. All of a sudden last week,due to one Single SQL,the Application crawled,bringing the application to a stand still. Finally,We identified a Single table had casued the performance issue and we computed statistics and now the reponse is flying.

can you throw some light on what could have been wrong and how to correct these issues?

24. Richard Foote - September 29, 2008

Hi Anantha

Looks like you solved issue yourself !!

The fact you collected stats and things went bad and then recollected full stats and things went good again suggests there was a problem with the accuracy of your stats. The danger of collecting sample statistics.

Always a good idea before you make such changes to try and fix things that you try and quickly capture some information while things are bad. The previous stats and the previous execution plan and costings would have been useful in seeing what was wrong. The new stats and the new execution plan will then confirm how things need to be and the differences between the two would have told you much.

It only takes one bad SQL to bring a system down to its knees, it only takes one bad statistic to make a SQL statement go belly up.

My recommendation is not to use Size AUTO, it’s problematic and dangerous in my experience.

Mohammed Mehraj Hussain - August 5, 2009

Hi Richard ,
i faced an issue with the generation of histograms in Oracle 10.2.0.4 with the method_opt parameter defaults to auto..
we got
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
then we degrade to the Oracle 9i default value of for all columns size 1.
Oracle support Reports it as a Bug..
I want to know whether Using of auto_sample_size for estimate_percent will it be Problematic?

Richard Foote - August 20, 2009

Hi Mohammed

I would indeed hope Oracle Support treat it as a bug and not as some kinda feature !!

I haven’t had issues with it and would actually highly recommend it with 11g. Something else I’ll like to cover at some point.

25. stetner - October 8, 2008

Hi Richard,

I have been looking into histograms when bind variables are in use (and with regards to peeking).

Rather than cluter up these comments I have some observations up here: http://douglasstetner.blogspot.com/2008/10/histograms-and-bind-variables.html

And would value comments (from anyone who has observations). The more I look into this the more there is to look into and I have about 3 or 4 other investigations to do now!

26. Richard Foote - October 8, 2008

Hi Douglas

Sorry for the delay in the comment appearing but the comment was originally treated as spam and nearly got lost !!

I don’t have time to respond in detail but a couple of quick comments:

1) 90000 histograms sounds like way way too many for most applications. Oracle going mental with the auto option again.

2) Histograms with bind variables and no bind peeking are somewhat useless as the CBO takes a stab at the selectivity (eg: 0.25% for bounded predicates, 5% for unbounded predicates, etc.) without considering histograms.

3) With bind peeking, then it all depends on the values being peeked, although 11g is much better at dealing with these scenarios.

27. Anantha - October 10, 2008

Hi Richard,

Thanks for your input. The maintenance window that we have is very short and we have tables ranging from 1m to 185m rows. Computing statistics is not possible and we can go with estimate percent between 10 to 25 max. In addition to it, we have started exporting the existing statistics at schema level right before gathering fresh statistics. This would help us in restoring the statistics just in case something is wrong. what is the general guidelines for using METHOD_OPT’.I have seen ‘FOR ALL COLUMNS SIZE AUTO’ is dangerous sometimes. if i had to choose between other options, i can either use FOR ALL COLUMNS SIZE SKEWONLY or explicit numbers.The tables have column data skewed and how can i find the appropriate bucket size for oracle to generate histograms.
I guess SKEWONLY would take few extra hours when compared to AUTO.

Your thoughts please.

28. Richard Foote - October 14, 2008

Hi Anantha

I wasn’t suggesting you compute stats, we also don’t have the time or resources and in most cases it’s unnecessary.

However, there’s always the risk …

SKEWONLY is even worse than AUTO because at least AUTO restricts histograms to just those columns that haveparsed sql statements with associated referenced columns whereas SKEWONLY will collect histograms on any column Oracle deems appropriate, regardless of any workload.

Until Oracle gets it right more consistently, I recommended SIZE 1 for all columns, except those columns where it actually makes a difference to collect histograms. That’s where knowing the applications, knowing the data and historical references comes into play.

29. ANTONY - October 16, 2008

Hi Richard,

I disagree with your demo.The reason why the index wasn’t picked up irrespective histogram present or not that you’re not collecting statistics for the index on the table.I mean to say CASCADE => TRUE option was missing in your demo.

I just did the same demo by including CASCADE option while collecting stats,the index is picked up by the optimizer.

SQL> create table hist_test (id1 number, id2 number, id3 number);

Table created.

SQL> insert into hist_test (id1, id2, id3) select rownum, mod(rownum,10)+1, 100 from dual connect by level update hist_test set id1=1000000000 where id1=1000000;

1 row updated.

SQL> commit;

Commit complete.

SQL> alter table hist_test add primary key(id1);

Table altered.

SQL> exec dbms_stats.gather_table_stats(user,’HIST_TEST’,cascade => true,estimate_percent => –
> dbms_stats.auto_sample_size,method_opt => ‘for all columns size auto’);

PL/SQL procedure successfully completed.

SQL> select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE from dba_tab_histograms where table_name = ‘HIST_TEST’ order by COLUMN_NAME,ENDPOINT_NUMBER;

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
——————————– ——————————– ————— ————–
HIST_TEST ID1 0 111
HIST_TEST ID1 1 999735
HIST_TEST ID2 0 1
HIST_TEST ID2 1 10
HIST_TEST ID3 0 100
HIST_TEST ID3 1 100

6 rows selected.

SQL> select table_name, column_name, num_distinct, density, num_buckets, histogram from dba_tab_columns where table_name = ‘HIST_TEST';

TABLE_NAME COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM
——————————– ——————————– ———— ———- ———– —————
HIST_TEST ID1 1001508 9.9849E-07 1 NONE
HIST_TEST ID2 10 .1 1 NONE
HIST_TEST ID3 1 1 1 NONE

SQL> set autot on
SQL> select * from hist_test where id1 > 1000000;

ID1 ID2 ID3
———- ———- ———-
1000000000 1 100

Execution Plan
———————————————————-
Plan hash value: 742697282

——————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————————-
| 0 | SELECT STATEMENT | | 1 | 10 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| HIST_TEST | 1 | 10 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SYS_C003344 | 1 | | 3 (0)| 00:00:01 |
——————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

2 – access(“ID1″>1000000)

Statistics
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
1 physical reads
0 redo size
638 bytes sent via SQL*Net to client
491 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

30. Richard Foote - October 17, 2008

Hi Antony

I disagree with your disagreement ;)

You’ve missed 2 important facts here that has caused you to misread the situation.

Firstly, the index does in fact have stats in my demo. Since 10g, index stats are automatically collected when an index is created. When the PK is added and the index created, Oracle has actually collected the stats behind the scenes. Re-run my demo and check the index stats just after creating the PK and you’ll see the stats are indeed there.

Therefore the difference you’re seeing has nothing to do with missing index stats …

The second point you’ve missed and what has caused the difference is that in my demo I compute the stats (estimate_percent is null) whereas you only collect sample stats. Here lies the key difference.

In my demo, Oracle is fully aware of the outlier value and has an endpoint_value of the outlier value. However, in your example, because you only collected sample stats, Oracle has missed out on collecting the outlier value and thinks the max value is only 999735.

Therefore, when you run your query, Oracle doesn’t think there are any values above 999735 and uses the average selectivity of a column value to estimate the number of rows to be retrieved (ie. 1 row). Therefore of course the CBO selects the index.

When I run my query, Oracle is aware of the outlier value and therefore assumes the vast majority of rows will be retrieved with my query because it assumes even distribution between my min / max value ranges and doesn’t pick up the fact there’s actually only the 1 row greater than 1000000.

The purpose of the demo was to hightlight issues with outlier values but the CBO isn’t aware of the outlier value in your example. If you were to compute stats fully, you would get the same results as in my demo. You made the classic mistake of changing two things (expliciting collecting stats on the index and only collecting sample stats on the table) and assumed the differences were due to one of the changes (the index stats) when in fact it was caused by the other change (the sample table stats).

Hope it makes sense :)

31. ANTONY - October 17, 2008

Hi Richard,

You are right as the index stats are automatically collected in 10g.My mistake.oh no!But your demo and the answer to my comments forced me to ask few questions.Bear with me :-)

In reality,it’s impossible to collect/compute stats for all tables using 100% sample size.If I use AUTO SAMPLE SIZE (which is the only option i have,considering 3000+ tables and 100s of table having 100+ million data) the endlier value is always going to be incorrect.

Oracle documentation says that the AUTO option of METHOD_OPT force Oracle “to determine the columns to collect histograms based on data distribution and the workload of the columns.”

Please answer me why Oracle isn’t generating histograms on column ID1,though the worload was generated on all columns.

I could understand the workload part on columns and it’s straightforward.But what drives Oracle not to create histograms on column ID1?

<>

from the above explanation,I understood that the histogram on ID1 doesn’t influence optimzer in making decison on execution plans for SQLs with predicate greater than the outlier value as it’s uses average selectivity..This would be the case in real-time databases where it’s impossible to collect stats with 100% sample size.

what’s your thought?

Thanks

32. Richard Foote - October 22, 2008

Hi Antony

Correct, it’s generally not feasible (or necessary) to compute stats but potentially incorrect stats is the risk.

Note that an outlier value could be picked up with sample stats, if depends on whether the sample used includes any outlier values. So it’s not “always” incorrect. In fact with 11g, the auto sampling method is likely to pick up such values.

Unfortunately, the processing that takes place to decide to collect or not to collect histograms is imperfect. In same cases (many in fact), it collects histograms unnecessarily, in other cases it doesn’t collect them when they could be useful.

Therefore, I don’t much like the auto method_opt option.

My thoughts are to generally collect stats with no histograms and only collect them specifically when necessary. In some cases, not having the outlier value in your stats is actually a good thing but is one of the reasons why things can suddenly go sour when they suddenly do get collected during a sample stat collection.

Life wasn’t meant to be easy and despite the docs, life can still be very unpleasant even with auto collected stats.

33. Balakrishna - December 29, 2008

Hi Richard,

Usvally ORACLE Recommends to collect histograms only when there is a skewed data but in your example ID1 does’nt have any duplicate values and that column is higly selective.

Can you please clarify this .

Regards
Balakrishna.

34. Venkat - December 31, 2008

Hi Richard,

Can unnecessary histograms cause unnecessary or excessive latches and also unwanted index range scans in Oracle 9.2.0.6?

Thanks,
Venkat

35. Richard Foote - January 3, 2009

Hi Balakrishna

Yes, but the ID1 column has an outlier value so that the distribution of data between the min and max values is not even.

Yes and equality search condition will result in the correct cardinality estimate by the CBO but a range scan predicate will not.

In the example, when searching for data that has an ID1 value > 1000000, there is actually only 1 row that meets that condition however the CBO is estimating the vast majority of rows will be returned because it uses the following forumla to determine how many rows will be returned:

(max_value – predicate_value) / (max_value – min_value)

Because the max value is so massive and because the data isn’t evenly distributed between the max and min values, the estimated rows returned is way wrong.

You might indeed need a histogram, even if there are no duplicates, in such a scenario.

36. Richard Foote - January 3, 2009

Hi Venkat

Yes indeed.

You suddenly introduce thousands of histograms that need to be loaded into the row cache and indeed the associated latches can burn red from overuse.

Providing the histograms are accurate they shouldn’t generate unwanted range scans, they simply result in Oracle having to do a lot more work to get to the same conclusion as it would have come to had the histograms not been there with the data evenly distributed.

37. Venkat - January 4, 2009

Thanks Richard!

Can you please confirm if both these methods are same in 10g: ALL INDEXED COLUMNS and ALL INDEXED COLUMNS SIZE AUTO. Do both of them generate histograms? What is the best method to gather schema stats if SIZE AUTO is not recommendable as Oracle creates histograms on required columns for us? How do we decide the best method_opt and also sampling size? We can certainly decide whether to have histogram on a column or columns for few SQLs. But, at whole schema level, what is the best approach to collect stats? We know that skewed data need histograms. Stating this, generally speaking, histograms are best for OLTP databases or Data Warehouse databases?

Thanks for your time!

Regards,
Venkat

38. Richard Foote - January 5, 2009

Hi Venkat

Yes, they’re both effectively the same as both would use an Auto sample size.

Just a couple of quick comments.

1) I wouldn’t recommend ALL INDEXED COLUMNS. Non-indexed columns can benefit from stats and histograms as well as indexed columns as the stats can provide important cardinality estimates of steps within an execution plan that can influence join orders and when steps are performed

2) The other way around, not all skewed data necessarily needs a histogram if the column is not used in such a manner as it governs how the CBO costs a plan (eg. if it’s only in the select list)

3) IMHO, in order to do the “best” thing by the CBO, you must know your data, you must know your applications and you must know how your data is being accessed. If you don’t know these things, you can’t accurately determine when a histogram should be collected and you’re forced to miss some you need or create (many) you don’t need. The best approach is to only collect histograms when they’re needed.

4) Generally speaking, histograms are just as useful with both OLTP and DWH applications. But with OLTP, bind values can be problematic, although gradually less so with each new release of Oracle.

Hope this helps !!

39. Balakrishna - January 8, 2009

Hi Richard,

Thanks for your explanation . Helped a lot to understand the concept well !!!.

Regards

Balakrishna.

40. Sanjeev - March 3, 2009

Chris,

Thanks for the valuable information.
When we are tuning complex queries with multiple join and filter conditions and huge million row tables Is the suggested approach is to take predicate column and run it seperately and get the data distribution to determine if it would benefit with histograms. Also the number of buckets in the following syntax
method_opt => ‘FOR COLUMNS ID1 SIZE 254′
estimate_percent=>null

For method_opt – will 254 still be valid if the number of distinct values is > 254. Is there any recommended value for SIZE based on distinct no of values?

estimate_percent – we usually specify 10% for huge tables so this will still be the valid and works ok for million rows table?

41. Peter Tranp - March 18, 2009

Hi Richard,

I couldn’t reproduce your results with Oracle 10.2.0.3.0 nor could I reproduce it with Oracle 11.1.0.7.0.

Can you confirm whether Oracle has fixed this or change the behavior?

Thanks!

42. Richard Foote - March 24, 2009

Hi Peter

I can reproduce the demo results consistently on 10.2.0.3 on both windows and AIX and on 11.1.0.6 on both windows and AIX.

The most “common” error with running the demo is not running the various select statements first to produce the necessary workload prior to running the auto stats.

What results can’t you reproduce exactly ?

43. Mohammed Mehraj Hussain - August 7, 2009

Hi Richard ,
i faced an issue with the generation of histograms in Oracle 10.2.0.4 with the method_opt parameter defaults to auto..
we got
ORA-00600: internal error code, arguments: [19004], [], [], [], [], [], [], []
then we degrade to the Oracle 9i default value of for all columns size 1.
Oracle support Reports it as a Bug..
I want to know whether Using of auto_sample_size for estimate_percent will it be Problematic?

Richard Foote - August 20, 2009

Hi Mohammed

I would indeed hope Oracle Support treat it as a bug and not as some kinda feature !!

I haven’t had issues with it and would actually highly recommend it with 11g. Something else I’ll like to cover at some point.

44. bob lettau - December 19, 2009

Richard, your blog is invaluable. Thank you. I have three questions.

1. Histograms are a real problem in a DW I support (10.2.0.4). Can I change the Histograms parameter in OEM > Manage Optimizer Statistics > Global Statistics Gathering Options from “FOR ALL COLUMNS SIZE AUTO” to “FOR ALL COLUMNS SIZE 1″ to prevent histograms from being generated? Do you recommend any other changes to the Global Statistics Gathering Options?

2. If I want to disable the daily gather_stats_job (it never finishes by 6am) and create gather_schema_stats for each schema, does the following command look correct? Does ‘GATHER AUTO’ override ‘FOR ALL COLUMNS SIZE 1’?

begin dbms_stats.gather_schema_stats(ownname=>”PEOPLE_WHOUSE_USER”,options=> ”GATHER AUTO”,CASCADE=>TRUE,method_opt=>”FOR ALL COLUMNS SIZE 1”); end;

BUG Note: Apparently, in 10.2.0.4, the gather_stats_job will not not analyze indexes on stale tables due to Bug 5676887 (fixed in 10.2.0.5).

Richard Foote - December 21, 2009

Hi Bob

1) If histograms are causing you problems, then yes, ‘FOR ALL COLUMNS SIZE 1′ will get rid of them. Note though histograms are generally more useful in a DW so save your stats first (eg. with export_schema_stats) to make sure getting rid of them all doesn’t cause more problems than it solves. Then reintroduce any that really are needed.

2) Yes it does, the method_opt will be ignored. Get rid of the options clause to make method_opt have an effect.

3). Thanks, noted.

45. rasel - February 13, 2010

Hi,

I found this thread very useful for my problem with System TableSpace. I have to disable Auto Gather job because of frequent increment of System Table space. However i have asked by metalink to set ‘METHOD_OPT’,’FOR ALL COLUMNS SIZE 1′. However how can i determine what is the current value for ‘METHOD_OPT’?

Richard Foote - February 16, 2010

Hi Rasel

If you look at the number of buckets used for any given segment (eg. dba_tab_columns), if it’s a height base histogram, you’ll know the value and if it’s frequency based you know that at least this value was specified for method_opt.

46. Vengat - March 16, 2010

Hi Richard,

1)We were facing some issue with one of siebel application database.The plan is getting changed for one of the sql and froze the application for three times.Then after analysis there were some unanalyzed indexes corresponding to the table. Then we analyzed it ,then it went to a better plan and issue never came. Also we saw lots of unanalyzed indexes,then planned to analyze all the indexes and found the performance drastically became down and dropped the idea.This was done in stage .

2) There was one more occurences where in we had a plan change and siebel recommended not to collect statistics for zero row tables and we are doing the testing for the same.

3) Now the same plan change issue is coming for some sql or the other. The plan gets changed of course after a weekly analyze job. Then when application report slowness,we flush the shared pool either the previous better plan comes or even some time new plan which is very efficient comes in .We do a default analyze of 10% for table and 10% for index and was running without no issues for last 2 years after 10g upgrade .Application team as usual tells the data is being loaded as it was loaded for last 3 years .. Do you have any comments on it where could be the issue?

47. Richard Foote - March 24, 2010

Hi Vengat

Some comments.

Firstly, you need to capture the before and after plans, make comparisons and determine why the CBO has made a different decision.

Without much to go on, it sounds like you could have a bind peeking issue, with plans changing depending the different values you may have for bind variables at different points in time.

If you go with the default settings, you likely have lots of histograms and hence more likely to run into these sorts of issues.

If you have some tables/indexes, some not, then this is usually not a good thing as the CBO as Oracle needs to either make guesses or perform dynamic sampling on the fly which can cause different stats on the fly at different times.

That said, for tables that are empty when you collect stats but are populated later, having no stats is actually a good idea as dynamic sampling is far better than the CBO thinking a table is empty when it’s not.

I would ensure all segments have stats (that are not empty), that you check number of histograms and that you check for bind peeking issues (capture v$bind_capture at different times).

48. DBMS_STATS.GATHER_SCHEMA_STATS Failing on 10.2.0.3 « Anand's Blog - August 25, 2010

[...] References :-http://richardfoote.wordpress.com/2008/01/04/dbms_stats-method_opt-default-behaviour-changed-in-10g-… [...]

49. bheem - October 31, 2010

Hi Richard,

This is a very useful article followed by very useful discussions. I have a question. We are not 9.2.0.8. As you suggested and many people opined that it is not a good idea to gather histograms all on columns and not even on all indexed columns. We run gather stats on all all tables using DBMS_STATS once in every three weeks. My question is, is there a way to figure out what tables I need to gather histograms using method_opt > ‘for all indexes columns size auto’?

Thanks

Richard Foote - November 4, 2010

There’s no easy way. It really comes back to knowing your applications, knowing your data and spotting which execution plans are problematic due to skewed data (users are good at letting you know which parts of the application run poorly, checking statspack/v$sql for high resource queries etc.).

Once an application has been around for a while, these problem columns rise and bubble to the surface and can be dealt with via a histogram as necessary.

The day will come when we can simply just collect histograms on everything but in most cases, we’re not at that point yet.

50. bheem - October 31, 2010

I meant we are on 9.2.0.8.

51. MAB - November 2, 2010

Hi Richard ,

Need your suggetion on below, we have a Production Database which i have started looking into recently, where 10g Auto stats gather job(GATHER_STATS_JOB: daily as per the maintanence window) and additionally a manual stats gathering job(DBMS_STATS.GATHER_SCHEMA_STATS : weekly thrice) are running.

Problem is these jobs are taking more the 12 hours, which means even these are running in bussiness hours as well.

My questions are

1. Is it required to run both the jobs, if not which one is recommanded.
2. Below is the job which runs thrice a week, is it required to modify/tune ?

EXEC DBMS_STATS.GATHER_SCHEMA_STATS(ownname=>’${schema_name}’,estimate_percent=>”,method_opt => ‘FOR ALL COLUMNS SIZE
AUTO’,cascade => TRUE,options=>’GATHER’);

Note: My Database is an OLTP which is of around 150GB and it is used by the end users 18×6 in a week.

I have gone through your article on METHOD_OPT default behavior, but not sure should i use the default or the “SIZE 1″ option.

Your help appriciated.

Thanks.
Abdul

Richard Foote - November 4, 2010

Hi Abdul

You only need one or the other.

If you’re not comfortable with how best to collect stats, then the automatic job and the 10% stale mark is probably the way to go.

If you are comfortable with how to best collect stats, you know your data and applications, then probably collecting stats manually with a default size of 1 is best, collecting histograms as necessary.

If also depends somewhat on the size of the database, number of columns in the database, load, number of users, distinct queries, amount of system resources, etc. If these metrics are low (eg. few users, fewish columns, lowish load), then the automatic job is likely fine.

Additionally, if it’s taking too long (12 hours being too long), then look at the sample size and how you’re collecting partition level stats as well. Lower sample, less time and resources and possibly accurate enough stats.

52. daniesh shaikh - November 13, 2010

Hi Richard,
Very good article.
How to find the default value for method_opt.

Thanks

Richard Foote - November 24, 2010

Hi Daniesh

The manuals is one good place to start :)

The dbms_stats.get_param function can be useful as well:

SQL> select dbms_stats.get_param(‘method_opt’) from dual;

DBMS_STATS.GET_PARAM(‘METHOD_OPT’)
——————————————————————————–

FOR ALL COLUMNS SIZE AUTO

53. Mark Rios - December 10, 2010

Richard,

We have a 10.2.0.4 database small datawarehouse, I have a 32gig table that is taking 21 hours to regather stats after deleting stats. I use 20% estimate and method_opt of for all columns skewed only. I’ve read many articles on 10g STATS and they say to grab the biggest estimate% possible. What are your thoughts ? Do you think the “skewed only” clause is slowing me down ? Also my plan is to run this INITIAL gather of ALL but then nightly just regather STALE with same STATS settings. Does this sound like a good approach ?

Thanks Mark

Richard Foote - December 21, 2010

Hi Mark

20% is a very large sample, you might consider a smaller sample size (say 10% initially, then to 5% if there are still no issues, etc.) and see if the stats are good enough while reducing your stats gathering times.

I would certainly caution against skewed only, it will likely produce many more histograms than are necessary in your enviornment. AUTO is bad enough but skewed only is even worse as it doesn’t consider columns being used in SQL to warrant a histogram in the first place. So yes, it’s very likely slowing you down AND generating unnecessary histograms in the process. Only collect histograms when necessary (which might not be so easy to determine initially with tons of histograms around).

These 2 steps alone should significantly reduce gathering times. Also, if you have partitioned objects, might need to consider how partition stats are gathered as well.

54. dfitzjarrell - May 27, 2011

I am unable to download the example you’ve written due to a 403 error:

— 403: Access Denied —

This file requires authorization:

You must both be a user of this blog as well as be currently logged into WordPress.com

How can I obtain your code?

Richard Foote - June 6, 2011

Yes, I’ll try and find some time to determine what’s going on here.

55. PD - May 29, 2011

Hello Richard,

I am also getting exactly the same errors that dfitzjarrell has mentioned just before.

Also, can I pls ask a very basic question here regarding histograms?

How are two different type of histograms, Frequency and Height Balanced, generated? In other words, what (logic) makes Oracle to decide whether to create a FH or an HBH when lets I run run this :

exec dbms_stats.gather_table_stats(”,’MY_TEST’,estimate_percent=>100, method_opt=>’FOR ALL COLUMNS SIZE [AUTO|10|254]‘);

I understand FH is when there are upto 254 distinct values and then beyond this a HBH will be generated? But I’ve see HBH even when there are less than 254 distinct values so how’s that created pls and could I somehow tell Oracle to generate the type of histogram that I want pls?

Richard Foote - June 6, 2011

Hi PD

Oracle uses HBH when there are fewer distinct values than there are buckets available as this means each value can have its own bucket and be effectively 100% accurate (based on a 100% sample size of course).

The default number of buckets is 75 so you can have FBH even if there are fewer than 254 distinct values if not enough buckets have been defined.

Answer given in case someone else has read your question but not the doco :)

56. PD - June 1, 2011

Richard, I got my answer thanks – it was hidden in the documentation in a line which I was overlooking everytime I was reading thru it (in a rush!). Thanks anyway.

57. tfdavao - January 31, 2012

Thanks for the blog post richard, very interesting findings. We had a similar issue with histograms in our environment. We had gather stats running every weekend. Everything was working fine for the past months until one day suddenly everyone was screaming that their queries were taking a long time to run all of a sudden. We did everything, even added an additional CPU to the server because we suspected that it was a capacity issue – which was a bad move. Later on we found out that the problem was with the histograms. Disabling them for some indexes did resolve the issue.

By the way, I am still not able to view your demo. Would it be possible to ask you if you can add me to your blog or to the privacy page so that I can download your demos? It seems like the new guidelines for wordpress on privacy of media is making it difficult to share things.

All the best!

Richard Foote - February 8, 2012

Hi tfdavao

Histograms can certainly introduce as many issues as they resolve :)

I’ve been converting some of my early demos in .txt format to PDF to address this issue. Will make this one my next one I promise.

58. Rani - May 9, 2012

Hi Richard
Very useful article and follow up discussions… Thanks..

Understand 10g default method_opt being ‘for all columns size auto’
and the modification of using
method_opt => ‘for all columns size 1’ to avoid generation of (unnecessary) histograms.

The default ‘method_opt’ of Oracle 11g (11.2.0.3.0) is AUTO.

SQL> select dbms_stats.get_param(‘method_opt’) from dual;
FOR ALL COLUMNS SIZE AUTO

In view of the issues with 10g default ‘AUTO’, has Oracle 11g improved in its use of dbms_stats — Method_opt ( being AUTO ) ?

At 10g (10.2.0.3.0), we are using
method_opt=>”FOR ALL COLUMNS SIZE 1″

We are now migrating to Oracle 11g (11.2.0.3.0).
Do you recommend using the 11g default method_opt (FOR ALL COLUMNS SIZE AUTO)
OR
continue using
method_opt=>”FOR ALL COLUMNS SIZE 1″

OR
Are are there any different approaches ?

Thanks
Regards
Rani

Richard Foote - May 11, 2012

Hi Rani

My personal recommendation is still to generally use SIZE 1 and only collect histograms as and when required, even in 11g R2.

59. Rani - May 11, 2012

OK. Thanks. Rani.

60. Arnold - June 8, 2012

Hi Richard,

Thank you so much for sharing your enriched experience with the professional.

Hats off.

Richard Foote - June 8, 2012

Thanks Arnold, it’s my pleasure :)

61. Sada - August 3, 2012

Richard,

I read comments on the blog. They are very helpful to me. I have a question. Assume I have gathered stats using method_opt=>’FOR ALL COLUMNS SIZE 1′. Then after collecting stats, can I add histograms to the already collected stats one by one basing on AWR reports ? I am using 10.2.0.4 version of oracle. What I mean is after collecting stats with no histograms, Can I add histograms based on my AWR Report one after the other. Please clarify.

Richard Foote - September 19, 2012

Hi Sada

Yes you can and it’s basically what I recommend. Collect standard column stats and then histograms on those special cases that warrant them.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,860 other followers

%d bloggers like this: