jump to navigation

Function-Based Indexes and Missing Statistics (No Surprises) December 4, 2008

Posted by Richard Foote in Function Based Indexes, Oracle Cost Based Optimizer, Oracle Indexes.
trackback

Thought I might mention a common trap associated with Function-Based Indexes following a recent question on the OTN Database Forum.

First of all, considered this “simple” little scenario. You have a column in a table that’s a standard VARCHAR2 field and it has 100 distinct values. Therefore, if you select just one value, assuming even distribution of data, you’ll select approximately 1% of the data.

However, this table can potentially allow mixed case fields, with values such as ‘Bowie’, ‘bowie’, ‘BowiE’, etc.

Therefore, you decide to write a query that first converts all the fields to UPPER case such that you can now compare all possible values of a specific name to say ‘BOWIE’.

Remembering you previously had a selectivity of 1%, what’s the selectively of the column after you’ve converted everything to upper case ?

Ummmm, actually, there’s no easy answer to that. It all depends on how many actual different names you have in your data. In theory, you may now have just 1 distinct value or you may have 100 distinct values, you can’t really tell.

However, what if the function did something more complex and converted the name to some numerical value based on the sum of all the characters via some weird formula. In theory, you could have no distinct values in that everything could get converted to a NULL or you could possibly have as many distinct values as there are rows in the table.

Again, Oracle can’t really tell how many rows may get selected based on the determination of such a function.

When a predicate is used with a function call, Oracle has great difficulty in determining the correct selectivity. Therefore when you create a Function-Based Index, which Oracle assumes you may wish to access, Oracle behind the scenes creates a hidden virtual column on the parent table in order to capture the data characteristics of the function so that the CBO can make an accurate determination of the selectivity associated with using the function.

If Oracle knows the low value, the high value and the number of distinct values of the data associated with the function on a column, it can then accurately determine the associated selectivity and cardinality when the function is used on the column (assuming even distribution of data) and hence calculate an accurate cost and determine whether the use of the function-based index is appropriate.

However, and here comes the trap, when a function-based index is created, Oracle will now (since 10g) automatically calculate the statistics associated with the index (such as the blevel, number of leaf blocks, clustering factor, etc.) but it will NOT calculate the statistics associated with the hidden virtual column as these statistics are associated with the parent table, not directly with the index itself.

Here I create a new function-based index on a table:

SQL> create index pink_floyd_upp_tab_name_i on pink_floyd(upper(table_name));

 

Index created.

 

But there are still no statistics on the hidden virtual column created by Oracle behind the scenes.

 

SQL> select column_name, num_distinct, hidden_column, virtual_column from dba_tab_cols where table_name=’PINK_FLOYD’;

 

COLUMN_NAME     NUM_DISTINCT HID VIR
--------------- ------------ --- ---
OWNER                     74 NO  NO
TABLE_NAME              5739 NO  NO
NUM_ROWS                 886 NO  NO
BLOCKS                   152 NO  NO
SYS_NC00005$                 YES YES

Therefore, even after you’ve created the function-based index, Oracle still has no idea on the selectivity associated the function because the necessary virtual column statistics are still missing. Oracle simply takes a “guess” and as with most guesses, it’s quite likely to be wrong which means the selectivity is likely to be wrong which means the costings is likely to be wrong which means the execution plan could very well be wrong as well.

What should you do after you create a function-based index ? You should collect the statistics on the hidden virtual column so that the CBO can accurately determine the cardinality associated with using the function and so make correct decisions regarding the most appropriate execution plan.

The ‘FOR ALL HIDDEN COLUMNS’ method_opt option with DBMS_STATS is one way of collecting just the necessary table statistics on these hidden virtual columns:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> ‘PINK_FLOYD’, estimate_percent=>null, cascade=>true, method_opt=> ‘FOR ALL HIDDEN COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.
SQL> select column_name, num_distinct, hidden_column, virtual_column from dba_tab_cols where table_name=’PINK_FLOYD’;

COLUMN_NAME     NUM_DISTINCT HID VIR
--------------- ------------ --- ---
OWNER                     74 NO  NO
TABLE_NAME              5739 NO  NO
NUM_ROWS                 886 NO  NO
BLOCKS                   152 NO  NO
SYS_NC00005$            5739 YES YES

This demo on missing virtual column statistics asscociated with function-based indexes goes through this whole issue.

My early Christmas wish to Oracle would be for the CBO to at least consider the DISTINCT_KEYS index statistic when determining the correctly selectivity when hidden virtual column statistics are missing on function-based indexes, rather than take it’s 0.4% / 1% guess …

More on virtual columns, hidden and otherwise, in the future …

Comments»

1. Brian Tkatch - December 4, 2008

Thanx Richard, this is good stuff.

I would like to make a request. Please don;t use sample data from *_Tables when the test will then use *_Tables as well. It took a minute to realize which TABLE you were referring to.

Anyway, this is definitely something to bear in mind about FBIs.

Like

2. Konstantin - December 4, 2008

Thanks for the reminder Richard.

… was just about to do some work with function-based indexes, and I saw your article in google reader.

Great timing!

Like

3. gary - December 4, 2008

“If Oracle knows the low value, the high value and the number of distinct values of the data associated with the function on a column, it can then accurately determine….”
Well MORE accurately determine. There might be skew in there. Maybe 11gR2 will have an option to determine skew based on the Scrabble values of the letters involved.

Like

4. Richard Foote - December 5, 2008

Hi Brian, noted next time I feel lazy !!

Hi Konstantin, timing is everything 🙂

Hi Gary, yes it’s a point I make in the demo, now amended the blog piece as well, thanks.

Like

5. Robert Klemme - December 6, 2008

Funny that you just blog over this topic when I was investigating statistics for FBI’s. Just in time. 🙂

As a caveat for others: For FBI’s I found statistics like NUM_DISTINCT only via USER_TAB_COLS and not via USER_TAB_COL_STATISTICS. I believed there were no stats for FBI columns because I queried USER_TAB_COL_STATISTICS. Then used Richard’s statement to gather them and was left wondering why they did not appear. Only after I queried the other table they showed up.

Well, I could have known: documentation of USER_TAB_COLS has this in there “This view differs from “USER_TAB_COLUMNS” in that hidden columns are not filtered out.” – but no remark about the filtering is to be found in doc of USER_TAB_COLUMNS…

Btw 10.2.0.2.

Cheers

Like

6. Log Buffer #126: a Carnival of the Vanities for DBAs - December 6, 2008

[…] Richard Foote looks at a gotcha in his item on function-based indexes and missing statistics. […]

Like

7. Calculating Statistics For Function-Based Index Virtual Hidden Columns at joeramsey.net - December 8, 2008

[…] read a very interesting blog post the other day by Richard Foote entitled “Function Based Indexes and Missing Statistics – No Suprises.“  The post explains how in Oracle 10g that normal statistics are generated for […]

Like

8. My learning | Database and Cloud Notes - August 27, 2018
9. Fjo - March 21, 2023

Richard, why these 2 options ? –
estimate_percent=>null, cascade=>true

And SIZE 1, so I guess you’re deliberately suppressing generation of histograms?

Like


Leave a reply to Richard Foote Cancel reply