## Function-Based Indexes and Missing Statistics (No Surprises)December 4, 2008

Posted by Richard Foote in Function Based Indexes, Oracle Cost Based Optimizer, Oracle Indexes.

Thought I might mention a common trap associated with Function-Based Indexes following a recent question on the OTN Database Forum.

First of all, considered this “simple” little scenario. You have a column in a table that’s a standard VARCHAR2 field and it has 100 distinct values. Therefore, if you select just one value, assuming even distribution of data, you’ll select approximately 1% of the data.

However, this table can potentially allow mixed case fields, with values such as ‘Bowie’, ‘bowie’, ‘BowiE’, etc.

Therefore, you decide to write a query that first converts all the fields to UPPER case such that you can now compare all possible values of a specific name to say ‘BOWIE’.

Remembering you previously had a selectivity of 1%, what’s the selectively of the column after you’ve converted everything to upper case ?

Ummmm, actually, there’s no easy answer to that. It all depends on how many actual different names you have in your data. In theory, you may now have just 1 distinct value or you may have 100 distinct values, you can’t really tell.

However, what if the function did something more complex and converted the name to some numerical value based on the sum of all the characters via some weird formula. In theory, you could have no distinct values in that everything could get converted to a NULL or you could possibly have as many distinct values as there are rows in the table.

Again, Oracle can’t really tell how many rows may get selected based on the determination of such a function.

When a predicate is used with a function call, Oracle has great difficulty in determining the correct selectivity. Therefore when you create a Function-Based Index, which Oracle assumes you may wish to access, Oracle behind the scenes creates a hidden virtual column on the parent table in order to capture the data characteristics of the function so that the CBO can make an accurate determination of the selectivity associated with using the function.

If Oracle knows the low value, the high value and the number of distinct values of the data associated with the function on a column, it can then accurately determine the associated selectivity and cardinality when the function is used on the column (assuming even distribution of data) and hence calculate an accurate cost and determine whether the use of the function-based index is appropriate.

However, and here comes the trap, when a function-based index is created, Oracle will now (since 10g) automatically calculate the statistics associated with the index (such as the blevel, number of leaf blocks, clustering factor, etc.) but it will NOT calculate the statistics associated with the hidden virtual column as these statistics are associated with the parent table, not directly with the index itself.

Here I create a new function-based index on a table:

SQL> create index pink_floyd_upp_tab_name_i on pink_floyd(upper(table_name));

Index created.

But there are still no statistics on the hidden virtual column created by Oracle behind the scenes.

SQL> select column_name, num_distinct, hidden_column, virtual_column from dba_tab_cols where table_name=’PINK_FLOYD’;

```COLUMN_NAME     NUM_DISTINCT HID VIR
--------------- ------------ --- ---
OWNER                     74 NO  NO
TABLE_NAME              5739 NO  NO
NUM_ROWS                 886 NO  NO
BLOCKS                   152 NO  NO
SYS_NC00005\$                 YES YES```

Therefore, even after you’ve created the function-based index, Oracle still has no idea on the selectivity associated the function because the necessary virtual column statistics are still missing. Oracle simply takes a “guess” and as with most guesses, it’s quite likely to be wrong which means the selectivity is likely to be wrong which means the costings is likely to be wrong which means the execution plan could very well be wrong as well.

What should you do after you create a function-based index ? You should collect the statistics on the hidden virtual column so that the CBO can accurately determine the cardinality associated with using the function and so make correct decisions regarding the most appropriate execution plan.

The ‘FOR ALL HIDDEN COLUMNS’ method_opt option with DBMS_STATS is one way of collecting just the necessary table statistics on these hidden virtual columns:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> ‘PINK_FLOYD’, estimate_percent=>null, cascade=>true, method_opt=> ‘FOR ALL HIDDEN COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.
SQL> select column_name, num_distinct, hidden_column, virtual_column from dba_tab_cols where table_name=’PINK_FLOYD’;

```COLUMN_NAME     NUM_DISTINCT HID VIR
--------------- ------------ --- ---
OWNER                     74 NO  NO
TABLE_NAME              5739 NO  NO
NUM_ROWS                 886 NO  NO
BLOCKS                   152 NO  NO
SYS_NC00005\$            5739 YES YES```

This demo on missing virtual column statistics asscociated with function-based indexes goes through this whole issue.

My early Christmas wish to Oracle would be for the CBO to at least consider the DISTINCT_KEYS index statistic when determining the correctly selectivity when hidden virtual column statistics are missing on function-based indexes, rather than take it’s 0.4% / 1% guess …

More on virtual columns, hidden and otherwise, in the future …