jump to navigation

Oracle 19c Automatic Indexing: Function-Based Indexes? (No Plan) February 4, 2021

Posted by Richard Foote in 19c, 19c New Features, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Function Based Indexes, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Virtual Columns.
3 comments

I previously discussed how Automatic Indexing only currently supports Equality based predicates.

The question I have today is does Automatic Indexing support function-based indexes? Let’s take a look.

The below DAVID table has the key column NAME which is an effectively unique VARCHAR2 column:

SQL> create table david (id number, code number, name varchar2(42), more_stuff1 varchar2(42), more_stuff2 varchar2(42), more_stuff3 varchar2(42), more_stuff4 varchar2(42), more_stuff5 varchar2(42), more_stuff6 varchar2(42), more_stuff7 varchar2(42), more_stuff8 varchar2(42), more_stuff9 varchar2(42), more_stuff10 varchar2(42));

Table created.

SQL> insert into david select rownum, mod(rownum, 10000)+1, 'David Bowie '|| rownum, 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', estimate_percent=>null);

PL/SQL procedure successfully completed.

If we look at the current details of the table columns:

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID

We notice the same oddity of my previous post that all columns have histograms…

Let’s run the following query with an UPPER function-based predicate that returns only the one row:

SQL> select * from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2426813604

-----------------------------------------------------------------------------------
| Id | Operation                 | Name  | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |       | 100K |   17M |    3350 (6) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | DAVID | 100K |   17M |    3350 (6) | 00:00:01  |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage(UPPER("NAME")='DAVID BOWIE 4242')
    filter(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     263469 consistent gets
     263452 physical reads
          0 redo size
       1256 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

What does Automatic Indexing make of this scenario?

Basically, it does nothing. Currently, Automatic Indexing does NOT support such function-based indexes, even with equality based predicates (as of at least version 19.5.0.0.0). If we look at the next Automatic Indexing report:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 2
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

No such function-based index is ever created by Automatic Indexing:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='DAVID';

no rows selected

To improve the performance of this query, one has to manually create the necessary function-based index:

SQL> create index david_upper_name_i on david(upper(name));

Index created.

If we now re-run the query:

SQL> select name from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2675555529

----------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name               | Rows  | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                    |  100K | 4199K |    3175 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID              |  100K | 4199K |    3175 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | DAVID_UPPER_NAME_I | 40000 |       |       3 (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
        369 bytes sent via SQL*Net to client
        384 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The query now uses the function-based index to significantly improve the performance of this query, with just 5 consistent gets.

Note however as with all function-based indexes, by default the estimated cardinality estimate and associated CBO costs are way off (100K rows are estimated, not the 1 row that is actually returned). This is due to the CBO having no real idea of the number and distribution of values coming out of the “black box” function-based predicate.

This is why Oracle automatically creates an hidden virtual column by which to store the necessary statistics associated to the function (in this case the SYS_NC00014$ column):

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID
SYS_NC00014$                                 NONE

But we need to first collect statistics on this hidden virtual column for the statistics to be populated:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', no_invalidate=> false, method_opt=> 'FOR ALL HIDDEN COLUMNS SIZE 1');

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID
SYS_NC00014$              9947366          0 HYBRID

Now the CBO has the necessary statistics by which to determine a much more accurate cardinality estimate for the function-based predicate and so potentially a more efficient execution plan:

SQL> select * from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2675555529

----------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name               | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                    |    1 |   200 |       4 (0) | 00:00:01  |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID              |    1 |   200 |       4 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | DAVID_UPPER_NAME_I |    1 |       |       3 (0) | 00:00:01  |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
       1256 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

With the virtual column statistics in place, the CBO now has the cardinality estimate of 1 and associated costs spot on, which is always a good thing.

This requirement to collect the necessary statistics on the associated virtual column created as a result of the function-based index to ensure the index is costed and used effectively is perhaps but one reason why function-based indexes are currently not supported by Automatic Indexing.

As always, this can always change in the future…