jump to navigation

Oracle 19c Automatic Indexing: Function-Based Indexes? (No Plan) February 4, 2021

Posted by Richard Foote in 19c, 19c New Features, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Function Based Indexes, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Virtual Columns.
trackback

I previously discussed how Automatic Indexing only currently supports Equality based predicates.

The question I have today is does Automatic Indexing support function-based indexes? Let’s take a look.

The below DAVID table has the key column NAME which is an effectively unique VARCHAR2 column:

SQL> create table david (id number, code number, name varchar2(42), more_stuff1 varchar2(42), more_stuff2 varchar2(42), more_stuff3 varchar2(42), more_stuff4 varchar2(42), more_stuff5 varchar2(42), more_stuff6 varchar2(42), more_stuff7 varchar2(42), more_stuff8 varchar2(42), more_stuff9 varchar2(42), more_stuff10 varchar2(42));

Table created.

SQL> insert into david select rownum, mod(rownum, 10000)+1, 'David Bowie '|| rownum, 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', estimate_percent=>null);

PL/SQL procedure successfully completed.

If we look at the current details of the table columns:

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID

We notice the same oddity of my previous post that all columns have histograms…

Let’s run the following query with an UPPER function-based predicate that returns only the one row:

SQL> select * from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2426813604

-----------------------------------------------------------------------------------
| Id | Operation                 | Name  | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |       | 100K |   17M |    3350 (6) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | DAVID | 100K |   17M |    3350 (6) | 00:00:01  |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage(UPPER("NAME")='DAVID BOWIE 4242')
    filter(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     263469 consistent gets
     263452 physical reads
          0 redo size
       1256 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

What does Automatic Indexing make of this scenario?

Basically, it does nothing. Currently, Automatic Indexing does NOT support such function-based indexes, even with equality based predicates (as of at least version 19.5.0.0.0). If we look at the next Automatic Indexing report:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 2
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

No such function-based index is ever created by Automatic Indexing:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='DAVID';

no rows selected

To improve the performance of this query, one has to manually create the necessary function-based index:

SQL> create index david_upper_name_i on david(upper(name));

Index created.

If we now re-run the query:

SQL> select name from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2675555529

----------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name               | Rows  | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                    |  100K | 4199K |    3175 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID              |  100K | 4199K |    3175 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | DAVID_UPPER_NAME_I | 40000 |       |       3 (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
        369 bytes sent via SQL*Net to client
        384 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The query now uses the function-based index to significantly improve the performance of this query, with just 5 consistent gets.

Note however as with all function-based indexes, by default the estimated cardinality estimate and associated CBO costs are way off (100K rows are estimated, not the 1 row that is actually returned). This is due to the CBO having no real idea of the number and distribution of values coming out of the “black box” function-based predicate.

This is why Oracle automatically creates an hidden virtual column by which to store the necessary statistics associated to the function (in this case the SYS_NC00014$ column):

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID
SYS_NC00014$                                 NONE

But we need to first collect statistics on this hidden virtual column for the statistics to be populated:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', no_invalidate=> false, method_opt=> 'FOR ALL HIDDEN COLUMNS SIZE 1');

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID
SYS_NC00014$              9947366          0 HYBRID

Now the CBO has the necessary statistics by which to determine a much more accurate cardinality estimate for the function-based predicate and so potentially a more efficient execution plan:

SQL> select * from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2675555529

----------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name               | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                    |    1 |   200 |       4 (0) | 00:00:01  |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID              |    1 |   200 |       4 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | DAVID_UPPER_NAME_I |    1 |       |       3 (0) | 00:00:01  |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
       1256 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

With the virtual column statistics in place, the CBO now has the cardinality estimate of 1 and associated costs spot on, which is always a good thing.

This requirement to collect the necessary statistics on the associated virtual column created as a result of the function-based index to ensure the index is costed and used effectively is perhaps but one reason why function-based indexes are currently not supported by Automatic Indexing.

As always, this can always change in the future…

Comments»

1. Connor McDonald - February 4, 2021

True, but the general consensus nowadays is that if people are going to use expressions consistently on columns, they should create a virtual column on that expression…

And auto indexing CAN index virtual columns

Liked by 1 person

Richard Foote - February 4, 2021

Hi Connor

Hey, don’t giveaway the ending to a Part II. Although it’s not always an option depending on the app.

Like

2. Oracle 19c Automatic Indexing: Function-Based Indexes? Part II (If You Can See Me) | Richard Foote's Oracle Blog - February 5, 2021

[…] my previous post, I discussed how Automatic Indexing does not currently support creating an index based on a […]

Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: