Oracle 19c Automatic Indexing: Function-Based Indexes? (No Plan) February 4, 2021
Posted by Richard Foote in 19c, 19c New Features, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Function Based Indexes, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Virtual Columns.trackback
I previously discussed how Automatic Indexing only currently supports Equality based predicates.
The question I have today is does Automatic Indexing support function-based indexes? Let’s take a look.
The below DAVID table has the key column NAME which is an effectively unique VARCHAR2 column:
SQL> create table david (id number, code number, name varchar2(42), more_stuff1 varchar2(42), more_stuff2 varchar2(42), more_stuff3 varchar2(42), more_stuff4 varchar2(42), more_stuff5 varchar2(42), more_stuff6 varchar2(42), more_stuff7 varchar2(42), more_stuff8 varchar2(42), more_stuff9 varchar2(42), more_stuff10 varchar2(42)); Table created. SQL> insert into david select rownum, mod(rownum, 10000)+1, 'David Bowie '|| rownum, 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', estimate_percent=>null); PL/SQL procedure successfully completed.
If we look at the current details of the table columns:
SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM -------------------- ------------ ---------- --------------- NAME 10000000 0 HYBRID MORE_STUFF9 1 .00000005 FREQUENCY MORE_STUFF8 1 .00000005 FREQUENCY MORE_STUFF7 1 .00000005 FREQUENCY MORE_STUFF6 1 .00000005 FREQUENCY MORE_STUFF5 1 .00000005 FREQUENCY MORE_STUFF4 1 .00000005 FREQUENCY MORE_STUFF3 1 .00000005 FREQUENCY MORE_STUFF2 1 .00000005 FREQUENCY MORE_STUFF10 1 .00000005 FREQUENCY MORE_STUFF1 1 .00000005 FREQUENCY ID 10000000 0 HYBRID CODE 10000 .0001 HYBRID
We notice the same oddity of my previous post that all columns have histograms…
Let’s run the following query with an UPPER function-based predicate that returns only the one row:
SQL> select * from david where upper(name) = 'DAVID BOWIE 4242'; Execution Plan ---------------------------------------------------------- Plan hash value: 2426813604 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K | 17M | 3350 (6) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | DAVID | 100K | 17M | 3350 (6) | 00:00:01 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(UPPER("NAME")='DAVID BOWIE 4242') filter(UPPER("NAME")='DAVID BOWIE 4242') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 263469 consistent gets 263452 physical reads 0 redo size 1256 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
What does Automatic Indexing make of this scenario?
Basically, it does nothing. Currently, Automatic Indexing does NOT support such function-based indexes, even with equality based predicates (as of at least version 19.5.0.0.0). If we look at the next Automatic Indexing report:
SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 2 SQL statements improved : 0 SQL plan baselines created : 0 Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 -------------------------------------------------------------------------------
No such function-based index is ever created by Automatic Indexing:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='DAVID'; no rows selected
To improve the performance of this query, one has to manually create the necessary function-based index:
SQL> create index david_upper_name_i on david(upper(name)); Index created.
If we now re-run the query:
SQL> select name from david where upper(name) = 'DAVID BOWIE 4242'; Execution Plan ---------------------------------------------------------- Plan hash value: 2675555529 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K | 4199K | 3175 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID | 100K | 4199K | 3175 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | DAVID_UPPER_NAME_I | 40000 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(UPPER("NAME")='DAVID BOWIE 4242') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 369 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The query now uses the function-based index to significantly improve the performance of this query, with just 5 consistent gets.
Note however as with all function-based indexes, by default the estimated cardinality estimate and associated CBO costs are way off (100K rows are estimated, not the 1 row that is actually returned). This is due to the CBO having no real idea of the number and distribution of values coming out of the “black box” function-based predicate.
This is why Oracle automatically creates an hidden virtual column by which to store the necessary statistics associated to the function (in this case the SYS_NC00014$ column):
SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM -------------------- ------------ ---------- --------------- NAME 10000000 0 HYBRID MORE_STUFF9 1 .00000005 FREQUENCY MORE_STUFF8 1 .00000005 FREQUENCY MORE_STUFF7 1 .00000005 FREQUENCY MORE_STUFF6 1 .00000005 FREQUENCY MORE_STUFF5 1 .00000005 FREQUENCY MORE_STUFF4 1 .00000005 FREQUENCY MORE_STUFF3 1 .00000005 FREQUENCY MORE_STUFF2 1 .00000005 FREQUENCY MORE_STUFF10 1 .00000005 FREQUENCY MORE_STUFF1 1 .00000005 FREQUENCY ID 10000000 0 HYBRID CODE 10000 .0001 HYBRID SYS_NC00014$ NONE
But we need to first collect statistics on this hidden virtual column for the statistics to be populated:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', no_invalidate=> false, method_opt=> 'FOR ALL HIDDEN COLUMNS SIZE 1'); SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM -------------------- ------------ ---------- --------------- NAME 10000000 0 HYBRID MORE_STUFF9 1 .00000005 FREQUENCY MORE_STUFF8 1 .00000005 FREQUENCY MORE_STUFF7 1 .00000005 FREQUENCY MORE_STUFF6 1 .00000005 FREQUENCY MORE_STUFF5 1 .00000005 FREQUENCY MORE_STUFF4 1 .00000005 FREQUENCY MORE_STUFF3 1 .00000005 FREQUENCY MORE_STUFF2 1 .00000005 FREQUENCY MORE_STUFF10 1 .00000005 FREQUENCY MORE_STUFF1 1 .00000005 FREQUENCY ID 10000000 0 HYBRID CODE 10000 .0001 HYBRID SYS_NC00014$ 9947366 0 HYBRID
Now the CBO has the necessary statistics by which to determine a much more accurate cardinality estimate for the function-based predicate and so potentially a more efficient execution plan:
SQL> select * from david where upper(name) = 'DAVID BOWIE 4242'; Execution Plan ---------------------------------------------------------- Plan hash value: 2675555529 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 200 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID | 1 | 200 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | DAVID_UPPER_NAME_I | 1 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(UPPER("NAME")='DAVID BOWIE 4242') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1256 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
With the virtual column statistics in place, the CBO now has the cardinality estimate of 1 and associated costs spot on, which is always a good thing.
This requirement to collect the necessary statistics on the associated virtual column created as a result of the function-based index to ensure the index is costed and used effectively is perhaps but one reason why function-based indexes are currently not supported by Automatic Indexing.
As always, this can always change in the future…
True, but the general consensus nowadays is that if people are going to use expressions consistently on columns, they should create a virtual column on that expression…
And auto indexing CAN index virtual columns
LikeLiked by 1 person
Hi Connor
Hey, don’t giveaway the ending to a Part II. Although it’s not always an option depending on the app.
LikeLike
[…] my previous post, I discussed how Automatic Indexing does not currently support creating an index based on a […]
LikeLike