jump to navigation

Automatic Indexing: JSON Expressions Part I (Making Plans For Nigel) April 13, 2022

Posted by Richard Foote in Automatic Indexing, Autonomous Database, CBO, Exadata, Function Based Indexes, Index statistics, JSON, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Virtual Columns.
1 comment so far

When Automatic Indexing was first released, one of the restrictions was that automatic indexes on JSON expressions were NOT supported.

However, the Oracle Database 21c doco mentions:

Automatic indexes can be single or multi-column. They are considered for the following: Selected expressions (for example, JSON expressions)“.

So on my (admittedly dodgy) “Exadata” VM, I thought I’ll check out how AI now indeed deals with JSON expressions.

I start by creating a simple little table that uses the new 21c JSON datatype and populate it with some JSON documents (note the PONumber key has effectively unique numeric values assigned):

SQL> CREATE TABLE bowie_json
       (id number,
        bowie_date date,
        bowie_order JSON);

SQL> insert into bowie_json
     select
     rownum,
     sysdate,
     '{"PONumber" : ' || rownum || ',
       "Reference" : "2022' || rownum || 'DBOWIE",
       "Requestor" : "David Bowie",
       "User" : "DBOWIE",
       "CostCenter" : "A42",
       "ShippingInstructions" : {"name" : "David Bowie",
                                 "Address": {"street" : "42 Ziggy Street",
                                             "city" : "Canberra",
                                              "state" : "ACT",
                                              "zipCode" : 2601,
                                              "country" : "Australia"},
                                 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
                                            {"type" : "Mobile", "number" : "417-555-1234"}]},
       "Special Instructions" : null,
       "AllowPartialShipment" : true,
       "LineItems" : [{"ItemNumber" : 1,
                       "Part" : {"Description" : "Hunky Dory",
                                 "UnitPrice" : 10.95},
                                  "Quantity" : 5.0},
                      {"ItemNumber" : 2,
                       "Part" : {"Description" : "Pin-Ups",
                                 "UnitPrice" : 10.95},
                                 "Quantity" : 3.0}]}'
from dual connect by level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_JSON');

PL/SQL procedure successfully completed.

As always, it’s important to ensure the table has statistics, as AI does not work properly without them.

I then run a number of SQL statements, with different JSON expression based predicates, including:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242;

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber' returning number)=42;

Execution Plan
----------------------------------------------------------
Plan hash value: 1196930810

--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            | 20000 |   12M |  34476 (1) | 00:00:02  |
|* 1 | TABLE ACCESS FULL | BOWIE_JSON | 20000 |   12M |  34476 (1) | 00:00:02  |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON
           , '$.PONumber' RETURNING NUMBER NULL ON ERROR)=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     259127 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

They all return just the one row, but must currently use a Full Table Scan with no indexes present.

So what does AI make of things?

The first thing to note is that running the AI last activity report generates the following error:

SQL> select dbms_auto_index.report_last_activity() report from dual;
ERROR:
ORA-30954: char 0 is invalid in json_value(BOWIE_ORDER, '$.PONumber' returning VA
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 177
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 107
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 9226
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 89
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 167
ORA-06512: at line 1

no rows selected

If we look at the indexes now present with the table:

SQL> select index_name, index_type, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes
where table_name='BOWIE_JSON';

INDEX_NAME                INDEX_TYPE                AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- ------------------------- --- --------- -------- ---------- ----------- -----------------
SYS_IL0000081096C00003$$  LOB                       NO  VISIBLE   VALID
SYS_AI_ayvj257jd93cv      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        5141            380000
SYS_AI_gpdkwzugdn055      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        4596            200000

SQL> select index_name, column_expression from user_ind_expressions where table_name='BOWIE_JSON';

INDEX_NAME                COLUMN_EXPRESSION
------------------------- --------------------------------------------------------------------------------
SYS_AI_ayvj257jd93cv      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERR
OR ON ERROR NULL ON EMPTY)

SYS_AI_gpdkwzugdn055      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ER
ROR NULL ON EMPTY)

We can see that AI has indeed created two new automatic indexes, one on the VARCHAR2 JSON expression and one on the NUMBER JSON expression.

If we re-run the SQLs, we notice 3 very important points. Note the following example was run soon after the automatic indexes were created:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      | 20000 |   12M |    1524 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           | 20000 |   12M |    1524 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv |  8000 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     234168 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The first point to note is that the CBO now chooses to use the newly created automatic index. As only one row is return, this is as one would hope.

But there are two other very important points/issues worth making about the above execution plan and associated costs and statistics. One is associated with new AI behaviour introduced in 21c and the other is associated with an old trap in relation to function-based indexes.

I’ll leave it to the discernible reader to spot these issues, before I cover them in Part II in the coming days…

Oracle 19c Automatic Indexing: Function-Based Indexes? Part II (If You Can See Me) February 5, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Function Based Indexes, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Virtual Columns.
1 comment so far

In my previous post, I discussed how Automatic Indexing does not currently support creating an index based on a function or expression predicate, even if it’s an equality predicate. You must manually create the associated function-based index.

However, if you have access to the application, there’s a better strategy when frequently searching on a function-based predicate. That’s to create a Virtual Column and use this column in your searching criteria (as mentioned by Connor McDonald in this comment).

To illustrate, I’m going to drop the previously manually created function-based index and hence the associated hidden virtual column, as Oracle quite rightly doesn’t allow you to have two virtual columns based on the same expression in the same table.

SQL> drop index david_upper_name_i;

Index dropped.

Since Oracle 11g, Oracle has supported the use of Visible Virtual Columns, a column that doesn’t physically exist, but defines a function/expression that can be easily accessed and populated when queried.

I’ll next create a Virtual Column called UPPER_NAME that is defined not based on a Data Type, but on the result on the UPPER function on the previously defined NAME column:

SQL> alter table david add (upper_name as (upper(name)));

Table altered.

Regardless of size of table, this column is added virtually instantly (pun fully intended), as no data is physically stored in the table itself. I view it (yep, another pun) as a “mini-view”, that can be used to hide complexity from the developer, with the actual data derived at run-time when the column is accessed in an SQL.

After I generate fresh statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', estimate_percent=>null);

PL/SQL procedure successfully completed.

SQL> select column_name, hidden_column, virtual_column, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          HID VIR NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- --- --- ------------ ---------- ---------------
NAME                 NO  NO      10000000          0 HYBRID
MORE_STUFF9          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF8          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF7          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF6          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF5          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF4          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF3          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF2          NO  NO             1  .00000005 FREQUENCY
MORE_STUFF10         NO  NO             1  .00000005 FREQUENCY
MORE_STUFF1          NO  NO             1  .00000005 FREQUENCY
ID                   NO  NO      10000000          0 HYBRID
CODE                 NO  NO         10000      .0001 HYBRID
UPPER_NAME           NO YES      10000000          0 HYBRID

Note how the UPPER_NAME virtual column is NOT hidden and now has up to date statistics.

We can now run this simplified query based on the new UPPER_NAME column, which does not need to include the potentially complex function expression:

SQL> select * from david where upper_name='DAVID BOWIE 42';

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2426813604

-----------------------------------------------------------------------------------
| Id | Operation                 | Name  | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |       |    1 |   200 |    3349 (6) | 00:00:01  | 
|* 1 | TABLE ACCESS STORAGE FULL | DAVID |    1 |   200 |    3349 (6) | 00:00:01  |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("UPPER_NAME"='DAVID BOWIE 42')
    filter("UPPER_NAME"='DAVID BOWIE 42')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     263469 consistent gets
     263452 physical reads
          0 redo size
       1328 bytes sent via SQL*Net to client
        375 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

If we look at portions of the subsequent Automatic Indexing report:

 

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 360.71 MB (360.71 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 2
SQL statements improved (improvement factor) : 2 (263476.8x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 263476.8x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Owner | Table | Index                | Key        | Type   | Properties |
---------------------------------------------------------------------------
| BOWIE | DAVID | SYS_AI_4k4mkgkw049ht | UPPER_NAME | B-TREE | NONE       |
---------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 7tfqh3pu526mt
SQL Text            : select * from david where upper_name='DAVID BOWIE 42'
Improvement Factor  : 263484.7x

Execution Statistics:
-----------------------------
                        Original Plan                Auto Index Plan
                        ---------------------------- ----------------------------
Elapsed Time (s):       1471249                      1414
CPU Time (s):           300584                       986
Buffer Gets:            3161816                      4
Optimizer Cost:         3349                         4
Disk Reads:             3161432                      3
Direct Writes:          0                            0
Rows Processed:         12                           1
Executions:             12                           1

PLANS SECTION
--------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value : 2426813604

-----------------------------------------------------------------------------
| Id | Operation                 | Name  | Rows | Bytes | Cost | Time       |
-----------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |       |      |       | 3349 |            |
|  1 | TABLE ACCESS STORAGE FULL | DAVID |    1 |   200 | 3349 | 00:00:01   |
-----------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no
- cardinality_feedback = yes

- With Auto Indexes
-----------------------------
Plan Hash Value : 1447691372

-------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                 | Rows | Bytes | Cost | Time      |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    1 |   200 |    4 | 00:00:01  |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | DAVID                |    1 |   200 |    4 | 00:00:01  |
| * 2 |  INDEX RANGE SCAN                    | SYS_AI_4k4mkgkw049ht |    1 |       |    3 | 00:00:01  |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("UPPER_NAME"='DAVID BOWIE 42')

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We see from the report that Automatic Indexing has now created the associated, implicitly created function-based index (SYS_AI_4k4mkgkw049ht) based on the virtual UPPER_NAME column:

SQL> select index_name, index_type, auto, constraint_index, visibility, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='DAVID';

INDEX_NAME           INDEX_TYPE                  AUT CON VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --------------------------- --- --- --------- -------- ---------- ----------- -----------------
SYS_AI_4k4mkgkw049ht FUNCTION-BASED NORMAL       YES NO  VISIBLE   VALID      10000000       43104           2136839

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='DAVID' order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_4k4mkgkw049ht UPPER_NAME                         1

 

If we now re-run the SQL query:

SQL> select * from david where upper_name='DAVID BOWIE 4242';

1 row selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1447691372

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |                      |    1 |   200 |       4 (0) | 00:00:01 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | DAVID                |    1 |   200 |       4 (0) | 00:00:01 |
|* 2 |   INDEX RANGE SCAN                   | SYS_AI_4k4mkgkw049ht |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("UPPER_NAME"='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
       1334 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The CBO now uses the new Automatic Index to significantly improve the performance of the query.

So not only is using a user defined Virtual Column a cleaner solution with respect to the frequent use of a function-based expressions, but has the added advantage of being supported with Automatic Indexing.

Oracle 19c Automatic Indexing: Function-Based Indexes? (No Plan) February 4, 2021

Posted by Richard Foote in 19c, 19c New Features, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Function Based Indexes, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Virtual Columns.
3 comments

I previously discussed how Automatic Indexing only currently supports Equality based predicates.

The question I have today is does Automatic Indexing support function-based indexes? Let’s take a look.

The below DAVID table has the key column NAME which is an effectively unique VARCHAR2 column:

SQL> create table david (id number, code number, name varchar2(42), more_stuff1 varchar2(42), more_stuff2 varchar2(42), more_stuff3 varchar2(42), more_stuff4 varchar2(42), more_stuff5 varchar2(42), more_stuff6 varchar2(42), more_stuff7 varchar2(42), more_stuff8 varchar2(42), more_stuff9 varchar2(42), more_stuff10 varchar2(42));

Table created.

SQL> insert into david select rownum, mod(rownum, 10000)+1, 'David Bowie '|| rownum, 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', estimate_percent=>null);

PL/SQL procedure successfully completed.

If we look at the current details of the table columns:

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID

We notice the same oddity of my previous post that all columns have histograms…

Let’s run the following query with an UPPER function-based predicate that returns only the one row:

SQL> select * from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2426813604

-----------------------------------------------------------------------------------
| Id | Operation                 | Name  | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |       | 100K |   17M |    3350 (6) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | DAVID | 100K |   17M |    3350 (6) | 00:00:01  |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage(UPPER("NAME")='DAVID BOWIE 4242')
    filter(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     263469 consistent gets
     263452 physical reads
          0 redo size
       1256 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

What does Automatic Indexing make of this scenario?

Basically, it does nothing. Currently, Automatic Indexing does NOT support such function-based indexes, even with equality based predicates (as of at least version 19.5.0.0.0). If we look at the next Automatic Indexing report:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 2
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

No such function-based index is ever created by Automatic Indexing:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='DAVID';

no rows selected

To improve the performance of this query, one has to manually create the necessary function-based index:

SQL> create index david_upper_name_i on david(upper(name));

Index created.

If we now re-run the query:

SQL> select name from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2675555529

----------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name               | Rows  | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                    |  100K | 4199K |    3175 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID              |  100K | 4199K |    3175 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | DAVID_UPPER_NAME_I | 40000 |       |       3 (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
        369 bytes sent via SQL*Net to client
        384 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The query now uses the function-based index to significantly improve the performance of this query, with just 5 consistent gets.

Note however as with all function-based indexes, by default the estimated cardinality estimate and associated CBO costs are way off (100K rows are estimated, not the 1 row that is actually returned). This is due to the CBO having no real idea of the number and distribution of values coming out of the “black box” function-based predicate.

This is why Oracle automatically creates an hidden virtual column by which to store the necessary statistics associated to the function (in this case the SYS_NC00014$ column):

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID
SYS_NC00014$                                 NONE

But we need to first collect statistics on this hidden virtual column for the statistics to be populated:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', no_invalidate=> false, method_opt=> 'FOR ALL HIDDEN COLUMNS SIZE 1');

SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
NAME                     10000000          0 HYBRID
MORE_STUFF9                     1  .00000005 FREQUENCY
MORE_STUFF8                     1  .00000005 FREQUENCY
MORE_STUFF7                     1  .00000005 FREQUENCY
MORE_STUFF6                     1  .00000005 FREQUENCY
MORE_STUFF5                     1  .00000005 FREQUENCY
MORE_STUFF4                     1  .00000005 FREQUENCY
MORE_STUFF3                     1  .00000005 FREQUENCY
MORE_STUFF2                     1  .00000005 FREQUENCY
MORE_STUFF10                    1  .00000005 FREQUENCY
MORE_STUFF1                     1  .00000005 FREQUENCY
ID                       10000000          0 HYBRID
CODE                        10000      .0001 HYBRID
SYS_NC00014$              9947366          0 HYBRID

Now the CBO has the necessary statistics by which to determine a much more accurate cardinality estimate for the function-based predicate and so potentially a more efficient execution plan:

SQL> select * from david where upper(name) = 'DAVID BOWIE 4242';

Execution Plan
----------------------------------------------------------
Plan hash value: 2675555529

----------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name               | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                    |    1 |   200 |       4 (0) | 00:00:01  |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID              |    1 |   200 |       4 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | DAVID_UPPER_NAME_I |    1 |       |       3 (0) | 00:00:01  |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(UPPER("NAME")='DAVID BOWIE 4242')

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
       1256 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

With the virtual column statistics in place, the CBO now has the cardinality estimate of 1 and associated costs spot on, which is always a good thing.

This requirement to collect the necessary statistics on the associated virtual column created as a result of the function-based index to ensure the index is costed and used effectively is perhaps but one reason why function-based indexes are currently not supported by Automatic Indexing.

As always, this can always change in the future…

11g Virtual Columns and Fast Refreshable Materialized Views (What In The World) November 24, 2010

Posted by Richard Foote in 11g, 11g New features, Function Based Indexes, Oracle Bugs, Virtual Columns.
15 comments

Previous to Oracle 11g Rel 2, two very common and useful features previously worked well together, they being fast refreshable materialized views and the introduction of virtual columns due to the creation of function-based indexes.
 
To illustrate, we create and populate a little demo table: 

 
SQL> create table bowie (a number, b number, c number);
 
Table created.
 
SQL> insert into bowie select rownum, mod(rownum,10), mod(rownum,100) from dual connect by level <= 100000;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> alter table bowie add primary key (a);
 
Table altered.

 
We now create a simple little function-based index:
 

 
SQL> create index bowie_func_i on bowie(b+c);
 
Index created.

 
 
If we look at the columns in the table via DBA_TAB_COLS:
 

 
SQL> select column_name, data_default, virtual_column, hidden_column from dba_tab_cols where table_name = 'BOWIE';
 
COLUMN_NAME  DATA_DEFAULT VIR HID
------------ ------------ --- ---
SYS_NC00004$ "B"+"C"      YES YES
C                         NO  NO
B                         NO  NO
A                         NO  NO

 

We notice Oracle has introduced a new, hidden virtual column (SYS_NC00004$), required to store statistics for use by the Cost Based Optimizer.
 
Next we create a materialized view log on this table and a fast refreshable materialized view:
 

 
SQL> create materialized view log on bowie WITH PRIMARY KEY,SEQUENCE, ROWID (b,c) INCLUDING NEW VALUES;
 
Materialized view log created.
 
SQL> create materialized view bowie_mv
  2  build immediate
  3  refresh fast
  4  with primary key
  5  enable query rewrite
  6  as
  7  select b, count(*) from bowie group by b;
 
Materialized view created.

 
 
Collect a few statistics and we note the Materialized View does indeed get used during a query rewrite scenario:
 

 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1')
 
PL/SQL procedure successfully completed.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_MV', estimate_percent=>null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1')
 
PL/SQL procedure successfully completed. 

SQL> select b, count(*) from bowie having b > 3 group by b;
 
         B   COUNT(*)
---------- ----------
         6      10000
         4      10000
         5      10000
         8      10000
         7      10000
         9      10000
 
6 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 593592962
 
-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     7 |    42 |     2   (0)| 00:00:01 |
|*  1 |  MAT_VIEW REWRITE ACCESS FULL| BOWIE_MV |     7 |    42 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("BOWIE_MV"."B">3)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        538  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

 
 

And indeed the materialized view is fast refreshable:
 

 
SQL> insert into bowie values (100001, 5, 42);
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_mview.refresh('BOWIE_MV', 'F');
 
PL/SQL procedure successfully completed.
 
 
SQL> select b, count(*) from bowie having b > 3 group by b;
 
         B   COUNT(*)
---------- ----------
         6      10000
         4      10000
         5      10001
         8      10000
         7      10000
         9      10000
 
6 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 593592962
 
-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     7 |    42 |     2   (0)| 00:00:01 |
|*  1 |  MAT_VIEW REWRITE ACCESS FULL| BOWIE_MV |     7 |    42 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("BOWIE_MV"."B">3)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        546  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

 
Notice how the materialized view does indeed displayed the correct updated information via the query rewrite operation . So the materialized view behaved and worked as expected even though the underlining master table has a virtual column due to the creation of the function-based index (note that QUERY_REWRITE_INTEGRITY is set to STALE_TOLERATED)
 
Unfortunately, things go off the rails somewhat since Oracle 11g Rel 2 when a virtual column is introduced due to one of the 11g new features. For example, I now collect some Extended Statistics on this table:
 

 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', method_opt=> 'FOR COLUMNS (A,B,C) SIZE 254');
 
PL/SQL procedure successfully completed.
 
SQL> select column_name, data_default, virtual_column, hidden_column from dba_tab_cols where table_name = 'BOWIE';
 
COLUMN_NAME                    DATA_DEFAULT                      VIR HID
------------------------------ --------------------------------- --- ---
SYS_STUM4KJU$CCICS9C1UJ6UWC4YP SYS_OP_COMBINED_HASH("A","B","C") YES YES
SYS_NC00004$                   "B"+"C"                           YES YES
C                                                                NO  NO
B                                                                NO  NO
A                                                                NO  NO

 
 
Notice how extended statistics has resulted in another hidden virtual column (SYS_STUM4KJU$CCICS9C1UJ6UWC4YP) being created to store the resultant statistics.
 
However, if now attempt to perform a fast refresh on the Materialized View:

 
SQL> insert into bowie values (100002, 5, 42);
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_mview.refresh('BOWIE_MV', 'F');
BEGIN dbms_mview.refresh('BOWIE_MV', 'F'); END;
 
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00904: "MAS$"."SYS_STUM4KJU$CCICS9C1UJ6UWC4YP": invalid identifier
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2558
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2771
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2740
ORA-06512: at line 1

 

We get an error, complaining about the existence of this new virtual column.
 
If we attempted to drop and re-create the materialized view:

 
SQL> drop materialized view bowie_mv;
 
Materialized view dropped.
 
SQL> create materialized view bowie_mv
  2  build immediate
  3  refresh fast
  4  with primary key
  5  enable query rewrite
  6  as
  7  select b, count(*) from bowie group by b;
select b, count(*) from bowie group by b
                        *
ERROR at line 7:
ORA-12033: cannot use filter columns from materialized view log on "BOWIE"."BOWIE"

 

It fails, complaining that the materialized view log is somehow missing a filter column (which it isn’t).  We get exactly the same set of issues if we add a visible virtual column via this new 11g capability: 
 
 

 
SQL> create table bowie2 (a number, b number, c number, d as (a+b+c));
 
Table created.
 
SQL> select column_name, data_default, virtual_column, hidden_column from dba_tab_cols where table_name = 'BOWIE2';
 
COLUMN_NAME  DATA_DEFAULT VIR HID
------------ ------------ --- ---
D            "A"+"B"+"C"  YES NO
C                         NO  NO
B                         NO  NO
A                         NO  NO
 
SQL> insert into bowie2 (a,b,c) select rownum, mod(rownum,10), mod(rownum,100) from dual connect by level <= 100000;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> alter table bowie2 add primary key (a);
 
Table altered.
 
SQL> create materialized view log on bowie2 WITH PRIMARY KEY,SEQUENCE, ROWID (b,c) INCLUDING NEW VALUES;
 
Materialized view log created.
 
SQL> create materialized view bowie2_mv
  2  build immediate
  3  refresh fast
  4  with primary key
  5  enable query rewrite
  6  as
  7  select b, count(*) from bowie2 group by b;
select b, count(*) from bowie2 group by b
                        *
ERROR at line 7:
ORA-12033: cannot use filter columns from materialized view log on "BOWIE"."BOWIE2"

 
Extended statistics and visible virtual columns are both potentially extremely useful new features introduced in 11g but unfortunately both can not be implemented on any table that needs to be fast refreshable within a complex materialized view.

I raised this issue with Oracle Support who have raised bug 10281402 as a result as it occurs in both 11.2.0.1 and 11.2.0.2 on various platforms I’ve tested.