Automatic Indexing: JSON Expressions Part I (Making Plans For Nigel) April 13, 2022
Posted by Richard Foote in Automatic Indexing, Autonomous Database, CBO, Exadata, Function Based Indexes, Index statistics, JSON, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Virtual Columns.trackback
When Automatic Indexing was first released, one of the restrictions was that automatic indexes on JSON expressions were NOT supported.
However, the Oracle Database 21c doco mentions:
“Automatic indexes can be single or multi-column. They are considered for the following: Selected expressions (for example, JSON expressions)“.
So on my (admittedly dodgy) “Exadata” VM, I thought I’ll check out how AI now indeed deals with JSON expressions.
I start by creating a simple little table that uses the new 21c JSON datatype and populate it with some JSON documents (note the PONumber key has effectively unique numeric values assigned):
SQL> CREATE TABLE bowie_json (id number, bowie_date date, bowie_order JSON); SQL> insert into bowie_json select rownum, sysdate, '{"PONumber" : ' || rownum || ', "Reference" : "2022' || rownum || 'DBOWIE", "Requestor" : "David Bowie", "User" : "DBOWIE", "CostCenter" : "A42", "ShippingInstructions" : {"name" : "David Bowie", "Address": {"street" : "42 Ziggy Street", "city" : "Canberra", "state" : "ACT", "zipCode" : 2601, "country" : "Australia"}, "Phone" : [{"type" : "Office", "number" : "417-555-7777"}, {"type" : "Mobile", "number" : "417-555-1234"}]}, "Special Instructions" : null, "AllowPartialShipment" : true, "LineItems" : [{"ItemNumber" : 1, "Part" : {"Description" : "Hunky Dory", "UnitPrice" : 10.95}, "Quantity" : 5.0}, {"ItemNumber" : 2, "Part" : {"Description" : "Pin-Ups", "UnitPrice" : 10.95}, "Quantity" : 3.0}]}' from dual connect by level <= 2000000; 2000000 rows created. SQL> commit; Commit complete SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_JSON'); PL/SQL procedure successfully completed.
As always, it’s important to ensure the table has statistics, as AI does not work properly without them.
I then run a number of SQL statements, with different JSON expression based predicates, including:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242; SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber' returning number)=42; Execution Plan ---------------------------------------------------------- Plan hash value: 1196930810 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 12M | 34476 (1) | 00:00:02 | |* 1 | TABLE ACCESS FULL | BOWIE_JSON | 20000 | 12M | 34476 (1) | 00:00:02 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING NUMBER NULL ON ERROR)=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 259127 consistent gets 200279 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
They all return just the one row, but must currently use a Full Table Scan with no indexes present.
So what does AI make of things?
The first thing to note is that running the AI last activity report generates the following error:
SQL> select dbms_auto_index.report_last_activity() report from dual; ERROR: ORA-30954: char 0 is invalid in json_value(BOWIE_ORDER, '$.PONumber' returning VA ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 177 ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 107 ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676 ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676 ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 9226 ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 89 ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 167 ORA-06512: at line 1 no rows selected
If we look at the indexes now present with the table:
SQL> select index_name, index_type, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_JSON'; INDEX_NAME INDEX_TYPE AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------- ------------------------- --- --------- -------- ---------- ----------- ----------------- SYS_IL0000081096C00003$$ LOB NO VISIBLE VALID SYS_AI_ayvj257jd93cv FUNCTION-BASED NORMAL YES VISIBLE VALID 2000000 5141 380000 SYS_AI_gpdkwzugdn055 FUNCTION-BASED NORMAL YES VISIBLE VALID 2000000 4596 200000 SQL> select index_name, column_expression from user_ind_expressions where table_name='BOWIE_JSON'; INDEX_NAME COLUMN_EXPRESSION ------------------------- -------------------------------------------------------------------------------- SYS_AI_ayvj257jd93cv JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERR OR ON ERROR NULL ON EMPTY) SYS_AI_gpdkwzugdn055 JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ER ROR NULL ON EMPTY)
We can see that AI has indeed created two new automatic indexes, one on the VARCHAR2 JSON expression and one on the NUMBER JSON expression.
If we re-run the SQLs, we notice 3 very important points. Note the following example was run soon after the automatic indexes were created:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; Execution Plan ---------------------------------------------------------- Plan hash value: 832017402 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20000 | 12M | 1524 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 20000 | 12M | 1524 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_ayvj257jd93cv | 8000 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 234168 consistent gets 200279 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The first point to note is that the CBO now chooses to use the newly created automatic index. As only one row is return, this is as one would hope.
But there are two other very important points/issues worth making about the above execution plan and associated costs and statistics. One is associated with new AI behaviour introduced in 21c and the other is associated with an old trap in relation to function-based indexes.
I’ll leave it to the discernible reader to spot these issues, before I cover them in Part II in the coming days…
[…] my previous post on Automatic Indexing (AI) and JSON expressions, I mentioned two additional issues with the execution plan generated after the automatic indexes […]
LikeLiked by 1 person