Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part III (“Loaded”) April 28, 2022
Posted by Richard Foote in 19c, Advanced Index Compression, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Column Order, Index Compression, Oracle, Oracle 21c, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Overloading.add a comment
In my previous two posts, I’ve discussed scenarios where Automatic Indexing (AI) does not currently created automatic indexes and you may need to manually create the necessary indexes.
In this post, I’ll discuss a third scenario where AI will create an index, but you may want to manually create an even better one…
I’ll start by creating a relatively “large” table, with 20+ columns:
SQL> create table bowie_overload (id number, code1 number, code2 number, stuff1 varchar2(42), stuff2 varchar2(42), stuff3 varchar2(42), stuff4 varchar2(42), stuff5 varchar2(42), stuff6 varchar2(42), stuff7 varchar2(42), stuff8 varchar2(42), stuff9 varchar2(42), stuff10 varchar2(42), stuff11 varchar2(42), stuff12 varchar2(42), stuff13 varchar2(42), stuff14 varchar2(42), stuff15 varchar2(42), stuff16 varchar2(42), stuff17 varchar2(42), stuff18 varchar2(42), stuff19 varchar2(42), stuff20 varchar2(42), name varchar2(42)); Table created. SQL> insert into bowie_overload select rownum, mod(rownum, 1000)+1, '42', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'The Spiders From Mars' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_OVERLOAD'); PL/SQL procedure successfully completed.
The main columns to note here are CODE1 which contains 1000 distinct values (and so is kinda selective on a 10M row table, but not spectacularly so, especially with a poor clustering factor) and CODE2 which always contains the same value of “42” (and so will compress wonderfully for maximum effect).
I’ll next run the following query a number of times:
SQL> select code1, code2 from bowie_overload where code1=42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1883860831 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 70000 | 74817 (1) | 00:00:03 | | * 1 | TABLE ACCESS STORAGE FULL | BOWIE_OVERLOAD | 10000 | 70000 | 74817 (1) | 00:00:03 | -------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=24) filter("CODE1"=24) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 869893 consistent gets 434670 physical reads 0 redo size 183890 bytes sent via SQL*Net to client 7378 bytes received via SQL*Net from client 668 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
Without an index, the CBO currently has no choice but to perform a FTS. An index on the CODE1 column would provide the necessary filtering to fetch and return the required rows.
BUT, if this query was important enough, we could improve things further by “Overloading” this index with the CODE2 column, so we could use the index exclusively to get all the necessary data, without having to access the table at all. Considering an index on just the CODE1 column would need to fetch a reasonable number of rows (10000) and would need to visit a substantial number of different table blocks due to its poor clustering, overloading the index in this scenario would substantially reduce the necessary workloads of this query.
So what does AI do in this scenario, is overloading an index considered?
If we look at the AI report:
GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 28-APR-2022 12:15:45 Activity end : 28-APR-2022 12:16:33 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 134.22 MB (134.22 MB / 0 B) Indexes dropped : 0 SQL statements verified : 2 SQL statements improved (improvement factor) : 2 (47.1x) SQL plan baselines created : 0 Overall improvement factor : 47.1x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ------------------------------------------------------------------------------- | BOWIE | BOWIE_OVERLOAD | SYS_AI_aat8t6ad0ux0h | CODE1 | B-TREE | NONE | ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : bh5cuyv8ga0bt SQL Text : select code1, code2 from bowie_overload where code1=42 Improvement Factor : 46.9x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 42619069 241844 CPU Time (s): 25387841 217676 Buffer Gets: 12148771 18499 Optimizer Cost: 74817 10021 Disk Reads: 6085380 9957 Direct Writes: 0 0 Rows Processed: 140000 10000 Executions: 14 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 1883860831 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 74817 | | | 1 | TABLE ACCESS FULL | BOWIE_OVERLOAD | 10000 | 70000 | 74817 | 00:00:03 | -------------------------------------------------------------------------------- - With Auto Indexes ----------------------------- Plan Hash Value : 2541132923 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9281 | 64967 | 10021 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD | 9281 | 64967 | 10021 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_aat8t6ad0ux0h | 10000 | | 18 | 00:00:01 | --------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE1"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
We see that an automatic index on just the CODE1 column was created.
SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_OVERLOAD'; INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------- --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_aat8t6ad0ux0h YES VISIBLE ADVANCED LOW VALID 10000000 15363 10000000 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------- --------------- --------------- SYS_AI_aat8t6ad0ux0h CODE1 1
If we now re-run the query (noting in Oracle21c after you invalidate the current cursor):
SQL> select code1, code2 from bowie_overload where code1=42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2541132923 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 10000 | 70000 | 10021 (1)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD | 10000 | 70000 | 10021 (1)| 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_aat8t6ad0ux0h | 10000 | | 18 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE1"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 10021 consistent gets 0 physical reads 0 redo size 50890 bytes sent via SQL*Net to client 63 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
The query now uses the newly created automatic index.
BUT, at 10021 consistent gets, it’s still doing a substantial amount to work here.
If we manually create another index that overloads the only other column (CODE2) required in this query:
SQL> create index bowie_overload_code1_code2_i on bowie_overload(code1,code2) compress advanced low; Index created.
I’m using COMPRESS ADVANCED LOW as used by the automatic index, noting that CODE2 only contains the value “42” for all rows, making it particularly perfect for compression and a “best case” scenario when it comes to the minimal overheads potentially associated with overloading this index (I’m trying yo give AI every chance here):
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_OVERLOAD'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_aat8t6ad0ux0h YES NO VISIBLE ADVANCED LOW VALID 10000000 15363 10000000 BOWIE_OVERLOAD_CODE1_CODE2_I NO NO VISIBLE ADVANCED LOW VALID 10000000 15363 10000000 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- BOWIE_OVERLOAD_CODE1_CODE2_I CODE1 1 BOWIE_OVERLOAD_CODE1_CODE2_I CODE2 2 SYS_AI_aat8t6ad0ux0h CODE1 1
In fact, my manually created index is effectively the same size as the automatic index, with the same number (15363) of leaf blocks.
So I’m giving AI the best possible scenario in which it could potentially create an overloaded index.
But I’ve never been able to get AI to create overloaded indexes. Only columns in filtering predicates are considered for inclusion in automatic indexes.
If I now re-run my query again:
SQL> select code1, code2 from bowie_overload where code1=42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1161047960 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 70000 | 18 (0)| 00:00:01 | | * 1 | INDEX RANGE SCAN | BOWIE_OVERLOAD_CODE1_CODE2_I | 10000 | 70000 | 18 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("CODE1"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 21 consistent gets 0 physical reads 0 redo size 50890 bytes sent via SQL*Net to client 63 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
We notice the CBO now uses the manually created index without any table access path, as it can just use the index to access the necessary data.
The number of consistent gets as a result has reduced significantly, down to just 21, a fraction of the previous 10021 when the automatic index was used.
So the scenario an of overloaded index that could significantly reduce database resources, which is currently not supported by AI, is another example of where may want to manually create a necessary index.
As always, this may change in the future releases…
Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part II (“Ragazzo Solo, Ragazza Sola” April 27, 2022
Posted by Richard Foote in 19c, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Constraints, Exadata, Foreign Keys, Full Table Scans, Index Internals, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.1 comment so far
In my last post, I discussed how Automatic Indexing doesn’t create an automatic index in the scenario where the minimum or maximum of a column is required.
Another scenario when an automatic index is not created is when we hit issues associated with a missing index on a Foreign Key (FK) constraint.
As I’ve discussed many times previously, if you delete a parent record without an index on the dependant FK constraints, you hit a number of issues including having to perform a (potentially expensive and problematic) Full Table Scan (FTS) on the child tables and the associated locking problems.
To illustrate, I’ll first create a small parent table:
SQL> create table daddy (id number constraint daddy_pk primary key , name varchar2(42)); Table created. SQL> insert into daddy select rownum, 'David Bowie '|| rownum from dual connect by level <= 10000; 10000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DADDY'); PL/SQL procedure successfully completed.
And then a somewhat larger child table, with no index on the associated foreign key constraint:
SQL> create table kiddy (id number constraint kiddy_pk primary key , code1 number constraint daddy_fk references daddy(id), code2 number, code3 number, name varchar2(42)); Table created. SQL> insert into kiddy select rownum, mod(rownum,1000)+1000 , mod(rownum, 10000)+1, mod(rownum, 100000)+1, 'Ziggy Stardust '|| rownum from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'KIDDY'); PL/SQL procedure successfully completed.
If we delete a number of parent rows, for example:
SQL> delete from daddy where id = 101; 1 row deleted. Execution Plan ---------------------------------------------------------- Plan hash value: 1477800718 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 1 | 4 | 1 (0) | 00:00:01 | | 1 | DELETE | DADDY | | | | | |* 2 | INDEX UNIQUE SCAN | DADDY_PK | 1 | 4 | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=101) Statistics ---------------------------------------------------------- 18 recursive calls 13 db block gets 117462 consistent gets 22292 physical reads 4645500 redo size 204 bytes sent via SQL*Net to client 41 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We notice that even though we only delete one row from a relatively small table, we perform a large number of consistent gets (117462) due to the necessary FTS on the child table, as Oracle is forced to check the table for any possible FK violations. Without an index on the child CODE1 column, Oracle has no choice but to perform the relatively expensive FTS.
Additionally, if we have an existing transaction of a child table (in Session 1):
SQL> insert into kiddy values (10000001,1042,1042,1042,'Iggy Pop'); 1 row created.
And then in another session attempt to delete a parent row (in Session 2):
SQL> delete from daddy where id = 112;
The delete hangs in a locked state due to the child transaction in Session 1. This can lead to further locking issues in other sessions (Session 3):
insert into kiddy values (10000002,1042,1042,1042,'Iggy Pop');
The FTS on the child table and these associated locks can all be avoided by having an index on the FK constraint, as the index can then be used to effectively police the constraint during such delete operations.
What does AI do in this scenario?
Currently, nothing.
I’ve been unable to ever get AI to create a usable automatic index in this scenario. In Oracle Database 19c, I’ve not been able to get an AI created at all. In Oracle Database 21c, the best I’ve seen has been a Unusable/Invisible AI:
SQL> select index_name, index_type, auto, constraint_index, visibility, status, num_rows from user_indexes where table_n ame='KIDDY'; INDEX_NAME INDEX_TYPE AUT CON VISIBILIT STATUS NUM_ROWS ------------------------------ --------------------------- --- --- --------- -------- ---------- KIDDY_PK NORMAL NO YES VISIBLE VALID 10000004 SYS_AI_31thttf8v6r35 NORMAL YES NO INVISIBLE UNUSABLE 10000004 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='KIDDY'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- KIDDY_PK ID 1 SYS_AI_31thttf8v6r35 CODE1 1
So you may need to manually create such an index on the FK constraint to improve performance and eliminate these locking issues:
SQL> create index kiddy_code1_i on kiddy(code1); Index created. SQL> delete from daddy where id = 142; 1 row deleted. Execution Plan ---------------------------------------------------------- Plan hash value: 1477800718 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 1 | 4 | 1 (0) | 00:00:01 | | 1 | DELETE | DADDY | | | | | |* 2 | INDEX UNIQUE SCAN | DADDY_PK | 1 | 4 | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=142) Statistics ---------------------------------------------------------- 1 recursive calls 8 db block gets 2 consistent gets 2 physical reads 132 redo size 204 bytes sent via SQL*Net to client 41 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Consistent gets have dropped off massively (down to just 8) as Oracle can now use the index to avoid the FTS search on the child table. The associated locking issues are eliminated as well.
Note: As always, this AI behaviour can always change in the future…
Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part I (“Always Crashing In The Same Car”) April 26, 2022
Posted by Richard Foote in 19c, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, MAX, MIN, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.1 comment so far
As I’ve discussed previously, Oracle has increased the number of scenarios in which it will now create automatic indexes, such as with non-equality predicates and JSON expressions.
However, as of Oracle Database 21c, there are still a number of scenarios where an automatic index will NOT be created, even though an index might prove beneficial.
One such scenario is when the Min/Max of a column is required.
As I’ve discussed a number of times previously, Oracle can very efficiently use an index to determine either the Min or Max value of a column, by (hopefully) just visiting the first or last leaf block in an index. The INDEX FULL SCAN (MIN/MAX) execution plan path can be used explicitly for this purpose.
If I create a simple table as follows:
SQL> create table bowie_min (id number constraint bowie_min_pk primary key, code number, name varchar2(42)); Table created. SQL> insert into bowie_min select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_MIN'); PL/SQL procedure successfully completed.
And then run the following queries a number of times that return the Min and Max of the CODE column:
SQL> select min(code) from bowie_min; Execution Plan ---------------------------------------------------------- Plan hash value: 1068446691 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 6706 (2) | 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 5 | | | | 2 | TABLE ACCESS STORAGE FULL | BOWIE_MIN | 10M | 47M | 6706 (2) | 00:00:01 | ---------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39430 consistent gets 39421 physical reads 0 redo size 569 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select max(code) from bowie_min; Execution Plan ---------------------------------------------------------- Plan hash value: 1068446691 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 6706 (2) | 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 5 | | | | 2 | TABLE ACCESS STORAGE FULL | BOWIE_MIN | 10M | 47M | 6706 (2) | 00:00:01 | ---------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39430 consistent gets 39421 physical reads 0 redo size 569 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Currently, the CBO has no choice but to use a Full Table Scan (FTS) as there is currently no index on the CODE column.
So what does Automatic Indexing (AI) make of things?
Nothing.
Currently, AI will not create an index in this scenario, no matter how many times I execute these queries.
If we look at the indexes on the table after a significant period of time after running these queries:
SQL> select index_name, auto from user_indexes where table_name='BOWIE_MIN'; INDEX_NAME AUT ------------ --- BOWIE_MIN_PK NO
No Automatic Indexes. To improve the performance of these queries, we currently have to manually create the associated index:
SQL> create index bowie_min_code_i on bowie_min(code); Index created.
If we now re-run these queries and look at the execution plan:
SQL> select min(code) from bowie_min; Execution Plan ---------------------------------------------------------- Plan hash value: 252811132 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 3 (0) | 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 5 | | | | 2 | INDEX FULL SCAN (MIN/MAX) | BOWIE_MIN_CODE_I | 1 | 5 | 3 (0) | 00:00:01 | ----------------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 569 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We can see that the CBO is now indeed using the index to return the Min/Max values with a vastly reduced number of consistent gets (down to just 3 from the previous 38538).
However, a key point here is that Automatic Indexes only works on an Exadata platform and Exadata has various smarts that potentially makes accessing data via a “FTS” in this manner much more efficient than in non-Exadata environments.
Oracle may well take the position that getting Min/Max data on a Exadata is potentially efficient enough and doesn’t on its own warrant the creation of an index.
More on this in future posts…
Automatic Indexing: Deferred Invalidations (“The Post War Dream”) April 19, 2022
Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Deferred Invalidation, Exadata, Function Based Indexes, Index Access Path, Index Internals, JSON, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Richard's Blog.2 comments
In my previous post on how JSON expressions can now be automatically indexed, I mentioned there was an outstanding issue with the associated CBO execution plan, immediately post the creation of the automatic index:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; Execution Plan ---------------------------------------------------------- Plan hash value: 832017402 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20000 | 12M | 1524 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 20000 | 12M | 1524 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_ayvj257jd93cv | 8000 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 234168 consistent gets 200279 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
If we look at the number of recursive calls, we notice that it remains at 0. If we look at both the number of consistent gets (234168) and physical reads (200279), they both remain very high and identical to that of the previous Full Table Scan plan.
Basically, although autotrace suggests the newly created automatic index is being used, in fact the previous Full Table Scan plan is still being invoked. (Note: this of course is one of the dangers of the autotrace plan, in that it might not display the actual plan being invoked).
So what’s going on here?
The Oracle Database 21c New Features Guide makes the following point: “an enhancement has been introduced to reduce the overhead of cursor invalidations when a new automatic index is created”.
Oracle 12.2 introduced a new feature in which one can now defer the invalidation of dependent SQL cursors when an index is created or modified. I’ve of course discussed this previously in this 12.2 Index Deferred Invalidation post.
When an automatic index is created in 21c, the current SQL cursors are NOT invalidated (to reduce the overhead of having to potentially reparse of large number of current SQL cursors). However, this means that currently inefficient SQL statements will keep their existing sub-optimal execution plans post the creation of newly created automatic indexes, until the existing SQL cursors aged out.
At which point, the new CBO plan using the automatic index will actually be invoked:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; Execution Plan ---------------------------------------------------------- Plan hash value: 832017402 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 671 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 1 | 671 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_ayvj257jd93cv | 1 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42') Statistics ---------------------------------------------------------- 30 recursive calls 0 db block gets 46 consistent gets 11 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
So just be aware in Oracle Database 21c that your beautifully created automatic indexes may not actually get used as desired for a period of time…
Automatic Indexing: JSON Expressions Part II (“Without You I’m Nothing)” April 14, 2022
Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Column Statistics, High Frequency Statistics Collection.2 comments
In my previous post on Automatic Indexing (AI) and JSON expressions, I mentioned two additional issues with the execution plan generated after the automatic indexes were created:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; Execution Plan ---------------------------------------------------------- Plan hash value: 832017402 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20000 | 12M | 1524 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 20000 | 12M | 1524 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_ayvj257jd93cv | 8000 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 234168 consistent gets 200279 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The first issue is in relation to the Cost and the estimated cardinality (Rows) of the generated plan.
The execution plan has a Cost of 1524 but most importantly, an estimate cardinality of 20,000 rows. As only 1 row is actually returned, this row estimate is way way off and so therefore are the associated costs. This could potentially result in an inefficient plan and with the index not being used by the CBO.
The 20,000 row estimate comes from it being 1% of the number of rows (2 million) in the table. As I’ve discussed previously (as in this rather humourous post), the issue here is that the CBO has no idea what the expected cardinality might be, as the output from the JSON expression is effectively a black box.
Oracle generates virtual columns for this purpose, to capture column statistics that gives the CBO an accurate idea on the selectivity of expression based predicates.
But, if we look at the column statistics after the generation of the automatic indexes:
SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name='BOWIE_JSON'; TABLE_NAME COLUMN_NAME NUM_DISTINCT VIR ---------- -------------------------------------------------- ------------ --- BOWIE_JSON ID 1996800 NO BOWIE_JSON BOWIE_DATE 1 NO BOWIE_JSON BOWIE_ORDER 0 NO BOWIE_JSON SYS_IME_OSON_B33B6E113C404F2EBF6648C01F605C8D 1 YES BOWIE_JSON SYS_NC00005$ YES BOWIE_JSON SYS_NC00006$ YES
We notice that the two virtual columns generated for the JSON based expressions have no statistics. This is because we have yet to collect new statistics (or statistics specifically on hidden columns) since the creation of the automatic indexes. As a result, the CBO still has no idea on how many rows are estimated to come from the JSON based expressions and so has to resort to the (entirely wrong) 1% estimate.
The good news with Exadata environments (which of course includes the Autonomous Database environments), is that Oracle has the High Frequency Statistics Collection capability, which will automatically collect these missing statistics after a small (configurable) period of time. I’ve previously discussed High Frequency Statistics Collection here.
So if I just wait approximately 15 minutes in my “Exadata” environment and check out the columns statistics again:
SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name='BOWIE_JSON'; TABLE_NAME COLUMN_NAME NUM_DISTINCT VIR ---------- -------------------------------------------------- ------------ --- BOWIE_JSON ID 1996800 NO BOWIE_JSON BOWIE_DATE 1 NO BOWIE_JSON BOWIE_ORDER 0 NO BOWIE_JSON SYS_IME_OSON_B33B6E113C404F2EBF6648C01F605C8D 1 YES BOWIE_JSON SYS_NC00005$ 2000000 YES BOWIE_JSON SYS_NC00006$ 1996800 YES
We can see that the missing statistics have now been populated and the CBO can now accurately determine that these virtual columns are effectively unique.
If we now re-run the queries again, e.g.:
SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242; Execution Plan ---------------------------------------------------------- Plan hash value: 1921179906 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 669 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 1 | 669 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_gpdkwzugdn055 | 1 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=4242) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 233150 consistent gets 200279 physical reads 0 redo size 1599 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We can see that the CBO has now correctly estimated that just 1 row is to be returned and the associated CBO cost has reduced to just 4 (down from 1524) as a result.
So if you create a function-based index, make sure the generated virtual column (whether created automatically or if manually generated before the associated index) has the necessary statistics.
In the upcoming days, I’ll discuss the remaining issue associated with this plan…
Automatic Indexing: JSON Expressions Part I (Making Plans For Nigel) April 13, 2022
Posted by Richard Foote in Automatic Indexing, Autonomous Database, CBO, Exadata, Function Based Indexes, Index statistics, JSON, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Virtual Columns.1 comment so far
When Automatic Indexing was first released, one of the restrictions was that automatic indexes on JSON expressions were NOT supported.
However, the Oracle Database 21c doco mentions:
“Automatic indexes can be single or multi-column. They are considered for the following: Selected expressions (for example, JSON expressions)“.
So on my (admittedly dodgy) “Exadata” VM, I thought I’ll check out how AI now indeed deals with JSON expressions.
I start by creating a simple little table that uses the new 21c JSON datatype and populate it with some JSON documents (note the PONumber key has effectively unique numeric values assigned):
SQL> CREATE TABLE bowie_json (id number, bowie_date date, bowie_order JSON); SQL> insert into bowie_json select rownum, sysdate, '{"PONumber" : ' || rownum || ', "Reference" : "2022' || rownum || 'DBOWIE", "Requestor" : "David Bowie", "User" : "DBOWIE", "CostCenter" : "A42", "ShippingInstructions" : {"name" : "David Bowie", "Address": {"street" : "42 Ziggy Street", "city" : "Canberra", "state" : "ACT", "zipCode" : 2601, "country" : "Australia"}, "Phone" : [{"type" : "Office", "number" : "417-555-7777"}, {"type" : "Mobile", "number" : "417-555-1234"}]}, "Special Instructions" : null, "AllowPartialShipment" : true, "LineItems" : [{"ItemNumber" : 1, "Part" : {"Description" : "Hunky Dory", "UnitPrice" : 10.95}, "Quantity" : 5.0}, {"ItemNumber" : 2, "Part" : {"Description" : "Pin-Ups", "UnitPrice" : 10.95}, "Quantity" : 3.0}]}' from dual connect by level <= 2000000; 2000000 rows created. SQL> commit; Commit complete SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_JSON'); PL/SQL procedure successfully completed.
As always, it’s important to ensure the table has statistics, as AI does not work properly without them.
I then run a number of SQL statements, with different JSON expression based predicates, including:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242; SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber' returning number)=42; Execution Plan ---------------------------------------------------------- Plan hash value: 1196930810 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20000 | 12M | 34476 (1) | 00:00:02 | |* 1 | TABLE ACCESS FULL | BOWIE_JSON | 20000 | 12M | 34476 (1) | 00:00:02 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING NUMBER NULL ON ERROR)=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 259127 consistent gets 200279 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
They all return just the one row, but must currently use a Full Table Scan with no indexes present.
So what does AI make of things?
The first thing to note is that running the AI last activity report generates the following error:
SQL> select dbms_auto_index.report_last_activity() report from dual; ERROR: ORA-30954: char 0 is invalid in json_value(BOWIE_ORDER, '$.PONumber' returning VA ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 177 ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 107 ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676 ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676 ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 9226 ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 89 ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 167 ORA-06512: at line 1 no rows selected
If we look at the indexes now present with the table:
SQL> select index_name, index_type, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_JSON'; INDEX_NAME INDEX_TYPE AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------- ------------------------- --- --------- -------- ---------- ----------- ----------------- SYS_IL0000081096C00003$$ LOB NO VISIBLE VALID SYS_AI_ayvj257jd93cv FUNCTION-BASED NORMAL YES VISIBLE VALID 2000000 5141 380000 SYS_AI_gpdkwzugdn055 FUNCTION-BASED NORMAL YES VISIBLE VALID 2000000 4596 200000 SQL> select index_name, column_expression from user_ind_expressions where table_name='BOWIE_JSON'; INDEX_NAME COLUMN_EXPRESSION ------------------------- -------------------------------------------------------------------------------- SYS_AI_ayvj257jd93cv JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERR OR ON ERROR NULL ON EMPTY) SYS_AI_gpdkwzugdn055 JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ER ROR NULL ON EMPTY)
We can see that AI has indeed created two new automatic indexes, one on the VARCHAR2 JSON expression and one on the NUMBER JSON expression.
If we re-run the SQLs, we notice 3 very important points. Note the following example was run soon after the automatic indexes were created:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; Execution Plan ---------------------------------------------------------- Plan hash value: 832017402 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20000 | 12M | 1524 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 20000 | 12M | 1524 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_ayvj257jd93cv | 8000 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 234168 consistent gets 200279 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The first point to note is that the CBO now chooses to use the newly created automatic index. As only one row is return, this is as one would hope.
But there are two other very important points/issues worth making about the above execution plan and associated costs and statistics. One is associated with new AI behaviour introduced in 21c and the other is associated with an old trap in relation to function-based indexes.
I’ll leave it to the discernible reader to spot these issues, before I cover them in Part II in the coming days…