Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.add a comment
In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.
So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?
I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:
SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42)); Table created. SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD'); PL/SQL procedure successfully completed. SQL> create table pink_floyd1 (id number, code number, create_date date, name varchar2(42)); Table created. SQL> insert into pink_floyd1 select rownum, ceil(dbms_random.value(0, 25000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD1'); PL/SQL procedure successfully completed.
So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.
I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:
SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1152280033 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 63660 consistent gets 63649 physical reads 0 redo size 426 bytes sent via SQL*Net to client 380 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 564520720 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 856 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 (11) | 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss')) filter("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss')) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 64424 consistent gets 64413 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):
INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- ----------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ----------------------------------------------------------------------------- | BOWIE | PINK_FLOYD1 | SYS_AI_96snkmu4sk44g | CODE | B-TREE | NONE | ----------------------------------------------------------------------------- ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 7wag3gbk0b3tm SQL Text : select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021' Improvement Factor : 64442.3x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 568513 2771 CPU Time (s): 275534 1874 Buffer Gets: 1031078 406 Optimizer Cost: 856 405 Disk Reads: 1030609 3 Direct Writes: 0 0 Rows Processed: 0 0 Executions: 16 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 564520720 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 856 | | | 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 | 00:00:01 | ----------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 2703636439 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 405 | 00:00:01 | | * 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 405 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 403 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) * 2 - access("CODE"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
If we look at the definitions of all indexes currently on these tables:
SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='PINK_FLOYD'; INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_dp2t0j12zux49 YES INVISIBLE ADVANCED LOW UNUSABLE 10000000 21702 4161898 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- SYS_AI_dp2t0j12zux49 CODE 1 SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='PINK_FLOYD1'; INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_96snkmu4sk44g YES VISIBLE ADVANCED LOW VALID 10000000 15400 9969473 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD1'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- SYS_AI_96snkmu4sk44g CODE 1
In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.
However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.
The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.
So if we re-run the first query:
SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1152280033 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 63660 consistent gets 63649 physical reads 0 redo size 426 bytes sent via SQL*Net to client 380 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
It continues to use a Full Table Scan.
If we re-run the second query:
SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2703636439 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 41 | 415 (0) | 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 415 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 412 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 406 consistent gets 0 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).
BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:
SQL> create index pink_floyd1_code_create_date_i on pink_floyd1(code, create_date) compress advanced low; Index created. SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3366491378 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | PINK_FLOYD1_CODE_CREATE_DATE_I | 1 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "CREATE_DATE" IS NOT NULL) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Performance improves significantly further, by reducing Consistent Gets down to just 3.
So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:
A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or
A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…
Oracle 19c Automatic Indexing: Non-Equality Predicates Part I (Lucy Can’t Dance) January 14, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cloud, Oracle Indexes, Oracle19c.2 comments
I’ve been waiting a while before posting a series on the various limitations associated with Automatic Indexing, in order to see how the feature matures over time.
The following have all been re-tested post 1 January 2021 on the Autonomous ATP Database Cloud service, using Oracle Database version 19.5.0.0.0.
In the Oracle Documentation (including version 21c), the only limitations with regard Automatic Indexing listed are the following:
- Auto indexes are local B-tree indexes.
- Auto indexes can be created for partitioned as well as non-partitioned tables.
- Auto indexes cannot be created for temporary tables.
Well, as I discussed in the previous series on Automatic Indexing on Partitioned tables, we already saw how Oracle can actually also create Non-Partitioned (Global) indexes. So the limitation on Automatic Indexes being “local” indexes is not actually correct, even with 19c.
But are there other limitations that are not officially documented?
If you look at every example I’ve used previously with regard Automatic Indexing, they all feature Equality predicates. In the following examples, I’m going to run a series on Range Scan predicates that heavily filter and would benefit greatly from an index.
I first create a simple table with 10M rows:
SQL> create table ziggy1 (id number, code number, name varchar2(42)); Table created. SQL> insert into ziggy1 select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY1'); PL/SQL procedure successfully completed.
I then run the following range scan queries several times that each return only a few rows:
SQL> select * from ziggy1 where id between 42 and 50; 9 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 8 | 184 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 8 | 184 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID"<=50 AND "ID">=42) filter("ID"<=50 AND "ID">=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 596 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9 rows processed SQL> select * from ziggy1 where id < 0; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 1 | 23 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID"<0) filter("ID"<0) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 364 bytes sent via SQL*Net to client 344 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed SQL> select * from ziggy1 where id > 100000000000; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 1 | 23 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID">100000000000) filter("ID">100000000000) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 364 bytes sent via SQL*Net to client 355 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
If we look at the subsequent Automatic Indexing report:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 13-JAN-2021 11:55:37 Activity end : 13-JAN-2021 11:56:20 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 3 SQL statements improved : 0 SQL plan baselines created : 0 Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 -------------------------------------------------------------------------------
We notice NO Automatic Indexes were created.
We can run these queries endlessly and Automatic Indexing will never create associated Automatic Indexes:
SQL> select index_name, auto, constraint_index, visibility from user_indexes where table_name='ZIGGY1'; no rows selected
These queries are doomed to perform Full Table Scans unless indexes are manually created:
SQL> select * from ziggy1 where id between 42 and 50; 9 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 8 | 184 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 8 | 184 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID"<=50 AND "ID">=42) filter("ID"<=50 AND "ID">=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 596 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9 rows processed
Currently Automatic Indexes do not support Non-Equality predicates. Automatic Indexes are only created based on Equality-based predicates.
Obviously, Automatic Indexing is a fabulous feature and this might all change in the future. But with Non-Equality predicates so prevalent in SQL, it’s vital to note this current limitation when using and relying on Automatic Indexing…
Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part II (Neighbourhood Threat) January 13, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Transaction Processing, CBO, Exadata, Local Indexes, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Partitioned Indexes, Partitioning.1 comment so far
In my first post on Automatic Indexing on Partitioned Tables, I discussed how Automatic Indexing (AI) can now create a Non-Partitioned index if deemed the most effective indexing structure (this wasn’t supported when AI was initially released). A Non-Partitioned index is indeed likely the most efficient indexing structure if the underlying table has many partitions and associated SQL equality predicates only reference non-partition key columns. A Non-Partitioned index ensure Oracle only needs to scan the single index structure and not all the partitions of a Local index.
But what if SQLs do reference the column by which the underlying table is partitioned?
The following SQL has an equality filtering predicate on the RELEASE_DATE column, the column by which the BIG_BOWIE1 table is partitioned:
SQL> SELECT * FROM big_bowie1 where release_date = to_date('2013-12-30 22:15:25', 'syyyy-mm-dd hh24:mi:ss'); no rows selected
If we look at the subsequent AI report:
INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------ | Owner | Table | Index | Key | Type | Properties | ------------------------------------------------------------------------------------------------ | BOWIE | BIG_BOWIE1 | SYS_AI_14gpurjp8m76s | RELEASE_DATE | B-TREE | LOCAL | ------------------------------------------------------------------------------------------------ -------------------------------------------------------------------------------
We notice that Automatic Indexing has in this instance created a Local Index.
If we look further down the AI report:
------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 4mm3mbkk38pa8 SQL Text : SELECT * FROM big_bowie1 where release_date = to_date('2013-12-30 22:15:25', 'syyyy-mm-dd hh24:mi:ss') Improvement Factor : 8339x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 146957 71 CPU Time (s): 146124 71 Buffer Gets: 16678 3 Optimizer Cost: 162 4 Disk Reads: 0 0 Direct Writes: 0 0 Rows Processed: 0 0 Executions: 2 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 4031749531 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 162 | | | 1 | PARTITION RANGE SINGLE | | 3602 | 93652 | 162 | 00:00:01 | | 2 | TABLE ACCESS STORAGE FULL | BIG_BOWIE1 | 3602 | 93652 | 162 | 00:00:01 | ----------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 4049653350 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 78 | 4 | 00:00:01 | | 1 | PARTITION RANGE SINGLE | | 3 | 78 | 4 | 00:00:01 | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE1 | 3 | 78 | 4 | 00:00:01 | | * 3 | INDEX RANGE SCAN | SYS_AI_14gpurjp8m76s | 1 | | 3 | 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 3 - access("RELEASE_DATE"=TO_DATE(' 2013-12-30 22:15:25', 'syyyy-mm-dd hh24:mi:ss')) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
We can see Automatic Indexing has created the index because it provides an average Improvement Factor of 8339x. As the necessary indexed column(s) matches the table partitioning key, it makes sense for the associated index be a Local index as Oracle is certain which specific index partition to visit based on the value of the equality predicate.
If we look at the details of this new AI:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BIG_BOWIE1'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_14gpurjp8m76s YES NO VISIBLE ADVANCED LOW N/A 20000000 30742 19941449 SYS_AI_8armv0hqq73fa YES NO VISIBLE ADVANCED LOW VALID 20000000 42697 19995451 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='BIG_BOWIE1' order by index_name, column_position; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- SYS_AI_14gpurjp8m76s RELEASE_DATE 1 SYS_AI_8armv0hqq73fa TOTAL_SALES 1 SQL> SELECT index_name, partitioning_type, partition_count, locality FROM user_part_indexes WHERE table_name = 'BIG_BOWIE1'; INDEX_NAME PARTITION PARTITION_COUNT LOCALI ------------------------------ --------- --------------- ------ SYS_AI_14gpurjp8m76s RANGE 8 LOCAL SQL> select index_name, partition_name, status, compression from user_ind_partitions where index_name in (select index_name from user_indexes where table_name='BIG_BOWIE1') order by partition_position; INDEX_NAME PARTITION_NAME STATUS COMPRESSION -------------------- -------------------- -------- ------------- SYS_AI_14gpurjp8m76s ALBUMS_2013 USABLE ADVANCED LOW SYS_AI_14gpurjp8m76s ALBUMS_2014 USABLE ADVANCED LOW SYS_AI_14gpurjp8m76s ALBUMS_2015 USABLE ADVANCED LOW SYS_AI_14gpurjp8m76s ALBUMS_2016 USABLE ADVANCED LOW SYS_AI_14gpurjp8m76s ALBUMS_2017 USABLE ADVANCED LOW SYS_AI_14gpurjp8m76s ALBUMS_2018 USABLE ADVANCED LOW SYS_AI_14gpurjp8m76s ALBUMS_2019 USABLE ADVANCED LOW SYS_AI_14gpurjp8m76s ALBUMS_2020 USABLE ADVANCED LOW
We can see that indeed, a Visible, Usable, Local index was created by Automatic Indexing.
So depending on the column(s) within the index, Automatic Indexing can potentially create either a Local or Non-Partitioned index when indexing a partitioned table.
Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part I (Conversation Piece) October 14, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Index Access Path, Local Indexes, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Partitioned Indexes, Partitioning, Performance Tuning.1 comment so far
In this little series, I’m going to discuss how Automatic Indexing works in relation to Partitioning.
I’ve discussed Indexing and Partitioning many times previously and how Oracle has various options when indexing a partitioned table:
- Non-Partitioned Index
- Globally Partitioned Index
- Locally Partitioned Index
So the question(s) are how does Automatic Indexing handle scenarios with partitioned objects?
A very important point to make at the start is that based on my research, the answer has already changed significantly since Automatic Indexing was first released. So it’s important to understand that Automatic Indexing is an ever evolving capability, that will advance and improve as time goes on.
I’ll focus on how the feature currently works (as of Oracle Database 19.5), but will mention previously identified behaviour as a reference on how things can easily change.
In my first simple little example, I’m just going to create a range-partitioned table, partitioned based on RELEASE_DATE, with a partition for each year’s worth of data:
SQL> CREATE TABLE big_bowie1(id number, album_id number, country_id number, release_date date, total_sales number) PARTITION BY RANGE (release_date) (PARTITION ALBUMS_2013 VALUES LESS THAN (TO_DATE('01-JAN-2014', 'DD-MON-YYYY')), PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')), PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')), PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')), PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')), PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')), PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')), PARTITION ALBUMS_2020 VALUES LESS THAN (MAXVALUE)); Table created.
I’ll now add about 8 years worth of data:
SQL> INSERT INTO big_bowie1 SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800), ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000; 10000000 rows created. SQL> COMMIT; Commit complete.
As discussed previously, I’ll importantly collect statistics:
SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE1'); PL/SQL procedure successfully completed.
I’ll now run the following very selective query based the TOTAL_SALES column that is NOT part of the partitioning key:
SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42; 19 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2468051548 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 520 | 643 (15)| 00:00:01 | | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 643 (15)| 00:00:01 | 1 | 8 | |* 2 | TABLE ACCESS STORAGE FULL| BIG_BOWIE1 | 20 | 520 | 643 (15)| 00:00:01 | 1 | 8 | --------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - storage("TOTAL_SALES"=42) filter("TOTAL_SALES"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 44014 consistent gets 9516 physical reads 0 redo size 1107 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 19 rows processed
Without an index in place, the CBO has no choice but to use a FTS. But what will Automatic Indexing make of things?
If we look at the next Automatic Indexing report:
SQL> select dbms_auto_index.report_last_activity() from dual; GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 13-OCT-2020 01:47:48 Activity end : 13-OCT-2020 02:59:48 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 184.55 MB (184.55 MB / 0 B) Indexes dropped : 0 SQL statements verified : 2 SQL statements improved (improvement factor) : 1 (44119.6x) SQL plan baselines created : 0 Overall improvement factor : 25135.8x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- --------------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | --------------------------------------------------------------------------------- | BOWIE | BIG_BOWIE1 | SYS_AI_2zt7rg40mxa4n | TOTAL_SALES | B-TREE | NONE | --------------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : chwm2gubm8fx9 SQL Text : SELECT * FROM big_bowie1 WHERE total_sales = 42 Improvement Factor : 44119.6x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 4387193 1173 CPU Time (s): 2599423 1037 Buffer Gets: 749507 22 Optimizer Cost: 643 22 Disk Reads: 470976 2 Direct Writes: 0 0 Rows Processed: 323 19 Executions: 17 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 2468051548 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 643 | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 643 | 00:00:01 | | 2 | TABLE ACCESS STORAGE FULL | BIG_BOWIE1 | 20 | 520 | 643 | 00:00:01 | ----------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 937174207 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 19 | 494 | 22 | 00:00:01 | | 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE1 | 19 | 494 | 22 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_2zt7rg40mxa4n | 19 | | 3 | 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("TOTAL_SALES"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
We notice a couple of interesting points.
Firstly, yes Automatic Indexing has created an index based on the TOTAL_SALES column (SYS_AI_2zt7rg40mxa4n) as it improves performance by a reported 44119.6x.
Note also that the Automatic Index is a Non-Partitioned (Global) Index. From a performance perspective, this is the most efficient index to create to improve the performance of this query as the CBO only has the one index structure to navigate (vs. a LOCAL index that would require having to navigate down all 8 index structures for each table partition.
If we look at the index details:
SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes WHERE table_name = 'BIG_BOWIE1'; INDEX_NAME PAR AUT VISIBILIT STATUS ------------------------------ --- --- --------- -------- SYS_AI_2zt7rg40mxa4n NO YES VISIBLE VALID
We notice that this is indeed a Non-Partitioned Index, that is both VISIBLE and VALID and so can be potentially used by any database session.
If we now re-run the query:
SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42; 19 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 937174207 ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 520 | 23 (0)| 00:00:01 | | | | 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| BIG_BOWIE1 | 20 | 520 | 23 (0)| 00:00:01 | ROWID | ROWID | |* 2 | INDEX RANGE SCAN | SYS_AI_2zt7rg40mxa4n | 20 | | 3 (0)| 00:00:01 | | | ----------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TOTAL_SALES"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 23 consistent gets 0 physical reads 0 redo size 1166 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 19 rows processed
We can see the query now uses the newly created Automatic Index and is indeed more efficient, performing now just 23 consistent gets (previously 44014 consistent gets).
However, this was NOT previous behaviour.
The documentation previously mentioned that only LOCAL indexes are used when indexing partitioned tables.
If we run the same demo on Oracle Database 19.3, we get the following report:
GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 14-OCT-2020 13:12:07 Activity end : 14-OCT-2020 14:24:07 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 192.94 MB (192.94 MB / 0 B) Indexes dropped : 0 SQL statements verified : 1 SQL statements improved (improvement factor) : 1 (1950.5x) SQL plan baselines created : 0 Overall improvement factor : 1950.5x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- --------------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | --------------------------------------------------------------------------------- | BOWIE | BIG_BOWIE1 | SYS_AI_8armv0hqq73fa | TOTAL_SALES | B-TREE | LOCAL | --------------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 2pp8ypramw30s SQL Text : SELECT * FROM big_bowie1 WHERE total_sales = 42 Improvement Factor : 1950.5x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 6996973 27327 CPU Time (s): 6704215 12819 Buffer Gets: 815306 49 Optimizer Cost: 12793 28 Disk Reads: 2 40 Direct Writes: 0 0 Rows Processed: 475 25 Executions: 19 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 4294056405 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 12793 | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 12793 | 00:00:01 | | 2 | TABLE ACCESS FULL | BIG_BOWIE1 | 20 | 520 | 12793 | 00:00:01 | ----------------------------------------------------------------------------- - With Auto Indexes ----------------------------- Plan Hash Value : 3781269341 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 25 | 650 | 28 | 00:00:01 | | 1 | PARTITION RANGE ALL | | 25 | 650 | 28 | 00:00:01 | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE1 | 25 | 650 | 28 | 00:00:01 | | * 3 | INDEX RANGE SCAN | SYS_AI_8armv0hqq73fa | 25 | | 17 | 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 3 - access("TOTAL_SALES"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
As we can see, in this scenario, the newly created Automatic Index has a “Property” of LOCAL.
If we look at its index details:
SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes WHERE table_name = 'BIG_BOWIE1'; INDEX_NAME PAR AUT VISIBILIT STATUS ------------------------------ --- --- --------- -------- SYS_AI_8armv0hqq73fa YES YES VISIBLE N/A SQL> SELECT index_name, partitioning_type, partition_count, locality FROM user_part_indexes WHERE table_name = 'BIG_BOWIE1'; INDEX_NAME PARTITION PARTITION_COUNT LOCALI ------------------------------ --------- --------------- ------ SYS_AI_8armv0hqq73fa RANGE 8 LOCAL
We can see how a Local Index was previously created.
As such if we re-run an equivalent query:
SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42; 25 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3781269341 ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 520 | 26 (0)| 00:00:01 | | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 26 (0)| 00:00:01 | 1 | 8 | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| BIG_BOWIE1 | 20 | 520 | 26 (0)| 00:00:01 | 1 | 8 | |* 3 | INDEX RANGE SCAN | SYS_AI_8armv0hqq73fa | 20 | | 17 (0)| 00:00:01 | 1 | 8 | ----------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("TOTAL_SALES"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 50 consistent gets 0 physical reads 0 redo size 1555 bytes sent via SQL*Net to client 409 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 25 rows processed
Although the query is returning 6 more rows (as with the random number generation, has a slightly different data set), it’s more expensive proportionally now having to perform 50 consistent gets as it now has to read 8 index structures rather than just the one.
So (IMHO), Automatic Indexing has improved here, creating a more efficient index structure than previously. So always bear in mind that Automatic Indexing is an evolving beast, improving and adapting as time moves on.
However, note the compromise here is that by having an effectively Global index structure, there may be some additional issues depending on any subsequent structural changes to the table.
More on Automatic Indexing and Partitioning in my next post…
Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part II (Survive) October 7, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Internals, Index statistics, Oracle, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Stale Statistics.1 comment so far
In my previous post, I discussed how having stale statistics, usually a bad idea, is especially problematic with regard Automatic Indexes as it usually results in viable automatic indexes only being created in an UNUSABLE/INVISIBLE state.
If we were to now to collect the missing statistics:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STALE'); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE'; TABLE_NAME NUM_ROWS BLOCKS LAST_ANAL -------------------- ---------- ---------- --------- BOWIE_STALE 10000000 39677 06-JUL-20 SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols where table_name='BOWIE_STALE'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM LAST_ANAL -------------------- ------------ ---------- --------------- --------- ID 10000000 0 HYBRID 06-JUL-20 CODE 971092 .000001 HYBRID 06-JUL-20 NAME 1 4.9416E-08 FREQUENCY 06-JUL-20
If we now repeatedly re-run the problematic query many times:
SQL> select * from bowie_stale where code=42; 10 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 65903426 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 230 | 544 (14)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE | 10 | 230 | 544 (14)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42) filter("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39430 consistent gets 39421 physical reads 0 redo size 610 bytes sent via SQL*Net to client 361 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
The CBO is forced to use the FTS as the current Automatic Index is in an UNUSABLE/INVISIBLE state.
If we wait for the next Automatic Indexing reporting period:
SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 06-JUL-2020 05:12:42 Activity end : 06-JUL-2020 05:13:34 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 0 SQL statements improved : 0 SQL plan baselines created : 0 Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 -------------------------------------------------------------------------------
We notice that the Automatic Indexing process has nothing to report. Even though the problematic query is repeatedly executed, the SQL is now effectively on a blacklist and is not re-considered by the Automatic Indexing process.
If we look at the index details on the table:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_STALE'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ---------------------- --- --- --------- ------------- -------- ---------- ----------- ----------------- BOWIE_STALE_PK NO YES VISIBLE DISABLED VALID 10000000 20164 59110 SYS_AI_300kk2unp8tr0 YES NO INVISIBLE ADVANCED LOW UNUSABLE 10000000 23058 4147514
So the Automatic Index (SYS_AI_300kk2unp8tr0) is still UNUSABLE and INVISIBLE and can not be used by the CBO.
NOTE: In earlier patches of Oracle Database 19c (I’m using version 19.5.0.0.0 in this demo), I identified some scenarios after stale statistics when indexes were created in but in a VALID/INVISIBLE state, such that they could still not be used by the CBO in general database sessions.
If we simply re-run the same queries again from the time when the dependant object statistics were stale, any SQL is just ignored by the Automatic Indexing process.
As such, if we now subsequently re-run the problematic query again:
SQL> select * from bowie_stale where code=42; 10 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 65903426 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 230 | 544 (14)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE | 10 | 230 | 544 (14)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42) filter("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39430 consistent gets 39421 physical reads 0 redo size 610 bytes sent via SQL*Net to client 361 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
Again, the CBO has no choice here with no viable VALID/VISIBLE index present but to perform a FTS, even though its getting the cardinality estimates spot on since statistics gathering.
In Part III I’ll discuss how to get this query to finally use the Automatic Index and improve its performance, although if you’re a regular reader of the blog you should already know the solution…
Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.add a comment
In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.
However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.
Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).
These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.
One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.
This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:
SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation        | Name        | Rows | Bytes | Cost (%CPU)| Time    | ---------------------------------------------------------------------------------- |  0 | SELECT STATEMENT |             | 1005K|  135M| 11411  (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY | 1005K|  135M| 11411  (1)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 41169 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2705 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.
However, if we now remove the histogram on the CODE column:
SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’); PL/SQL procedure successfully completed.
There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.
So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start              : 18-AUG-2020 16:42:33 Activity end                : 18-AUG-2020 16:43:06 Executions completed        : 1 Executions interrupted      : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates                            : 0 Indexes created                             : 0 Space used                                  : 0 B Indexes dropped                             : 0 SQL statements verified                     : 1 SQL statements improved                     : 0 SQL plan baselines created (SQL statements) : 1 (1) Overall improvement factor                  : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes   : 0 Space used       : 0 B Unusable indexes : 0
We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.
If we look at the Verification Details part of this report:
VERIFICATION DETAILS ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- The following SQL plan baselines were created: ------------------------------------------------------------------------------- Parsing Schema Name    : BOWIE SQL ID                 : 3yz8unzhhvnuz SQL Text               : select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000) SQL Signature          : 3910785437403172730 SQL Handle             : SQL_3645e6a2952fcf7a SQL Plan Baselines (1) : SQL_PLAN_3cjg6naakzmvu198c05b9
We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.
Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.
Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.
Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.
This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:
SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz'; EXECUTION_NAMEÂ Â Â Â Â Â Â Â Â Â Â Â ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS -------------------------- -------------------- ---------------------- --------- SYS_AI_2020-08-18/16:42:33Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 41169Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 410291 REGRESSED
If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):
SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation        | Name        | Rows | Bytes | Cost (%CPU)| Time    | ---------------------------------------------------------------------------------- |  0 | SELECT STATEMENT |             |   32 | 4512 | 11425  (2)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY |   32 | 4512 | 11425  (2)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Hint Report (identified by operation id / Query Block Name / Object Alias): Total hints for statement: 1 (U - Unused (1)) --------------------------------------------------------------------------- 1 - SEL$1 U - dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS Note ----- - SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement Statistics ---------------------------------------------------------- 9 recursive calls 4 db block gets 41170 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2705 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.
So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…
Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Prettiest Star)) September 25, 2020
Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.1 comment so far
In my previous few blog posts, I’ve been discussing some issues in relation to how Automatic Indexes handle SQL statements that accesses skewed data. In this post, I’m going to setup the scenario in which Automatic Indexing can potentially use Baselines to help address some of these issues. BUT, as we’ll see, I’m having to manufacture things somewhat to make this work due to the problem of the Automatic Indexing task using Dynamic Sampling of level 11, whereas most usual database sessions do not.
To set things up, I’m going recap what I’ve previously discussed (but with a slight difference), by creating a table that has significant data skew on the CODE column, with most values very uncommon, but with a handful of values being very common:
SQL> create table space_oddity (id number constraint space_oddity_pk primary key, code number, name varchar2(142)); Table created. SQL> begin 2Â Â Â Â for i in 1..2000000 loop 3Â Â Â Â Â Â if mod(i,2) = 0 then 4Â Â Â Â Â Â Â Â Â insert into space_oddity values(i, ceil(dbms_random.value(0,1000000)), 'David Bowie is really Ziggy Stardust and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history'); 5Â Â Â Â Â Â else 6Â Â Â Â Â Â Â Â Â insert into space_oddity values(i, mod(i,20)*10000, 'Ziggy Stardust is really David Bowie and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history.'); 7Â Â Â Â Â Â end if; 8Â Â Â Â end loop; 9Â Â Â Â commit; 10Â end; 11Â / PL/SQL procedure successfully completed.
So most CODE values will only occur a few times if at all, but a few values divisible by 10000 have many many occurrences within the table.
Importantly, we will initially collect statistics with NO histograms on the CODE column, which is the default behaviour anyways if no SQL has previous run with predicates on the column:
SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we run a query based on a rare value for CODE:
SQL> set arraysize 5000 SQL> select * from space_oddity where code=25; Execution Plan ---------------------------------------------------------------------------------- | Id | Operation        | Name        | Rows | Bytes | Cost (%CPU)| Time    | ---------------------------------------------------------------------------------- |  0 | SELECT STATEMENT |             |    3 |  423 | 11356  (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY |    3 |  423 | 11356  (1)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=25) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40974 consistent gets 0 physical reads 0 redo size 1018 bytes sent via SQL*Net to client 402 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed
Without an index, the CBO has no choice at this point but to perform a FTS. BUT note that the 2 rows returned is very similar to the 3 estimated rows, which would make an index likely the way to go if such an index existed.
However, the following SQL accesses many of the common values of CODE and returns many rows:
SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation        | Name        | Rows | Bytes | Cost (%CPU)| Time    | ---------------------------------------------------------------------------------- |  0 | SELECT STATEMENT |             |   32 | 4512 | 11425  (2)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY |   32 | 4512 | 11425  (2)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 41169 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2678 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
Again, without an index in place, the CBO has no choice but to perform a FTS but this is almost certainly the way to go regardless. BUT without a histogram on the CODE column, the CBO has got the cardinality estimate way way off and thinks only 32 rows are to be returned and not the actual 1000011 rows.
So what does Automatic Indexing make of things. Let’s wait and have a look at the next Automatic Indexing Report:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start              : 18-AUG-2020 15:57:14 Activity end                : 18-AUG-2020 15:58:10 Executions completed        : 1 Executions interrupted      : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates                             : 1 Indexes created (visible / invisible)        : 1 (1 / 0) Space used (visible / invisible)             : 35.65 MB (35.65 MB / 0 B) Indexes dropped                              : 0 SQL statements verified                      : 1 SQL statements improved (improvement factor) : 1 (40984.3x) SQL plan baselines created                   : 0 Overall improvement factor                   : 40984.3x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes   : 0 Space used       : 0 B Unusable indexes : 0 INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ---------------------------------------------------------------------------- | Owner | Table       | Index               | Key | Type  | Properties | ---------------------------------------------------------------------------- | BOWIE | SPACE_ODDITY | SYS_AI_82bdnqs7q8rtm | CODE | B-TREE | NONE      | ----------------------------------------------------------------------------
So Automatic Indexing has indeed created the index (SYS_AI_82bdnqs7q8rtm) on the CODE column BUT this is based on only the one SQL statement:
VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID              : 19sv1g6tt0g1y SQL Text            : select * from space_oddity where code=25 Improvement Factor  : 40984.3x Execution Statistics: ----------------------------- Original Plan                Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 5417408                      139265 CPU Time (s):     1771880                      7797 Buffer Gets:      327876                       5 Optimizer Cost:   11356                        5 Disk Reads:       649                          2 Direct Writes:    0                            0 Rows Processed:   16                           2 Executions:       8                            1
The Automatic Indexing task has correctly identified a significant improvement of 40984.3x when using an index on the SQL statement that returned just the 2 rows. The other SQL statement that returns many rows IS NOT MENTIONED.
This is because the Automatic Indexing tasks uses Dynamic Sampling Level=11, meaning it determines the more accurate cardinality estimate on the fly and correctly identifies that a vast number of rows are going to be returned. As a result, it correctly determines that the new Automatic Indexing if used would be detrimental to performance and would not be used by the CBO.
BUT most importantly, it also makes the assumption that the CBO would automatically likewise make this same decision to NOT use any such index in other database sessions and so there’s nothing to protect.
BUT this assumption is incorrect IF other database sessions don’t likewise use Dynamic Sampling with Level=11.
BUT by default, including in Oracle’s Autonomous Database Transaction Processing Cloud environment, the Dynamic Sampling Level is NOT set to 11, but the 2.
Therefore, most database sessions will not be able to determine the correct cardinality estimate on the fly and so will incorrectly assume the number of returned rows is much less than in reality and potentially use any such new Automatic Index inappropriately…
So if we look at the Plans Section of the Automatic Indexing report:
PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 2301175572 ----------------------------------------------------------------------------- | Id | Operation          | Name        | Rows | Bytes | Cost | Time    | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT   |             |     |      | 11356 |         | | 1 |  TABLE ACCESS FULL | SPACE_ODDITY |   3 |  423 | 11356 | 00:00:01 | ----------------------------------------------------------------------------- - With Auto Indexes ----------------------------- Plan Hash Value : 54782313 ------------------------------------------------------------------------------------------------------- | Id | Operation                            | Name                | Rows | Bytes | Cost | Time    | ------------------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT                     |                     |   3 |  423 |   5 | 00:00:01 | |  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | SPACE_ODDITY        |   3 |  423 |   5 | 00:00:01 | | * 2 |   INDEX RANGE SCAN                  | SYS_AI_82bdnqs7q8rtm |   2 |      |   3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE"=25) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
The new plan for the SQL returning 2 rows when using the new Automatic Index and is much more efficient with a significantly reduced cost (just 3 down from 11356).
But again, the plans for the SQL that returns many rows are not listed as the Automatic Indexing task has already determined that an index would make such a plan significantly less efficient.
If we now rerun the SQL the returns many rows (and BEFORE High Frequency Collection Statistics potentially kicks in):
SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000); 1000011 rows selected. Execution Plan ------------------------------------------------------------------------------------------------------------- | Id | Operation                           | Name                | Rows | Bytes | Cost (%CPU)| Time    | ------------------------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT                    |                     |   32 | 4512 |   35  (0)| 00:00:01 | |  1 | INLIST ITERATOR                    |                     |      |      |           |         | |  2 |  TABLE ACCESS BY INDEX ROWID BATCHED| SPACE_ODDITY        |   32 | 4512 |   35  (0)| 00:00:01 | |* 3 |   INDEX RANGE SCAN                 | SYS_AI_82bdnqs7q8rtm |   32 |      |   12  (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 410422 consistent gets 0 physical reads 0 redo size 145536076 bytes sent via SQL*Net to client 2678 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
Note that the cardinality estimate is still way way wrong, thinking that just 32 rows are to be returned, when is fact 1000011 rows are returned.
As a result, the CBO has decided to incorrectly use the new Automatic Index. Incorrectly, in that the number of consistent gets has increased 10x from the previous FTS plan (410,422 now, up from 41,169).
One way to resolve this is to collect histograms on the CODE column (or wait for the High Frequency Stats Collection to kick in):
SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 2048’); PL/SQL procedure successfully completed.
If we now re-run this SQL:
SQL> select * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation        | Name        | Rows | Bytes | Cost (%CPU)| Time    | ---------------------------------------------------------------------------------- |  0 | SELECT STATEMENT |             |  996K|  133M| 11411  (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY |  996K|  133M| 11411  (1)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 41169 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2678 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
The cardinality estimate is now much more accurate and the the execution plan now uses the more efficient FTS.
In Part II, we’ll look at how the Automatic Indexing tasks can be made to identify the dangers of a new index to SQLs that might degrade in performance and how it will create a Baseline to protect against any such SQL regressions….
Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.add a comment
As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).
As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.
One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).
If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:
SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 101K| 2388K| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=42 AND "CODE2"=42) filter("CODE1"=42 AND "CODE2"=42) Note ----- - dynamic statistics used: dynamic sampling (level=AUTO) - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 609 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.
Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true)Â in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.
In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.
We currently have the following columns and statistics on the IGGY_POP table:
SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM -------------------- ------------ ---------- ----------- --------------- ID 9705425 0 254 HYBRID CODE1 100 .00000005 100 FREQUENCY CODE2 100 .00000005 100 FREQUENCY NAME 1 5.0210E-08 1 FREQUENCY
If we now collect Extended Statistics on both CODE1, CODE2 columns:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254'); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM ------------------------------ ------------ ---------- ----------- --------------- ID 9705425 0 254 HYBRID CODE1 100 .00000005 100 FREQUENCY CODE2 100 .00000005 100 FREQUENCY NAME 1 5.0210E-08 1 FREQUENCY SYS_STU#29QF8Y9BUDOW2HCDL47N44 99 .00000005 100 FREQUENCY
The CBO now has some idea on the cardinality if both columns are used within a predicate.
If we re-run the problematic query without the hint:
SQL> select * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 100K| 2343K| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=42 AND "CODE2"=42) filter("CODE1"=42 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 581 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.
However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:
SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50000 | 878K | 575 (15) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 | 878K | 575 (15) | 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=1 AND "CODE2"=42) filter("CODE1"=1 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 368 bytes sent via SQL*Net to client 377 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.
This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.
So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.
As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…
Oracle 19c Automatic Indexing: Data Skew Part III (The Good Son) September 16, 2020
Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Index Access Path, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Unusable Indexes.add a comment
I’m going to expand just a tad on my previous posts on data skew and run a simple query that returns a few rows based on a column predicate AND another query on the same column that returns many rows.
The following table has a CODE column as with previous posts with the data heavily skewed:
SQL> create table bowie_skew (id number, code number, name varchar2(42)); Table created. SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000; 1000000 rows created. SQL> update bowie_skew set code = 9 where mod(id,3) = 0; 333333 rows updated. SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000; 10000 rows updated. SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000; 5000 rows updated. SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000; 1000 rows updated. SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000; 1000 rows updated. SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000; 1000 rows updated. SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000; 100 rows updated. SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000; 100 rows updated. SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000; 100 rows updated. SQL> commit; Commit complete.
I’ll next collect statistics with NO histogram, as I don’t think they’re required at this point:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'bowie_skew', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we look at the table data:
SQL> select code, count(*) from bowie_skew group by code order by code; CODEÂ COUNT(*) ---------- ---------- 1Â Â Â Â 10000 2Â Â Â Â Â 5000 3Â Â Â Â Â 1000 4Â Â Â Â Â 1000 5Â Â Â Â Â 1000 6Â Â Â Â Â Â 100 7Â Â Â Â Â Â 100 8Â Â Â Â Â Â 100 9Â Â Â Â 327235 10Â Â Â Â 654465
The value “7” only has 100 associated rows, while the value “10” is very common with 654,465 rows.
But I currently have no histograms:
SQL> select column_name, num_buckets, histogram from user_tab_cols where table_name='BOWIE_SKEW'; COLUMN_NAMEÂ Â Â Â NUM_BUCKETS HISTOGRAM --------------- ----------- --------------- IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 NONE CODEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 NONE NAMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 NONE
If I run the following query with a CODE=7 predicate just once:
SQL> select * from bowie_skew where code=7; 100 rows selected. Execution Plan -------------------------------------------------------------------------------------------- | Id | Operation                  | Name     | Rows | Bytes | Cost (%CPU)| Time    | -------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT           |          |  100K| 1953K|  570  (7)| 00:00:01 | |  1 | PX COORDINATOR             |          |     |     |          |        | |  2 |  PX SEND QC (RANDOM)      | :TQ10000  |  100K| 1953K|  570  (7)| 00:00:01 | |  3 |   PX BLOCK ITERATOR       |          |  100K| 1953K|  570  (7)| 00:00:01 | |* 4 |    TABLE ACCESS STORAGE FULL| bowie_skew |  100K| 1953K|  570  (7)| 00:00:01 | --------------------------------------------------------------------------------------------
It uses a Full Table Scan (the CBO has no choice without an index) AND hopelessly gets the cardinality estimate wrong, thinking 100K are going to be returned (and not the 100 actual rows). So the CBO is unlikely to use an index anyways as it would be deemed too expensive to return so many rows.
I’ll now run the following query many times on the CODE=10 predicate that returns many rows:
SQL> select * from bowie_skew where code=10; 654465 rows selected. Execution Plan -------------------------------------------------------------------------------------------- | Id | Operation                  | Name     | Rows | Bytes | Cost (%CPU)| Time    | -------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT           |          |  100K| 1953K|  570  (7)| 00:00:01 | |  1 | PX COORDINATOR             |          |     |     |          |         | |  2 |  PX SEND QC (RANDOM)      | :TQ10000  |  100K| 1953K|  570  (7)| 00:00:01 | |  3 |   PX BLOCK ITERATOR       |          |  100K| 1953K|  570  (7)| 00:00:01 | |* 4 |    TABLE ACCESS STORAGE FULL| bowie_skew |  100K| 1953K|  570  (7)| 00:00:01 | --------------------------------------------------------------------------------------------
So again, no choice here with a FTS and we likely wouldn’t want to use an index anyways as it would be just too expensive.
If we check out what the Automatic Indexing process has done with such a workload:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- -------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | -------------------------------------------------------------------------- | BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE | -------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 6fm3m8cg2jnun SQL Text : select * from bowie_skew where code=7 Improvement Factor : 46.6x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 36653 1992 CPU Time (s): 33899 967 Buffer Gets: 4291 103 Optimizer Cost: 52 4 Disk Reads: 0 2 Direct Writes: 0 0 Rows Processed: 100 100 Executions: 1 1
An Automatic Index on the CODE column is created (SYS_AI_7psvzc164vbng), with ONLY the SQL based on the CODE=7 predicate listed in the report. The other query is indeed too expensive for a new index to be viable and so isn’t listed.
If we look at the Plans Section of the Automatic Indexing report:
PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 410492785 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 52 | | | 1 | TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 | 52 | 00:00:01 | -------------------------------------------------------------------------------------- Notes ----- - dop_reason = no expensive parallel operation - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 140816325 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 119 | 2380 | 4 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW | 119 | 2380 | 4 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_7psvzc164vbng | 100 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE"=7) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
The important point to note here is that the cardinality estimates are relatively accurate despite there being no histograms at this stage because the Automatic Indexing session uses Dynamic Sampling Level=11. Missing/inaccurate statistics are calculated on fly and this enables the session to accurately determine the size of the returned data set and that an index is indeed the more efficient access path.
So with mixed workloads, all it takes is one SQL executed once that demonstrably improves thanks to an index for the associated Automatic Index to be created as a VISIBLE/VALID index:
SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_SKEW'; INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- -------- ---------- ----------- ----------------- SYS_AI_7psvzc164vbng YES VISIBLE VALID 1000000 1537 8534
If we now run the query AFTER the histograms are subsequently created thanks to the High-Frequency Automatic Statistics Collection (see previous post), the new Automatic Index is now used:
SQL> select * from bowie_skew where code=7; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 140816325 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 100 | 2000 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW | 100 | 2000 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_7psvzc164vbng | 100 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=7) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 104 consistent gets 0 physical reads 0 redo size 2871 bytes sent via SQL*Net to client 359 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
Note if the histogram is NOT yet collected, the CBO will not determine the correct cardinality estimate and will ignore the new Automatic Index (as previously discussed).
If we run again the query that returns many rows:
SQL> select * from bowie_skew where code=10; 654465 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 410492785 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 654K| 12M| 52 (16)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_SKEW | 654K| 12M| 52 (16)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=10) filter("CODE"=10) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3725 consistent gets 0 physical reads 0 redo size 6549708 bytes sent via SQL*Net to client 1790 bytes received via SQL*Net from client 132 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 654465 rows processed
The new Automatic Index is correctly ignored by the CBO, as the query returns too many rows for the index to be viable.
So in this example, Automatic Indexing works exactly as it should. It creates a new Automatic Index for a query where it will indeed improve the performance, while other queries on the same column in which many more rows are returned are also run. For these other queries, the new Automatic Index is correctly not used as such an index would degrade the performance of the query.
In my next post, I’ll look at the first example with data skew where Automatic Indexing can be problematic…
Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets) September 10, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Data Skew, Full Table Scans, Histograms, Index Access Path, Index statistics, Low Cardinality, Oracle Blog, Oracle Indexes, Oracle19c, Performance Tuning.1 comment so far
When it comes to Automatic Indexes, things can become particularly interesting when dealing with data skew (meaning that some columns values are much less common than other column values). The next series of blog posts will look at a number of different scenarios in relation to how Automatic Indexing works with data that is skewed and not uniformly distributed.
I’ll start with a simple little example, that has an interesting little twist at the end.
The following table has a CODE column, which has 10 distinct values that a widely skewed, with some values much less common than others:
SQL> create table bowie_skew (id number, code number, name varchar2(42)); Table created. SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000; 1000000 rows created. SQL> update bowie_skew set code = 9 where mod(id,3) = 0; 333333 rows updated. SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000; 10000 rows updated. SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000; 5000 rows updated. SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000; 1000 rows updated. SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000; 1000 rows updated. SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000; 1000 rows updated. SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000; 100 rows updated. SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000; 100 rows updated. SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000; 100 rows updated. SQL> commit; Commit complete.
I’ll collect statistics on this table, but explicitly NOT collect histograms, so that the CBO will have no idea that the data is actually skewed. Note if I collected data with the default size, there would still be no histograms, as the column has yet to be used within an SQL predicate and so has no column usage recorded.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_SKEW', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
We can clearly see that some CODE values (such as “6”) have relatively few values, with only 100 occurrences:
SQL> select code, count(*) from bowie_skew group by code order by code; CODE COUNT(*) ---------- ---------- 1Â Â Â Â 10000 2Â Â Â Â Â 5000 3Â Â Â Â Â 1000 4Â Â Â Â Â 1000 5Â Â Â Â Â 1000 6Â Â Â Â Â Â 100 7Â Â Â Â Â Â 100 8Â Â Â Â Â Â 100 9Â Â Â Â 327235 10Â Â Â 654465
As I explicitly collected statistics with SIZE 1, we currently have NO histograms in the table:
SQL> select column_name, num_buckets, histogram from user_tab_cols where table_name='BOWIE_SKEW'; COLUMN_NAMEÂ Â Â NUM_BUCKETS HISTOGRAM --------------- ----------- --------------- IDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 NONE CODEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 NONE NAMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 NONE
Let’s now run the following query with a predicate on CODE=6, returning just 100 rows:
SQL> select * from bowie_skew where code=6; 100 rows selected. Execution Plan ------------------------------------------------------------------------------------------- | Id | Operation                    | Name       | Rows | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT             |            |  100K| 1953K|  570  (7)| 00:00:01 | |  1 | PX COORDINATOR               |            |       |       |            |          | |  2 |  PX SEND QC (RANDOM)        | :TQ10000  |  100K| 1953K|  570  (7)| 00:00:01 | |  3 |   PX BLOCK ITERATOR         |            |  100K| 1953K|  570  (7)| 00:00:01 | |* 4 |    TABLE ACCESS STORAGE FULL| BOWIE_SKEW |  100K| 1953K|  570  (7)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - storage("CODE"=6) filter("CODE"=6) Statistics ---------------------------------------------------------- 6 recursive calls 0 db block gets 3781 consistent gets 0 physical reads 0 redo size 2796 bytes sent via SQL*Net to client 654 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
The CBO has no choice but to use a FTS as I currently have no indexes on the CODE column. Note also that the CBO has got its cardinality estimates way wrong, expecting 100,000 rows and not the actual 100 rows, as I have no histograms on the CODE column.
So let’s now wait 15 minutes or so and see what the Automatic Indexing process decides to do. Following are portions of the next Auto Indexing report:
INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: -------------------------------------------------------------------------- | Owner | Table     | Index              | Key | Type  | Properties | -------------------------------------------------------------------------- | BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE      | -------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID              : fn4shnphu4bvj SQL Text           : select * from bowie_skew where code=6 Improvement Factor  : 41.1x Execution Statistics: ----------------------------- Original Plan               Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 119596                      322 CPU Time (s):    100781                      322 Buffer Gets:     11347                       103 Optimizer Cost:   570                          4 Disk Reads:      0                           0 Direct Writes:    0                           0 Rows Processed:   100                          100 Executions:      1                           1
So we can see that yes, Auto Indexing has decided to create a new index here on the CODE column (“SYS_AI_7psvzc164vbng“) as it improves the performance of the query by a factor of 41.1x.
If we look further down the Auto Indexing report and compare the execution plans:
PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 3374004665 ----------------------------------------------------------------------------------------- | Id | Operation                    | Name     | Rows  | Bytes  | Cost | Time   | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT              |           |      |       | 570 |        | | 1 | PX COORDINATOR               |           |      |       |     |        | | 2 |  PX SEND QC (RANDOM)       | :TQ10000  | 100000 | 2000000 | 570 | 00:00:01 | | 3 |   PX BLOCK ITERATOR        |           | 100000 | 2000000 | 570 | 00:00:01 | | 4 |    TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 | 570 | 00:00:01 | ----------------------------------------------------------------------------------------- - With Auto Indexes ----------------------------- Plan Hash Value : 140816325 ------------------------------------------------------------------------------------------------------- | Id | Operation                           | Name               | Rows | Bytes | Cost | Time    | ------------------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT                    |                     | 100 | 2000 |   4 | 00:00:01 | |  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW         | 100 | 2000 |   4 | 00:00:01 | | * 2 |   INDEX RANGE SCAN                 | SYS_AI_7psvzc164vbng | 100 |      |   3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE"=6) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
We can see that new execution plan indeed uses the index BUT interestingly, it has a correct cardinality estimate of 100 and not 100,000 as per the original plan.
Now this can be explained in that the Automatic Indexing process uses a Dynamic Sampling level of 11, meaning it can calculate the correct cardinality on the fly and can cause difficulties between what the Automatic Indexing process thinks the CBO costs will be vs. the CBO costs in a default database session that uses the (usually default) Dynamic Sampling level of 2 (as I’ve discussed previously).
BUT when I now rerun the SQL query again:
SQL> select * from bowie_skew where code=6; 100 rows selected. Execution Plan --------------------------------------------------------------------------------------------------- | Id | Operation                           | Name               | Rows | Bytes | Cost (%CPU)| --------------------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT                    |                     |  100 | 2000 |    4  (0)| |  1 | PX COORDINATOR                      |                     |      |      |           | |  2 |  PX SEND QC (RANDOM)               | :TQ10001           |  100 | 2000 |    4  (0)| |  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW         |  100 | 2000 |    4  (0)| |  4 |    BUFFER SORT                      |                     |     |      |           | |  5 |     PX RECEIVE                      |                     | 100 |      |    3  (0)| |  6 |      PX SEND HASH (BLOCK ADDRESS)   | :TQ10000           | 100 |      |    3  (0)| |  7 |       PX SELECTOR                  |                     |     |      |           | |* 8 |          INDEX RANGE SCAN           | SYS_AI_7psvzc164vbng | 100 |      |    3  (0)| --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 8 - access("CODE"=6) Statistics ---------------------------------------------------------- 12 recursive calls 0 db block gets 103 consistent gets 0 physical reads 0 redo size 2796 bytes sent via SQL*Net to client 654 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 100 rows processed
We notice the new Automatic Index is now used BUT also that the CBO has now determined the correct cardinality estimate of 100. But how is this possible when I haven’t recalculated the table statistics?
I’ll explain in my next post.
Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part I (Don’t Look Down) August 6, 2020
Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Full Table Scans, Index Rebuild, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.4 comments
I’ve discussed many times the importance of data clustering in relation to the efficiency of indexes. With respect to the efficiency of Automatic Indexes including their usage within Oracle’s Autonomous Database environments, data clustering is just as important.
The following demo was run on an Oracle 19c database within the Oracle Autonomous Database Transaction Processing Cloud environment.
I begin by creating a simple table that has the key column CODE, in which data is populated in a manner where the data is very poorly clustered:
SQL> create table nickcave (id number, code number, name varchar2(42)); Table created. SQL> insert into nickcave select rownum, mod(rownum, 100), 'Nick Cave and the Bad Seeds' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'NICKCAVE'); PL/SQL procedure successfully completed.
So we have 100 evenly distributed distinct CODE values but they’re all distributed throughout the table.
The following SQL statement is basically returning just 1% of the data and is executed a number of times:
SQL> select * from nickcave where code=42; 100000 rows selected. Execution Plan ----------------------------------------------------------------------------------------- | Id | Operation                  | Name   | Rows  | Bytes | Cost (%CPU)| Time  | ----------------------------------------------------------------------------------------- |  0 | SELECT STATEMENT           |        |  100K| 3613K| 9125  (5)| 00:00:01| |  1 | PX COORDINATOR             |        |       |     |          |        | |  2 |  PX SEND QC (RANDOM)      | :TQ10000 |  100K| 3613K| 9125  (5)| 00:00:01| |  3 |   PX BLOCK ITERATOR       |        |  100K| 3613K| 9125  (5)| 00:00:01| |* 4 |    TABLE ACCESS STORAGE FULL| NICKCAVE |  100K| 3613K| 9125  (5)| 00:00:01| ------------------------------------------------------------------------------------------
Without an index, the CBO currently has no choice but to use a Full Table Scan to access the table. So we wait for the next Automatic Index process to kick in:
SQL> select dbms_auto_index.report_last_activity() report from dual;
The Automatic Indexing report makes no mention of Automatic Indexes on the NICKCAVE table…
If we look to see if any indexes have actually been created:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE'; INDEX_NAMEÂ Â Â Â Â Â Â Â Â AUT CON VISIBILIT COMPRESSIONÂ Â STATUSÂ Â Â NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR -------------------- --- --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_dh8pumfww3f4r YES NOÂ INVISIBLE DISABLEDÂ Â Â Â Â UNUSABLEÂ 10000000Â Â Â Â Â Â 20346Â Â Â Â Â Â Â Â Â 4158302 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='NICKCAVE' order by index_name, column_position; INDEX_NAMEÂ Â Â Â Â Â Â Â COLUMN_NAMEÂ Â Â Â Â Â Â Â Â COLUMN_POSITION -------------------- -------------------- --------------- SYS_AI_dh8pumfww3f4r CODEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1
We can see that yes, an Automatic Index (SYS_AI_dh8pumfww3f4r) has been created on the CODE column of the NICKCAVE table BUT it remains in an INVISIBLE, UNUSABLE state.
So Automatic Indexing considered an index on CODE, created it in an INVISIBLE, USABLE state but when testing it, failed in that it found it to be less efficient than the current FTS and so reverted the Automatic Index back to an UNUSABLE index.
Therefore, if we run a bunch of other similar SQL statements such as the following:
SQL> select * from nickcave where code=24;
SQL> select * from nickcave where code=42;
SQL> select * from nickcave where code=13;
They all use the FTS as again, the CBO has no choice with no VALID index on the CODE column available.
If we keep checking the Automatic Indexing report:
SQL> select dbms_auto_index.report_last_activity() report from dual;
There’s still no mention of an index on the CODE column. The existing Automatic Index remains in an UNUSABLE state:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE'; INDEX_NAMEÂ Â Â Â Â Â Â Â Â AUT CON VISIBILIT COMPRESSIONÂ Â STATUSÂ Â Â NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR -------------------- --- --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_dh8pumfww3f4r YES NOÂ INVISIBLE DISABLEDÂ Â Â Â Â UNUSABLEÂ 10000000Â Â Â Â Â Â 20346Â Â Â Â Â Â Â Â Â 4158302
Basically, the index remains ineffective because with a Clustering Factor of 4158302, it’s just too inefficient to return the 1% (100000 rows) of the table.
Even in an Autonomous Database environment, nothing will automatically change with this scenario.
In my next post, we’ll look at how we can improve the performance of this query and get an Automatic Index to actually kick in with a USABLE index…
Oracle 19c Automatic Indexing: Configuration (All I Need) July 29, 2019
Posted by Richard Foote in Automatic Indexing, AUTO_INDEX_COMPRESSION, AUTO_INDEX_DEFAULT_TABLESPACE, AUTO_INDEX_MODE, AUTO_INDEX_REPORT_RETENTION, AUTO_INDEX_RETENTION_FOR_AUTO, AUTO_INDEX_RETENTION_FOR_MANUAL, AUTO_INDEX_SCHEMA, AUTO_INDEX_SPACE_BUDGET, DBA_AUTO_INDEX_CONFIG, DBMS_AUTO_INDEX.CONFIGURE, Oracle Indexes, Oracle19c, SMB$CONFIG.add a comment
In this post, I’ll discuss how to configure the new Automatic Indexing capability introduced in Oracle Database 19c.
The intent of Oracle here is to make the configuration of Automatic Indexing as simplistic as possible, with as few levers for DBAs to potentially stuff up as possible. The ultimate goal would be to have a switch that just turns the feature on and that all necessary indexes then simply be created/modified/dropped as required. It’s not quite there yet, but it’ll no doubt get closer with each new release.
By default, Automatic Indexing is turned OFF. To turn on these capabilities, you simply run the following using the DBMS_AUTO_INDEX.CONFIGURE procedure:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_MODE‘,’IMPLEMENT‘);
PL/SQL procedure successfully completed.
That’s it. Automatic Indexing is now enabled and as discussed in the previous blog post on the Automatic Indexing Methodology, every 15 minutes, the SYS_AUTO_INDEX_TASK background task will kickoff and automatically create/replace/drop any database indexes as necessary.
Another option, is to enable Automatic Indexing in ‘REPORT ONLY‘ mode:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_MODE’,’REPORT ONLY‘);
PL/SQL procedure successfully completed.
This will only create new indexes as INVISIBLE indexes, which are not considered by default by the CBO. The intent here is that the DBA can investigate the newly created Automatic Indexes and decide whether turning on this feature for real would be a good idea. Need to exercise some caution with this option though, as the limited options regarding how to subsequently administer the created Invisible Automatic Indexing can be problematic. I’ll discuss all this is more detail in a future post.
To turn off Automatic Indexing, simply set the AUTO_INDEX_MODE to ‘OFF’:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_MODE’,’OFF‘);
PL/SQL procedure successfully completed.
Note here the documentation states “the existing auto indexes are disabled” which is incorrect. New Automatic Indexes will no longer be created, but existing Automatic Indexes will still be both Visible and Valid and available to the CBO for continued use.
By default, Automatic Indexing considers all tables in all “user created” schemas. However, this can be controlled with the AUTO_INDEX_SCHEMA option within the DBMS_AUTO_INDEX.CONFIGURE procedure. You can control which schemas to either explicitly include or exclude from Automatic Indexing considerations.
To add the BOWIE schema to an “Inclusion” list of schemas, you run the following:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_SCHEMA‘,’BOWIE’, TRUE);
PL/SQL procedure successfully completed.
To add the BOWIE schema to an “Exclusion” list of schemas, you run the following:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_SCHEMA‘,’BOWIE’, FALSE);
PL/SQL procedure successfully completed.
To remove the BOWIE schema from whichever list it belongs, you run the following:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_SCHEMA‘, ‘BOWIE’, NULL);
PL/SQL procedure successfully completed.
To remove all schemas from the “Inclusion” list, you run the following:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_SCHEMA‘, NULL, TRUE);
PL/SQL procedure successfully completed.
You can configure a tablespace to be the tablespace in which all Automatic Indexes are to now be created by running the following:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_DEFAULT_TABLESPACE‘,’INDEX_TS’);
PL/SQL procedure successfully completed
The INDEX_TS tablespace is now the location of all newly created Automatic Indexes.
You can also control how much of the configured Automatic Indexing tablespace is to be reserved for use by Automatic Indexes (default 50%) by running the following:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_SPACE_BUDGET‘, ’42’);
PL/SQL procedure successfully completed.
Now, only 42% of the INDEX_TS tablespace can be used by Automatic Indexes.
You can control the number of days (the default is 373 days) in which if an Automatic Index has been deemed NOT to have been used, it will be automatically dropped. The following command:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_RETENTION_FOR_AUTO‘, ’42’);
PL/SQL procedure successfully completed.
will automatically drop any Automatic Index that is deemed not to have been used in the last 42 days.
Note: Oracle uses the new Indexing Tracking feature introduced in Oracle 12.2 to determine if an index has/has not been used, which has limitations that could potentially result in an Automatic Index that has been “lightly” used during the retention period being dropped. This will be discussed in more detail in a future post.
A similar retention configuration can be implemented for manually created indexes. The following command:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_RETENTION_FOR_MANUAL’,’42’);
PL/SQL procedure successfully completed.
will automatically drop any manually created index that is deemed not to have been used in the last 42 days.
Note: the same Indexing Tracking limitations means that manually created indexes lightly used during the last 42 days could also be automatically dropped.
I will discuss various undocumented implications of automatically dropping both Automatic and Manual Indexes in future posts.
Very importantly, the Automatic Indexing logs on which Automatic Indexing Reports are based are only retained by default for just 31 days. To change the time before the Automatic Indexing logs are deleted, you run the following:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_REPORT_RETENTION‘,’342’);
PL/SQL procedure successfully completed.
The Automatic Indexing logs are now retained for 342 days. These Automatic Indexing reports (and hence logs) are critical for understanding what the Automatic Indexing featuring is doing within the database and will be discussed extensively in future posts. I would recommend increasing the retention period from the 31 days default.
The final configuration option is not officially documented (yet) and controls whether or not Advanced Compression is used for Automatic Indexes. The default is that Advanced Compression is disabled, but this can be changed as follows:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_COMPRESSION‘,’ON’);
PL/SQL procedure successfully completed.
All Automatic Indexes are now automatically compressed using Advanced Low Compression.
I believe this was at some stage going to be the default behaviour, but due to licencing considerations and that many sites don’t have the Advanced Compression Option, this was subsequently changed. If you do have Advanced Compression, I would strongly recommend turning this ON, as implementing Advanced Low Compression is a relatively no-brainer beneficial decision.
I will however discuss the various undocumented implications of Advanced Compression in relation to Automatic Indexing in future posts.
The documented DBA_AUTO_INDEX_CONFIG view can be used to view the current setting for all of these configuration options:
SQL> select * from dba_auto_index_config; PARAMETER_NAMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â PARAMETER_VALUEÂ Â Â Â Â LAST_MODIFIEDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â MODIFIED_BY ---------------------------------------- -------------------- ---------------------------------------- -------------------- AUTO_INDEX_COMPRESSIONÂ OFF AUTO_INDEX_DEFAULT_TABLESPACE AUTO_INDEX_MODEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â REPORT ONLYÂ Â Â Â Â Â Â Â Â 03-JUL-19 05.43.28.000000 AMÂ Â Â Â Â Â Â Â Â Â BOWIE AUTO_INDEX_REPORT_RETENTIONÂ 31 AUTO_INDEX_RETENTION_FOR_AUTOÂ 42Â Â Â Â Â Â Â Â Â 20-JUN-19 06.32.06.000000 AMÂ Â Â Â Â Â Â Â Â Â BOWIE AUTO_INDEX_RETENTION_FOR_MANUAL Â Â Â Â Â Â Â Â Â Â 02-JUL-19 12.12.21.000000 AMÂ Â Â Â Â Â Â Â Â Â BOWIE AUTO_INDEX_SCHEMAÂ schema IN (BOWIE)Â Â Â 20-JUN-19 06.27.26.000000 AMÂ Â Â Â Â Â Â Â Â Â BOWIE AUTO_INDEX_SPACE_BUDGET Â Â 50
Additionally, you can view both the documented and undocumented settings regarding Automatic Indexing by looking at the SMB$CONFIG table:
SQL> select parameter_name, parameter_value from sys.SMB$CONFIG; PARAMETER_NAMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â PARAMETER_VALUE ---------------------------------------- --------------- SPACE_BUDGET_PERCENTÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 10 PLAN_RETENTION_WEEKSÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 53 SPM_TRACINGÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_CAPTURE_PARSING_SCHEMA_NAMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_CAPTURE_MODULEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_CAPTURE_ACTIONÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_CAPTURE_SQL_TEXTÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_INDEX_SCHEMAÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_INDEX_DEFAULT_TABLESPACEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_INDEX_SPACE_BUDGETÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 50 AUTO_INDEX_REPORT_RETENTIONÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 31 AUTO_INDEX_RETENTION_FOR_AUTOÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_INDEX_RETENTION_FOR_MANUALÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_INDEX_MODEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 _AUTO_INDEX_TRACEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 _AUTO_INDEX_TASK_INTERVALÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 900 _AUTO_INDEX_TASK_MAX_RUNTIMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 3600 _AUTO_INDEX_IMPROVEMENT_THRESHOLDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 20 _AUTO_INDEX_REGRESSION_THRESHOLDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 10 _AUTO_INDEX_ABSDIFF_THRESHOLDÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 100 _AUTO_INDEX_STS_CAPTURE_TASKÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 _AUTO_INDEX_CONTROLÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 _AUTO_INDEX_DERIVE_STATISTICSÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 _AUTO_INDEX_CONCURRENCYÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 _AUTO_INDEX_SPA_CONCURRENCYÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1 _AUTO_INDEX_REBUILD_TIME_LIMITÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 30 _AUTO_INDEX_REBUILD_COUNT_LIMITÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 5 _AUTO_INDEX_REVERIFY_TIMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 30 AUTO_INDEX_COMPRESSIONÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_SPM_EVOLVE_TASKÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 0 AUTO_SPM_EVOLVE_TASK_INTERVALÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 3600 AUTO_SPM_EVOLVE_TASK_MAX_RUNTIMEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 1800
I’ll discuss a number of these undocumented configuration options in future posts.
In my next post however, we’ll look at the Automatic Indexing feature in action with a very simple example to start with…
Oracle 19c Automatic Indexing: Methodology Introduction (After Today) July 24, 2019
Posted by Richard Foote in Automatic Indexing, Oracle Indexes, Oracle19c.1 comment so far
For the past month or so I’ve been playing around extensively with the new Oracle 19c “Automatic Indexing” feature, so I thought it was time to start blogging about it.
Considering it’s only in “Version 1” status, my initial impression is very positive in that it works extremely well doing at what it’s initially designed to do. There are certainly “interesting” bits of behavior here and there and some missing functionality, but it’s a whole lot better than DB running sub-optimally due to missing indexes.
This post is purely an introduction in which I’ll just discuss the general methodology behind the Automatic Indexing (AI) capability, initially in a somewhat simplistic manner in order to convey the more important concepts. I’ll expand and discuss many of the actual complexities behind this cool feature in future posts.
The basic concept behind AI is that the DBA via a simple “switch” can turn on the AIÂ feature and have the Oracle Database automatically create/drop/modify any necessary database indexes as required.
The Oracle database continually captures SQL workloads and keeps track of any new column usages within SQL predicates (as well as associated plans and execution statistics) that might warrant the need for a new index. Note that currently, only equality predicates are considered when determining potential candidate Automatic Indexes.
By default, a background task is executed every 15 minutes that by default runs for up to an hour, to determine if the database can identify any new indexes that “might” be warranted. There are a number of reasons why the tasks might actually take considerably more time than the default 1 hour (I’ll expand on reasons why in future posts).
Candidate indexes that have been detected based on the previous 15 mins database workload are initially created as INVISIBLE/UNUSABLE indexes and hard parsed with the captured SQLs to determine if the index could be considered by the CBO.
If the candidate indexes are indeed viable, the indexes are then created as INVISIBLE/USABLE indexes and verified via the SQL Performance Analyzer using SQL Tuning Sets to determine if the performance of the captured SQLs have indeed improved from the existing plans when using the newly created Automatic Indexes.
If performance actually improves for all captured SQLs using a new Automatic Index, the Automatic Index is made VISIBLE and is now available for general database use. If performance is worse for all captured SQLs, the Automatic Index is made UNUSABLE again and hence not available to the CBO. If performance is better for some SQLs but worse for others, well the story gets a little complicated. In “theory”, the Automatic Index is made USABLE but SQL baselines are created for the SQLs that suffer performance degradation to not use the Automatic Index. Any such SQLs are effectively “blacklisted” and are not (easily) considered for future “new” AI deliberations. As I’ll discuss in future posts, things are not actually quite as straightforward as that.
So depending on the scenario, Automatic Indexes can end up being in any of the following states:
- Invisible and Unusable
- Invisible and Valid
- Visible and Unusable
- Visible and Valid
When Visible and Valid, Automatic Indexes can ultimately be:
- Used by the CBO
- Not used by the CBO (even by the SQL that caused its creation)
As part of the identify candidate index process, Oracle will consider if a new index can be logically “merged” with an existing index and effectively replace an existing index by dropping and replacing it with a new Automatic Index.
AI will also monitor if existing (either Automatic or Manual) indexes are not currently being used within the database. If after a configurable period of time an index is deemed not to have been used within that time, the index will be automatically dropped during the AI 15 minute task.
This is the basic AI story. The actual story is a tad more complicated and which I’ll expand upon in many many future posts, so keep your questions until then 🙂
Importantly, AI is ONLY available on the Exadata platform or on Oracle Cloud environments. It is NOT available on standard On-Premises Oracle 19c deployments. Attempts to turn the feature on where not supported will only result in disappointment:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_MODE’,’IMPLEMENT’);
BEGIN DBMS_AUTO_INDEX.CONFIGURE(‘AUTO_INDEX_MODE’,’IMPLEMENT’); END;
*
ERROR at line 1:
ORA-40216: feature not supported
ORA-06512: at “SYS.DBMS_SYS_ERROR”, line 79
ORA-06512: at “SYS.DBMS_AUTO_INDEX_INTERNAL”, line 9180
ORA-06512: at “SYS.DBMS_AUTO_INDEX”, line 283
ORA-06512: at line 1
I’ll next discuss the AI configuration options available to the DBA…