Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.trackback
In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.
So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?
I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:
SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42)); Table created. SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD'); PL/SQL procedure successfully completed. SQL> create table pink_floyd1 (id number, code number, create_date date, name varchar2(42)); Table created. SQL> insert into pink_floyd1 select rownum, ceil(dbms_random.value(0, 25000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD1'); PL/SQL procedure successfully completed.
So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.
I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:
SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1152280033 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 63660 consistent gets 63649 physical reads 0 redo size 426 bytes sent via SQL*Net to client 380 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 564520720 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 856 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 (11) | 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss')) filter("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss')) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 64424 consistent gets 64413 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):
INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- ----------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ----------------------------------------------------------------------------- | BOWIE | PINK_FLOYD1 | SYS_AI_96snkmu4sk44g | CODE | B-TREE | NONE | ----------------------------------------------------------------------------- ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 7wag3gbk0b3tm SQL Text : select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021' Improvement Factor : 64442.3x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 568513 2771 CPU Time (s): 275534 1874 Buffer Gets: 1031078 406 Optimizer Cost: 856 405 Disk Reads: 1030609 3 Direct Writes: 0 0 Rows Processed: 0 0 Executions: 16 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 564520720 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 856 | | | 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 | 00:00:01 | ----------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 2703636439 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 405 | 00:00:01 | | * 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 405 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 403 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) * 2 - access("CODE"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
If we look at the definitions of all indexes currently on these tables:
SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='PINK_FLOYD'; INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_dp2t0j12zux49 YES INVISIBLE ADVANCED LOW UNUSABLE 10000000 21702 4161898 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- SYS_AI_dp2t0j12zux49 CODE 1 SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='PINK_FLOYD1'; INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_96snkmu4sk44g YES VISIBLE ADVANCED LOW VALID 10000000 15400 9969473 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD1'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- SYS_AI_96snkmu4sk44g CODE 1
In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.
However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.
The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.
So if we re-run the first query:
SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1152280033 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 63660 consistent gets 63649 physical reads 0 redo size 426 bytes sent via SQL*Net to client 380 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
It continues to use a Full Table Scan.
If we re-run the second query:
SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2703636439 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 41 | 415 (0) | 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 415 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 412 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 406 consistent gets 0 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).
BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:
SQL> create index pink_floyd1_code_create_date_i on pink_floyd1(code, create_date) compress advanced low; Index created. SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3366491378 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | PINK_FLOYD1_CODE_CREATE_DATE_I | 1 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "CREATE_DATE" IS NOT NULL) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Performance improves significantly further, by reducing Consistent Gets down to just 3.
So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:
A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or
A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…
[…] I previously discussed how Automatic Indexing only currently supports Equality based predicates. […]
LikeLike