jump to navigation

Oracle 19c Automatic Indexing: Mixing Manual and Automatic Indexes Part II (Stay) May 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Add Column To Existing Index, Automatic Indexing, Oracle Indexes.
add a comment

stay single

 

In my previous post, I discussed how Automatic Indexing did not recognise there was already an existing logically equivalent manually created index and so created effectively a redundant Automatic Index.

I also discussed previously how Automatic Indexing was clever enough to logically add new columns to existing Automatic Indexes if it determined such a new index can be used effectively for both previous and new workloads.

In this post, how will Automatic Indexing handle the scenario if a previously manually created index could also potentially be improved by adding a new column.

I’ll start by creating a table similar to my previous post but with more distinct values for the CODE3 column such that the test query will be more selective and so make the CBO favour the use of an index (Note: all examples are run on the OLTP Autonomous Database Cloud Service and hence the odd parallel execution plans):

SQL> create table major_tom6 (id number, code1 number, code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into major_tom6 select rownum, mod(rownum, 1000)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 100)), 'David Bowie' from dual connect by level = 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM6');

PL/SQL procedure successfully completed.

 

I’ll now manually create an index for BOTH combinations of the CODE2, CODE3 columns:

 

SQL> create index major_tom6_code2_code3_i on major_tom6(code2, code3);

Index created.

SQL> create index major_tom6_code3_code2_i on major_tom6(code3, code2);

Index created.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='MAJOR_TOM6';

INDEX_NAME                AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
MAJOR_TOM6_CODE2_CODE3_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       23697           9890973
MAJOR_TOM6_CODE3_CODE2_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       23697           9890973

 

If I now run the following query:

SQL> select * from major_tom6 where code2=42 and code3=42;

983 rows selected.

Execution Plan
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                          |  1000 | 28000 |   997   (1)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                          |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001                 |  1000 | 28000 |   997   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MAJOR_TOM6               |  1000 | 28000 |   997   (1)| 00:00:01 |
|   4 |     BUFFER SORT                       |                          |       |       |            |          |
|   5 |      PX RECEIVE                       |                          |  1000 |       |     5   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000                 |  1000 |       |     5   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                          |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | MAJOR_TOM6_CODE2_CODE3_I |  1000 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     8 - access("CODE2"=42 AND "CODE3"=42)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       971  consistent gets
         0  physical reads
         0  redo size
     27836  bytes sent via SQL*Net to client
      1303  bytes received via SQL*Net from client
        67  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
       983  rows processed

 

The CBO favours the use of an index as with just 983 rows returned from a 10M row table, the index is the cheaper access method.

If I now run the following SQL which also includes the more selectively CODE1 column predicate as well (which returns just 1 row):

SQL> select * from major_tom6 where code1=42 and code2=42 and code3=42;

Execution Plan
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                          |     1 |    28 |   997   (1)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                          |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001                 |     1 |    28 |   997   (1)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MAJOR_TOM6               |     1 |    28 |   997   (1)| 00:00:01 |
|   4 |     BUFFER SORT                       |                          |       |       |            |          |
|   5 |      PX RECEIVE                       |                          |  1000 |       |     5   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000                 |  1000 |       |     5   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                          |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | MAJOR_TOM6_CODE2_CODE3_I |  1000 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     3 - filter("CODE1"=42)
     8 - access("CODE2"=42 AND "CODE3"=42)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       971  consistent gets
         0  physical reads
         0  redo size
       867  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The CBO again uses the same index based on columns CODE2, CODE3 as this has already been proven to be more efficient than a FTS. However, an index that also included the CODE1 column would be even more efficient as the CBO could simply use this index to fetch just the row(s) of interest, without having to perform the unnecessary filtering on the CODE1 column.

So what does Automatic Indexing do in this scenario? If we look at the corresponding Auto Indexing Report:

 

INDEX DETAILS

-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
---------------------------------------------------------------------------------------
| Owner | Table      | Index                | Key               | Type   | Properties |
---------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM6 | SYS_AI_4nc6g08bw8db7 | CODE2,CODE3,CODE1 | B-TREE | NONE       |
---------------------------------------------------------------------------------------

 

We notice Auto Indexing has created a new index based on columns CODE2, CODE3, CODE1, however it has NOT dropped any indexes.

If we look at the Verification section of the Auto Indexing Report:

 

VERIFICATION DETAILS

-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 93zw1kj4n43n9
SQL Text             : select * from major_tom6 where code1=42 and code2=42 and code3=42
Improvement Factor   : 972.2x

Execution Statistics:
-----------------------------

                              Original Plan                 Auto Index Plan
                              ----------------------------  ----------------------------
Elapsed Time (s):             159122                        1240
CPU Time (s):                 70379                         1320
Buffer Gets:                  10698                         4
Optimizer Cost:               997                           4
Disk Reads:                   0                             2
Direct Writes:                0                             0
Rows Processed:               11                            1
Executions:                   11                            1

 

We can see the index was created because of a 972.2x improvement in the performance of the SQL query I ran.

If we look at the details of the indexes that now exist on the table:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='MAJOR_TOM6';

INDEX_NAME                AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
MAJOR_TOM6_CODE2_CODE3_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       23556           9890973
MAJOR_TOM6_CODE3_CODE2_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       24029           9890973
SYS_AI_4nc6g08bw8db7      YES NO  VISIBLE   DISABLED      VALID      10000000       29125           9999444

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='MAJOR_TOM6' order by index_name, column_position;

INDEX_NAME                COLUMN_NAME          COLUMN_POSITION
------------------------- -------------------- ---------------
MAJOR_TOM6_CODE2_CODE3_I  CODE2                              1
MAJOR_TOM6_CODE2_CODE3_I  CODE3                              2
MAJOR_TOM6_CODE3_CODE2_I  CODE3                              1
MAJOR_TOM6_CODE3_CODE2_I  CODE2                              2
SYS_AI_4nc6g08bw8db7      CODE2                              1
SYS_AI_4nc6g08bw8db7      CODE3                              2
SYS_AI_4nc6g08bw8db7      CODE1                              3

 

We notice a couple of key points.

Firstly, even though the previously created manual index on the columns (CODE2, CODE3) is now totally redundant because it has the same column list as the leading columns of the newly created Auto Index based on the columns (CODE2, CODE3, CODE1), Auto Indexing does NOT automatically drop the manually created index.

Auto Indexing ONLY automatically drops and logically recreates Auto Indexes.

Secondly, Auto Indexing is certainly aware of the previous workload because it has created the new Auto Index with the column list (CODE2, CODE3, CODE1) and NOT in the default CODE1, CODE2, CODE3 column order (as defined in the table definition).

This suggests Auto Indexing is indeed trying to create a new index that is able to cater for all known SQL workloads (predicates on just the CODE2, CODE3 columns and predicates on columns CODE1, CODE2, CODE3).

However, Auto Indexing does not (yet) have the capability to logically modify or drop obviously redundant manually created indexes (it can only do so on previously created Auto Indexes). This is likely one of the reasons why Oracle has provided us with the DROP_SECONDARY_INDEXES procedure in order to get rid of all those annoying manually created secondary indexes that can get in the way of an optimal indexing strategy.

More on DROP_SECONDARY_INDEXES in future posts.

Oracle 19c Automatic Indexing: Adding Columns To Existing Automatic Indexes (2+2=5) April 7, 2020

Posted by Richard Foote in 19c, 19c New Features, Add Column To Existing Index, Automatic Indexing, Oracle Indexes.
2 comments

2+2=5 Single

 

In my previous post, I discussed how when the following query is run:

select * from major_tom3 where code3=4 and code2=42;

the Automatic Indexing process will create an index on (CODE2, CODE3) but ultimately not use the index as the CBO considers the corresponding index based execution plan too expensive.

I’m going to expand on the demo and run now the following SQL for the first time (note these examples are run on the OLTP Autonomous Cloud service which explains the odd default parallel based execution plans):

SQL> select * from major_tom3
where code1=42 and code3=4 and code2=42;

10 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    10 |   280 |  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |    10 |   280 |  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |    10 |   280 |  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM3 |    10 |   280 |  7354   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("CODE1"=42 AND "CODE2"=42 AND "CODE3"=4)
    filter("CODE1"=42 AND "CODE2"=42 AND "CODE3"=4)

Statistics
----------------------------------------------------------
         6 recursive calls
         0 db block gets
     45853 consistent gets
         0 physical reads
         0 redo size
      1032 bytes sent via SQL*Net to client
       588 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
        10 rows processed

 

This query has a predicate that includes both CODE2 and CODE3 filtering columns as previously, but now also a new filtering column on CODE1. This now makes the resultant SQL much more selective than the previous SQL, returning just 10 rows where the previous SQL returned 9968 rows.

The current explain plan still uses the previous Full Table Scan, as the only index available is the Automatic Index created previously based on CODE2, CODE3 which has already been shown to be too expensive to return the necessary rows. The additional filtering predicate based on CODE1 doesn’t make the index any more efficient, it still has to access the 9968 rows that match the CODE2, CODE3 predicates and then filter out most of the rows less the 10 that are actually required.

So what does the Automatic Index process do in this scenario?

Let’s look at the resultant Automatic Index report:

 

INDEX DETAILS

-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
| Owner | Table      | Index                | Key               | Type   | Properties |
---------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM3 | SYS_AI_cy8rs2dqb0nrp | CODE2,CODE3,CODE1 | B-TREE | NONE       |
---------------------------------------------------------------------------------------

The following indexes were dropped:
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Owner | Table      | Index                | Key         | Type   | Properties|
---------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM3 | SYS_AI_bnyacywycxx8b | CODE2,CODE3 | B-TREE | NONE      |
---------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

So the first thing to note is that Oracle first creates a new index based on columns CODE2,CODE3,CODE1.

It then drops the previously created index based on the CODE2,CODE3 columns.

If we look at the Verification Details section of the report:

VERIFICATION DETAILS

-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 1hv3d685x2cy4
SQL Text             : select * from major_tom3 where code1=43 and code3=4 and code2=42
Improvement Factor   : 45853.6x

-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 97by2q15zprgc
SQL Text            : select * from major_tom3 where code1=42 and code3=4 and code2=42
Improvement Factor  : 45856.2x

Execution Statistics:
-----------------------------
                     Original Plan                Auto Index Plan
                     ---------------------------- ----------------------------
Elapsed Time (s):    2815446                      1046
CPU Time (s):        2741134                      1013
Buffer Gets:         596135                       13
Optimizer Cost:      7354                         13
Disk Reads:          0                            2
Direct Writes:       0                            0
Rows Processed:      130                          10
Executions:          13                           1
PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 2354969370

--------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost | Time     |
--------------------------------------------------------------------------------
| 0  | SELECT STATEMENT          |            |      |       | 7354 |          |
| 1  | PX COORDINATOR            |            |      |       |      |          |
| 2  | PX SEND QC (RANDOM)       | :TQ10000   |   10 |   280 | 7354 | 00:00:01 |
| 3  | PX BLOCK ITERATOR         |            |   10 |   280 | 7354 | 00:00:01 |
| 4  | TABLE ACCESS STORAGE FULL | MAJOR_TOM3 |   10 |   280 | 7354 | 00:00:01 |
--------------------------------------------------------------------------------

Notes
-----
- dop = 2
- px_in_memory_imc = no
- px_in_memory = no
- With Auto Indexes
-----------------------------
Plan Hash Value : 2892362571

-----------------------------------------------------------------------------------------------------
|  Id | Operation                           | Name                 | Rows | Bytes | Cost | Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   10 |   280 |   13 | 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | MAJOR_TOM3           |   10 |   280 |   13 | 00:00:01 |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_cy8rs2dqb0nrp |   10 |       |    3 | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE2"=42 AND "CODE3"=4 AND "CODE1"=42)
Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

So what’s going on here?

Oracle has correctly determined that an index based on all 3 columns would enable a much more efficient access path for the new SQL, by a factor of 45856.2x no less.

In order to service BOTH known queries that the Automatic Index process has determined would benefit from an index based on predicates (CODE2=42 and CODE3=4) and (CODE2=42 and CODE3=4 and CODE1=42), a single index based on CODE2,CODE3,CODE1 would suffice.

As such, the existing index based on just CODE2,CODE3 is redundant as the new index has the same leading columns. Therefore, the existing index can be safely dropped.

If we look at the definition of the indexes on this table:

SQL> select index_name, auto, constraint_index, visibility, status compression from user_indexes where table_name='MAJOR_TOM3';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS
-------------------- --- --- --------- ------------- --------
SYS_AI_cy8rs2dqb0nrp YES NO  VISIBLE   DISABLED      VALID

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='MAJOR_TOM3' order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_cy8rs2dqb0nrp CODE2                              1
SYS_AI_cy8rs2dqb0nrp CODE3                              2
SYS_AI_cy8rs2dqb0nrp CODE1                              3

 

We notice we now have just the new index, which is both VISIBLE and VALID, based on columns in CODE2,CODE3,CODE1 order.

So Automatic Indexing is clever enough to recognise the scenario where a new index can replace an existing index by adding additional columns to cater for new SQL workloads.

Now that’s rather impressive…

Much more on Automatic Indexing to come.