jump to navigation

Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part III (“Loaded”) April 28, 2022

Posted by Richard Foote in 19c, Advanced Index Compression, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Column Order, Index Compression, Oracle, Oracle 21c, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Overloading.
add a comment

In my previous two posts, I’ve discussed scenarios where Automatic Indexing (AI) does not currently created automatic indexes and you may need to manually create the necessary indexes.

In this post, I’ll discuss a third scenario where AI will create an index, but you may want to manually create an even better one…

I’ll start by creating a relatively “large” table, with 20+ columns:

SQL> create table bowie_overload (id number, code1 number, code2 number, stuff1 varchar2(42), stuff2 varchar2(42), stuff3 varchar2(42), stuff4 varchar2(42), stuff5 varchar2(42), stuff6 varchar2(42), stuff7 varchar2(42), stuff8 varchar2(42), stuff9 varchar2(42), stuff10 varchar2(42), stuff11 varchar2(42), stuff12 varchar2(42), stuff13 varchar2(42), stuff14 varchar2(42), stuff15 varchar2(42), stuff16 varchar2(42), stuff17 varchar2(42), stuff18 varchar2(42), stuff19 varchar2(42), stuff20 varchar2(42), name varchar2(42));

Table created.

SQL> insert into bowie_overload select rownum, mod(rownum, 1000)+1, '42', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'The Spiders From Mars' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_OVERLOAD');

PL/SQL procedure successfully completed.

 

The main columns to note here are CODE1 which contains 1000 distinct values (and so is kinda selective on a 10M row table, but not spectacularly so, especially with a poor clustering factor) and CODE2 which always contains the same value of “42” (and so will compress wonderfully for maximum effect).

I’ll next run the following query a number of times:

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1883860831

--------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                | 10000 | 70000 |  74817 (1) | 00:00:03 |
| * 1 | TABLE ACCESS STORAGE FULL | BOWIE_OVERLOAD | 10000 | 70000 |  74817 (1) | 00:00:03 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=24)
    filter("CODE1"=24)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     869893 consistent gets
     434670 physical reads
          0 redo size
     183890 bytes sent via SQL*Net to client
       7378 bytes received via SQL*Net from client
        668 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

 

Without an index, the CBO currently has no choice but to perform a FTS. An index on the CODE1 column would provide the necessary filtering to fetch and return the required rows.

BUT, if this query was important enough, we could improve things further by “Overloading” this index with the CODE2 column, so we could use the index exclusively to get all the necessary data, without having to access the table at all. Considering an index on just the CODE1 column would need to fetch a reasonable number of rows (10000) and would need to visit a substantial number of different table blocks due to its poor clustering, overloading the index in this scenario would substantially reduce the necessary workloads of this query.

So what does AI do in this scenario, is overloading an index considered?

If we look at the AI report:

GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 28-APR-2022 12:15:45
Activity end                : 28-APR-2022 12:16:33
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 134.22 MB (134.22 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 2
SQL statements improved (improvement factor) : 2 (47.1x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 47.1x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner | Table          | Index                | Key   | Type   | Properties |
-------------------------------------------------------------------------------
| BOWIE | BOWIE_OVERLOAD | SYS_AI_aat8t6ad0ux0h | CODE1 | B-TREE | NONE       |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name : BOWIE
SQL ID              : bh5cuyv8ga0bt
SQL Text            : select code1, code2 from bowie_overload where code1=42
Improvement Factor  : 46.9x

Execution Statistics:
-----------------------------
                    Original Plan                Auto Index Plan
                    ---------------------------- ----------------------------
Elapsed Time (s):   42619069                     241844
CPU Time (s):       25387841                     217676
Buffer Gets:        12148771                     18499
Optimizer Cost:     74817                        10021
Disk Reads:         6085380                      9957
Direct Writes:      0                            0
Rows Processed:     140000                       10000
Executions:         14                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 1883860831

--------------------------------------------------------------------------------
| Id | Operation         | Name           | Rows  | Bytes | Cost  | Time       |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |                |       |       | 74817 |            |
|  1 | TABLE ACCESS FULL | BOWIE_OVERLOAD | 10000 | 70000 | 74817 | 00:00:03   |
--------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value : 2541132923

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost  | Time       |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |  9281 | 64967 | 10021 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD       |  9281 | 64967 | 10021 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_aat8t6ad0ux0h | 10000 |       |    18 | 00:00:01   |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE1"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We see that an automatic index on just the CODE1 column was created.

 

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_OVERLOAD';

INDEX_NAME                AUT VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_aat8t6ad0ux0h      YES VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position;

INDEX_NAME                COLUMN_NAME     COLUMN_POSITION
------------------------- --------------- ---------------
SYS_AI_aat8t6ad0ux0h      CODE1                         1

 

If we now re-run the query (noting in Oracle21c after you invalidate the current cursor):

 

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2541132923

------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 |  Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      | 10000 | 70000 |   10021 (1)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD       | 10000 | 70000 |   10021 (1)| 00:00:01 |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_aat8t6ad0ux0h | 10000 |       |      18 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      10021 consistent gets
          0 physical reads
          0 redo size
      50890 bytes sent via SQL*Net to client
         63 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

The query now uses the newly created automatic index.

BUT, at 10021 consistent gets, it’s still doing a substantial amount to work here.

If we manually create another index that overloads the only other column (CODE2) required in this query:

SQL> create index bowie_overload_code1_code2_i on bowie_overload(code1,code2) compress advanced low;

Index created.

I’m using COMPRESS ADVANCED LOW as used by the automatic index, noting that CODE2 only contains the value “42” for all rows, making it particularly perfect for compression and a “best case” scenario when it comes to the minimal overheads potentially associated with overloading this index (I’m trying yo give AI every chance here):

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_OVERLOAD';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_aat8t6ad0ux0h           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000
BOWIE_OVERLOAD_CODE1_CODE2_I   NO  NO  VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
BOWIE_OVERLOAD_CODE1_CODE2_I   CODE1                         1
BOWIE_OVERLOAD_CODE1_CODE2_I   CODE2                         2
SYS_AI_aat8t6ad0ux0h           CODE1                         1

In fact, my manually created index is effectively the same size as the automatic index, with the same number (15363) of leaf blocks.

So I’m giving AI the best possible scenario in which it could potentially create an overloaded index.

But I’ve never been able to get AI to create overloaded indexes. Only columns in filtering predicates are considered for inclusion in automatic indexes.

If I now re-run my query again:

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1161047960

-------------------------------------------------------------------------------------------------
| Id  | Operation        | Name                         |  Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                              | 10000 | 70000 |      18 (0)| 00:00:01 |
| * 1 | INDEX RANGE SCAN | BOWIE_OVERLOAD_CODE1_CODE2_I | 10000 | 70000 |      18 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("CODE1"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         21 consistent gets
          0 physical reads
          0 redo size
      50890 bytes sent via SQL*Net to client
         63 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

We notice the CBO now uses the manually created index without any table access path, as it can just use the index to access the necessary data.

The number of consistent gets as a result has reduced significantly, down to just 21, a fraction of the previous 10021 when the automatic index was used.

So the scenario an of overloaded index that could significantly reduce database resources, which is currently not supported by AI, is another example of where may want to manually create a necessary index.

As always, this may change in the future releases…