jump to navigation

Costing Concatenated Indexes With Range Scan Predicates Part II (Coming Back To Life) July 27, 2022

Posted by Richard Foote in Automatic Indexing, CBO, Column Statistics, Concatenated Indexes, Explain Plan For Index, Full Table Scans, Index Access Path, Index Column Order, Index Column Reorder, Index Internals, Index statistics, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Index Seminar, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard Foote Training.
add a comment

In my previous Part I post, I discussed how the CBO basically stops the index leaf block access calculations after a non-equality predicate. This means that for an index with the leading indexed column being accessed via an unselective non-equality predicate, a large percentage of the index’s leaf blocks might need to be scanned, making the index access path unviable.

In the example in Part I, an index on the ID, CODE columns was too expensive due to the unselective range-scan predicate based on the leading ID column.

To provide the CBO a potentially much more efficient access path, we need an index with the more selective CODE predicate to be the leading column:

SQL> CREATE INDEX radiohead_code_id_i ON radiohead(code, id);

Index created.

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor

FROM user_indexes WHERE index_name = 'RADIOHEAD_CODE_ID_I';

INDEX_NAME                        BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
----------------------------- ---------- ----------- -----------------
RADIOHEAD_CODE_ID_I                    1         265             98619

If we now re-run the previous query:

SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_CODE_ID_I |     4 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        806  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice the CBO is now using this new index, as the costs for this index-based plan have dropped significantly, down to just 6 (from the previous 116). This overall cost of 6 is lower than the cost of 105 for the Full Table Scan and hence the reason why this index-based plan is now chosen by the CBO.

This is all due entirely to the significant drop in costs in accessing the index itself, now just 2 (from the previous 112).

Importantly, these much lower costs are accurate as we can tell via the reduced number of consistent reads, now just 7 (from 114 from the previous index-based plan).

If we now look at the associated costings:

Effective Index Selectivity = CODE selectivity x ID selectivity

= (1/10000) x ((5000-1000)/(10000-1) + 2 x (1/10000))

= 0.0001 x ((4000/9999) + 0.0002)

= 0.0001 x 0.40024)

= 0.000040024

Effective Table Selectivity = same as Index Selectivity

= 0.000040024

 

The effective index selectivity of 0.000040024 is now much lower than the previous (0.40024), as the CBO can now consider the product of the selectivities of both columns).

If we now plug this improved effective index selectivity into the index path costing calculations:

Index IO Cost = blevel +

ceil(effective index selectivity x leaf_blocks) +

ceil(effective table selectivity x clustering_factor)

 

Index IO Cost = 1  +  ceil(0.000040024 x 265) + ceil(0.000040024 x 99034)

= 1 + 1  + 4

= 2 + 4

= 6

Index Access Cost  = IO Costs + CPU Costs (in this plan, 0% of total costs and so unchanged from the IO costs)

= 2 + 4

= 6

We can see how the respective 2 and 6 improved CBO index costings are derived.

So again, it’s important to note that Automatic Indexing is doing entirely the correct thing with these examples, when it creates an index with the equality based predicate columns as the leading columns of the index…

Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”) July 14, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, Index Column Order, Invisible Indexes, Non-Equality Predicates, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning, Richard Foote Training, Richard's Blog.
3 comments

I’m currently putting together some Exadata related training for a couple of customers and came across a rather strange anomaly with regard the status of Automatic Indexes, when created in part on unselective, non-equality predicates.

As discussed previously, Oracle Database 21c now allows the creation of Automatic Indexes based on non-equality predicates (previously, Automatic Indexes were only created on equality-based predicates).

But one appears to get rather odd resultant Automatic Indexes in the scenario where the non-equality predicate is not particularly selective but other predicates are highly selective.

To illustrate, I’ll create a basic table that has two columns (ID and CODE) that are both highly selective:

SQL> create table ziggy_new (id number, code number, name varchar2(42));

Table created.

SQL> insert into ziggy_new select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_NEW');

PL/SQL procedure successfully completed.

So there are currently no indexes on this table.

I’ll next run the following SQL (and others similar) a number of times:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525

----------------------------------------------------------------------------------------
| Id  | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)
       filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      38605 consistent gets
      38600 physical reads
          0 redo size
        725 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

Without any indexes, the CBO currently has no choice but to use a Full Table Scan.

But only 1 row is returned. The first equality predicate on the CODE column is highly selective and on its own would only return 10 rows out of the 10M row table. The second, non-equality range-based predicate on the ID column is nowhere near as selective and offers limited additional filtering.

The CBO stops calculating index related costs after a non-equality predicate column (as subsequent column values could exist anywhere within the preceding range), and so the more effective index here is one based on (CODE, ID) with the non-equality predicate column second,  or potentially just on the CODE column only, as the ID range offers minimal filtering benefits.

So what does Automatic Indexing make of things?

If we look at the subsequent Automatic Indexing report:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 3
Indexes created (visible / invisible)        : 1 (0 / 1)
Space used (visible / invisible)             : 209.72 MB (0 B / 209.72 MB)
Indexes dropped                              : 0
SQL statements verified                      : 44
SQL statements improved (improvement factor) : 12 (64.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1.6x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Owner | Table     | Index                | Key     | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE       |
----------------------------------------------------------------------------

So Automatic Indexing has indeed created an index based on CODE,ID (a common Automatic Indexing trait appears to be to create an index based on all available predicates).

BUT the index is created as an INVISIBLE Index and so can not generally be used by database sessions.

SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='ZIGGY_NEW';

INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_75j16xff1ag3j           YES INVISIBLE VALID      10000000       25123          10000000

SQL> select index_name, column_name, column_position
     from user_ind_columns where table_name='ZIGGY_NEW';

INDEX_NAME                     COLUMN_NAME  COLUMN_POSITION
------------------------------ ------------ ---------------
SYS_AI_75j16xff1ag3j           CODE                       1
SYS_AI_75j16xff1ag3j           ID                         2

 

So re-running the previous SQL statements continues to use a Full Table Scan:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525

----------------------------------------------------------------------------------------
|  Id | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)
       filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      38605 consistent gets
      38600 physical reads
          0 redo size
        725 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

 

Automatic Indexing appears to only create Invisible indexes when there is an inefficient non-equality predicate present. It won’t create the index as a Visible index, even though it would significantly benefit these SQL statements that caused its creation. And Automatic Indexing won’t create an index on just the highly selective CODE equality predicate, which would also be of much benefit to these SQL statements.

If we now run similar queries, but with much more selective non-equality predicates, such as:

SQL> select * from ziggy_new where code=1 and id between 1 and 10;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525

----------------------------------------------------------------------------------------
|  Id | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("CODE"=1 AND "ID"<=10 AND "ID">=1)
       filter("CODE"=1 AND "ID"<=10 AND "ID">=1)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      38604 consistent gets
      38600 physical reads
          0 redo size
        503 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

Again, with no (Visible) index present, the CBO currently has no choice but to use the Full Table Scan.

But during the next cycle, after Automatic Indexing kicks in again:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 5
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 209.72 MB (209.72 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 89
SQL statements improved (improvement factor) : 31 (71.9x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Owner | Table     | Index                | Key     | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE       |
----------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : d4znwcu4h52ca
SQL Text            : select * from ziggy_new where code=42 and id between 1 and 10
Improvement Factor  : 38604x

Execution Statistics:
-----------------------------
                    Original Plan                Auto Index Plan
                    ---------------------------- ----------------------------
Elapsed Time (s):   3398605                      68
CPU Time (s):       3166824                      68
Buffer Gets:        463250                       3
Optimizer Cost:     6738                         4
Disk Reads:         463200                       0
Direct Writes:      0                            0
Rows Processed:     0                            0
Executions:         12                           1

PLANS SECTION
--------------------------------------------------------------------------------
-------------

- Original
-----------------------------
Plan Hash Value : 3165184525

--------------------------------------------------------------------------------
| Id | Operation                  | Name      | Rows | Bytes | Cost | Time     |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |           |      |       | 6738 |          |
|  1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 | 6738 | 00:00:01 |
--------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value : 1514586396

-------------------------------------------------------------------------------------------------------
|  Id | Operation                            | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    1 |    23 |    4 | 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |    4 | 00:00:01 |
| * 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

But this time, the index on the CODE,ID columns is created as a Visible index.

INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_75j16xff1ag3j           YES VISIBLE   VALID      10000000       25123          10000000

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='ZIGGY_NEW';

INDEX_NAME                     COLUMN_NAME  COLUMN_POSITION
------------------------------ ------------ ---------------
SYS_AI_75j16xff1ag3j           CODE                       1
SYS_AI_75j16xff1ag3j           ID                         2

So this index can be generally used, both by the newer SQLs that generated the now Visible index:

SQL> select * from ziggy_new where code=42 and id between 1 and 10;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1514586396

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |                      |    1 |    23 |       4 (0) | 00:00:01 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |       4 (0) | 00:00:01 |
|* 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        503 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

And also used by the SQLs with the unselective non-equality predicates, that Automatic Indexing would only create as Invisible indexes:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1514586396

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |                      |    1 |    23 |       4 (0) | 00:00:01 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |       4 (0) | 00:00:01 |
|* 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42 AND "ID">=1 AND "ID"<=100000)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
        729 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

 

Automatic Indexing appears to currently not quite do the right thing with SQL statements that have unselective non-equality predicates, by creating such indexes as only Invisible Indexes, inclusive of the unselective columns.

Although an edge case, I would recommend looking through the list of created Automatic Indexes to see if any such Invisible/Valid indexes exists, as it can suggest there are current inefficient SQL statements that could benefit from such indexes being Visible.

Automatic Indexes: Automatically Rebuild Unusable Indexes Part IV (“Nothing Has Changed”) May 31, 2022

Posted by Richard Foote in 19c, 19c New Features, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Column Order, Index Internals, Local Indexes, Mixing Auto and Manual Indexes, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle19c, Unusable Indexes.
1 comment so far

In a previous post, I discussed how Automatic Indexing (AI) does not automatically rebuild a manually built index that is in an Unusable state (but will rebuild an Unusable automatically created index).

The demo I used was a simple one, based on manually created indexes referencing a non-partitioned table.

In this post, I’m going to use a demo based on manually created indexes referencing a partitioned table.

I’ll start by creating a rather basic range-based partitioned table, using the RELEASE_DATE column to partition the data by year:

SQL> CREATE TABLE big_bowie (id number, album_id number, country_id number, release_date date,
total_sales number) PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2020 VALUES LESS THAN (TO_DATE('01-JAN-2021', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2021 VALUES LESS THAN (MAXVALUE));

Table created.

SQL> INSERT INTO big_bowie SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE');

PL/SQL procedure successfully completed.

I’ll next manually create a couple indexes; a non-partitioned index based on just the ALBUM_ID column and a prefixed locally partitioned index, based on the columns RELEASE_DATE, TOTAL_SALES:

 

SQL> create index album_id_i on big_bowie(album_id);

Index created.

SQL> create index release_date_total_sales_i on big_bowie(release_date, total_sales) local;

Index created.

 

If we now re-organise just partition ALBUMS_2017 (without using the ONLINE clause):

SQL> alter table big_bowie move partition albums_2017;

Table altered.

This results in the non-partitioned index and the ALBUMS_2017 local index partition becoming Unusable:

SQL> select index_name, status from user_indexes where table_name='BIG_BOWIE';

INDEX_NAME                     STATUS
------------------------------ --------
ALBUM_ID_I                     UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     N/A

SQL> select index_name, partition_name, status from user_ind_partitions
     where index_name='RELEASE_DATE_TOTAL_SALES_I';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2014          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2015          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2016          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2017          UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2018          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2019          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2020          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2021          USABLE

Let’s now run a number of queries a number of times. The first series is based on a predicate on just the ALBUM_ID column, such as:

SQL> select * from big_bowie where album_id=42;

2000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1510748290

-------------------------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           | 2000 | 52000 |    7959 (2) | 00:00:01 |       |       |
|   1 | PARTITION RANGE ALL |           | 2000 | 52000 |    7959 (2) | 00:00:01 |     1 |     8 |
| * 2 |  TABLE ACCESS FULL  | BIG_BOWIE | 2000 | 52000 |    7959 (2) | 00:00:01 |     1 |     8 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("ALBUM_ID"=42)
  - filter("ALBUM_ID"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      48593 consistent gets
      42881 physical reads
          0 redo size
      44289 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       2000 rows processed

We’ll also run a series of queries based on both the RELEASE_DATE column using dates from the unusable index partition and the TOTAL_SALES column, such as:

SQL> select * from big_bowie where release_date='01-JUN-2017' and total_sales=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3245457041

----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    1 |    26 |     986 (2) | 00:00:01 |       |       |
|   1 | PARTITION RANGE SINGLE |           |    1 |    26 |     986 (2) | 00:00:01 |     4 |     4 |
| * 2 |  TABLE ACCESS FULL     | BIG_BOWIE |    1 |    26 |     986 (2) | 00:00:01 |     4 |     4 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("TOTAL_SALES"=42 AND "RELEASE_DATE"=TO_DATE(' 2017-06-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
   - filter("TOTAL_SALES"=42 AND "RELEASE_DATE"=TO_DATE(' 2017-06-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       5573 consistent gets
          0 physical reads
          0 redo size
        676 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

Without a valid/usable index, the CBO currently has no choice but to use a Full Table Scan on the first query, and a Full Partition Scan on the partition with the unusable local index.

So what does AI make of things? Does it rebuild the unusable manually created indexes so the associated indexes can be used to improve these queries?

If we wait until the next AI task completes and check out the indexes on the table:

SQL> select index_name, status, partitioned from user_indexes where table_name='BIG_BOWIE';

INDEX_NAME                     STATUS   PAR
------------------------------ -------- ---
RELEASE_DATE_TOTAL_SALES_I     N/A      YES
ALBUM_ID_I                     UNUSABLE NO
SYS_AI_aw2825ffpus5s           VALID    NO
SYS_AI_2hf33fpvnqztw           VALID    NO

SQL> select index_name, partition_name, status from user_ind_partitions
     where index_name='RELEASE_DATE_TOTAL_SALES_I';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2014          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2015          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2016          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2017          UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2018          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2019          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2020          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2021          USABLE

We notice that AI has created two new non-partitioned automatic indexes, while both the manually created indexes remain in the same unusable state. If we look at the columns associated with these new automatic indexes:

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BIG_BOWIE';

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
ALBUM_ID_I                     ALBUM_ID                           1
RELEASE_DATE_TOTAL_SALES_I     RELEASE_DATE                       1
RELEASE_DATE_TOTAL_SALES_I     TOTAL_SALES                        2
SYS_AI_aw2825ffpus5s           ALBUM_ID                           1
SYS_AI_aw2825ffpus5s           RELEASE_DATE                       2
SYS_AI_2hf33fpvnqztw           TOTAL_SALES                        1
SYS_AI_2hf33fpvnqztw           RELEASE_DATE                       2

As we can see, AI has logically replaced both unusable indexes.

The manual index based on ALBUM_ID has been replaced with an inferior index based on the ALBUM_ID, RELEASE_DATE columns. Inferior in that the automatic index is both redundant (if only the manual index on ALBUM_ID were rebuilt) and in that it has the logically unnecessary RELEASE_DATE column to inflate the size of the index.

The manual index based on the RELEASE_DATE, TOTAL_SALES columns has been replaced with a redundant automatic index based on the reversed TOTAL_SALES, RELEASE_DATE columns.

Now, AI has indeed automatically addressed the current FTS performance issues associated with these queries by creating these indexes, but a better remedy would have been to rebuild the unusable manual indexes and hence negate the need for these redundant automatic indexes.

But currently (including with version 21.3), AI will NOT rebuild unusable manually created indexes, no matter the scenario, and will instead create additional automatic indexes if it’s viable for it to do so.

A reason why Oracle at times recommends dropping all current manually created secondary indexes before implementing AI (although of course this comes with a range of obvious issues and concerns).

If these manually created indexes didn’t exist, I’ll leave it as an exercise to the discernable reader on what automatic indexes would have been created…

As always, this restriction may change in future releases…

Automatic Indexes: Automatically Rebuild Unusable Indexes Part III (“Waiting For The Man”) May 17, 2022

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, Exadata, Full Table Scans, Manual Indexes, Mixing Auto and Manual Indexes, Oracle, Oracle Blog, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Unusable Indexes.
1 comment so far

I’ve previously discussed how Automatic Indexing (AI) will not only create missing indexes, but will also rebuild unusable indexes, be it a Global or Local index.

However, all my previous examples have been with Automatic Indexes. How does AI handle unusable indexes in which the indexes were manually created?

In my first demo, I’ll start by creating a basic non-partitioned table:

SQL> create table bowie_stuff (id number, album_id number, country_id number, release_date date, total_sales number);

Table created.

SQL> insert into bowie_stuff select rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE_STUFF');

PL/SQL procedure successfully completed.

We next manually create an index on the highly selective TOTAL_SALES column:

SQL> create index bowie_stuff_total_sales_i on bowie_stuff(total_sales);

Index created.

Let’s now invalidate the index by re-organising the table without the online clause:

SQL> alter table bowie_stuff move;

Table altered.

SQL> select index_name, status from user_indexes where table_name='BOWIE_STUFF';

INDEX_NAME                     STATUS
------------------------------ --------
BOWIE_STUFF_TOTAL_SALES_I      UNUSABLE

So the index is now in an UNUSABLE state.

To perk up the interest of AI, I’ll run a number of queries such as the following with a predicate condition on TOTAL_SALES:

select * from bowie_stuff where total_sales=42;

18 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 910563088

---------------------------------------------------------------------------------
| Id | Operation          | Name        | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------------
|  0 | SELECT STATEMENT   |             |   20 |   520 |    7427 (2) | 00:00:01 |
|* 1 |  TABLE ACCESS FULL | BOWIE_STUFF |   20 |   520 |    7427 (2) | 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("TOTAL_SALES"=42)
    filter("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      42746 consistent gets
      42741 physical reads
          0 redo size
       1392 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         18 rows processed

Without a valid index, the CBO has no choice but to perform an expensive full table scan.

However, it doesn’t matter how long I wait or how many different queries I run similar to the above, AI currently will never rebuild an unusable index if the index was manually created.

AI will only rebuild unusable automatically created indexes.

I’ve discussed previously how automatic and manually created indexes often don’t gel well together and is one of the key reasons why Oracle recommends dropping all manually created secondary indexes if you wish to implement AI (using the DBMS_AUTO_INDEX.DROP_SECONDARY_INDEXES procedure, which I’ll discuss in a future post).

Things can get a little interesting with AI, if the underlining table is partitioned and you have manually created unusable indexes.

As I’ll discuss in my next post…

Automatic Indexes: Automatically Rebuild Unusable Indexes Part II (“I Wish You Would”) May 11, 2022

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Local Indexes, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Partitioned Indexes, Partitioning, Performance Tuning, Rebuild Unusable Indexes.
1 comment so far

Within a few hours of publishing my last blog piece on how Automatic Indexing (AI) can automatically rebuild indexes that have been placed in an UNUSABLE state, I was asked by a couple of readers a similar question: “Does this also work if just a single partition of an partitioned index becomes unusable”?

My answer to them both is that I’ve provided them the basic framework in the demo to check out the answer to that question for themselves (Note: a fantastic aspect of working with the Oracle Database is that it’s available for free to play around with, including the Autonomous Database environments).

But based on the principle that for every time someone asks a question, there’s probably a 100 others who potentially might be wondering the same thing, thought I’ll quickly whip up a demo to answer this for all.

I’ll begin with the same table format and data as my previous blog:

SQL> CREATE TABLE big_ziggy(id number, album_id number, country_id number, release_date date,
total_sales number) PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2020 VALUES LESS THAN (TO_DATE('01-JAN-2021', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2021 VALUES LESS THAN (TO_DATE('01-JAN-2022', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2022 VALUES LESS THAN (MAXVALUE));

Table created.

SQL> INSERT INTO big_ziggy SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> COMMIT;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_ZIGGY');

PL/SQL procedure successfully completed.

 

But this time, I’ll run a number of queries similar to the following, that also has a predicate based on the partitioned key (RELEASE_DATE) of the table:

SQL> select * FROM big_ziggy where release_date = '01-JUN-2017' and total_sales = 123456;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3599046327

----------------------------------------------------------------------------------------------------
| Id | Operation              | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT       |           |    1 |    26 |    1051 (2) | 00:00:01 |        |       |
|  1 | PARTITION RANGE SINGLE |           |    1 |    26 |    1051 (2) | 00:00:01 |      3 |     3 |
|* 2 |  TABLE ACCESS FULL     | BIG_ZIGGY |    1 |    26 |    1051 (2) | 00:00:01 |      3 |     3 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage(("TOTAL_SALES"=123456 AND "RELEASE_DATE"=TO_DATE('2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
    filter(("TOTAL_SALES"=123456 AND "RELEASE_DATE"=TO_DATE('2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       5618 consistent gets
          0 physical reads
          0 redo size
        676 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

If we wait for the next AI task to kick in:

DBMS_AUTO_INDEX.REPORT_LAST_ACTIVITY()
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 11-MAY-2022 10:55:43
Activity end                : 11-MAY-2022 10:56:27
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 192.94 MB (192.94 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 6
SQL statements improved (improvement factor) : 3 (6670.1x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 2x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
--------------------------------------------------------------------------------
-------------
| Owner | Table     | Index                | Key                      | Type   | Properties |
---------------------------------------------------------------------------------------------
| BOWIE | BIG_ZIGGY | SYS_AI_6wv99zdbsy8ar | RELEASE_DATE,TOTAL_SALES | B-TREE | LOCAL      |
---------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

We can see that AI has indeed automatically created a LOCAL, partitioned index (on columns RELEASE_DATE, TOTAL_SALES) in this scenario, as we have an equality predicate based on the partitioned key (RELEASE_DATE).

Currently, all is well with the index, with all partitions in a USABLE state:

SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes WHERE table_name = 'BIG_ZIGGY';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_6wv99zdbsy8ar           YES YES VISIBLE   N/A

SQL> select index_name, partition_name, status from user_ind_partitions where index_name='SYS_AI_6wv99zdbsy8ar';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
SYS_AI_6wv99zdbsy8ar           ALBUMS_2015          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2016          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2017          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2018          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2019          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2020          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2021          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2022          USABLE

SQL> select index_name, column_name, column_position from user_ind_columns 
     where index_name='SYS_AI_6wv99zdbsy8ar';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_6wv99zdbsy8ar           RELEASE_DATE                  1
SYS_AI_6wv99zdbsy8ar           TOTAL_SALES                   2

 

But if we now do an offline reorg of a specific table partition:

SQL> alter table big_ziggy move partition albums_2017;

Table altered.

SQL> select index_name, partition_name, status from user_ind_partitions where index_name='SYS_AI_6wv99zdbsy8ar';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
SYS_AI_6wv99zdbsy8ar           ALBUMS_2015          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2016          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2017          UNUSABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2018          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2019          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2020          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2021          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2022          USABLE

 

We can see we’ve now made the associated Local Index partition UNUSABLE.

If we run the following query:

SQL> select * FROM big_ziggy where release_date = '01-JUN-2017' and total_sales = 123456;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3599046327

----------------------------------------------------------------------------------------------------
| Id | Operation              | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT       |           |    1 |    26 |     986 (2) | 00:00:01 |        |       |
|  1 | PARTITION RANGE SINGLE |           |    1 |    26 |     986 (2) | 00:00:01 |      3 |     3 |
|* 2 |  TABLE ACCESS FULL     | BIG_ZIGGY |    1 |    26 |     986 (2) | 00:00:01 |      3 |     3 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage(("TOTAL_SALES"=123456 AND "RELEASE_DATE"=TO_DATE('2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
    filter(("TOTAL_SALES"=123456 AND "RELEASE_DATE"=TO_DATE('2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

Statistics
----------------------------------------------------------
          3 recursive calls
          4 db block gets
       5578 consistent gets
       5571 physical reads
        924 redo size
        676 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

The CBO has no choice here but to do a full partition table scan.

If now wait again for the next AI task to strut its stuff:

SQL> select dbms_auto_index.report_last_activity() from dual;

DBMS_AUTO_INDEX.REPORT_LAST_ACTIVITY()
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 11-MAY-2022 11:42:42
Activity end                : 11-MAY-2022 11:43:13
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 192.94 MB (192.94 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 4
SQL statements improved (improvement factor) : 1 (5573x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1.1x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
--------------------------------------------------------------------------------
-------------
| Owner | Table     | Index                | Key                      | Type   | Properties |
---------------------------------------------------------------------------------------------
| BOWIE | BIG_ZIGGY | SYS_AI_6wv99zdbsy8ar | RELEASE_DATE,TOTAL_SALES | B-TREE | LOCAL      |
---------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------


SQL> select index_name, partition_name, status from user_ind_partitions where index_name='SYS_AI_6wv99zdbsy8ar';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
SYS_AI_6wv99zdbsy8ar           ALBUMS_2015          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2016          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2017          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2018          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2019          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2020          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2021          USABLE
SYS_AI_6wv99zdbsy8ar           ALBUMS_2022          USABLE

The index partition is now automatically in a USABLE state again.

If we look at the index object data:

SQL> select object_name, subobject_name, to_char(created, 'dd-Mon-yy hh24:mi:ss') created, to_char(last_ddl_time, 'dd-Mon-yy hh24:mi:ss’)
last_ddl_time from dba_objects where object_name='SYS_AI_6wv99zdbsy8ar';

OBJECT_NAME                    SUBOBJECT_NAME       CREATED                     LAST_DDL_TIME
------------------------------ -------------------- --------------------------- ---------------------------
SYS_AI_6wv99zdbsy8ar           ALBUMS_2015          11-May-22 10:41:33          11-May-22 10:56:14
SYS_AI_6wv99zdbsy8ar           ALBUMS_2016          11-May-22 10:41:33          11-May-22 10:56:15
SYS_AI_6wv99zdbsy8ar           ALBUMS_2017          11-May-22 10:41:33          11-May-22 11:42:42
SYS_AI_6wv99zdbsy8ar           ALBUMS_2018          11-May-22 10:41:33          11-May-22 10:56:18
SYS_AI_6wv99zdbsy8ar           ALBUMS_2019          11-May-22 10:41:33          11-May-22 10:56:19
SYS_AI_6wv99zdbsy8ar           ALBUMS_2020          11-May-22 10:41:33          11-May-22 10:56:20
SYS_AI_6wv99zdbsy8ar           ALBUMS_2021          11-May-22 10:41:33          11-May-22 10:56:22
SYS_AI_6wv99zdbsy8ar           ALBUMS_2022          11-May-22 10:41:33          11-May-22 10:56:22
SYS_AI_6wv99zdbsy8ar                                11-May-22 10:41:33          11-May-22 11:43:13

 

We can see that just the impacted index partition has been rebuilt.

The CBO can now successfully use the index to avoid the full partition table scan:

SQL> select * FROM big_ziggy where release_date = '01-JUN-2017' and total_sales = 123456;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3640710173

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                 | Rows | Bytes | Cost (%CPU)| Time     | Pstart | Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                      |    1 |    26 |      4 (0) | 00:00:01 |        |       |
|  1 | PARTITION RANGE SINGLE                     |                      |    1 |    26 |      4 (0) | 00:00:01 |      3 |     3 |
|  2 |  TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_ZIGGY            |    1 |    26 |      4 (0) | 00:00:01 |      3 |     3 |
|* 3 |   INDEX RANGE SCAN                         | SYS_AI_6wv99zdbsy8ar |    1 |       |      3 (0) | 00:00:01 |      3 |     3 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("RELEASE_DATE"=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "TOTAL_SALES"=123456)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        676 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

I’ll leave it to the discernible reader to determine if this also works in the scenario where the partitioned index were to be global… 🙂

Automatic Indexes: Automatically Rebuild Unusable Indexes Part I (“Andy Warhol”) May 10, 2022

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Rebuild Unusable Indexes.
2 comments

Obviously, the main feature of Automatic Indexing (AI) is for Oracle to automatically create indexes, that have been proven to improve performance, in a relatively safe and timely manner.

However, another nice and useful capability is for AI to automatically rebuild indexes that are placed in an “Unusable” state.

The documentation states that:

Automatic indexing provides the following functionality:

Rebuilds the indexes that are marked unusable due to table partitioning maintenance operations, such as ALTER TABLE MOVE.

Now, when AI was initially released, I was unable to get this rebuild capability to work as advertised. I don’t know whether this was because the capability had not yet been successfully implemented or because of some failings in my testing.

However, with both the current versions of Oracle Database 19c (19.15.0.1.0 as now implemented in Autonomous Databases) and Oracle Database 21c, the following demo now works successfully.

Let’s begin by creating a simple partitioned table:

SQL> CREATE TABLE big_bowie(id number, album_id number, country_id number, release_date date,
total_sales number) PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2020 VALUES LESS THAN (TO_DATE('01-JAN-2021', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2021 VALUES LESS THAN (TO_DATE('01-JAN-2022', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2022 VALUES LESS THAN (MAXVALUE));

Table created.

SQL> INSERT INTO big_bowie SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> COMMIT;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE');

PL/SQL procedure successfully completed.

We next run a number of SQL statements such as the following:

SQL> SELECT * FROM big_bowie WHERE total_sales = 123456;

19 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1510748290

-------------------------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |   20 |   520 |    7958 (2) | 00:00:01 |       |      |
|   1 |  PARTITION RANGE ALL |           |   20 |   520 |    7958 (2) | 00:00:01 |     1 |    8 |
| * 2 |   TABLE ACCESS FULL  | BIG_BOWIE |   20 |   520 |    7958 (2) | 00:00:01 |     1 |    8 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("TOTAL_SALES"=123456)
    filter("TOTAL_SALES"=123456)

Statistics
----------------------------------------------------------
          1 recursive calls
          0 db block gets
      49573 consistent gets
      42778 physical reads
          0 redo size
       1423 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         19 rows processed

If we wait for the AI task to kick in, we notice is has successfully created an associated automatic index:

SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes WHERE table_name = 'BIG_BOWIE';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_17cd4101fvrk1           NO  YES VISIBLE   VALID

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='BIG_BOWIE';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_17cd4101fvrk1           TOTAL_SALES                   1

As discussed previously, AI can now create a non-partitioned, Global index if deemed more efficient than a corresponding Local index.

Note that the newly created automatic index is currently VALID.

However, if we re-organise a partition within the table without using the Online clause:

SQL> alter table big_bowie move partition albums_2017;

Table altered.

SQL> select index_name, partitioned, auto, visibility, status from user_indexes where table_name = 'BIG_BOWIE';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_17cd4101fvrk1           NO  YES VISIBLE   UNUSABLE

The index as a result goes into an UNUSABLE state.

Running similar queries from this point will result in a FTS again:

SQL> select * from big_bowie where total_sales=42;

22 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1510748290

-------------------------------------------------------------------------------------------------
| Id | Operation            | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT     |           |   20 |   520 |    7937 (2) | 00:00:01 |       |       |
|  1 |  PARTITION RANGE ALL |           |   20 |   520 |    7937 (2) | 00:00:01 |     1 |     8 |
|* 2 |   TABLE ACCESS FULL  | BIG_BOWIE |   20 |   520 |    7937 (2) | 00:00:01 |     1 |     8 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("TOTAL_SALES"=123456)
    filter("TOTAL_SALES"=123456)

Statistics
----------------------------------------------------------
          126 recursive calls
            0 db block gets
        48962 consistent gets
        42799 physical reads
            0 redo size
         1497 bytes sent via SQL*Net to client
           52 bytes received via SQL*Net from client
            2 SQL*Net roundtrips to/from client
           17 sorts (memory)
            0 sorts (disk)
           22 rows processed

If we now wait until the next AI task period and check out the index:

SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes WHERE table_name = 'BIG_BOWIE';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_17cd4101fvrk1           NO  YES VISIBLE   VALID

We notice the index is now back in a VALID state again.

 

Checking out the date attributes of the index confirms the index has indeed been rebuilt:

SQL> select object_name, to_char(created, 'dd-Mon-yy hh24:mi:ss') created, to_char(last_ddl_time, 'dd-Mon-yyhh24:mi:ss’)
last_ddl_time from dba_objects where object_name='SYS_AI_17cd4101fvrk1';

OBJECT_NAME                    CREATED                     LAST_DDL_TIME
------------------------------ --------------------------- ---------------------------
SYS_AI_17cd4101fvrk1           18-Apr-22 11:59:36          18-Apr-22 18:37:42

Being in a VALID state again, the CBO can now use the automatic index:

SQL> select * from big_bowie where total_sales=42;

22 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 920768077

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                   | Name                 | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                            |                      |   20 |   520 |      23 (0) | 00:00:01 |       |       |
|  1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE            |   20 |   520 |      23 (0) | 00:00:01 | ROWID | ROWID |
|* 2 |   INDEX RANGE SCAN                          | SYS_AI_17cd4101fvrk1 |   20 |       |       3 (0) | 00:00:01 |       |       |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      48711 consistent gets
      42799 physical reads
          0 redo size
       1497 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         22 rows processed

Note: This scenario works the same if the table is Non-Partitioned.

In my next post, I’ll discuss a scenario where the automatic rebuild of an Unusable index will currently NOT work…

Automatic Indexes: AUTO_INDEX_TABLE Configuration (“Without You”) May 3, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, AUTO_INDEX_TABLE, DBMS_AUTO_INDEX.CONFIGURE, Exadata, Oracle, Oracle 21c, Oracle Cloud, Oracle General, Oracle Indexes.
2 comments

One of the more common questions I get regarding Automatic Indexing (AI) are areas of concern around having large and expensive automatic index build operations suddenly occurring in one’s database and the impact this may have on overall performance.

Additionally, I’ve had questions around scenarios where very large automatic indexes are suddenly being built, but then get canceled because they couldn’t complete in the default (3600 seconds, 1 hour) allocated time, only for them to be attempted to be built again and for this cycle to be forever ongoing.

And this is fair enough. You may not necessarily want to have indexes built on specific tables, perhaps because they’re massive and you want to control when and how indexes on such tables are built, perhaps because you’re satisfied that such tables are already indexed satisfactorily, etc. etc.

Note: the impact on overall database performance of the AI task creating large indexes is reduced, by Oracle only allowing one index to be created serially at any given time.

However, to help address these concerns, Oracle has now (from Oracle Database 21c) introduced a new configuration option within the DBMS_AUTO_INDEX.CONFIGURE procedure, AUTO_INDEX_TABLE. This now allows us to explicitly state which tables we may wish to either include or exclude from the AI process. Previously, we only had the ability to state which schemas we wanted to in/exclude from the AI process.

To add the BOWIE.SALES table to an exclusion list:

SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_TABLE', ‘BOWIE.SALES’, FALSE);

PL/SQL procedure successfully completed.

To add the BOWIE.PRODUCTS table to an inclusion list:

SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_TABLE', ‘BOWIE.PRODUCTS', TRUE);

PL/SQL procedure successfully completed.

 

To view current AI settings:

SQL> select parameter_name, parameter_value from dba_auto_index_config;

PARAMETER_NAME                      PARAMETER_VALUE
----------------------------------- -----------------------------------------------------------------
AUTO_INDEX_COMPRESSION              ON
AUTO_INDEX_DEFAULT_TABLESPACE       USERDATA2
AUTO_INDEX_MODE                     IMPLEMENT
AUTO_INDEX_REPORT_RETENTION         100
AUTO_INDEX_RETENTION_FOR_AUTO       373
AUTO_INDEX_RETENTION_FOR_MANUAL
AUTO_INDEX_SCHEMA                   schema IN (BOWIE)
AUTO_INDEX_SPACE_BUDGET             100
AUTO_INDEX_TABLE                    table IN ("BOWIE"."PRODUCTS") AND table NOT IN ("BOWIE"."SALES")

To remove all tables from both inclusion/exclusion table lists:

SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_TABLE', NULL);

PL/SQL procedure successfully completed.

 

This means you can now more safely deploy AI, by determining explicitly which tables you wish to in/exclude.

Note if you wish to include large tables that can potentially take longer to build than the default 3600 seconds allowed for the AI task to complete, you can change the MAX_RUN_TIME of the AI task as follows (e.g. increase the max run time to 18000 seconds, 5 hours):

SQL> select task_id, task_name, enabled, interval, max_run_time, enabled from dba_autotask_settings
where task_name = 'Auto Index Task';

   TASK_ID TASK_NAME            ENABL   INTERVAL MAX_RUN_TIME ENABL
---------- -------------------- ----- ---------- ------------ -----
         3 Auto Index Task      TRUE         900         3600 TRUE

SQL> exec dbms_auto_task_admin.modify_autotask_setting('Auto Index Task', 'MAX RUN TIME', 18000);

PL/SQL procedure successfully completed.

SQL> select task_id, task_name, enabled, interval, max_run_time, enabled from dba_autotask_settings
     where task_name = 'Auto Index Task';

   TASK_ID TASK_NAME            ENABL   INTERVAL MAX_RUN_TIME ENABL
---------- -------------------- ----- ---------- ------------ -----
         3 Auto Index Task      TRUE         900        18000 TRUE

Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part III (“Loaded”) April 28, 2022

Posted by Richard Foote in 19c, Advanced Index Compression, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Column Order, Index Compression, Oracle, Oracle 21c, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Overloading.
add a comment

In my previous two posts, I’ve discussed scenarios where Automatic Indexing (AI) does not currently created automatic indexes and you may need to manually create the necessary indexes.

In this post, I’ll discuss a third scenario where AI will create an index, but you may want to manually create an even better one…

I’ll start by creating a relatively “large” table, with 20+ columns:

SQL> create table bowie_overload (id number, code1 number, code2 number, stuff1 varchar2(42), stuff2 varchar2(42), stuff3 varchar2(42), stuff4 varchar2(42), stuff5 varchar2(42), stuff6 varchar2(42), stuff7 varchar2(42), stuff8 varchar2(42), stuff9 varchar2(42), stuff10 varchar2(42), stuff11 varchar2(42), stuff12 varchar2(42), stuff13 varchar2(42), stuff14 varchar2(42), stuff15 varchar2(42), stuff16 varchar2(42), stuff17 varchar2(42), stuff18 varchar2(42), stuff19 varchar2(42), stuff20 varchar2(42), name varchar2(42));

Table created.

SQL> insert into bowie_overload select rownum, mod(rownum, 1000)+1, '42', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'The Spiders From Mars' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_OVERLOAD');

PL/SQL procedure successfully completed.

 

The main columns to note here are CODE1 which contains 1000 distinct values (and so is kinda selective on a 10M row table, but not spectacularly so, especially with a poor clustering factor) and CODE2 which always contains the same value of “42” (and so will compress wonderfully for maximum effect).

I’ll next run the following query a number of times:

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1883860831

--------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                | 10000 | 70000 |  74817 (1) | 00:00:03 |
| * 1 | TABLE ACCESS STORAGE FULL | BOWIE_OVERLOAD | 10000 | 70000 |  74817 (1) | 00:00:03 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=24)
    filter("CODE1"=24)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     869893 consistent gets
     434670 physical reads
          0 redo size
     183890 bytes sent via SQL*Net to client
       7378 bytes received via SQL*Net from client
        668 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

 

Without an index, the CBO currently has no choice but to perform a FTS. An index on the CODE1 column would provide the necessary filtering to fetch and return the required rows.

BUT, if this query was important enough, we could improve things further by “Overloading” this index with the CODE2 column, so we could use the index exclusively to get all the necessary data, without having to access the table at all. Considering an index on just the CODE1 column would need to fetch a reasonable number of rows (10000) and would need to visit a substantial number of different table blocks due to its poor clustering, overloading the index in this scenario would substantially reduce the necessary workloads of this query.

So what does AI do in this scenario, is overloading an index considered?

If we look at the AI report:

GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 28-APR-2022 12:15:45
Activity end                : 28-APR-2022 12:16:33
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 134.22 MB (134.22 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 2
SQL statements improved (improvement factor) : 2 (47.1x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 47.1x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner | Table          | Index                | Key   | Type   | Properties |
-------------------------------------------------------------------------------
| BOWIE | BOWIE_OVERLOAD | SYS_AI_aat8t6ad0ux0h | CODE1 | B-TREE | NONE       |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name : BOWIE
SQL ID              : bh5cuyv8ga0bt
SQL Text            : select code1, code2 from bowie_overload where code1=42
Improvement Factor  : 46.9x

Execution Statistics:
-----------------------------
                    Original Plan                Auto Index Plan
                    ---------------------------- ----------------------------
Elapsed Time (s):   42619069                     241844
CPU Time (s):       25387841                     217676
Buffer Gets:        12148771                     18499
Optimizer Cost:     74817                        10021
Disk Reads:         6085380                      9957
Direct Writes:      0                            0
Rows Processed:     140000                       10000
Executions:         14                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 1883860831

--------------------------------------------------------------------------------
| Id | Operation         | Name           | Rows  | Bytes | Cost  | Time       |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |                |       |       | 74817 |            |
|  1 | TABLE ACCESS FULL | BOWIE_OVERLOAD | 10000 | 70000 | 74817 | 00:00:03   |
--------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value : 2541132923

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost  | Time       |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |  9281 | 64967 | 10021 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD       |  9281 | 64967 | 10021 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_aat8t6ad0ux0h | 10000 |       |    18 | 00:00:01   |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE1"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We see that an automatic index on just the CODE1 column was created.

 

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_OVERLOAD';

INDEX_NAME                AUT VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_aat8t6ad0ux0h      YES VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position;

INDEX_NAME                COLUMN_NAME     COLUMN_POSITION
------------------------- --------------- ---------------
SYS_AI_aat8t6ad0ux0h      CODE1                         1

 

If we now re-run the query (noting in Oracle21c after you invalidate the current cursor):

 

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2541132923

------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 |  Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      | 10000 | 70000 |   10021 (1)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD       | 10000 | 70000 |   10021 (1)| 00:00:01 |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_aat8t6ad0ux0h | 10000 |       |      18 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      10021 consistent gets
          0 physical reads
          0 redo size
      50890 bytes sent via SQL*Net to client
         63 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

The query now uses the newly created automatic index.

BUT, at 10021 consistent gets, it’s still doing a substantial amount to work here.

If we manually create another index that overloads the only other column (CODE2) required in this query:

SQL> create index bowie_overload_code1_code2_i on bowie_overload(code1,code2) compress advanced low;

Index created.

I’m using COMPRESS ADVANCED LOW as used by the automatic index, noting that CODE2 only contains the value “42” for all rows, making it particularly perfect for compression and a “best case” scenario when it comes to the minimal overheads potentially associated with overloading this index (I’m trying yo give AI every chance here):

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_OVERLOAD';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_aat8t6ad0ux0h           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000
BOWIE_OVERLOAD_CODE1_CODE2_I   NO  NO  VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
BOWIE_OVERLOAD_CODE1_CODE2_I   CODE1                         1
BOWIE_OVERLOAD_CODE1_CODE2_I   CODE2                         2
SYS_AI_aat8t6ad0ux0h           CODE1                         1

In fact, my manually created index is effectively the same size as the automatic index, with the same number (15363) of leaf blocks.

So I’m giving AI the best possible scenario in which it could potentially create an overloaded index.

But I’ve never been able to get AI to create overloaded indexes. Only columns in filtering predicates are considered for inclusion in automatic indexes.

If I now re-run my query again:

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1161047960

-------------------------------------------------------------------------------------------------
| Id  | Operation        | Name                         |  Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                              | 10000 | 70000 |      18 (0)| 00:00:01 |
| * 1 | INDEX RANGE SCAN | BOWIE_OVERLOAD_CODE1_CODE2_I | 10000 | 70000 |      18 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("CODE1"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         21 consistent gets
          0 physical reads
          0 redo size
      50890 bytes sent via SQL*Net to client
         63 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

We notice the CBO now uses the manually created index without any table access path, as it can just use the index to access the necessary data.

The number of consistent gets as a result has reduced significantly, down to just 21, a fraction of the previous 10021 when the automatic index was used.

So the scenario an of overloaded index that could significantly reduce database resources, which is currently not supported by AI, is another example of where may want to manually create a necessary index.

As always, this may change in the future releases…

Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part II (“Ragazzo Solo, Ragazza Sola” April 27, 2022

Posted by Richard Foote in 19c, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Constraints, Exadata, Foreign Keys, Full Table Scans, Index Internals, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.
1 comment so far

In my last post, I discussed how Automatic Indexing doesn’t create an automatic index in the scenario where the minimum or maximum of a column is required.

Another scenario when an automatic index is not created is when we hit issues associated with a missing index on a Foreign Key (FK) constraint.

As I’ve discussed many times previously, if you delete a parent record without an index on the dependant FK constraints, you hit a number of issues including having to perform a (potentially expensive and problematic) Full Table Scan (FTS) on the child tables and the associated locking problems.

To illustrate, I’ll first create a small parent table:

SQL> create table daddy (id number constraint daddy_pk primary key , name varchar2(42));

Table created.

SQL> insert into daddy select rownum, 'David Bowie '|| rownum from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DADDY');

PL/SQL procedure successfully completed.

And then a somewhat larger child table, with no index on the associated foreign key constraint:

SQL> create table kiddy (id number constraint kiddy_pk primary key , code1 number constraint daddy_fk references daddy(id), code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into kiddy select rownum, mod(rownum,1000)+1000 , mod(rownum, 10000)+1, mod(rownum, 100000)+1, 'Ziggy Stardust '|| rownum from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'KIDDY');

PL/SQL procedure successfully completed.

 

If we delete a number of parent rows, for example:

SQL> delete from daddy where id = 101;

1 row deleted.

Execution Plan
----------------------------------------------------------
Plan hash value: 1477800718

-------------------------------------------------------------------------------
| Id | Operation         | Name     | Rows | Bytes | Cost (%CPU) |   Time     |
-------------------------------------------------------------------------------
|  0 | DELETE STATEMENT  |          |    1 |     4 |       1 (0) |   00:00:01 |
|  1 | DELETE            | DADDY    |      |       |             |            |
|* 2 | INDEX UNIQUE SCAN | DADDY_PK |    1 |     4 |       1 (0) |   00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=101)

Statistics
----------------------------------------------------------
         18 recursive calls
         13 db block gets
     117462 consistent gets
      22292 physical reads
    4645500 redo size
        204 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

We notice that even though we only delete one row from a relatively small table, we perform a large number of consistent gets (117462) due to the necessary FTS on the child table, as Oracle is forced to check the table for any possible FK violations. Without an index on the child CODE1 column, Oracle has no choice but to perform the relatively expensive FTS.

Additionally, if we have an existing transaction of a child table (in Session 1):

SQL> insert into kiddy values (10000001,1042,1042,1042,'Iggy Pop');

1 row created.

And then in another session attempt to delete a parent row (in Session 2):

SQL> delete from daddy where id = 112;

The delete hangs in a locked state due to the child transaction in Session 1. This can lead to further locking issues in other sessions (Session 3):

insert into kiddy values (10000002,1042,1042,1042,'Iggy Pop');

 

The FTS on the child table and these associated locks can all be avoided by having an index on the FK constraint, as the index can then be used to effectively police the constraint during such delete operations.

What does AI do in this scenario?

Currently, nothing.

I’ve been unable to ever get AI to create a usable automatic index in this scenario. In Oracle Database 19c, I’ve not been able to get an AI created at all. In Oracle Database 21c, the best I’ve seen has been a Unusable/Invisible AI:

SQL> select index_name, index_type, auto, constraint_index, visibility, status, num_rows from user_indexes where table_n
ame='KIDDY';

INDEX_NAME                     INDEX_TYPE                  AUT CON VISIBILIT STATUS     NUM_ROWS
------------------------------ --------------------------- --- --- --------- -------- ----------
KIDDY_PK                       NORMAL                      NO  YES VISIBLE   VALID      10000004
SYS_AI_31thttf8v6r35           NORMAL                      YES NO  INVISIBLE UNUSABLE   10000004

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='KIDDY';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
KIDDY_PK                       ID                            1
SYS_AI_31thttf8v6r35           CODE1                         1

So you may need to manually create such an index on the FK constraint to improve performance and eliminate these locking issues:

SQL> create index kiddy_code1_i on kiddy(code1);

Index created.

SQL> delete from daddy where id = 142;

1 row deleted.

Execution Plan
----------------------------------------------------------
Plan hash value: 1477800718

-------------------------------------------------------------------------------
| Id | Operation         | Name     | Rows | Bytes | Cost (%CPU) |   Time     |
-------------------------------------------------------------------------------
|  0 | DELETE STATEMENT  |          |    1 |     4 |       1 (0) |   00:00:01 |
|  1 | DELETE            | DADDY    |      |       |             |            |
|* 2 | INDEX UNIQUE SCAN | DADDY_PK |    1 |     4 |       1 (0) |   00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=142)

Statistics
----------------------------------------------------------
          1 recursive calls
          8 db block gets
          2 consistent gets
          2 physical reads
        132 redo size
        204 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

Consistent gets have dropped off massively (down to just 8) as Oracle can now use the index to avoid the FTS search on the child table. The associated locking issues are eliminated as well.

Note: As always, this AI behaviour can always change in the future…

Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part I (“Always Crashing In The Same Car”) April 26, 2022

Posted by Richard Foote in 19c, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, MAX, MIN, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
1 comment so far

As I’ve discussed previously, Oracle has increased the number of scenarios in which it will now create automatic indexes, such as with non-equality predicates and JSON expressions.

However, as of Oracle Database 21c, there are still a number of scenarios where an automatic index will NOT be created, even though an index might prove beneficial.

One such scenario is when the Min/Max of a column is required.

As I’ve discussed a number of times previously, Oracle can very efficiently use an index to determine either the Min or Max value of a column, by (hopefully) just visiting the first or last leaf block in an index. The INDEX FULL SCAN (MIN/MAX) execution plan path can be used explicitly for this purpose.

If I create a simple table as follows:

SQL> create table bowie_min (id number constraint bowie_min_pk primary key, code number, name varchar2(42));

Table created.

SQL> insert into bowie_min select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_MIN');

PL/SQL procedure successfully completed.

 

And then run the following queries a number of times that return the Min and Max of the CODE column:

SQL> select min(code) from bowie_min;

Execution Plan
----------------------------------------------------------
Plan hash value: 1068446691

----------------------------------------------------------------------------------------
| Id | Operation                 | Name      | Rows | Bytes | Cost (%CPU) | Time       |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |           |    1 |     5 |    6706 (2) | 00:00:01   |
|  1 | SORT AGGREGATE            |           |    1 |     5 |             |            |
|  2 | TABLE ACCESS STORAGE FULL | BOWIE_MIN |  10M |   47M |    6706 (2) | 00:00:01   |
----------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        569 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

SQL> select max(code) from bowie_min;

Execution Plan
----------------------------------------------------------
Plan hash value: 1068446691

----------------------------------------------------------------------------------------
| Id | Operation                 | Name      | Rows | Bytes | Cost (%CPU) | Time       |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |           |    1 |     5 |    6706 (2) | 00:00:01   |
|  1 | SORT AGGREGATE            |           |    1 |     5 |             |            |
|  2 | TABLE ACCESS STORAGE FULL | BOWIE_MIN |  10M |   47M |    6706 (2) | 00:00:01   |
----------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        569 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

 

Currently, the CBO has no choice but to use a Full Table Scan (FTS) as there is currently no index on the CODE column.

So what does Automatic Indexing (AI) make of things?

Nothing.

Currently, AI will not create an index in this scenario, no matter how many times I execute these queries.

If we look at the indexes on the table after a significant period of time after running these queries:

SQL> select index_name, auto from user_indexes where table_name='BOWIE_MIN';

INDEX_NAME   AUT
------------ ---
BOWIE_MIN_PK NO

No Automatic Indexes. To improve the performance of these queries, we currently have to manually create the associated index:

SQL> create index bowie_min_code_i on bowie_min(code);

Index created.

If we now re-run these queries and look at the execution plan:

SQL> select min(code) from bowie_min;

Execution Plan
----------------------------------------------------------
Plan hash value: 252811132

-----------------------------------------------------------------------------------------------
| Id | Operation                 | Name             | Rows | Bytes | Cost (%CPU) | Time       |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |                  |    1 |     5 |       3 (0) | 00:00:01   |
|  1 | SORT AGGREGATE            |                  |    1 |     5 |             |            |
|  2 | INDEX FULL SCAN (MIN/MAX) | BOWIE_MIN_CODE_I |    1 |     5 |       3 (0) | 00:00:01   |
-----------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        569 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

We can see that the CBO is now indeed using the index to return the Min/Max values with a vastly reduced number of consistent gets (down to just 3 from the previous 38538).

However, a key point here is that Automatic Indexes only works on an Exadata platform and Exadata has various smarts that potentially makes accessing data via a “FTS” in this manner much more efficient than in non-Exadata environments.

Oracle may well take the position that getting Min/Max data on a Exadata is potentially efficient enough and doesn’t on its own warrant the creation of an index.

More on this in future posts…

Automatic Indexing: Deferred Invalidations (“The Post War Dream”) April 19, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Deferred Invalidation, Exadata, Function Based Indexes, Index Access Path, Index Internals, JSON, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Richard's Blog.
1 comment so far

In my previous post on how JSON expressions can now be automatically indexed, I mentioned there was an outstanding issue with the associated CBO execution plan, immediately post the creation of the automatic index:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      | 20000 |   12M |    1524 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           | 20000 |   12M |    1524 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv | 8000  |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     234168 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

If we look at the number of recursive calls, we notice that it remains at 0. If we look at both the number of consistent gets (234168) and physical reads (200279), they both remain very high and identical to that of the previous Full Table Scan plan.

Basically, although autotrace suggests the newly created automatic index is being used, in fact the previous Full Table Scan plan is still being invoked.  (Note: this of course is one of the dangers of the autotrace plan, in that it might not display the actual plan being invoked).

So what’s going on here?

The Oracle Database 21c New Features Guide makes the following point: “an enhancement has been introduced to reduce the overhead of cursor invalidations when a new automatic index is created”.

Oracle 12.2 introduced a new feature in which one can now defer the invalidation of dependent SQL cursors when an index is created or modified. I’ve of course discussed this previously in this 12.2 Index Deferred Invalidation post.

When an automatic index is created in 21c, the current SQL cursors are NOT invalidated (to reduce the overhead of having to potentially reparse of large number of current SQL cursors). However, this means that currently inefficient SQL statements will keep their existing sub-optimal execution plans post the creation of newly created automatic indexes, until the existing SQL cursors aged out.

At which point, the new CBO plan using the automatic index will actually be invoked:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) |  Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |   671 |       4 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           |    1 |   671 |       4 (0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv |    1 |       |       3 (0) |  00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          30 recursive calls
           0 db block gets
          46 consistent gets
          11 physical reads
           0 redo size
        1595 bytes sent via SQL*Net to client
         526 bytes received via SQL*Net from client
           3 SQL*Net roundtrips to/from client
           0 sorts (memory)
           0 sorts (disk)
           1 rows processed

So just be aware in Oracle Database 21c that your beautifully created automatic indexes may not actually get used as desired for a period of time…

Automatic Indexing: JSON Expressions Part II (“Without You I’m Nothing)” April 14, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Column Statistics, High Frequency Statistics Collection.
2 comments

In my previous post on Automatic Indexing (AI) and JSON expressions, I mentioned two additional issues with the execution plan generated after the automatic indexes were created:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      | 20000 |  12M |    1524 (1) | 00:00:01  |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           | 20000 |  12M |    1524 (1) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv |  8000 |      |       3 (0) | 00:00:01  |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     234168 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The first issue is in relation to the Cost and the estimated cardinality (Rows) of the generated plan.

The execution plan has a Cost of 1524 but most importantly, an estimate cardinality of 20,000 rows. As only 1 row is actually returned, this row estimate is way way off and so therefore are the associated costs. This could potentially result in an inefficient plan and with the index not being used by the CBO.

The 20,000 row estimate comes from it being 1% of the number of rows (2 million) in the table. As I’ve discussed previously (as in this rather humourous post), the issue here is that the CBO has no idea what the expected cardinality might be, as the output from the JSON expression is effectively a black box.

Oracle generates virtual columns for this purpose, to capture column statistics that gives the CBO an accurate idea on the selectivity of expression based predicates.

But, if we look at the column statistics after the generation of the automatic indexes:

SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name='BOWIE_JSON';

TABLE_NAME COLUMN_NAME                                        NUM_DISTINCT VIR
---------- -------------------------------------------------- ------------ ---
BOWIE_JSON ID                                                      1996800 NO
BOWIE_JSON BOWIE_DATE                                                    1 NO
BOWIE_JSON BOWIE_ORDER                                                   0 NO
BOWIE_JSON SYS_IME_OSON_B33B6E113C404F2EBF6648C01F605C8D                 1 YES
BOWIE_JSON SYS_NC00005$                                                    YES
BOWIE_JSON SYS_NC00006$                                                    YES

We notice that the two virtual columns generated for the JSON based expressions have no statistics. This is because we have yet to collect new statistics (or statistics specifically on hidden columns) since the creation of the automatic indexes. As a result, the CBO still has no idea on how many rows are estimated to come from the JSON based expressions and so has to resort to the (entirely wrong) 1% estimate.

The good news with Exadata environments (which of course includes the Autonomous Database environments), is that Oracle has the High Frequency Statistics Collection capability, which will automatically collect these missing statistics after a small (configurable) period of time. I’ve previously discussed High Frequency Statistics Collection here.

So if I just wait approximately 15 minutes in my “Exadata” environment and check out the columns statistics again:

SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name='BOWIE_JSON';

TABLE_NAME COLUMN_NAME                                        NUM_DISTINCT VIR
---------- -------------------------------------------------- ------------ ---
BOWIE_JSON ID                                                      1996800 NO
BOWIE_JSON BOWIE_DATE                                                    1 NO
BOWIE_JSON BOWIE_ORDER                                                   0 NO
BOWIE_JSON SYS_IME_OSON_B33B6E113C404F2EBF6648C01F605C8D                 1 YES
BOWIE_JSON SYS_NC00005$                                            2000000 YES
BOWIE_JSON SYS_NC00006$                                            1996800 YES

We can see that the missing statistics have now been populated and the CBO can now accurately determine that these virtual columns are effectively unique.

If we now re-run the queries again, e.g.:

SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242;

Execution Plan
----------------------------------------------------------
Plan hash value: 1921179906

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) |  Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |   669 |       4 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           |    1 |   669 |       4 (0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_gpdkwzugdn055 |    1 |       |       3 (0) |  00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           NUMBER ERROR ON ERROR NULL ON EMPTY)=4242)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     233150 consistent gets
     200279 physical reads
          0 redo size
       1599 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

We can see that the CBO has now correctly estimated that just 1 row is to be returned and the associated CBO cost has reduced to just 4 (down from 1524) as a result.

So if you create a function-based index, make sure the generated virtual column (whether created automatically or if manually generated before the associated index) has the necessary statistics.

In the upcoming days, I’ll discuss the remaining issue associated with this plan…

Automatic Indexing: JSON Expressions Part I (Making Plans For Nigel) April 13, 2022

Posted by Richard Foote in Automatic Indexing, Autonomous Database, CBO, Exadata, Function Based Indexes, Index statistics, JSON, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Virtual Columns.
1 comment so far

When Automatic Indexing was first released, one of the restrictions was that automatic indexes on JSON expressions were NOT supported.

However, the Oracle Database 21c doco mentions:

Automatic indexes can be single or multi-column. They are considered for the following: Selected expressions (for example, JSON expressions)“.

So on my (admittedly dodgy) “Exadata” VM, I thought I’ll check out how AI now indeed deals with JSON expressions.

I start by creating a simple little table that uses the new 21c JSON datatype and populate it with some JSON documents (note the PONumber key has effectively unique numeric values assigned):

SQL> CREATE TABLE bowie_json
       (id number,
        bowie_date date,
        bowie_order JSON);

SQL> insert into bowie_json
     select
     rownum,
     sysdate,
     '{"PONumber" : ' || rownum || ',
       "Reference" : "2022' || rownum || 'DBOWIE",
       "Requestor" : "David Bowie",
       "User" : "DBOWIE",
       "CostCenter" : "A42",
       "ShippingInstructions" : {"name" : "David Bowie",
                                 "Address": {"street" : "42 Ziggy Street",
                                             "city" : "Canberra",
                                              "state" : "ACT",
                                              "zipCode" : 2601,
                                              "country" : "Australia"},
                                 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
                                            {"type" : "Mobile", "number" : "417-555-1234"}]},
       "Special Instructions" : null,
       "AllowPartialShipment" : true,
       "LineItems" : [{"ItemNumber" : 1,
                       "Part" : {"Description" : "Hunky Dory",
                                 "UnitPrice" : 10.95},
                                  "Quantity" : 5.0},
                      {"ItemNumber" : 2,
                       "Part" : {"Description" : "Pin-Ups",
                                 "UnitPrice" : 10.95},
                                 "Quantity" : 3.0}]}'
from dual connect by level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_JSON');

PL/SQL procedure successfully completed.

As always, it’s important to ensure the table has statistics, as AI does not work properly without them.

I then run a number of SQL statements, with different JSON expression based predicates, including:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242;

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber' returning number)=42;

Execution Plan
----------------------------------------------------------
Plan hash value: 1196930810

--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            | 20000 |   12M |  34476 (1) | 00:00:02  |
|* 1 | TABLE ACCESS FULL | BOWIE_JSON | 20000 |   12M |  34476 (1) | 00:00:02  |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON
           , '$.PONumber' RETURNING NUMBER NULL ON ERROR)=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     259127 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

They all return just the one row, but must currently use a Full Table Scan with no indexes present.

So what does AI make of things?

The first thing to note is that running the AI last activity report generates the following error:

SQL> select dbms_auto_index.report_last_activity() report from dual;
ERROR:
ORA-30954: char 0 is invalid in json_value(BOWIE_ORDER, '$.PONumber' returning VA
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 177
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 107
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 9226
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 89
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 167
ORA-06512: at line 1

no rows selected

If we look at the indexes now present with the table:

SQL> select index_name, index_type, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes
where table_name='BOWIE_JSON';

INDEX_NAME                INDEX_TYPE                AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- ------------------------- --- --------- -------- ---------- ----------- -----------------
SYS_IL0000081096C00003$$  LOB                       NO  VISIBLE   VALID
SYS_AI_ayvj257jd93cv      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        5141            380000
SYS_AI_gpdkwzugdn055      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        4596            200000

SQL> select index_name, column_expression from user_ind_expressions where table_name='BOWIE_JSON';

INDEX_NAME                COLUMN_EXPRESSION
------------------------- --------------------------------------------------------------------------------
SYS_AI_ayvj257jd93cv      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERR
OR ON ERROR NULL ON EMPTY)

SYS_AI_gpdkwzugdn055      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ER
ROR NULL ON EMPTY)

We can see that AI has indeed created two new automatic indexes, one on the VARCHAR2 JSON expression and one on the NUMBER JSON expression.

If we re-run the SQLs, we notice 3 very important points. Note the following example was run soon after the automatic indexes were created:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      | 20000 |   12M |    1524 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           | 20000 |   12M |    1524 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv |  8000 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     234168 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The first point to note is that the CBO now chooses to use the newly created automatic index. As only one row is return, this is as one would hope.

But there are two other very important points/issues worth making about the above execution plan and associated costs and statistics. One is associated with new AI behaviour introduced in 21c and the other is associated with an old trap in relation to function-based indexes.

I’ll leave it to the discernible reader to spot these issues, before I cover them in Part II in the coming days…

Automatic Indexing: Non-Equality Predicates Part III (“Who Can I Be Now”) December 21, 2021

Posted by Richard Foote in 21c New Features, Automatic Indexing, Non-Equality Predicates, Oracle Indexes.
2 comments

In previous posts, I discussed how Automatic Indexing supported the creation of indexes based on equality predicates only.

One of the most significant improvements with Automatic Indexing introduced with Oracle Database 21c is that non-equality predicates are now supported.

Previously, if I created the following 10 million row table and ran an SQL query based on a non-equality predicate that only returned 9 rows:

SQL> create table ziggy (id number, code number, name varchar2(42));

Table created.

SQL> insert into ziggy select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> select * from ziggy where id between 42 and 50;

9 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |    10 |   230 |  6173   (6)| 00:00:01 |
|   1 |  PX COORDINATOR              |          |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000 |    10 |   230 |  6173   (6)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |          |    10 |   230 |  6173   (6)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| ZIGGY    |    10 |   230 |  6173   (6)| 00:00:01 |
-----------------------------------------------------------------------------------------

It didn’t matter how often I ran the query, Automatic Indexing would never created the necessary Automatic Index on the ID column.

 

However, run the same query now on a 21c database and Automatic Indexing will generate the following index:

INDEX DETAILS

-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
--------------------------------------------------------------------
| Owner | Table | Index                | Key | Type   | Properties |
--------------------------------------------------------------------
| BOWIE | ZIGGY | SYS_AI_8102kh14m1mf8 | ID  | B-TREE | NONE       |
--------------------------------------------------------------------
-------------------------------------------------------------------------------

SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='ZIGGY';

INDEX_NAME                AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --------- -------- ---------- ----------- -----------------
SYS_AI_8102kh14m1mf8      YES VISIBLE   VALID      10000000       23780             38451

 

If I re-run the equivalent query:

SQL> select * from ziggy where id between 42 and 50;

Execution Plan
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |    10 |   230 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY                |    10 |   230 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_8102kh14m1mf8 |    10 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
       2 - access("ID">=42 AND "ID"<=50)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        963  bytes sent via SQL*Net to client
         52  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          9  rows processed

 

Automatic Indexing has kicked in and significantly improved the performance of this query for me.

Automatic Indexing may still have a number of restrictions, but by supporting non-equality predicates, it has now removed its most significant restriction…

Automatic Indexing: 3 Possible States Of Newly Created Automatic Indexes Part II (“Because You’re Young”) December 20, 2021

Posted by Richard Foote in 21c New Features, Automatic Indexing.
add a comment

In a previous post, I discussed how there can be three different states of newly created Automatic Indexes, based on the predicate selectivities of the SQLs that generated the indexes.

In this post, I’m going to highlight new behaviour I’ve noticed with Oracle 21c that results in the three different states of newly created Automatic Indexes based on the number of times of new SQL statements are executed.

To demonstrate, I’m going to create 3 identical tables and populate them with identical data:

SQL> create table bowie1 (id number, code number, name varchar2(42));
SQL> create table bowie2 (id number, code number, name varchar2(42));
SQL> create table bowie3 (id number, code number, name varchar2(42));

SQL> insert into bowie1 select rownum, mod(rownum, 1000000)+1, 'David Bowie’ from dual connect by level <= 10000000;
SQL> insert into bowie2 select rownum, mod(rownum, 1000000)+1, 'David Bowie’ from dual connect by level <= 10000000;
SQL> insert into bowie3 select rownum, mod(rownum, 1000000)+1, 'David Bowie’ from dual connect by level <= 10000000;

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE1');
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2');
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE3');

I then run the following queries in the following manner.

I run this query based on a CODE equality predicate on the BOWIE1 table just one time only:

select * from bowie1 where code=42;

I run the same query on the BOWIE2 table a few times (definitely more than once, say 5 times):

select * from bowie2 where code=42;

I run the following queries on the BOWIE3 table multiple times (say 10 or more times):

select * from bowie3 where code=42;
select * from bowie3 where code=24;
select * from bowie3 where code=4242;
select * from bowie3 where code=4444;

 

I then await the Automatic Indexing process and examine the resultant generated Automatic Indexes:

SQL> select table_name, index_name, auto, visibility, status, num_rows, leaf_blocks
from user_indexes where table_name in ('BOWIE1','BOWIE2', 'BOWIE3');

TABLE_NAME INDEX_NAME           AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS
---------- -------------------- --- --------- -------- ---------- -----------
BOWIE1     SYS_AI_1c6k1qpybh694 YES INVISIBLE UNUSABLE   10000000       23058
BOWIE2     SYS_AI_8v1sh0j56g4hs YES INVISIBLE VALID      10000000       16891
BOWIE3     SYS_AI_g6tta1zrmak4d YES VISIBLE   VALID      10000000       16891

 

We notice we have each of our possible 3 Automatic Index states.

The Automatic Index on the BOWIE1 table with a query that was “only executed once” has an INVISIBLE/UNUSABLE index. Where previously Automatic Indexing would have simply generated a VISIBLE/VALID index even on a query that was only executed once, in more current releases, the logic seems to be if a query is very very infrequently executed, it will not be enough on it’s own to generate a usable Automatic Index.

The Automatic Index on the BOWIE2 table with a single query that was only executed a “very few number of times” (5 in this example) has an INVISIBLE/VALID index. Where previously again Automatic Indexing would have simply generated a VISIBLE/VALID index, in more current releases, the logic seems to be if a query is executed only a few times, it’s close to being useful, but not quite to generate a Visible index. Having it as a Visible index is perhaps something to consider in the near future, if the useful load increases sufficiently.

The Automatic Index on the BOWIE3 table with a query that runs “many times” and/or with multiple queries on the column has a VISIBLE/VALID index. The logic seems to be if the index is used sufficiently to justify its creation, the index is actually created in a state in which it can be generally subsequently used.

So there are a number of possible scenarios now in which an Automatic Index is created in this “in-between” INVISIBLE/VALID state.

It would perhaps be a nice addition if we had the ability to alter the Visibility state of an Automatic Index. But I guess the whole idea behind Automatic Indexing is that we shouldn’t need to worry about such things…