jump to navigation

METHOD_OPT Default In Oracle Autonomous Databases (She’ll Drive The Big Car) March 2, 2021

Posted by Richard Foote in 19c, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Histograms, METHOD_OPT, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Statistics.
add a comment

 

In a recent post on Invisible Automatic Indexes, I was puzzled by a couple of “oddities” in relation to some behaviour in the Oracle Autonomous Database Cloud environments.

The first one was how Oracle appeared to be creating Histograms on a much more regular basis than it had previously.

As one can see in the demo below, if I create and populate a table:

SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42));

Table created.

SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

And then collect statistics using the “default” options:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD');

PL/SQL procedure successfully completed.

All the columns in the table now have histograms, regardless of whether they’ve been used in SQL predicates or if they have data skew:

SQL> select column_name, num_distinct, density, histogram from dba_tab_columns where table_name='PINK_FLOYD';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM
-------------------- ------------ ---------- ---------------
ID                        9705425          0 HYBRID
CODE                         4835     .00005 HYBRID
CREATE_DATE                 50357     .00002 HYBRID
NAME                            1 4.9639E-08 FREQUENCY

 

The explanation for this is embarrassingly simple. A quick check on the default settings for METHOD_OPT shows the following:

SQL> select dbms_stats.get_prefs('METHOD_OPT') from dual;

DBMS_STATS.GET_PREFS('METHOD_OPT')
--------------------------------------------------------------------------------
FOR ALL COLUMNS SIZE 254

 

The default is FOR ALL COLUMNS 254, meaning that we will now indeed have histograms collected on all columns. With new capabilities such as High Frequency Statistics Collection, it’s interesting that Oracle has taken this approach but Oracle has obviously taken the attitude that with Exadata as the hosted infrastructure, it can afford to simply collect histograms globally on all columns in the Autonomous Database environments.

If you wanted to change this, you can do so by for example:

SQL> exec DBMS_STATS.SET_GLOBAL_PREFS ('METHOD_OPT', 'FOR ALL COLUMNS SIZE AUTO');

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_prefs('METHOD_OPT') from dual;

DBMS_STATS.GET_PREFS('METHOD_OPT')
--------------------------------------------------------------------------------
FOR ALL COLUMNS SIZE AUTO

 

So not an “oddity”, but expected behaviour now on Oracle Autonomous Databases.

The other “oddity” I noticed were Invisible Valid Automatic indexes at times being created. The explanation for this will be the topic of my next blog post…

Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part I (Conversation Piece) October 14, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Index Access Path, Local Indexes, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Partitioned Indexes, Partitioning, Performance Tuning.
1 comment so far

In this little series, I’m going to discuss how Automatic Indexing works in relation to Partitioning.

I’ve discussed Indexing and Partitioning many times previously and how Oracle has various options when indexing a partitioned table:

  • Non-Partitioned Index
  • Globally Partitioned Index
  • Locally Partitioned Index

So the question(s) are how does Automatic Indexing handle scenarios with partitioned objects?

A very important point to make at the start is that based on my research, the answer has already changed significantly since Automatic Indexing was first released. So it’s important to understand that Automatic Indexing is an ever evolving capability, that will advance and improve as time goes on.

I’ll focus on how the feature currently works (as of Oracle Database 19.5), but will mention previously identified behaviour as a reference on how things can easily change.

In my first simple little example, I’m just going to create a range-partitioned table, partitioned based on RELEASE_DATE, with a partition for each year’s worth of data:

SQL> CREATE TABLE big_bowie1(id number, album_id number, country_id number, release_date date,
total_sales number) PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2013 VALUES LESS THAN (TO_DATE('01-JAN-2014', 'DD-MON-YYYY')),
PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')),
PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')),
PARTITION ALBUMS_2020 VALUES LESS THAN (MAXVALUE));

Table created.

 

I’ll now add about 8 years worth of data:

SQL> INSERT INTO big_bowie1 SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> COMMIT;

Commit complete.

 

As discussed previously, I’ll importantly collect statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE1');

PL/SQL procedure successfully completed.

 

I’ll now run the following very selective query based the TOTAL_SALES column that is NOT part of the partitioning key:

 

SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42;

19 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2468051548

---------------------------------------------------------------------------------------------------------
| Id | Operation                | Name       | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop     |
---------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |            |   20 |   520 |    643 (15)| 00:00:01 |       |           |
|  1 | PARTITION RANGE ALL      |            |   20 |   520 |    643 (15)| 00:00:01 |     1 |         8 |
|* 2 | TABLE ACCESS STORAGE FULL| BIG_BOWIE1 |   20 |   520 |    643 (15)| 00:00:01 |     1 |         8 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("TOTAL_SALES"=42)
    filter("TOTAL_SALES"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      44014 consistent gets
       9516 physical reads
          0 redo size
       1107 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         19 rows processed

 

Without an index in place, the CBO has no choice but to use a FTS. But what will Automatic Indexing make of things?

If we look at the next Automatic Indexing report:

 

SQL> select dbms_auto_index.report_last_activity() from dual;

GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 13-OCT-2020 01:47:48
Activity end                : 13-OCT-2020 02:59:48
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 184.55 MB (184.55 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 2
SQL statements improved (improvement factor) : 1 (44119.6x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 25135.8x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Owner | Table      | Index                | Key         | Type   | Properties |
---------------------------------------------------------------------------------
| BOWIE | BIG_BOWIE1 | SYS_AI_2zt7rg40mxa4n | TOTAL_SALES | B-TREE | NONE       |
---------------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : chwm2gubm8fx9
SQL Text            : SELECT * FROM big_bowie1 WHERE total_sales = 42
Improvement Factor  : 44119.6x

Execution Statistics:
-----------------------------
                     Original Plan                Auto Index Plan
                     ---------------------------- ----------------------------
Elapsed Time (s):    4387193                      1173
CPU Time (s):        2599423                      1037
Buffer Gets:         749507                       22
Optimizer Cost:      643                          22
Disk Reads:          470976                       2
Direct Writes:       0                            0
Rows Processed:      323                          19
Executions:          17                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 2468051548

-----------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost | Time        |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |            |      |       |  643 |             |
|  1 | PARTITION RANGE ALL       |            |   20 |   520 |  643 | 00:00:01    |
|  2 | TABLE ACCESS STORAGE FULL | BIG_BOWIE1 |   20 |   520 |  643 | 00:00:01    |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 937174207

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                 | Rows | Bytes | Cost | Time       |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                      |   19 |   494 |   22 | 00:00:01   |
|   1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE1           |   19 |   494 |   22 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                           | SYS_AI_2zt7rg40mxa4n |   19 |       |    3 | 00:00:01   |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("TOTAL_SALES"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We notice a couple of interesting points.

Firstly, yes Automatic Indexing has created an index based on the TOTAL_SALES column (SYS_AI_2zt7rg40mxa4n) as it improves performance by a reported 44119.6x.

Note also that the Automatic Index is a Non-Partitioned (Global) Index. From a performance perspective, this is the most efficient index to create to improve the performance of this query as the CBO only has the one index structure to navigate (vs. a LOCAL index that would require having to navigate down all 8 index structures for each table partition.

If we look at the index details:

SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes
WHERE table_name = 'BIG_BOWIE1';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_2zt7rg40mxa4n           NO  YES VISIBLE   VALID

 

We notice that this is indeed a Non-Partitioned Index, that is both VISIBLE and VALID and so can be potentially used by any database session.

If we now re-run the query:

SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42;

19 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 937174207

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name                 | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop    |
-----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                          |                      |   20 |   520 |      23 (0)| 00:00:01 |       |          |
|  1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| BIG_BOWIE1           |   20 |   520 |      23 (0)| 00:00:01 | ROWID | ROWID    |
|* 2 | INDEX RANGE SCAN                          | SYS_AI_2zt7rg40mxa4n |   20 |       |       3 (0)| 00:00:01 |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TOTAL_SALES"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         23 consistent gets
          0 physical reads
          0 redo size
       1166 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         19 rows processed

 

We can see the query now uses the newly created Automatic Index and is indeed more efficient, performing now just 23 consistent gets (previously 44014 consistent gets).

 

However, this was NOT previous behaviour.

The documentation previously mentioned that only LOCAL indexes are used when indexing partitioned tables.

If we run the same demo on Oracle Database 19.3, we get the following report:

 

GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 14-OCT-2020 13:12:07
Activity end                : 14-OCT-2020 14:24:07
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 192.94 MB (192.94 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (1950.5x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1950.5x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Owner | Table      | Index                | Key         | Type   | Properties |
---------------------------------------------------------------------------------
| BOWIE | BIG_BOWIE1 | SYS_AI_8armv0hqq73fa | TOTAL_SALES | B-TREE | LOCAL      |
---------------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 2pp8ypramw30s
SQL Text            : SELECT * FROM big_bowie1 WHERE total_sales = 42
Improvement Factor  : 1950.5x

Execution Statistics:
-----------------------------
                     Original Plan                Auto Index Plan
                     ---------------------------- ----------------------------
Elapsed Time (s):    6996973                      27327
CPU Time (s):        6704215                      12819
Buffer Gets:         815306                       49
Optimizer Cost:      12793                        28
Disk Reads:          2                            40
Direct Writes:       0                            0
Rows Processed:      475                          25
Executions:          19                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 4294056405

-----------------------------------------------------------------------------
| Id | Operation          | Name       | Rows | Bytes | Cost  | Time        |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT    |            |      |       | 12793 |             |
| 1 | PARTITION RANGE ALL |            |   20 |   520 | 12793 | 00:00:01    |
| 2 | TABLE ACCESS FULL   | BIG_BOWIE1 |   20 |   520 | 12793 | 00:00:01    |
-----------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value : 3781269341

--------------------------------------------------------------------------------------------------------------
|  Id | Operation                                 | Name                 | Rows | Bytes | Cost | Time        |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                      |   25 |   650 |   28 | 00:00:01    |
|   1 | PARTITION RANGE ALL                       |                      |   25 |   650 |   28 | 00:00:01    |
|   2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE1           |   25 |   650 |   28 | 00:00:01    |
| * 3 | INDEX RANGE SCAN                          | SYS_AI_8armv0hqq73fa |   25 |       |   17 | 00:00:01    |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 3 - access("TOTAL_SALES"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

As we can see, in this scenario, the newly created Automatic Index has a “Property” of LOCAL.

If we look at its index details:

 

SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes
WHERE table_name = 'BIG_BOWIE1';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_8armv0hqq73fa           YES YES VISIBLE   N/A

SQL> SELECT index_name, partitioning_type, partition_count, locality FROM user_part_indexes
WHERE table_name = 'BIG_BOWIE1';

INDEX_NAME                     PARTITION PARTITION_COUNT LOCALI
------------------------------ --------- --------------- ------
SYS_AI_8armv0hqq73fa           RANGE                   8 LOCAL

 

We can see how a Local Index was previously created.

As such if we re-run an equivalent query:

SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42;

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3781269341

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                | Name                 | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop     |
-----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                         |                      |   20 |   520 |      26 (0)| 00:00:01 |       |           |
|  1 | PARTITION RANGE ALL                      |                      |   20 |   520 |      26 (0)| 00:00:01 |     1 |         8 |
|  2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| BIG_BOWIE1           |   20 |   520 |      26 (0)| 00:00:01 |     1 |         8 |
|* 3 | INDEX RANGE SCAN                         | SYS_AI_8armv0hqq73fa |   20 |       |      17 (0)| 00:00:01 |     1 |         8 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         50 consistent gets
          0 physical reads
          0 redo size
       1555 bytes sent via SQL*Net to client
        409 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         25 rows processed

 

Although the query is returning 6 more rows (as with the random number generation, has a slightly different data set), it’s more expensive proportionally now having to perform 50 consistent gets as it now has to read 8 index structures rather than just the one.

So (IMHO), Automatic Indexing has improved here, creating a more efficient index structure than previously. So always bear in mind that Automatic Indexing is an evolving beast, improving and adapting as time moves on.

However, note the compromise here is that by having an effectively Global index structure, there may be some additional issues depending on any subsequent structural changes to the table.

More on Automatic Indexing and Partitioning in my next post…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part III (Do Anything You Say) October 8, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Stale Statistics.
2 comments

In Part I of this series, we saw how Automatic Indexing will not create a viable Automatic Index if there are stale or missing statistics on the underlining segments. In Part II we saw how these SQL statements effectively become blacklisted and when segment statistics are subsequently collected, Automatic Indexing will still not create viable Automatic Indexes when the SQL statements are re-run.

So how do we get Automatic Indexing to now kick in and create necessary indexes on these problematic SQLs?

As I’ve discussed previously in relation to blacklisted SQLs, we need to run a NEW SQL statement that hasn’t been blacklist that will result in a necessary index to be created. An easy way to do this is just to include a new comment within the previous SQL to give the SQL a new signature.

If we now run the following “new” SQL statement (identical to the problematic SQL but with a comment embedded):

SQL> select /* new */ * from bowie_stale where code=42;

        ID       CODE NAME
---------- ---------- ------------------------------------------
   1000041         42 David Bowie
   6000041         42 David Bowie
        41         42 David Bowie
   3000041         42 David Bowie
   7000041         42 David Bowie
   8000041         42 David Bowie
   4000041         42 David Bowie
   9000041         42 David Bowie
   5000041         42 David Bowie
   2000041         42 David Bowie

 

If we now wait to see what the next Automatic Indexing task makes of things:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 07-JUL-2020 06:34:49
Activity end                : 07-JUL-2020 06:35:54
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 142.61 MB (142.61 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (19787.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 19787.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Owner | Table       | Index                | Key  | Type   | Properties |
---------------------------------------------------------------------------
| BOWIE | BOWIE_STALE | SYS_AI_300kk2unp8tr0 | CODE | B-TREE | NONE       |
---------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

We see that the index on the CODE column (SYS_AI_300kk2unp8tr0) has now been created.

Further down the report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : du6psd0xmzpg5
SQL Text            : select /* new */ * from bowie_stale where code=42
Improvement Factor  : 19787.7x

Execution Statistics:
-----------------------------
                  Original Plan Auto           Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 137261                       2620
CPU Time (s):     84621                        1769
Buffer Gets:      277028                       13
Optimizer Cost:   544                          13
Disk Reads:       275947                       2
Direct Writes:    0                            0
Rows Processed:   70                           10
Executions:       7                            1

 

A new index was indeed created because of this new SQL statement, with a performance improvement of 19787.7x.

Further down the report to the Plans Section:

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 65903426

-----------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost | Time        |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |             |      |       |  544 |             |
| 1 | TABLE ACCESS STORAGE FULL | BOWIE_STALE |   10 |   230 |  544 | 00:00:01    |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2558864466

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   10 |   230 |   13 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_STALE          |   10 |   230 |   13 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_300kk2unp8tr0 |   10 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We can see that the new plan using the new Automatic Index with a much lower CBO cost.

If we now look at the status of this index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       16891          10000000

 

We see that the index is now both VISIBLE and VALID (previously, it was INVISIBLE and UNUSABLE).

As such, the Automatic Index can now potentially be used by any SQL, including the previous problematic query.

So with a viable index now in place, if we re-run the initial problematic query:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2558864466

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      |   10 |   230 |      14 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_STALE          |   10 |   230 |      14 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_300kk2unp8tr0 |   10 |       |       3 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         14 consistent gets
          0 physical reads
          0 redo size
        738 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

We see that finally, the SQL uses the new Automatic Index and is indeed much more efficient as a result, with just 14 consistent gets required (when previously it was 39430 consistent gets).

So if ever you come across the scenario where an SQL does not have an Automatic Index created when clearly it should, it could be that it has been blacklisted and needs a different SQL to actually generate the necessary index.

To avoid some of these issues, make sure you do not have stale or missing statistics when reliant on Automatic Indexing. The new High Frequency Statistics Collection capability to designed to specifically avoid such a scenario.

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part II (Survive) October 7, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Internals, Index statistics, Oracle, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Stale Statistics.
1 comment so far

 

 

In my previous post, I discussed how having stale statistics, usually a bad idea, is especially problematic with regard Automatic Indexes as it usually results in viable automatic indexes only being created in an UNUSABLE/INVISIBLE state.

If we were to now to collect the missing statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STALE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME             NUM_ROWS     BLOCKS LAST_ANAL
-------------------- ---------- ---------- ---------
BOWIE_STALE            10000000      39677 06-JUL-20

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                       10000000          0 HYBRID          06-JUL-20
CODE                       971092    .000001 HYBRID          06-JUL-20
NAME                            1 4.9416E-08 FREQUENCY       06-JUL-20

 

If we now repeatedly re-run the problematic query many times:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)|  Time       |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)|  00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)|  00:00:01   |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

 

The CBO is forced to use the FTS as the current Automatic Index is in an UNUSABLE/INVISIBLE state.

If we wait for the next Automatic Indexing reporting period:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 06-JUL-2020 05:12:42
Activity end                : 06-JUL-2020 05:13:34
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 0
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

 

We notice that the Automatic Indexing process has nothing to report. Even though the problematic query is repeatedly executed, the SQL is now effectively on a blacklist and is not re-considered by the Automatic Indexing process.

If we look at the index details on the table:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME             AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK         NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0   YES NO  INVISIBLE ADVANCED LOW  UNUSABLE   10000000       23058           4147514 

 

So the Automatic Index (SYS_AI_300kk2unp8tr0) is still UNUSABLE and INVISIBLE and can not be used by the CBO.

NOTE: In earlier patches of Oracle Database 19c (I’m using version 19.5.0.0.0 in this demo), I identified some scenarios after stale statistics when indexes were created in but in a VALID/INVISIBLE state, such that they could still not be used by the CBO in general database sessions.

If we simply re-run the same queries again from the time when the dependant object statistics were stale, any SQL is just ignored by the Automatic Indexing process.

As such, if we now subsequently re-run the problematic query again:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

Again, the CBO has no choice here with no viable VALID/VISIBLE index present but to perform a FTS, even though its getting the cardinality estimates spot on since statistics gathering.

 

In Part III I’ll discuss how to get this query to finally use the Automatic Index and improve its performance, although if you’re a regular reader of the blog you should already know the solution…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part I (Dead Against It) October 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, High Frequency Statistics Collection, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Stale Statistics, Unusable Indexes.
5 comments

A “golden rule” when working with Automatic Indexing is that things don’t work properly if there are stale statistics on the dependant objects. Stale statistics can of course be problematic but they can be particularly troublesome when dealing with Automatic Indexing.

In the Oracle Autonomous Database environments, this issue is addressed somewhat by the new High Frequency Statistics Collection capability, which helps to automatically collect stale statistics on a regular basis. However, in on-prem Exadata environments where this can more easily be turned off or collected less frequently, it’s a potential issue worth consideration.

I’ll start with a simple little table, with a CODE column that has lots of distinct values:

SQL> create table bowie_stale (id number constraint bowie_stale_pk primary key, code number, name varchar2(42));

Table created.

SQL> insert into bowie_stale select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

Importantly, I don’t collect statistics on this newly populated table…

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME        NUM_ROWS     BLOCKS LAST_ANAL
--------------- ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

If we now run the following query a number of times while there are no statistics on the table:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice but to use a FTS as I don’t yet have an index on the CODE column.

If I now wait for the next Automatic Indexing task to kick in AND if there are still NO statistics on the table:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 05-JUL-2020 06:36:31
Activity end                 : 05-JUL-2020 06:37:07
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates            : 1
Indexes created             : 0
Space used                  : 0 B
Indexes dropped             : 0
SQL statements verified     : 0
SQL statements improved     : 0
SQL plan baselines created  : 0
Overall improvement factor  : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

 

You can see that there was the one index candidate BUT no Automatic Index appears to have been created.

Assuming there are still no statistics:

 

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE';

TABLE_NAME                       NUM_ROWS    BLOCKS  LAST_ANAL
------------------------------ ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE2';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

 

If we look now at what indexes exist on the table:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID
SYS_AI_300kk2unp8tr0           YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='BOWIE_STALE2' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
BOWIE_STALE_PK                 ID                                 1
SYS_AI_300kk2unp8tr0           CODE                               1

 

We notice there is now an Automatic Index BUT it remains in an UNUSABLE/INVISIBLE state. This means the index can’t be used by the CBO.

So if we now re-run the SQL query again:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice still but to use the FTS.

In Part II, we’ll see that once we get into this scenario, it can be a tad problematic to get ourselves out of it and get the Automatic Index created as we would like…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
add a comment

 

In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.

However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.

Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).

These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.

One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.

This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1005K|   135M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |  1005K|   135M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.

However, if we now remove the histogram on the CODE column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’);

PL/SQL procedure successfully completed.

 

There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.

So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 16:42:33
Activity end                 : 18-AUG-2020 16:43:06
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created                              : 0
Space used                                   : 0 B
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved                      : 0
SQL plan baselines created (SQL statements)  : 1 (1)
Overall improvement factor                   : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.

If we look at the Verification Details part of this report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The following SQL plan baselines were created:
-------------------------------------------------------------------------------
Parsing Schema Name     : BOWIE
SQL ID                  : 3yz8unzhhvnuz
SQL Text                : select /*+ dynamic_sampling(0) */ * from
space_oddity where code in (190000, 170000, 150000,
130000, 110000, 90000, 70000, 50000, 30000, 10000)
SQL Signature           : 3910785437403172730
SQL Handle              : SQL_3645e6a2952fcf7a
SQL Plan Baselines (1)  : SQL_PLAN_3cjg6naakzmvu198c05b9

We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.

Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.

Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.

Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.

This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:

 

SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status
from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz';

EXECUTION_NAME             ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS
-------------------------- -------------------- ---------------------- ---------
SYS_AI_2020-08-18/16:42:33                41169                 410291 REGRESSED

 

If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Hint Report (identified by operation id / Query Block Name / Object Alias):

Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 -  SEL$1
U -  dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS

Note
-----

- SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          4  db block gets
      41170  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.

So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Prettiest Star)) September 25, 2020

Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
1 comment so far

In my previous few blog posts, I’ve been discussing some issues in relation to how Automatic Indexes handle SQL statements that accesses skewed data. In this post, I’m going to setup the scenario in which Automatic Indexing can potentially use Baselines to help address some of these issues. BUT, as we’ll see, I’m having to manufacture things somewhat to make this work due to the problem of the Automatic Indexing task using Dynamic Sampling of level 11, whereas most usual database sessions do not.

To set things up, I’m going recap what I’ve previously discussed (but with a slight difference), by creating a table that has significant data skew on the CODE column, with most values very uncommon, but with a handful of values being very common:

SQL> create table space_oddity (id number constraint space_oddity_pk primary key, code number, name varchar2(142));

Table created.

SQL> begin
2     for i in 1..2000000 loop
3       if mod(i,2) = 0 then
4          insert into space_oddity values(i, ceil(dbms_random.value(0,1000000)), 'David Bowie is really Ziggy Stardust and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history');
5       else
6          insert into space_oddity values(i, mod(i,20)*10000, 'Ziggy Stardust is really David Bowie and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history.');
7       end if;
8     end loop;
9     commit;
10  end;
11  /

PL/SQL procedure successfully completed.

 

So most CODE values will only occur a few times if at all, but a few values divisible by 10000 have many many occurrences within the table.

Importantly, we will initially collect statistics with NO histograms on the CODE column, which is the default behaviour anyways if no SQL has previous run with predicates on the column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 

If we run a query based on a rare value for CODE:

SQL> set arraysize 5000

SQL> select * from space_oddity where code=25;

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |     3 |   423 | 11356   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |     3 |   423 | 11356   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=25)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      40974  consistent gets
          0  physical reads
          0  redo size
       1018  bytes sent via SQL*Net to client
        402  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

 

Without an index, the CBO has no choice at this point but to perform a FTS. BUT note that the 2 rows returned is very similar to the 3 estimated rows, which would make an index likely the way to go if such an index existed.

However, the following SQL accesses many of the common values of CODE and returns many rows:

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Again, without an index in place, the CBO has no choice but to perform a FTS but this is almost certainly the way to go regardless. BUT without a histogram on the CODE column, the CBO has got the cardinality estimate way way off and thinks only 32 rows are to be returned and not the actual 1000011 rows.

So what does Automatic Indexing make of things. Let’s wait and have a look at the next Automatic Indexing Report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 15:57:14
Activity end                 : 18-AUG-2020 15:58:10
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                              : 1
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 35.65 MB (35.65 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 1
SQL statements improved (improvement factor)  : 1 (40984.3x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 40984.3x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
----------------------------------------------------------------------------
| Owner | Table        | Index                | Key  | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | SPACE_ODDITY | SYS_AI_82bdnqs7q8rtm | CODE | B-TREE | NONE       |
----------------------------------------------------------------------------

 

So Automatic Indexing has indeed created the index (SYS_AI_82bdnqs7q8rtm) on the CODE column BUT this is based on only the one SQL statement:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name  : BOWIE
SQL ID               : 19sv1g6tt0g1y
SQL Text             : select * from space_oddity where code=25
Improvement Factor   : 40984.3x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  5417408                       139265
CPU Time (s):      1771880                       7797
Buffer Gets:       327876                        5
Optimizer Cost:    11356                         5
Disk Reads:        649                           2
Direct Writes:     0                             0
Rows Processed:    16                            2
Executions:        8                             1

 

The Automatic Indexing task has correctly identified a significant improvement of 40984.3x when using an index on the SQL statement that returned just the 2 rows. The other SQL statement that returns many rows IS NOT MENTIONED.

This is because the Automatic Indexing tasks uses Dynamic Sampling Level=11, meaning it determines the more accurate cardinality estimate on the fly and correctly identifies that a vast number of rows are going to be returned. As a result, it correctly determines that the new Automatic Indexing if used would be detrimental to performance and would not be used by the CBO.

BUT most importantly, it also makes the assumption that the CBO would automatically likewise make this same decision to NOT use any such index in other database sessions and so there’s nothing to protect.

BUT this assumption is incorrect IF other database sessions don’t likewise use Dynamic Sampling with Level=11.

BUT by default, including in Oracle’s Autonomous Database Transaction Processing Cloud environment, the Dynamic Sampling Level is NOT set to 11, but the 2.

Therefore, most database sessions will not be able to determine the correct cardinality estimate on the fly and so will incorrectly assume the number of returned rows is much less than in reality and potentially use any such new Automatic Index inappropriately…

So if we look at the Plans Section of the Automatic Indexing report:

 

PLANS SECTION

---------------------------------------------------------------------------------------------
- Original
-----------------------------

Plan Hash Value  : 2301175572
-----------------------------------------------------------------------------
| Id | Operation           | Name         | Rows | Bytes | Cost  | Time     |
-----------------------------------------------------------------------------
|  0 | SELECT STATEMENT    |              |      |       | 11356 |          |
|  1 |   TABLE ACCESS FULL | SPACE_ODDITY |    3 |   423 | 11356 | 00:00:01 |
-----------------------------------------------------------------------------

- With Auto Indexes

-----------------------------
Plan Hash Value  : 54782313
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |    3 |   423 |    5 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | SPACE_ODDITY         |    3 |   423 |    5 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_82bdnqs7q8rtm |    2 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=25)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

The new plan for the SQL returning 2 rows when using the new Automatic Index and is much more efficient with a significantly reduced cost (just 3 down from 11356).

But again, the plans for the SQL that returns many rows are not listed as the Automatic Indexing task has already determined that an index would make such a plan significantly less efficient.

If we now rerun the SQL the returns many rows (and BEFORE High Frequency Collection Statistics potentially kicks in):

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    32 |  4512 |    35   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SPACE_ODDITY         |    32 |  4512 |    35   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | SYS_AI_82bdnqs7q8rtm |    32 |       |    12   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR
           "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     410422  consistent gets
          0  physical reads
          0  redo size
  145536076  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Note that the cardinality estimate is still way way wrong, thinking that just 32 rows are to be returned, when is fact 1000011 rows are returned.

As a result, the CBO has decided to incorrectly use the new Automatic Index. Incorrectly, in that the number of consistent gets has increased 10x from the previous FTS plan (410,422 now, up from 41,169).

One way to resolve this is to collect histograms on the CODE column (or wait for the High Frequency Stats Collection to kick in):

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 2048’);

PL/SQL procedure successfully completed.

If we now re-run this SQL:

SQL> select * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |   996K|   133M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |   996K|   133M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The cardinality estimate is now much more accurate and the the execution plan now uses the more efficient FTS.

In Part II, we’ll look at how the Automatic Indexing tasks can be made to identify the dangers of a new index to SQLs that might degrade in performance and how it will create a Baseline to protect against any such SQL regressions….

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
add a comment

As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).

As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.

One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).

If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:

SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  101K|  2388K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        609 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.

Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true)  in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.

In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.

We currently have the following columns and statistics on the IGGY_POP table:

SQL> select column_name, num_distinct, density, num_buckets, histogram
from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME          NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
-------------------- ------------ ---------- ----------- ---------------
ID                        9705425          0         254 HYBRID
CODE1                         100  .00000005         100 FREQUENCY
CODE2                         100  .00000005         100 FREQUENCY
NAME                            1 5.0210E-08           1 FREQUENCY

 

If we now collect Extended Statistics on both CODE1, CODE2 columns:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID                                  9705425          0         254 HYBRID
CODE1                                   100  .00000005         100 FREQUENCY
CODE2                                   100  .00000005         100 FREQUENCY
NAME                                      1 5.0210E-08           1 FREQUENCY
SYS_STU#29QF8Y9BUDOW2HCDL47N44           99  .00000005         100 FREQUENCY

 

The CBO now has some idea on the cardinality if both columns are used within a predicate.

If we re-run the problematic query without the hint:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  100K|  2343K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.

However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time       |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          | 50000 |  878K |   575 (15) | 00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 |  878K |   575 (15) | 00:00:01   |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)
    filter("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.

This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.

So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.

 

As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Explain Plan For Index, Extended Statistics, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle General, Oracle Indexes.
1 comment so far

Following on from my previous few posts on “data skew”, I’m now going to look at it from a slightly different perspective, where there is an inherent relationship between columns. The CBO has difficulties in recognising (by default) that some combinations of column values are far more common than other combinations, resulting in incorrect cardinality estimates and resultant poor execution plans.

As we’ll see, this skew in returned data can lead to poor execution plans due to the inappropriate use of newly created Automatic Indexes…

I’ll start by creating a simple table that has two columns of interest, CODE1 and CODE2:

SQL> create table iggy_pop (id number, code1 number, code2 number, name varchar2(42));

Table created.

SQL> insert into iggy_pop select rownum, mod(rownum, 100)+1, mod(rownum, 100)+1, 'David Bowie'
from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP');

PL/SQL procedure successfully completed.

 

Both columns CODE1 and CODE2 each have 100 distinct values, so that the possible combinations of data from both columns is 100 x 100 = 10,000. HOWEVER, the values of CODE1 and CODE2 are always the same and so there is in fact only 100 distinct combinations of data because of this inherent relationship between columns.

If we run the following query for a combination of data that exists:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name      | Rows | Bytes | Cost (%CPU)|   Time     |
--------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT         |          |   1000|  24000|    575 (15)|   00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |   1000|  24000|    575 (15)|   00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Without an index, the CBO has no choice but to use a FTS. However, the interesting thing to note is how the cardinality estimate is way wrong, with 100,000 rows returned but only 1000 rows estimated. The CBO incorrect assumes that 1/10000th of the data is being returned and not actual the 1/100 (1%).

If we run a query on a combination of data that doesn’t exist:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          | 1000 |  18000|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 1000 |  18000|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)
    filter("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

The CBO still estimates that 1000 rows are to be returned. However, with no rows returned, an index would be a much better alternative than the current FTS in this case.

Let’s now wait and see what the Automatic Indexing process makes of all this (following are highlights from the Auto Indexing Last Activity report):

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 18-SEP-2020 01:24:17
Activity end                : 18-SEP-2020 01:25:29
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 134.22 MB (134.22 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (41301.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 41301.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner | Table    | Index                | Key         | Type   | Properties |
-------------------------------------------------------------------------------
| BOWIE | IGGY_POP | SYS_AI_1awkddqkwa4f8 | CODE1,CODE2 | B-TREE | NONE       |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

So Oracle does indeed create an automatic index on the CODE1, CODE2 columns. However, notice that only 1 statement has been verified and not the above two statements that I had executed during the previous period.

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : bdnf0barn3jk7
SQL Text            : select code1, code2, name from iggy_pop where code1=1 and code2=42
Improvement Factor  : 41301.7x

Execution Statistics:
-----------------------------
                  Original Plan                 Auto Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 72085                        1342
CPU Time (s):     39272                        679
Buffer Gets:      123907                       3
Optimizer Cost:   575                          4
Disk Reads:       122859                       2
Direct Writes:    0                            0
Rows Processed:   0                            0
Executions:       3                            1

 

So only the SQL that returned 0 rows has been reported. As expected, it runs much more efficiently with an index than via the previous FTS, with an Improvement Factor of some 41301.7x.

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 3288467

--------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost | Time        |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |          |      |       |  575 |             |
| 1 | TABLE ACCESS STORAGE FULL | IGGY_POP | 1000 | 18000 |  575 | 00:00:01    |
--------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2496796491

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |    2 |    36 |    4 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | IGGY_POP             |    2 |    36 |    4 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_1awkddqkwa4f8 |    1 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE1"=1 AND "CODE2"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

If we look at the comparison between plans, the new plan of course uses the newly created Automatic Index.

The critical point to notice here however is that the cardinality estimates are almost spot for the new execution plan (2 rows is much closer to reality than the previous 1000).

The reason why it’s much more accurate is because the Auto Indexing process session uses the new Dynamic Sampling Level = 11. This enables the CBO to sample data on the fly and determine a much more accurate cardinality estimate than by default where the Dynamic Sampling Level=2.

This also explains why the other statement which returned many rows was not “verified”. Actually, it was but because the Auto Index process with Dynamic Sampling set to 11 correctly identified that too many rows were being returned to make any new index viable, this statement did NOT cause the new index to be kept.

So it was only the SQL that returned no rows that resulted in the newly created Automatic Index. The other statement was correctly determined by the Automatic Indexing process to run worse with the new index and so determined that the CBO would simply ignore the index if created.

BUT this assumption of the CBO ignoring the index is NOT correct as we’ll see…

If we look at the new Automatic Index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='IGGY_POP';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_1awkddqkwa4f8           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       15362           4083700

 

We can see the index is both VISIBLE and VALID and so can potentially be used now by ANY subsequent SQL statement.

Now the important thing to note is that the default for most sessions in a database is for Dynamic Sampling to be set to 2 and for Optimizer_Adaptive_Statistics=False. Importantly, this is also the case in Oracle’s Autonomous Transaction Processing Cloud service.

SQL> show parameter sampling

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2
SQL> show parameter optimizer_adaptive

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_adaptive_plans             boolean     TRUE
optimizer_adaptive_reporting_only    boolean     FALSE
optimizer_adaptive_statistics        boolean     FALSE

 

So this is DIFFERENT to the settings for the Automatic Indexing process. In a standard session, the CBO will NOT have the capability to accurately determine the correct cardinality estimates as we saw previously.

If we now re-run the SQL that returns no rows:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 1000 | 18000 |     413 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP             | 1000 | 18000 |     413 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_1awkddqkwa4f8 | 1000 |       |       4 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

The execution uses the new index, because even though it STILL thinks 1000 rows are to be returned, that’s sufficiently few for the index to be costed the cheaper option.

When when re-run the SQL that returns many many rows:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 1000 | 24000 |     413 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP             | 1000 | 24000 |     413 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_1awkddqkwa4f8 | 1000 |       |       4 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
         25 recursive calls
          0 db block gets
      41981 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          1 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Ouch. It also uses the new Automatic Index, because it also STILL thinks only 1000 rows are to be returned and just like the previous SQL statement, is determined to be the cheaper option.

BUT in this case it isn’t really the cheaper option, having to read the table potentially piecemeal at a time via the index, rather than more efficiently with fewer and larger multiblock reads via a FTS.

This is not really how Automatic is designed to work. Its meant to protect us from making SQL statements regress in performance BUT because there is a difference in how a normal session and the Automatic Indexing process determines the cost of execution plans, these scenarios can eventuate.

In my next blog I’ll look at how to address this specific scenario and then look at an example of how Automatic Indexing is really meant to work via the use of automated baselines…

Oracle 19c Automatic Indexing: Data Skew Part III (The Good Son) September 16, 2020

Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Index Access Path, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Unusable Indexes.
add a comment

 

I’m going to expand just a tad on my previous posts on data skew and run a simple query that returns a few rows based on a column predicate AND another query on the same column that returns many rows.

The following table has a CODE column as with previous posts with the data heavily skewed:

SQL> create table bowie_skew (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie_skew set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

 

I’ll next collect statistics with NO histogram, as I don’t think they’re required at this point:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'bowie_skew', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we look at the table data:

SQL> select code, count(*) from bowie_skew group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

The value “7” only has 100 associated rows, while the value “10” is very common with 654,465 rows.

But I currently have no histograms:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

 

If I run the following query with a CODE=7 predicate just once:

SQL> select * from bowie_skew where code=7;

100 rows selected.

Execution Plan

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100K|  1953K|   570   (7)| 00:00:01  |
|   1 |  PX COORDINATOR              |            |       |       |            |           |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01  |
|   3 |    PX BLOCK ITERATOR         |            |   100K|  1953K|   570   (7)| 00:00:01  |
|*  4 |     TABLE ACCESS STORAGE FULL| bowie_skew |   100K|  1953K|   570   (7)| 00:00:01  |
--------------------------------------------------------------------------------------------

 

It uses a Full Table Scan (the CBO has no choice without an index) AND hopelessly gets the cardinality estimate wrong, thinking 100K are going to be returned (and not the 100 actual rows).  So the CBO is unlikely to use an index anyways as it would be deemed too expensive to return so many rows.

I’ll now run the following query many times on the CODE=10 predicate that returns many rows:

SQL> select * from bowie_skew where code=10;

654465 rows selected.

Execution Plan

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100K|  1953K|   570   (7)| 00:00:01  |
|   1 |  PX COORDINATOR              |            |       |       |            |           |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01  |
|   3 |    PX BLOCK ITERATOR         |            |   100K|  1953K|   570   (7)| 00:00:01  |
|*  4 |     TABLE ACCESS STORAGE FULL| bowie_skew |   100K|  1953K|   570   (7)| 00:00:01  |
--------------------------------------------------------------------------------------------

 

So again, no choice here with a FTS and we likely wouldn’t want to use an index anyways as it would be just too expensive.

If we check out what the Automatic Indexing process has done with such a workload:

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
--------------------------------------------------------------------------
| Owner | Table      | Index                | Key  | Type   | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE       |
--------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 6fm3m8cg2jnun
SQL Text            : select * from bowie_skew where code=7
Improvement Factor  : 46.6x

Execution Statistics:
-----------------------------
                    Original Plan                Auto Index Plan
                    ---------------------------- ----------------------------
Elapsed Time (s):   36653                        1992
CPU Time (s):       33899                        967
Buffer Gets:        4291                         103
Optimizer Cost:     52                           4
Disk Reads:         0                            2
Direct Writes:      0                            0
Rows Processed:     100                          100
Executions:         1                            1

 

An Automatic Index on the CODE column is created (SYS_AI_7psvzc164vbng), with ONLY the SQL based on the CODE=7 predicate listed in the report. The other query is indeed too expensive for a new index to be viable and so isn’t listed.

If we look at the Plans Section of the Automatic Indexing report:

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 410492785

--------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows   | Bytes   | Cost | Time       |
--------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT          |            |        |         | 52   |            |
| 1  | TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 | 52   | 00:00:01   |
--------------------------------------------------------------------------------------

Notes
-----
- dop_reason = no expensive parallel operation
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 140816325

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      | 119  | 2380  | 4    | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW           | 119  | 2380  | 4    | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_7psvzc164vbng | 100  |       | 3    | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=7)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

The important point to note here is that the cardinality estimates are relatively accurate despite there being no histograms at this stage because the Automatic Indexing session uses Dynamic Sampling Level=11. Missing/inaccurate statistics are calculated on fly and this enables the session to accurately determine the size of the returned data set and that an index is indeed the more efficient access path.

So with mixed workloads, all it takes is one SQL executed once that demonstrably improves thanks to an index for the associated Automatic Index to be created as a VISIBLE/VALID index:

SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_SKEW';

INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_7psvzc164vbng           YES VISIBLE   VALID       1000000        1537              8534

 

If we now run the query AFTER the histograms are subsequently created thanks to the High-Frequency Automatic Statistics Collection (see previous post), the new Automatic Index is now used:

SQL> select * from bowie_skew where code=7;

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 140816325

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 100  | 2000  |       4 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW           | 100  | 2000  |       4 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_7psvzc164vbng | 100  |       |       3 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=7)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
        104 consistent gets
          0 physical reads
          0 redo size
       2871 bytes sent via SQL*Net to client
        359 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
        100 rows processed

 

Note if the histogram is NOT yet collected, the CBO will not determine the correct cardinality estimate and will ignore the new Automatic Index (as previously discussed).

If we run again the query that returns many rows:

SQL> select * from bowie_skew where code=10;

654465 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 410492785

----------------------------------------------------------------------------------------
| Id | Operation                | Name       | Rows | Bytes | Cost (%CPU)| Time        |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |            |  654K|    12M|     52 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_SKEW |  654K|    12M|     52 (16)| 00:00:01    |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=10)
    filter("CODE"=10)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       3725 consistent gets
          0 physical reads
          0 redo size
    6549708 bytes sent via SQL*Net to client
       1790 bytes received via SQL*Net from client
        132 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     654465 rows processed

The new Automatic Index is correctly ignored by the CBO, as the query returns too many rows for the index to be viable.

So in this example, Automatic Indexing works exactly as it should. It creates a new Automatic Index for a query where it will indeed improve the performance, while other queries on the same column in which many more rows are returned are also run. For these other queries, the new Automatic Index is correctly not used as such an index would degrade the performance of the query.

In my next post, I’ll look at the first example with data skew where Automatic Indexing can be problematic…

Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets) September 10, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Data Skew, Full Table Scans, Histograms, Index Access Path, Index statistics, Low Cardinality, Oracle Blog, Oracle Indexes, Oracle19c, Performance Tuning.
1 comment so far

When it comes to Automatic Indexes, things can become particularly interesting when dealing with data skew (meaning that some columns values are much less common than other column values). The next series of blog posts will look at a number of different scenarios in relation to how Automatic Indexing works with data that is skewed and not uniformly distributed.

I’ll start with a simple little example, that has an interesting little twist at the end.

The following table has a CODE column, which has 10 distinct values that a widely skewed, with some values much less common than others:

SQL> create table bowie_skew (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie_skew set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

 

I’ll collect statistics on this table, but explicitly NOT collect histograms, so that the CBO will have no idea that the data is actually skewed. Note if I collected data with the default size, there would still be no histograms, as the column has yet to be used within an SQL predicate and so has no column usage recorded.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_SKEW', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We can clearly see that some CODE values (such as “6”) have relatively few values, with only 100 occurrences:

SQL> select code, count(*) from bowie_skew group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

As I explicitly collected statistics with SIZE 1, we currently have NO histograms in the table:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

 

Let’s now run the following query with a predicate on CODE=6, returning just 100 rows:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name         | Rows  | Bytes | Cost (%CPU)| Time       |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |              |   100K|  1953K|   570   (7)| 00:00:01 |
|   1 |  PX COORDINATOR                |              |         |         |              |            |
|   2 |   PX SEND QC (RANDOM)          | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR           |              |   100K|  1953K|   570   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_SKEW |   100K|  1953K|   570   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("CODE"=6)
    filter("CODE"=6)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
      3781  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

The CBO has no choice but to use a FTS as I currently have no indexes on the CODE column. Note also that the CBO has got its cardinality estimates way wrong, expecting 100,000 rows and not the actual 100 rows, as I have no histograms on the CODE column.

So let’s now wait 15 minutes or so and see what the Automatic Indexing process decides to do. Following are portions of the next Auto Indexing report:

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
--------------------------------------------------------------------------
| Owner | Table      | Index                | Key  | Type   | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE       |
--------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : fn4shnphu4bvj
SQL Text             : select * from bowie_skew where code=6
Improvement Factor   : 41.1x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  119596                        322
CPU Time (s):      100781                        322
Buffer Gets:       11347                         103
Optimizer Cost:    570                           4
Disk Reads:        0                             0
Direct Writes:     0                             0
Rows Processed:    100                           100
Executions:        1                             1

 

So we can see that yes, Auto Indexing has decided to create a new index here on the CODE column (“SYS_AI_7psvzc164vbng“) as it improves the performance of the query by a factor of 41.1x.

If we look further down the Auto Indexing report and compare the execution plans:

 

PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value  : 3374004665
-----------------------------------------------------------------------------------------
| Id | Operation                      | Name       | Rows   | Bytes   | Cost | Time     |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |            |        |         |  570 |          |
|  1 |  PX COORDINATOR                |            |        |         |      |          |
|  2 |    PX SEND QC (RANDOM)         | :TQ10000   | 100000 | 2000000 |  570 | 00:00:01 |
|  3 |     PX BLOCK ITERATOR          |            | 100000 | 2000000 |  570 | 00:00:01 |
|  4 |      TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 |  570 | 00:00:01 |
-----------------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value  : 140816325
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100 |  2000 |    4 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW           |  100 |  2000 |    4 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_7psvzc164vbng |  100 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=6)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

We can see that new execution plan indeed uses the index BUT interestingly, it has a correct cardinality estimate of 100 and not 100,000 as per the original plan.

Now this can be explained in that the Automatic Indexing process uses a Dynamic Sampling level of 11, meaning it can calculate the correct cardinality on the fly and can cause difficulties between what the Automatic Indexing process thinks the CBO costs will be vs. the CBO costs in a default database session that uses the (usually default) Dynamic Sampling level of 2 (as I’ve discussed previously).

BUT when I now rerun the SQL query again:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
---------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |   100 |  2000 |     4   (0)|
|   1 |  PX COORDINATOR                       |                      |       |       |            |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |   100 |  2000 |     4   (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW           |   100 |  2000 |     4   (0)|
|   4 |     BUFFER SORT                       |                      |       |       |            |
|   5 |      PX RECEIVE                       |                      |   100 |       |     3   (0)|
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |   100 |       |     3   (0)|
|   7 |        PX SELECTOR                    |                      |       |       |            |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_7psvzc164vbng |   100 |       |     3   (0)|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("CODE"=6)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       103  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

We notice the new Automatic Index is now used BUT also that the CBO has now determined the correct cardinality estimate of 100. But how is this possible when I haven’t recalculated the table statistics?

I’ll explain in my next post.

Oracle 19c Automatic Indexing: DDL Statements With Auto Indexes (No Control) September 1, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Drop Automatic Indexing, Drop Index, Index Coalesce, Index Rebuild, Index Shrink, Invisible Indexes, Online DDL, Oracle Indexes.
2 comments

 

I’ve had a number of questions in relation to DDL support for Automatic Indexes since my last post on how one can now drop Automatic Indexes, so decided to quickly discuss what DDL statements are supported with Automatic Indexes.

Many DDL commands are NOT supported with Automatic Indexes, such as making indexes (IN)VISIBLE and (UN)USABLE and changing storage attributes:

 

SQL> alter index "SYS_AI_600vgjmtqsgv3" invisible;

alter index "SYS_AI_600vgjmtqsgv3" invisible
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes


SQL> alter index "SYS_AI_600vgjmtqsgv3" unusable;

alter index "SYS_AI_600vgjmtqsgv3" unusable
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes


SQL> ALTER INDEX "SYS_AI_600vgjmtqsgv3" INITRANS 5;
ALTER INDEX "SYS_AI_600vgjmtqsgv3" INITRANS 5
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

You also can’t drop indexes with the DDL statement:

SQL> drop index "SYS_AI_600vgjmtqsgv3";

drop index "SYS_AI_600vgjmtqsgv3"
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

 

Although as discussed in my last post, you can now drop Automatic Indexes by using DBMS_AUTO_INDEX.DROP_AUTO_INDEXES.

 

You can however potentially improve the structure of an Automatic Index by using the REBUILD, COALESCE or SHRINK (SPACE) options:

 

SQL> alter index "SYS_AI_600vgjmtqsgv3" rebuild online;

Index altered.

SQL> alter index "SYS_AI_600vgjmtqsgv3" coalesce;

Index altered.

SQL> alter index "SYS_AI_600vgjmtqsgv3" shrink space;

Index altered.

 

Interestingly, if Oracle considers an Automatic Index but decides it’s not efficient enough to be created, the Automatic Indexing process can leave a new Automatic Index in UNUSABLE / INVISIBLE state (as previously discussed), which can be subsequently rebuilt:

SQL> select index_name, status, visibility from user_indexes where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT
------------------------------ -------- ---------
SYS_AI_600vgjmtqsgv3           UNUSABLE INVISIBLE

SQL> alter index "SYS_AI_600vgjmtqsgv3" rebuild online;

Index altered.

SQL> select index_name, status, visibility from user_indexes where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT
------------------------------ -------- ---------
SYS_AI_600vgjmtqsgv3           VALID    INVISIBLE

 

So the index is now VALID and actually physically created. But you can’t subsequently make it VISIBLE, which means it can’t ordinarily be used by the CBO:

SQL> alter index "SYS_AI_600vgjmtqsgv3" visible;
alter index "SYS_AI_600vgjmtqsgv3" visible
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

 

When you rebuild an Automatic Index, you can however change the manner in which it’s compressed:

SQL> select index_name, status, visibility, compression from user_indexes
where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT COMPRESSION
------------------------------ -------- --------- -------------
SYS_AI_600vgjmtqsgv3           VALID    INVISIBLE ADVANCED LOW

SQL> alter index "SYS_AI_600vgjmtqsgv3" rebuild nocompress;

Index altered.

SQL> select index_name, status, visibility, compression from user_indexes
where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT COMPRESSION
------------------------------ -------- --------- -------------
SYS_AI_600vgjmtqsgv3           VALID    INVISIBLE DISABLED

 

And no, you can’t rename an Automatic Index:

 

SQL> alter index "SYS_AI_600vgjmtqsgv3" rename to BOWIE_INDEX;
alter index "SYS_AI_600vgjmtqsgv3" rename to BOWIE_INDEX
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

 

So the answer is it depends on what one can and can’t do currently with an Automatic Index, which of course is subject to change in the future…

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part III (Star) August 11, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Internals, Index statistics, Oracle, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
1 comment so far

In Part I we looked at a scenario where an index was deemed to be too inefficient for Automatic Indexing to create a VALID index, because of the poor clustering of data within the table.

In Part II we improved the data clustering but the previous SQLs could still not generate a new Automatic Index because they had effectively been blacklisted.

So how do we get Automatic Indexing to improve the performance of these queries?

Basically, we need to run some new SQL statements to those previously run which have not been blacklisted, that can make the Automatic Indexing process kick in and create the necessary indexes.

For example, if we now run the following SQL statements that have not previously run:

select * from nickcave where code=1;

select * from nickcave where code=2;

select * from nickcave where code=3;

 

And now wait for the next Automatic Indexing process period and look at the following (partial) Automatic Indexing report:

 

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 22-JUN-2020 04:26:31
Activity end                 : 22-JUN-2020 04:27:25
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                              : 0
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 167.77 MB (167.77 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 3
SQL statements improved (improvement factor)  : 3 (76x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 76x


INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
------------------------------------------------------------------------
| Owner | Table    | Index                | Key  | Type   | Properties |
------------------------------------------------------------------------
| BOWIE | NICKCAVE | SYS_AI_dh8pumfww3f4r | CODE | B-TREE | NONE       |
------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 5k1wmtu7um5q9
SQL Text             : select * from nickcave where code=1
Improvement Factor   : 76x

Execution Statistics:
-----------------------------

                   Original Plan                   Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  1725103                       106145
CPU Time (s):      1534305                       62314
Buffer Gets:       291835                        779
Optimizer Cost:    9125                          792
Disk Reads:        0                             197
Direct Writes:     0                             0
Rows Processed:    500000                        100000
Executions:        5                             1

 

We can see that an index has indeed now been created on the CODE column because one of the new statements is now deemed to be 76x more efficient thanks to the new index.

If we look at details of this new Automatic Index:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  VISIBLE   DISABLED      VALID      10000000       19518             57983

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='NICKCAVE'
order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that the index is now indeed VALID and VISIBLE with a much improved Clustering Factor at just 57983.

If we now re-run newer SQL statement:

 

SQL> select * from nickcave where code=1;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("CODE"=1)

Statistics
----------------------------------------------------------
          12  recursive calls
           0  db block gets
         779  consistent gets
           0  physical reads
         176  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

We notice the SQL statement is now indeed using this new Automatic Index.

If we now re-run our original SQL statement that had been using a FTS execution plan and that we couldn’t make Automatic Indexing create a VALID index because when originally run, the data clustering was too poor within the table:

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

    8 - access("CODE"=42)

Statistics
----------------------------------------------------------
          14  recursive calls
           4  db block gets
         780  consistent gets
         198  physical reads
       15224  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

This query is now also finally using the newly created index, because the CBO now too deems it to be more efficient with an index based execution plan.

The moral of the story. Automatic Indexing may initially deem a potential index to not be efficient enough to be created. However, things may change such as the clustering of table data (or the distribution of data values, etc. etc.) that may make a new index now viable. This though requires a NEW SQL statement to be executed, such that a non-blacklisted SQL can invoke the Automatic Indexing process to create the necessary Automatic Index.

Of course, things may change in the future. Future releases may have the facility to automatically re-cluster the data in tables optimally based on existing workloads and may also have a mechanism to identify that things have sufficient “changed” such that previously “failed” SQL statements from an Automatic Indexing perspective may warrant reevaluation.

This has only been tested up to version Oracle Database 19.5 of the Oracle Autonomous Database environments.

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part II (Wild Is The Wind) August 10, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Oracle Indexes, Performance Tuning.
2 comments

 

In my previous post, I discussed a scenario in which Oracle Automatic Indexing refused to create a VALID index, because the resultant index was too inefficient to access the necessary rows due to the poor clustering of data within the table.

If the performance of such an SQL were critical for business requirements, there is a way to address this scenario, by re-clustering the data within the table to align itself with the index. Although the re-clustering table operation can now be very easily performed online since Oracle Database 12.2 (without having to use the dbms_redefinition process), this is NOT automatically performed within the Autonomous Database self-tuning framework (yet).

But it’s an activity we can perform manually to improve the performance of such critical SQLs as follows:

SQL> alter table nickcave add clustering by linear order(code);

Table altered.

SQL> alter table nickcave move online;

Table altered.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

 

With the data in the table now perfectly aligned with the index, we would ordinarily now expect the index to be more efficient method to retrieve this 1% of the data.

However, if we now re-run the previously executed SQLs each a number of times:

 

SQL> select * from nickcave where code=24;

SQL> select * from nickcave where code=42;

SQL> select * from nickcave where code=13;

And now wait until next Automatic Indexing process period:

SQL> select dbms_auto_index.report_last_activity() report from dual;

...

We notice that the Automatic Index is still NOT mentioned in the Automatic Indexing reports and still remains UNUSABLE:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

 

So what’s going on?

In order to prevent the same SQLs from being continually re-evaluated to see if an index might be preferable, the Automatic Indexing process puts previously evaluated SQLs on a type of blacklist and therefore don’t get subsequently re-evaluated.

So although the new clustering of the data within the table would now likely warrant the creation of a new index, if we just run the some SQLs as previously, nothing changes. No Automatic Index is created and the SQLs remain in their current “sub-optimal” state.

In Part III, we’ll look at how to finally get Automatic Indexing to create these indexes and improve the performance of these queries…

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part I (Don’t Look Down) August 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Full Table Scans, Index Rebuild, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
4 comments

I’ve discussed many times the importance of data clustering in relation to the efficiency of indexes. With respect to the efficiency of Automatic Indexes including their usage within Oracle’s Autonomous Database environments, data clustering is just as important.

The following demo was run on an Oracle 19c database within the Oracle Autonomous Database Transaction Processing Cloud environment.

I begin by creating a simple table that has the key column CODE, in which data is populated in a manner where the data is very poorly clustered:

 

SQL> create table nickcave (id number, code number, name varchar2(42));

Table created.

SQL> insert into nickcave select rownum, mod(rownum, 100), 'Nick Cave and the Bad Seeds'
     from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'NICKCAVE');

PL/SQL procedure successfully completed.

 

So we have 100 evenly distributed distinct CODE values but they’re all distributed throughout the table.

The following SQL statement is basically returning just 1% of the data and is executed a number of times:

 

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows    | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     100K|  3613K|  9125   (5)| 00:00:01|
|   1 |  PX COORDINATOR              |          |         |       |            |         |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000 |     100K|  3613K|  9125   (5)| 00:00:01|
|   3 |    PX BLOCK ITERATOR         |          |     100K|  3613K|  9125   (5)| 00:00:01|
|*  4 |     TABLE ACCESS STORAGE FULL| NICKCAVE |     100K|  3613K|  9125   (5)| 00:00:01|
------------------------------------------------------------------------------------------

Without an index, the CBO currently has no choice but to use a Full Table Scan to access the table. So we wait for the next Automatic Index process to kick in:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

The Automatic Indexing report makes no mention of Automatic Indexes on the NICKCAVE table…

If we look to see if any indexes have actually been created:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor 
     from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='NICKCAVE'
     order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that yes, an Automatic Index (SYS_AI_dh8pumfww3f4r) has been created on the CODE column of the NICKCAVE table BUT it remains in an INVISIBLE, UNUSABLE state.

So Automatic Indexing considered an index on CODE, created it in an INVISIBLE, USABLE state but when testing it, failed in that it found it to be less efficient than the current FTS and so reverted the Automatic Index back to an UNUSABLE index.

Therefore, if we run a bunch of other similar SQL statements such as the following:

SQL> select * from nickcave where code=24;

SQL> select * from nickcave where code=42;

SQL> select * from nickcave where code=13;

 

They all use the FTS as again, the CBO has no choice with no VALID index on the CODE column available.

If we keep checking the Automatic Indexing report:

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

There’s still no mention of an index on the CODE column. The existing Automatic Index remains in an UNUSABLE state:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

 

Basically, the index remains ineffective because with a Clustering Factor of 4158302, it’s just too inefficient to return the 1% (100000 rows) of the table.

Even in an Autonomous Database environment, nothing will automatically change with this scenario.

In my next post, we’ll look at how we can improve the performance of this query and get an Automatic Index to actually kick in with a USABLE index…