jump to navigation

Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.
add a comment

In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.

So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?

I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:

SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42));

Table created.

SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'
from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD');

PL/SQL procedure successfully completed.


SQL> create table pink_floyd1 (id number, code number, create_date date, name varchar2(42));

Table created.

SQL> insert into pink_floyd1 select rownum, ceil(dbms_random.value(0, 25000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'
from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD1');

PL/SQL procedure successfully completed.

 

So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.

I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:

 

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1152280033

----------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |            |    1 |    40 |    844 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD |    1 |    40 |    844 (11) | 00:00:01  |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
     filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      63660 consistent gets
      63649 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        380 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed


SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 564520720

-----------------------------------------------------------------------------------------
| Id | Operation                 | Name        | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |             |    1 |    41 |    856 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 |    1 |    41 |    856 (11) | 00:00:01  |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))
     filter("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      64424 consistent gets
      64413 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):

 

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
-----------------------------------------------------------------------------
| Owner | Table       | Index                | Key  | Type   | Properties   |
-----------------------------------------------------------------------------
| BOWIE | PINK_FLOYD1 | SYS_AI_96snkmu4sk44g | CODE | B-TREE | NONE         |
-----------------------------------------------------------------------------
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 7wag3gbk0b3tm
SQL Text            : select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'
Improvement Factor  : 64442.3x

Execution Statistics:
-----------------------------
                      Original Plan                Auto Index Plan
                      ---------------------------- ----------------------------
Elapsed Time (s):     568513                       2771
CPU Time (s):         275534                       1874
Buffer Gets:          1031078                      406
Optimizer Cost:       856                          405
Disk Reads:           1030609                      3
Direct Writes:        0                            0
Rows Processed:       0                            0
Executions:           16                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 564520720

-----------------------------------------------------------------------------------
| Id | Operation                 | Name        | Rows | Bytes | Cost | Time       |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |             |      |       |  856 |            |
|  1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 |    1 |    41 |  856 | 00:00:01   |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2703636439

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |    1 |    41 |  405 | 00:00:01   |
| * 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1          |    1 |    41 |  405 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_96snkmu4sk44g |  403 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
* 2 - access("CODE"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

If we look at the definitions of all indexes currently on these tables:

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='PINK_FLOYD';

INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS   NUM_ROWS   LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dp2t0j12zux49           YES INVISIBLE ADVANCED LOW  UNUSABLE   10000000       21702           4161898

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_dp2t0j12zux49           CODE                          1


SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='PINK_FLOYD1';

INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_96snkmu4sk44g           YES VISIBLE   ADVANCED LOW  VALID      10000000       15400           9969473

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD1';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_96snkmu4sk44g           CODE                          1

 

In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.

However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.

The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.

So if we re-run the first query:

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1152280033

----------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |            |    1 |    40 |    844 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD |    1 |    40 |    844 (11) | 00:00:01  |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
     filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      63660 consistent gets
      63649 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        380 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

It continues to use a Full Table Scan.

If we re-run the second query:

 

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2703636439

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    41 |     415 (0) | 00:00:01  |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1          |    1 |    41 |     415 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_96snkmu4sk44g |  412 |       |       3 (0) | 00:00:01  |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
        406 consistent gets
          0 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).

BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:

SQL> create index pink_floyd1_code_create_date_i on pink_floyd1(code, create_date) compress advanced low;

Index created.

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3366491378

----------------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                           | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                                |    1 |    41 |       4 (0) | 00:00:01  |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1                    |    1 |    41 |       4 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | PINK_FLOYD1_CODE_CREATE_DATE_I |    1 |       |       3 (0) | 00:00:01  |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"CREATE_DATE" IS NOT NULL)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

Performance improves significantly further, by reducing Consistent Gets down to just 3.

So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:

A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or

A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…

Oracle 19c Automatic Indexing: Non-Equality Predicates Part I (Lucy Can’t Dance) January 14, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cloud, Oracle Indexes, Oracle19c.
2 comments

 

I’ve been waiting a while before posting a series on the various limitations associated with Automatic Indexing, in order to see how the feature matures over time.

The following have all been re-tested post 1 January 2021 on the Autonomous ATP Database Cloud service, using Oracle Database version 19.5.0.0.0.

In the Oracle Documentation (including version 21c), the only limitations with regard Automatic Indexing listed are the following:

  • Auto indexes are local B-tree indexes.
  • Auto indexes can be created for partitioned as well as non-partitioned tables.
  • Auto indexes cannot be created for temporary tables.

Well, as I discussed in the previous series on Automatic Indexing on Partitioned tables, we already saw how Oracle can actually also create Non-Partitioned (Global) indexes. So the limitation on Automatic Indexes being “local” indexes is not actually correct, even with 19c.

But are there other limitations that are not officially documented?

If you look at every example I’ve used previously with regard Automatic Indexing, they all feature Equality predicates. In the following examples, I’m going to run a series on Range Scan predicates that heavily filter and would benefit greatly from an index.

I first create a simple table with 10M rows:

SQL> create table ziggy1 (id number, code number, name varchar2(42));

Table created.

SQL> insert into ziggy1 select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY1');

PL/SQL procedure successfully completed.

 

I then run the following range scan queries several times that each return only a few rows:

SQL> select * from ziggy1 where id between 42 and 50;

9 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    8 |   184 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    8 |   184 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID"<=50 AND "ID">=42)
    filter("ID"<=50 AND "ID">=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        596 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          9 rows processed


SQL> select * from ziggy1 where id < 0;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    1 |    23 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    1 |    23 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID"<0)
    filter("ID"<0)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        364 bytes sent via SQL*Net to client
        344 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

SQL> select * from ziggy1 where id > 100000000000;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    1 |    23 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    1 |    23 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID">100000000000)
    filter("ID">100000000000)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        364 bytes sent via SQL*Net to client
        355 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

If we look at the subsequent Automatic Indexing report:

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 13-JAN-2021 11:55:37
Activity end                : 13-JAN-2021 11:56:20
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 3
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

We notice NO Automatic Indexes were created.

We can run these queries endlessly and Automatic Indexing will never create associated Automatic Indexes:

SQL> select index_name, auto, constraint_index, visibility from user_indexes where table_name='ZIGGY1';

no rows selected

These queries are doomed to perform Full Table Scans unless indexes are manually created:

SQL> select * from ziggy1 where id between 42 and 50;

9 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    8 |   184 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    8 |   184 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID"<=50 AND "ID">=42)
    filter("ID"<=50 AND "ID">=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        596 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          9 rows processed

 

Currently Automatic Indexes do not support Non-Equality predicates. Automatic Indexes are only created based on Equality-based predicates.

Obviously, Automatic Indexing is a fabulous feature and this might all change in the future. But with Non-Equality predicates so prevalent in SQL, it’s vital to note this current limitation when using and relying on Automatic Indexing…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part III (Do Anything You Say) October 8, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Stale Statistics.
2 comments

In Part I of this series, we saw how Automatic Indexing will not create a viable Automatic Index if there are stale or missing statistics on the underlining segments. In Part II we saw how these SQL statements effectively become blacklisted and when segment statistics are subsequently collected, Automatic Indexing will still not create viable Automatic Indexes when the SQL statements are re-run.

So how do we get Automatic Indexing to now kick in and create necessary indexes on these problematic SQLs?

As I’ve discussed previously in relation to blacklisted SQLs, we need to run a NEW SQL statement that hasn’t been blacklist that will result in a necessary index to be created. An easy way to do this is just to include a new comment within the previous SQL to give the SQL a new signature.

If we now run the following “new” SQL statement (identical to the problematic SQL but with a comment embedded):

SQL> select /* new */ * from bowie_stale where code=42;

        ID       CODE NAME
---------- ---------- ------------------------------------------
   1000041         42 David Bowie
   6000041         42 David Bowie
        41         42 David Bowie
   3000041         42 David Bowie
   7000041         42 David Bowie
   8000041         42 David Bowie
   4000041         42 David Bowie
   9000041         42 David Bowie
   5000041         42 David Bowie
   2000041         42 David Bowie

 

If we now wait to see what the next Automatic Indexing task makes of things:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 07-JUL-2020 06:34:49
Activity end                : 07-JUL-2020 06:35:54
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 142.61 MB (142.61 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (19787.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 19787.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Owner | Table       | Index                | Key  | Type   | Properties |
---------------------------------------------------------------------------
| BOWIE | BOWIE_STALE | SYS_AI_300kk2unp8tr0 | CODE | B-TREE | NONE       |
---------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

We see that the index on the CODE column (SYS_AI_300kk2unp8tr0) has now been created.

Further down the report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : du6psd0xmzpg5
SQL Text            : select /* new */ * from bowie_stale where code=42
Improvement Factor  : 19787.7x

Execution Statistics:
-----------------------------
                  Original Plan Auto           Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 137261                       2620
CPU Time (s):     84621                        1769
Buffer Gets:      277028                       13
Optimizer Cost:   544                          13
Disk Reads:       275947                       2
Direct Writes:    0                            0
Rows Processed:   70                           10
Executions:       7                            1

 

A new index was indeed created because of this new SQL statement, with a performance improvement of 19787.7x.

Further down the report to the Plans Section:

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 65903426

-----------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost | Time        |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |             |      |       |  544 |             |
| 1 | TABLE ACCESS STORAGE FULL | BOWIE_STALE |   10 |   230 |  544 | 00:00:01    |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2558864466

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   10 |   230 |   13 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_STALE          |   10 |   230 |   13 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_300kk2unp8tr0 |   10 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We can see that the new plan using the new Automatic Index with a much lower CBO cost.

If we now look at the status of this index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       16891          10000000

 

We see that the index is now both VISIBLE and VALID (previously, it was INVISIBLE and UNUSABLE).

As such, the Automatic Index can now potentially be used by any SQL, including the previous problematic query.

So with a viable index now in place, if we re-run the initial problematic query:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2558864466

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      |   10 |   230 |      14 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_STALE          |   10 |   230 |      14 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_300kk2unp8tr0 |   10 |       |       3 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         14 consistent gets
          0 physical reads
          0 redo size
        738 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

We see that finally, the SQL uses the new Automatic Index and is indeed much more efficient as a result, with just 14 consistent gets required (when previously it was 39430 consistent gets).

So if ever you come across the scenario where an SQL does not have an Automatic Index created when clearly it should, it could be that it has been blacklisted and needs a different SQL to actually generate the necessary index.

To avoid some of these issues, make sure you do not have stale or missing statistics when reliant on Automatic Indexing. The new High Frequency Statistics Collection capability to designed to specifically avoid such a scenario.

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part II (Survive) October 7, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Internals, Index statistics, Oracle, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Stale Statistics.
1 comment so far

 

 

In my previous post, I discussed how having stale statistics, usually a bad idea, is especially problematic with regard Automatic Indexes as it usually results in viable automatic indexes only being created in an UNUSABLE/INVISIBLE state.

If we were to now to collect the missing statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STALE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME             NUM_ROWS     BLOCKS LAST_ANAL
-------------------- ---------- ---------- ---------
BOWIE_STALE            10000000      39677 06-JUL-20

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                       10000000          0 HYBRID          06-JUL-20
CODE                       971092    .000001 HYBRID          06-JUL-20
NAME                            1 4.9416E-08 FREQUENCY       06-JUL-20

 

If we now repeatedly re-run the problematic query many times:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)|  Time       |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)|  00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)|  00:00:01   |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

 

The CBO is forced to use the FTS as the current Automatic Index is in an UNUSABLE/INVISIBLE state.

If we wait for the next Automatic Indexing reporting period:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 06-JUL-2020 05:12:42
Activity end                : 06-JUL-2020 05:13:34
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 0
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

 

We notice that the Automatic Indexing process has nothing to report. Even though the problematic query is repeatedly executed, the SQL is now effectively on a blacklist and is not re-considered by the Automatic Indexing process.

If we look at the index details on the table:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME             AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK         NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0   YES NO  INVISIBLE ADVANCED LOW  UNUSABLE   10000000       23058           4147514 

 

So the Automatic Index (SYS_AI_300kk2unp8tr0) is still UNUSABLE and INVISIBLE and can not be used by the CBO.

NOTE: In earlier patches of Oracle Database 19c (I’m using version 19.5.0.0.0 in this demo), I identified some scenarios after stale statistics when indexes were created in but in a VALID/INVISIBLE state, such that they could still not be used by the CBO in general database sessions.

If we simply re-run the same queries again from the time when the dependant object statistics were stale, any SQL is just ignored by the Automatic Indexing process.

As such, if we now subsequently re-run the problematic query again:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

Again, the CBO has no choice here with no viable VALID/VISIBLE index present but to perform a FTS, even though its getting the cardinality estimates spot on since statistics gathering.

 

In Part III I’ll discuss how to get this query to finally use the Automatic Index and improve its performance, although if you’re a regular reader of the blog you should already know the solution…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part I (Dead Against It) October 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, High Frequency Statistics Collection, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Stale Statistics, Unusable Indexes.
5 comments

A “golden rule” when working with Automatic Indexing is that things don’t work properly if there are stale statistics on the dependant objects. Stale statistics can of course be problematic but they can be particularly troublesome when dealing with Automatic Indexing.

In the Oracle Autonomous Database environments, this issue is addressed somewhat by the new High Frequency Statistics Collection capability, which helps to automatically collect stale statistics on a regular basis. However, in on-prem Exadata environments where this can more easily be turned off or collected less frequently, it’s a potential issue worth consideration.

I’ll start with a simple little table, with a CODE column that has lots of distinct values:

SQL> create table bowie_stale (id number constraint bowie_stale_pk primary key, code number, name varchar2(42));

Table created.

SQL> insert into bowie_stale select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

Importantly, I don’t collect statistics on this newly populated table…

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME        NUM_ROWS     BLOCKS LAST_ANAL
--------------- ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

If we now run the following query a number of times while there are no statistics on the table:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice but to use a FTS as I don’t yet have an index on the CODE column.

If I now wait for the next Automatic Indexing task to kick in AND if there are still NO statistics on the table:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 05-JUL-2020 06:36:31
Activity end                 : 05-JUL-2020 06:37:07
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates            : 1
Indexes created             : 0
Space used                  : 0 B
Indexes dropped             : 0
SQL statements verified     : 0
SQL statements improved     : 0
SQL plan baselines created  : 0
Overall improvement factor  : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

 

You can see that there was the one index candidate BUT no Automatic Index appears to have been created.

Assuming there are still no statistics:

 

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE';

TABLE_NAME                       NUM_ROWS    BLOCKS  LAST_ANAL
------------------------------ ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE2';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

 

If we look now at what indexes exist on the table:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID
SYS_AI_300kk2unp8tr0           YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='BOWIE_STALE2' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
BOWIE_STALE_PK                 ID                                 1
SYS_AI_300kk2unp8tr0           CODE                               1

 

We notice there is now an Automatic Index BUT it remains in an UNUSABLE/INVISIBLE state. This means the index can’t be used by the CBO.

So if we now re-run the SQL query again:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice still but to use the FTS.

In Part II, we’ll see that once we get into this scenario, it can be a tad problematic to get ourselves out of it and get the Automatic Index created as we would like…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
add a comment

 

In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.

However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.

Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).

These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.

One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.

This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1005K|   135M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |  1005K|   135M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.

However, if we now remove the histogram on the CODE column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’);

PL/SQL procedure successfully completed.

 

There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.

So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 16:42:33
Activity end                 : 18-AUG-2020 16:43:06
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created                              : 0
Space used                                   : 0 B
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved                      : 0
SQL plan baselines created (SQL statements)  : 1 (1)
Overall improvement factor                   : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.

If we look at the Verification Details part of this report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The following SQL plan baselines were created:
-------------------------------------------------------------------------------
Parsing Schema Name     : BOWIE
SQL ID                  : 3yz8unzhhvnuz
SQL Text                : select /*+ dynamic_sampling(0) */ * from
space_oddity where code in (190000, 170000, 150000,
130000, 110000, 90000, 70000, 50000, 30000, 10000)
SQL Signature           : 3910785437403172730
SQL Handle              : SQL_3645e6a2952fcf7a
SQL Plan Baselines (1)  : SQL_PLAN_3cjg6naakzmvu198c05b9

We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.

Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.

Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.

Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.

This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:

 

SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status
from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz';

EXECUTION_NAME             ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS
-------------------------- -------------------- ---------------------- ---------
SYS_AI_2020-08-18/16:42:33                41169                 410291 REGRESSED

 

If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Hint Report (identified by operation id / Query Block Name / Object Alias):

Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 -  SEL$1
U -  dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS

Note
-----

- SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          4  db block gets
      41170  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.

So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Prettiest Star)) September 25, 2020

Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
1 comment so far

In my previous few blog posts, I’ve been discussing some issues in relation to how Automatic Indexes handle SQL statements that accesses skewed data. In this post, I’m going to setup the scenario in which Automatic Indexing can potentially use Baselines to help address some of these issues. BUT, as we’ll see, I’m having to manufacture things somewhat to make this work due to the problem of the Automatic Indexing task using Dynamic Sampling of level 11, whereas most usual database sessions do not.

To set things up, I’m going recap what I’ve previously discussed (but with a slight difference), by creating a table that has significant data skew on the CODE column, with most values very uncommon, but with a handful of values being very common:

SQL> create table space_oddity (id number constraint space_oddity_pk primary key, code number, name varchar2(142));

Table created.

SQL> begin
2     for i in 1..2000000 loop
3       if mod(i,2) = 0 then
4          insert into space_oddity values(i, ceil(dbms_random.value(0,1000000)), 'David Bowie is really Ziggy Stardust and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history');
5       else
6          insert into space_oddity values(i, mod(i,20)*10000, 'Ziggy Stardust is really David Bowie and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history.');
7       end if;
8     end loop;
9     commit;
10  end;
11  /

PL/SQL procedure successfully completed.

 

So most CODE values will only occur a few times if at all, but a few values divisible by 10000 have many many occurrences within the table.

Importantly, we will initially collect statistics with NO histograms on the CODE column, which is the default behaviour anyways if no SQL has previous run with predicates on the column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 

If we run a query based on a rare value for CODE:

SQL> set arraysize 5000

SQL> select * from space_oddity where code=25;

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |     3 |   423 | 11356   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |     3 |   423 | 11356   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=25)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      40974  consistent gets
          0  physical reads
          0  redo size
       1018  bytes sent via SQL*Net to client
        402  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

 

Without an index, the CBO has no choice at this point but to perform a FTS. BUT note that the 2 rows returned is very similar to the 3 estimated rows, which would make an index likely the way to go if such an index existed.

However, the following SQL accesses many of the common values of CODE and returns many rows:

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Again, without an index in place, the CBO has no choice but to perform a FTS but this is almost certainly the way to go regardless. BUT without a histogram on the CODE column, the CBO has got the cardinality estimate way way off and thinks only 32 rows are to be returned and not the actual 1000011 rows.

So what does Automatic Indexing make of things. Let’s wait and have a look at the next Automatic Indexing Report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 15:57:14
Activity end                 : 18-AUG-2020 15:58:10
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                              : 1
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 35.65 MB (35.65 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 1
SQL statements improved (improvement factor)  : 1 (40984.3x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 40984.3x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
----------------------------------------------------------------------------
| Owner | Table        | Index                | Key  | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | SPACE_ODDITY | SYS_AI_82bdnqs7q8rtm | CODE | B-TREE | NONE       |
----------------------------------------------------------------------------

 

So Automatic Indexing has indeed created the index (SYS_AI_82bdnqs7q8rtm) on the CODE column BUT this is based on only the one SQL statement:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name  : BOWIE
SQL ID               : 19sv1g6tt0g1y
SQL Text             : select * from space_oddity where code=25
Improvement Factor   : 40984.3x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  5417408                       139265
CPU Time (s):      1771880                       7797
Buffer Gets:       327876                        5
Optimizer Cost:    11356                         5
Disk Reads:        649                           2
Direct Writes:     0                             0
Rows Processed:    16                            2
Executions:        8                             1

 

The Automatic Indexing task has correctly identified a significant improvement of 40984.3x when using an index on the SQL statement that returned just the 2 rows. The other SQL statement that returns many rows IS NOT MENTIONED.

This is because the Automatic Indexing tasks uses Dynamic Sampling Level=11, meaning it determines the more accurate cardinality estimate on the fly and correctly identifies that a vast number of rows are going to be returned. As a result, it correctly determines that the new Automatic Indexing if used would be detrimental to performance and would not be used by the CBO.

BUT most importantly, it also makes the assumption that the CBO would automatically likewise make this same decision to NOT use any such index in other database sessions and so there’s nothing to protect.

BUT this assumption is incorrect IF other database sessions don’t likewise use Dynamic Sampling with Level=11.

BUT by default, including in Oracle’s Autonomous Database Transaction Processing Cloud environment, the Dynamic Sampling Level is NOT set to 11, but the 2.

Therefore, most database sessions will not be able to determine the correct cardinality estimate on the fly and so will incorrectly assume the number of returned rows is much less than in reality and potentially use any such new Automatic Index inappropriately…

So if we look at the Plans Section of the Automatic Indexing report:

 

PLANS SECTION

---------------------------------------------------------------------------------------------
- Original
-----------------------------

Plan Hash Value  : 2301175572
-----------------------------------------------------------------------------
| Id | Operation           | Name         | Rows | Bytes | Cost  | Time     |
-----------------------------------------------------------------------------
|  0 | SELECT STATEMENT    |              |      |       | 11356 |          |
|  1 |   TABLE ACCESS FULL | SPACE_ODDITY |    3 |   423 | 11356 | 00:00:01 |
-----------------------------------------------------------------------------

- With Auto Indexes

-----------------------------
Plan Hash Value  : 54782313
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |    3 |   423 |    5 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | SPACE_ODDITY         |    3 |   423 |    5 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_82bdnqs7q8rtm |    2 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=25)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

The new plan for the SQL returning 2 rows when using the new Automatic Index and is much more efficient with a significantly reduced cost (just 3 down from 11356).

But again, the plans for the SQL that returns many rows are not listed as the Automatic Indexing task has already determined that an index would make such a plan significantly less efficient.

If we now rerun the SQL the returns many rows (and BEFORE High Frequency Collection Statistics potentially kicks in):

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    32 |  4512 |    35   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SPACE_ODDITY         |    32 |  4512 |    35   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | SYS_AI_82bdnqs7q8rtm |    32 |       |    12   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR
           "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     410422  consistent gets
          0  physical reads
          0  redo size
  145536076  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Note that the cardinality estimate is still way way wrong, thinking that just 32 rows are to be returned, when is fact 1000011 rows are returned.

As a result, the CBO has decided to incorrectly use the new Automatic Index. Incorrectly, in that the number of consistent gets has increased 10x from the previous FTS plan (410,422 now, up from 41,169).

One way to resolve this is to collect histograms on the CODE column (or wait for the High Frequency Stats Collection to kick in):

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 2048’);

PL/SQL procedure successfully completed.

If we now re-run this SQL:

SQL> select * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |   996K|   133M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |   996K|   133M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The cardinality estimate is now much more accurate and the the execution plan now uses the more efficient FTS.

In Part II, we’ll look at how the Automatic Indexing tasks can be made to identify the dangers of a new index to SQLs that might degrade in performance and how it will create a Baseline to protect against any such SQL regressions….

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Explain Plan For Index, Extended Statistics, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle General, Oracle Indexes.
1 comment so far

Following on from my previous few posts on “data skew”, I’m now going to look at it from a slightly different perspective, where there is an inherent relationship between columns. The CBO has difficulties in recognising (by default) that some combinations of column values are far more common than other combinations, resulting in incorrect cardinality estimates and resultant poor execution plans.

As we’ll see, this skew in returned data can lead to poor execution plans due to the inappropriate use of newly created Automatic Indexes…

I’ll start by creating a simple table that has two columns of interest, CODE1 and CODE2:

SQL> create table iggy_pop (id number, code1 number, code2 number, name varchar2(42));

Table created.

SQL> insert into iggy_pop select rownum, mod(rownum, 100)+1, mod(rownum, 100)+1, 'David Bowie'
from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP');

PL/SQL procedure successfully completed.

 

Both columns CODE1 and CODE2 each have 100 distinct values, so that the possible combinations of data from both columns is 100 x 100 = 10,000. HOWEVER, the values of CODE1 and CODE2 are always the same and so there is in fact only 100 distinct combinations of data because of this inherent relationship between columns.

If we run the following query for a combination of data that exists:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name      | Rows | Bytes | Cost (%CPU)|   Time     |
--------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT         |          |   1000|  24000|    575 (15)|   00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |   1000|  24000|    575 (15)|   00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Without an index, the CBO has no choice but to use a FTS. However, the interesting thing to note is how the cardinality estimate is way wrong, with 100,000 rows returned but only 1000 rows estimated. The CBO incorrect assumes that 1/10000th of the data is being returned and not actual the 1/100 (1%).

If we run a query on a combination of data that doesn’t exist:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          | 1000 |  18000|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 1000 |  18000|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)
    filter("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

The CBO still estimates that 1000 rows are to be returned. However, with no rows returned, an index would be a much better alternative than the current FTS in this case.

Let’s now wait and see what the Automatic Indexing process makes of all this (following are highlights from the Auto Indexing Last Activity report):

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 18-SEP-2020 01:24:17
Activity end                : 18-SEP-2020 01:25:29
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 134.22 MB (134.22 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (41301.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 41301.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner | Table    | Index                | Key         | Type   | Properties |
-------------------------------------------------------------------------------
| BOWIE | IGGY_POP | SYS_AI_1awkddqkwa4f8 | CODE1,CODE2 | B-TREE | NONE       |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

So Oracle does indeed create an automatic index on the CODE1, CODE2 columns. However, notice that only 1 statement has been verified and not the above two statements that I had executed during the previous period.

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : bdnf0barn3jk7
SQL Text            : select code1, code2, name from iggy_pop where code1=1 and code2=42
Improvement Factor  : 41301.7x

Execution Statistics:
-----------------------------
                  Original Plan                 Auto Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 72085                        1342
CPU Time (s):     39272                        679
Buffer Gets:      123907                       3
Optimizer Cost:   575                          4
Disk Reads:       122859                       2
Direct Writes:    0                            0
Rows Processed:   0                            0
Executions:       3                            1

 

So only the SQL that returned 0 rows has been reported. As expected, it runs much more efficiently with an index than via the previous FTS, with an Improvement Factor of some 41301.7x.

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 3288467

--------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost | Time        |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |          |      |       |  575 |             |
| 1 | TABLE ACCESS STORAGE FULL | IGGY_POP | 1000 | 18000 |  575 | 00:00:01    |
--------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2496796491

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |    2 |    36 |    4 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | IGGY_POP             |    2 |    36 |    4 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_1awkddqkwa4f8 |    1 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE1"=1 AND "CODE2"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

If we look at the comparison between plans, the new plan of course uses the newly created Automatic Index.

The critical point to notice here however is that the cardinality estimates are almost spot for the new execution plan (2 rows is much closer to reality than the previous 1000).

The reason why it’s much more accurate is because the Auto Indexing process session uses the new Dynamic Sampling Level = 11. This enables the CBO to sample data on the fly and determine a much more accurate cardinality estimate than by default where the Dynamic Sampling Level=2.

This also explains why the other statement which returned many rows was not “verified”. Actually, it was but because the Auto Index process with Dynamic Sampling set to 11 correctly identified that too many rows were being returned to make any new index viable, this statement did NOT cause the new index to be kept.

So it was only the SQL that returned no rows that resulted in the newly created Automatic Index. The other statement was correctly determined by the Automatic Indexing process to run worse with the new index and so determined that the CBO would simply ignore the index if created.

BUT this assumption of the CBO ignoring the index is NOT correct as we’ll see…

If we look at the new Automatic Index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='IGGY_POP';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_1awkddqkwa4f8           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       15362           4083700

 

We can see the index is both VISIBLE and VALID and so can potentially be used now by ANY subsequent SQL statement.

Now the important thing to note is that the default for most sessions in a database is for Dynamic Sampling to be set to 2 and for Optimizer_Adaptive_Statistics=False. Importantly, this is also the case in Oracle’s Autonomous Transaction Processing Cloud service.

SQL> show parameter sampling

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2
SQL> show parameter optimizer_adaptive

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_adaptive_plans             boolean     TRUE
optimizer_adaptive_reporting_only    boolean     FALSE
optimizer_adaptive_statistics        boolean     FALSE

 

So this is DIFFERENT to the settings for the Automatic Indexing process. In a standard session, the CBO will NOT have the capability to accurately determine the correct cardinality estimates as we saw previously.

If we now re-run the SQL that returns no rows:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 1000 | 18000 |     413 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP             | 1000 | 18000 |     413 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_1awkddqkwa4f8 | 1000 |       |       4 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

The execution uses the new index, because even though it STILL thinks 1000 rows are to be returned, that’s sufficiently few for the index to be costed the cheaper option.

When when re-run the SQL that returns many many rows:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 1000 | 24000 |     413 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP             | 1000 | 24000 |     413 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_1awkddqkwa4f8 | 1000 |       |       4 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
         25 recursive calls
          0 db block gets
      41981 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          1 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Ouch. It also uses the new Automatic Index, because it also STILL thinks only 1000 rows are to be returned and just like the previous SQL statement, is determined to be the cheaper option.

BUT in this case it isn’t really the cheaper option, having to read the table potentially piecemeal at a time via the index, rather than more efficiently with fewer and larger multiblock reads via a FTS.

This is not really how Automatic is designed to work. Its meant to protect us from making SQL statements regress in performance BUT because there is a difference in how a normal session and the Automatic Indexing process determines the cost of execution plans, these scenarios can eventuate.

In my next blog I’ll look at how to address this specific scenario and then look at an example of how Automatic Indexing is really meant to work via the use of automated baselines…

Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets) September 10, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Data Skew, Full Table Scans, Histograms, Index Access Path, Index statistics, Low Cardinality, Oracle Blog, Oracle Indexes, Oracle19c, Performance Tuning.
1 comment so far

When it comes to Automatic Indexes, things can become particularly interesting when dealing with data skew (meaning that some columns values are much less common than other column values). The next series of blog posts will look at a number of different scenarios in relation to how Automatic Indexing works with data that is skewed and not uniformly distributed.

I’ll start with a simple little example, that has an interesting little twist at the end.

The following table has a CODE column, which has 10 distinct values that a widely skewed, with some values much less common than others:

SQL> create table bowie_skew (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie_skew set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

 

I’ll collect statistics on this table, but explicitly NOT collect histograms, so that the CBO will have no idea that the data is actually skewed. Note if I collected data with the default size, there would still be no histograms, as the column has yet to be used within an SQL predicate and so has no column usage recorded.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_SKEW', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We can clearly see that some CODE values (such as “6”) have relatively few values, with only 100 occurrences:

SQL> select code, count(*) from bowie_skew group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

As I explicitly collected statistics with SIZE 1, we currently have NO histograms in the table:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

 

Let’s now run the following query with a predicate on CODE=6, returning just 100 rows:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name         | Rows  | Bytes | Cost (%CPU)| Time       |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |              |   100K|  1953K|   570   (7)| 00:00:01 |
|   1 |  PX COORDINATOR                |              |         |         |              |            |
|   2 |   PX SEND QC (RANDOM)          | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR           |              |   100K|  1953K|   570   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_SKEW |   100K|  1953K|   570   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("CODE"=6)
    filter("CODE"=6)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
      3781  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

The CBO has no choice but to use a FTS as I currently have no indexes on the CODE column. Note also that the CBO has got its cardinality estimates way wrong, expecting 100,000 rows and not the actual 100 rows, as I have no histograms on the CODE column.

So let’s now wait 15 minutes or so and see what the Automatic Indexing process decides to do. Following are portions of the next Auto Indexing report:

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
--------------------------------------------------------------------------
| Owner | Table      | Index                | Key  | Type   | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE       |
--------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : fn4shnphu4bvj
SQL Text             : select * from bowie_skew where code=6
Improvement Factor   : 41.1x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  119596                        322
CPU Time (s):      100781                        322
Buffer Gets:       11347                         103
Optimizer Cost:    570                           4
Disk Reads:        0                             0
Direct Writes:     0                             0
Rows Processed:    100                           100
Executions:        1                             1

 

So we can see that yes, Auto Indexing has decided to create a new index here on the CODE column (“SYS_AI_7psvzc164vbng“) as it improves the performance of the query by a factor of 41.1x.

If we look further down the Auto Indexing report and compare the execution plans:

 

PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value  : 3374004665
-----------------------------------------------------------------------------------------
| Id | Operation                      | Name       | Rows   | Bytes   | Cost | Time     |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |            |        |         |  570 |          |
|  1 |  PX COORDINATOR                |            |        |         |      |          |
|  2 |    PX SEND QC (RANDOM)         | :TQ10000   | 100000 | 2000000 |  570 | 00:00:01 |
|  3 |     PX BLOCK ITERATOR          |            | 100000 | 2000000 |  570 | 00:00:01 |
|  4 |      TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 |  570 | 00:00:01 |
-----------------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value  : 140816325
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100 |  2000 |    4 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW           |  100 |  2000 |    4 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_7psvzc164vbng |  100 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=6)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

We can see that new execution plan indeed uses the index BUT interestingly, it has a correct cardinality estimate of 100 and not 100,000 as per the original plan.

Now this can be explained in that the Automatic Indexing process uses a Dynamic Sampling level of 11, meaning it can calculate the correct cardinality on the fly and can cause difficulties between what the Automatic Indexing process thinks the CBO costs will be vs. the CBO costs in a default database session that uses the (usually default) Dynamic Sampling level of 2 (as I’ve discussed previously).

BUT when I now rerun the SQL query again:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
---------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |   100 |  2000 |     4   (0)|
|   1 |  PX COORDINATOR                       |                      |       |       |            |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |   100 |  2000 |     4   (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW           |   100 |  2000 |     4   (0)|
|   4 |     BUFFER SORT                       |                      |       |       |            |
|   5 |      PX RECEIVE                       |                      |   100 |       |     3   (0)|
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |   100 |       |     3   (0)|
|   7 |        PX SELECTOR                    |                      |       |       |            |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_7psvzc164vbng |   100 |       |     3   (0)|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("CODE"=6)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       103  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

We notice the new Automatic Index is now used BUT also that the CBO has now determined the correct cardinality estimate of 100. But how is this possible when I haven’t recalculated the table statistics?

I’ll explain in my next post.

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part I (Don’t Look Down) August 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Full Table Scans, Index Rebuild, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
4 comments

I’ve discussed many times the importance of data clustering in relation to the efficiency of indexes. With respect to the efficiency of Automatic Indexes including their usage within Oracle’s Autonomous Database environments, data clustering is just as important.

The following demo was run on an Oracle 19c database within the Oracle Autonomous Database Transaction Processing Cloud environment.

I begin by creating a simple table that has the key column CODE, in which data is populated in a manner where the data is very poorly clustered:

 

SQL> create table nickcave (id number, code number, name varchar2(42));

Table created.

SQL> insert into nickcave select rownum, mod(rownum, 100), 'Nick Cave and the Bad Seeds'
     from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'NICKCAVE');

PL/SQL procedure successfully completed.

 

So we have 100 evenly distributed distinct CODE values but they’re all distributed throughout the table.

The following SQL statement is basically returning just 1% of the data and is executed a number of times:

 

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows    | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     100K|  3613K|  9125   (5)| 00:00:01|
|   1 |  PX COORDINATOR              |          |         |       |            |         |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000 |     100K|  3613K|  9125   (5)| 00:00:01|
|   3 |    PX BLOCK ITERATOR         |          |     100K|  3613K|  9125   (5)| 00:00:01|
|*  4 |     TABLE ACCESS STORAGE FULL| NICKCAVE |     100K|  3613K|  9125   (5)| 00:00:01|
------------------------------------------------------------------------------------------

Without an index, the CBO currently has no choice but to use a Full Table Scan to access the table. So we wait for the next Automatic Index process to kick in:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

The Automatic Indexing report makes no mention of Automatic Indexes on the NICKCAVE table…

If we look to see if any indexes have actually been created:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor 
     from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='NICKCAVE'
     order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that yes, an Automatic Index (SYS_AI_dh8pumfww3f4r) has been created on the CODE column of the NICKCAVE table BUT it remains in an INVISIBLE, UNUSABLE state.

So Automatic Indexing considered an index on CODE, created it in an INVISIBLE, USABLE state but when testing it, failed in that it found it to be less efficient than the current FTS and so reverted the Automatic Index back to an UNUSABLE index.

Therefore, if we run a bunch of other similar SQL statements such as the following:

SQL> select * from nickcave where code=24;

SQL> select * from nickcave where code=42;

SQL> select * from nickcave where code=13;

 

They all use the FTS as again, the CBO has no choice with no VALID index on the CODE column available.

If we keep checking the Automatic Indexing report:

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

There’s still no mention of an index on the CODE column. The existing Automatic Index remains in an UNUSABLE state:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

 

Basically, the index remains ineffective because with a Clustering Factor of 4158302, it’s just too inefficient to return the 1% (100000 rows) of the table.

Even in an Autonomous Database environment, nothing will automatically change with this scenario.

In my next post, we’ll look at how we can improve the performance of this query and get an Automatic Index to actually kick in with a USABLE index…

The CBO CPU Costing Model: Indexes vs. Full Table Scans November 25, 2009

Posted by Richard Foote in CBO, Full Table Scans, Oracle Indexes, System Statistics.
8 comments

As previously promised, I thought I might look at how the CBO goes about costing a Full Table Scan (FTS) with system statistics and the CPU costing model, so we can understand why the CBO may have chosen one option over the other.
 
WARNING: You might need to grab a calculator to help you along 🙂

To illustrate, I’m simply going to use the original BOWIE_STUFF table and index setup I created in my earlier Introduction to the CBO. I’ll however recreate the demo here again from scratch to refresh your memory:
 
I first create a table that has 100,000 rows, with an indexed “ID” column that has 100 distinct, evenly distributed values. For those mathematically challenged, this means each distinct value will return 1000 rows.
 

SQL> CREATE TABLE bowie_stuff AS SELECT (mod(rownum,100)+1)*10 id, 'Ziggy Stardust' name FROM dual CONNECT BY LEVEL <= 100000;
 
Table created.
 
SQL> CREATE INDEX bowie_stuff_i ON bowie_stuff(id);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE_STUFF', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select blocks from dba_tables where table_name='BOWIE_STUFF';
  
BLOCKS
------
   329

Note the table has 329 blocks. It’s a number I’ll refer to a number of times throughout.
 

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor FROM user_indexes WHERE index_name = 'BOWIE_STUFF_I';
  

INDEX_NAME    BLEVEL
------------- ------
BOWIE_STUFF_I      1

LEAF_BLOCKS CLUSTERING_FACTOR
----------- -----------------
        207             32900

   
Note also that the index has a blevel of 1, 207 leaf blocks and a rather poor Clustering Factor (CF) of 32900, not close at all to the number of blocks in the table. As we’ll see, the CF is so bad that the CBO will choose a FTS over the index.
 

SQL> show parameter db_file_multi
   
NAME                          VALUE
----------------------------- -----
db_file_multiblock_read_count    16

 
   
Note the db_file_multiblock_read_count is manually set to 16. Relevant when calculating the cost of a FTS with the I/O costing model. Less so with CPU costing in use as we’ll see.

Finally, if we look at the system statistics in place:
 

SQL> select pname, pval1 from sys.aux_stats$ where pname in ('SREADTIM', 'MREADTIM', 'MBRC', 'CPUSPEED');

PNAME    PVAL1
-------- -----
SREADTIM     5
MREADTIM    10
CPUSPEED  1745
MBRC        10

All of these values will be relevant when calculating the cost of a FTS with the CPU costing model.
 
OK, we now have all the information we need to determine how the CBO will treat both index and FTS activities on this table.
 
Let’s start by refreshing ourselves with how the I/O based CBO model will deal with such a scenario.
 

SQL> alter session set "_optimizer_cost_model" = io;
 
Session altered.

 
OK, let’s just run a simple query that selects data for a specific ID. Remember, there are 100 evenly distributed distinct IDs so this query will return 1% of the data (1000 rows):

SQL> set autotrace traceonly
SQL> SELECT * FROM bowie_stuff WHERE id = 420;
 
  
-----------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost  |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |  1000 | 18000 |       |
|*  1 |  TABLE ACCESS FULL| BOWIE_STUFF |  1000 | 18000 |       |
-----------------------------------------------------------------

 
 

Note: the CBO has decided to use a FTS to select the 1% of rows as it has the lowest associated cost.
 
As previously discussed, the cost of using the index is approximately:
 
index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)
 
 = 1 + (207 x 0.01) + (32900 x 0.01) = 1 + 3 + 329 = 333        

Note: the 1 for the blevel can be dropped by the CBO bringing the cost down to 332, to be discussed another time .
 
As previously discussed, the FTS cost is approximately:
 
segment header I/O + ceil(table blocks/fudged mbrc value) 

Note: for a db_file_multiblock_read_count of 16, the adjusted, “fudged” value used by the CBO is approximately 10.4.

Therefore, for the above example, the FTS cost is calculated as:
 
= 1 + ceil(329/10.4) = 1 + 32 = 33
 
33 is significantly less than 333 so the FTS easily wins out.
 

So how do things change when using System Statistics and the CPU costing model ? How does the CBO calculate the cost of the FTS with the above system statistics in place ?
 
As I’ve previously discussed, the significant change with the CPU costing model is not so much how it impacts the cost of index accesses but that of the FTS.
 
Let’s run the same SQL, but this time using the CPU costing model:
 

SQL> alter session set "_optimizer_cost_model" = cpu;
 
Session altered.
 
SQL> SELECT * FROM bowie_stuff WHERE id = 420;
 
1000 rows selected.
 

--------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time    |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |  1000 | 18000 |     70(5)  | 00:00:01|
|*  1 |  TABLE ACCESS FULL| BOWIE_STUFF |  1000 | 18000 |     70(5)  | 00:00:01|
--------------------------------------------------------------------------------

Note: CBO is still picking the FTS as the CF is truly awful. However the cost of the FTS has increased significantly from 33 to 70, although nowhere near the approximate 333 cost of using the index.
 
So why has the FTS cost increased and how is this new cost calculated ?
 
As previously discussed , the CPU costing formula is basically:
 
(sum of all the single block I/Os x average wait time for a single block I/O +
 sum of all the multiblock I/Os x average wait time for a multiblock I/O +
 sum of all the required CPU cycles / CPU cycles per second)
/
average wait time for a single block I/O
 

If we first focus on the single block I/O portion of the formula, the only single block read considered by the CBO during a FTS is the one associated with reading the segment header. Note that the average wait time for a single block read is the SREADTIM system statistic.
 
If there’s just the one single block I/O, the single block read portion of the formula effectively equates to (1 x sreadtim) / sreadtim, which just equals 1. So 1 is basically added to the cost with regard to reading the segment header as it is with the I/O costing model.
 
OK, lets next look at the portion of the formula with regard to multiblock I/Os.
 
The sum of all the multiblock I/Os is calculated in a similar manner as it was with the I/O costing model. It’s simply the number of blocks in the table below the HWM (329 in our example) but this time divided by the MBRC system statistic. Note however the MBRC statistic isn’t some fudged, somewhat arbitrarily set figure based on the db_file_multiblock_read_count parameter, but the actual average size of multiblock I/Os in the specific database environment. Note also that the average wait time for a multiblock read is the MREADTIM system statistic.
 
So the total wait time for all multiblock reads in the above example is:
 
sum of all the multiblock I/Os x average wait time for a multiblock I/O = (BLOCKS/MBRC) x MREADTIM = ceil(329/10) x 10 = 330.
 
This value is then divided by the average wait time for a single block read (the SREADTIM system statistic) to give the overall cost of multiblock reads, but expressed in units of single block I/Os.
 
The total cost for multiblock I/Os is therefore:

 ((BLOCKS/MBRC) x MREADTIM)/ SREADTIM = 330/5 = 66.
 
So the total costs associated for all I/Os is the 1 for reading the segment header plus 66 for all the multiblock reads = 67.
 
However, the cost of the FTS is 70, not 67. Where does the additional cost of 3 come from ?
 
Well, that’s the CPU portion of the formula. The CBO has determined that the FTS operation will require ‘x’ number of CPU cycles and this value is then divided by the CPUSPEED to determine how long this CPU activity will take.
 
This CPU elapsed figure is then again divided by the average wait of a single block read (SREADTIM) to also put the CPU costs in units of single block reads. In this example, the total CPU related costs amount to 3.
 
Oracle gives us an indication of what the CPU component is in the overall cost within the execution plan via the %CPU value (which is 5 in the above execution plan). The (%CPU) value is the ceil of the overall percentage of CPU costs as calculated by the following formula:
 
%CPU = ceil(CPU related costs/overall costs)
 
So in our example, %CPU = ceil(3/70 x 100) = ceil(4.29) = 5% (as indeed displayed in the above execution plan).
 

Again, all the costs associated with a FTS with the CPU costing model can be derived and make some kinda sense. Providing all the necessary inputs are all actually correct and valid, the CBO will indeed correctly decide to use a FTS over an index when it’s the less expensive option.
 
I’ll next expand these points and why understanding how these costs are derived can be extremely useful.

You can now put your calculators away 🙂