jump to navigation

Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.
add a comment

In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.

So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?

I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:

SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42));

Table created.

SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'
from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD');

PL/SQL procedure successfully completed.


SQL> create table pink_floyd1 (id number, code number, create_date date, name varchar2(42));

Table created.

SQL> insert into pink_floyd1 select rownum, ceil(dbms_random.value(0, 25000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon'
from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD1');

PL/SQL procedure successfully completed.

 

So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.

I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:

 

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1152280033

----------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |            |    1 |    40 |    844 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD |    1 |    40 |    844 (11) | 00:00:01  |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
     filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      63660 consistent gets
      63649 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        380 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed


SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 564520720

-----------------------------------------------------------------------------------------
| Id | Operation                 | Name        | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |             |    1 |    41 |    856 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 |    1 |    41 |    856 (11) | 00:00:01  |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))
     filter("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss'))

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      64424 consistent gets
      64413 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):

 

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
-----------------------------------------------------------------------------
| Owner | Table       | Index                | Key  | Type   | Properties   |
-----------------------------------------------------------------------------
| BOWIE | PINK_FLOYD1 | SYS_AI_96snkmu4sk44g | CODE | B-TREE | NONE         |
-----------------------------------------------------------------------------
-------------------------------------------------------------------------------


-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 7wag3gbk0b3tm
SQL Text            : select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'
Improvement Factor  : 64442.3x

Execution Statistics:
-----------------------------
                      Original Plan                Auto Index Plan
                      ---------------------------- ----------------------------
Elapsed Time (s):     568513                       2771
CPU Time (s):         275534                       1874
Buffer Gets:          1031078                      406
Optimizer Cost:       856                          405
Disk Reads:           1030609                      3
Direct Writes:        0                            0
Rows Processed:       0                            0
Executions:           16                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 564520720

-----------------------------------------------------------------------------------
| Id | Operation                 | Name        | Rows | Bytes | Cost | Time       |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |             |      |       |  856 |            |
|  1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 |    1 |    41 |  856 | 00:00:01   |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2703636439

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |    1 |    41 |  405 | 00:00:01   |
| * 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1          |    1 |    41 |  405 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_96snkmu4sk44g |  403 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
* 2 - access("CODE"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

If we look at the definitions of all indexes currently on these tables:

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='PINK_FLOYD';

INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS   NUM_ROWS   LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dp2t0j12zux49           YES INVISIBLE ADVANCED LOW  UNUSABLE   10000000       21702           4161898

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_dp2t0j12zux49           CODE                          1


SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='PINK_FLOYD1';

INDEX_NAME                     AUT VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_96snkmu4sk44g           YES VISIBLE   ADVANCED LOW  VALID      10000000       15400           9969473

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD1';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_96snkmu4sk44g           CODE                          1

 

In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.

However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.

The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.

So if we re-run the first query:

SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1152280033

----------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |            |    1 |    40 |    844 (11) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD |    1 |    40 |    844 (11) | 00:00:01  |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)
     filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      63660 consistent gets
      63649 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        380 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

It continues to use a Full Table Scan.

If we re-run the second query:

 

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2703636439

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    41 |     415 (0) | 00:00:01  |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1          |    1 |    41 |     415 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_96snkmu4sk44g |  412 |       |       3 (0) | 00:00:01  |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
        406 consistent gets
          0 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).

BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:

SQL> create index pink_floyd1_code_create_date_i on pink_floyd1(code, create_date) compress advanced low;

Index created.

SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021';

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3366491378

----------------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                           | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                                |    1 |    41 |       4 (0) | 00:00:01  |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1                    |    1 |    41 |       4 (0) | 00:00:01  |
|* 2 | INDEX RANGE SCAN                    | PINK_FLOYD1_CODE_CREATE_DATE_I |    1 |       |       3 (0) | 00:00:01  |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"CREATE_DATE" IS NOT NULL)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        426 bytes sent via SQL*Net to client
        381 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

Performance improves significantly further, by reducing Consistent Gets down to just 3.

So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:

A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or

A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…

Oracle 19c Automatic Indexing: Non-Equality Predicates Part I (Lucy Can’t Dance) January 14, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cloud, Oracle Indexes, Oracle19c.
2 comments

 

I’ve been waiting a while before posting a series on the various limitations associated with Automatic Indexing, in order to see how the feature matures over time.

The following have all been re-tested post 1 January 2021 on the Autonomous ATP Database Cloud service, using Oracle Database version 19.5.0.0.0.

In the Oracle Documentation (including version 21c), the only limitations with regard Automatic Indexing listed are the following:

  • Auto indexes are local B-tree indexes.
  • Auto indexes can be created for partitioned as well as non-partitioned tables.
  • Auto indexes cannot be created for temporary tables.

Well, as I discussed in the previous series on Automatic Indexing on Partitioned tables, we already saw how Oracle can actually also create Non-Partitioned (Global) indexes. So the limitation on Automatic Indexes being “local” indexes is not actually correct, even with 19c.

But are there other limitations that are not officially documented?

If you look at every example I’ve used previously with regard Automatic Indexing, they all feature Equality predicates. In the following examples, I’m going to run a series on Range Scan predicates that heavily filter and would benefit greatly from an index.

I first create a simple table with 10M rows:

SQL> create table ziggy1 (id number, code number, name varchar2(42));

Table created.

SQL> insert into ziggy1 select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY1');

PL/SQL procedure successfully completed.

 

I then run the following range scan queries several times that each return only a few rows:

SQL> select * from ziggy1 where id between 42 and 50;

9 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    8 |   184 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    8 |   184 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID"<=50 AND "ID">=42)
    filter("ID"<=50 AND "ID">=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        596 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          9 rows processed


SQL> select * from ziggy1 where id < 0;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    1 |    23 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    1 |    23 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID"<0)
    filter("ID"<0)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        364 bytes sent via SQL*Net to client
        344 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

SQL> select * from ziggy1 where id > 100000000000;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    1 |    23 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    1 |    23 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID">100000000000)
    filter("ID">100000000000)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        364 bytes sent via SQL*Net to client
        355 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

If we look at the subsequent Automatic Indexing report:

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 13-JAN-2021 11:55:37
Activity end                : 13-JAN-2021 11:56:20
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 3
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

We notice NO Automatic Indexes were created.

We can run these queries endlessly and Automatic Indexing will never create associated Automatic Indexes:

SQL> select index_name, auto, constraint_index, visibility from user_indexes where table_name='ZIGGY1';

no rows selected

These queries are doomed to perform Full Table Scans unless indexes are manually created:

SQL> select * from ziggy1 where id between 42 and 50;

9 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4062853157

------------------------------------------------------------------------------------
| Id | Operation                 | Name   | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |        |    8 |   184 |    538 (14) | 00:00:01  |
|* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 |    8 |   184 |    538 (14) | 00:00:01  |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ID"<=50 AND "ID">=42)
    filter("ID"<=50 AND "ID">=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39436 consistent gets
      39425 physical reads
          0 redo size
        596 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          9 rows processed

 

Currently Automatic Indexes do not support Non-Equality predicates. Automatic Indexes are only created based on Equality-based predicates.

Obviously, Automatic Indexing is a fabulous feature and this might all change in the future. But with Non-Equality predicates so prevalent in SQL, it’s vital to note this current limitation when using and relying on Automatic Indexing…

Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part II (Neighbourhood Threat) January 13, 2021

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Transaction Processing, CBO, Exadata, Local Indexes, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Partitioned Indexes, Partitioning.
1 comment so far

In my first post on Automatic Indexing on Partitioned Tables, I discussed how Automatic Indexing (AI) can now create a Non-Partitioned index if deemed the most effective indexing structure (this wasn’t supported when AI was initially released). A Non-Partitioned index is indeed likely the most efficient indexing structure if the underlying table has many partitions and associated SQL equality predicates only reference non-partition key columns. A Non-Partitioned index ensure Oracle only needs to scan the single index structure and not all the partitions of a Local index.

But what if SQLs do reference the column by which the underlying table is partitioned?

The following SQL has an equality filtering predicate on the RELEASE_DATE column, the column by which the BIG_BOWIE1 table is partitioned:

SQL> SELECT * FROM big_bowie1 where release_date = to_date('2013-12-30 22:15:25', 'syyyy-mm-dd hh24:mi:ss');

no rows selected

If we look at the subsequent AI report:

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
| Owner | Table      | Index                | Key          | Type   | Properties |
------------------------------------------------------------------------------------------------
| BOWIE | BIG_BOWIE1 | SYS_AI_14gpurjp8m76s | RELEASE_DATE | B-TREE | LOCAL      |
------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

We notice that Automatic Indexing has in this instance created a Local Index.

If we look further down the AI report:

-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 4mm3mbkk38pa8
SQL Text            : SELECT * FROM big_bowie1 where release_date = to_date('2013-12-30 22:15:25', 'syyyy-mm-dd hh24:mi:ss')
Improvement Factor  : 8339x

Execution Statistics:
-----------------------------
                  Original Plan                Auto Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 146957                       71
CPU Time (s):     146124                       71
Buffer Gets:      16678                        3
Optimizer Cost:   162                          4
Disk Reads:       0                            0
Direct Writes:    0                            0
Rows Processed:   0                            0
Executions:       2                            1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 4031749531

-----------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost | Time        |
-----------------------------------------------------------------------------------
| 0  | SELECT STATEMENT          |            |      |       |  162 |             |
| 1  | PARTITION RANGE SINGLE    |            | 3602 | 93652 |  162 | 00:00:01    |
| 2  | TABLE ACCESS STORAGE FULL | BIG_BOWIE1 | 3602 | 93652 |  162 | 00:00:01    |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 4049653350

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name                 | Rows | Bytes | Cost | Time        |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                      |    3 |    78 |    4 | 00:00:01    |
|   1 | PARTITION RANGE SINGLE                    |                      |    3 |    78 |    4 | 00:00:01    |
|   2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE1           |    3 |    78 |    4 | 00:00:01    |
| * 3 | INDEX RANGE SCAN                          | SYS_AI_14gpurjp8m76s |    1 |       |    3 | 00:00:01    |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 3 - access("RELEASE_DATE"=TO_DATE(' 2013-12-30 22:15:25', 'syyyy-mm-dd hh24:mi:ss'))

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We can see Automatic Indexing has created the index because it provides an average Improvement Factor of 8339x. As the necessary indexed column(s) matches the table partitioning key, it makes sense for the associated index be a Local index as Oracle is certain which specific index partition to visit based on the value of the equality predicate.

If we look at the details of this new AI:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BIG_BOWIE1';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_14gpurjp8m76s           YES NO  VISIBLE   ADVANCED LOW  N/A        20000000       30742          19941449
SYS_AI_8armv0hqq73fa           YES NO  VISIBLE   ADVANCED LOW  VALID      20000000       42697          19995451

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='BIG_BOWIE1' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
SYS_AI_14gpurjp8m76s           RELEASE_DATE                  1
SYS_AI_8armv0hqq73fa           TOTAL_SALES                   1

SQL> SELECT index_name, partitioning_type, partition_count, locality FROM user_part_indexes WHERE table_name = 'BIG_BOWIE1';

INDEX_NAME                     PARTITION PARTITION_COUNT LOCALI
------------------------------ --------- --------------- ------
SYS_AI_14gpurjp8m76s           RANGE                   8 LOCAL

SQL> select index_name, partition_name, status, compression from user_ind_partitions
where index_name in (select index_name from user_indexes where table_name='BIG_BOWIE1')
order by partition_position;

INDEX_NAME           PARTITION_NAME       STATUS   COMPRESSION
-------------------- -------------------- -------- -------------
SYS_AI_14gpurjp8m76s ALBUMS_2013          USABLE   ADVANCED LOW
SYS_AI_14gpurjp8m76s ALBUMS_2014          USABLE   ADVANCED LOW
SYS_AI_14gpurjp8m76s ALBUMS_2015          USABLE   ADVANCED LOW
SYS_AI_14gpurjp8m76s ALBUMS_2016          USABLE   ADVANCED LOW
SYS_AI_14gpurjp8m76s ALBUMS_2017          USABLE   ADVANCED LOW
SYS_AI_14gpurjp8m76s ALBUMS_2018          USABLE   ADVANCED LOW
SYS_AI_14gpurjp8m76s ALBUMS_2019          USABLE   ADVANCED LOW
SYS_AI_14gpurjp8m76s ALBUMS_2020          USABLE   ADVANCED LOW

 

We can see that indeed, a Visible, Usable, Local index was created by Automatic Indexing.

So depending on the column(s) within the index, Automatic Indexing can potentially create either a Local or Non-Partitioned index when indexing a partitioned table.

Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part I (Conversation Piece) October 14, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Index Access Path, Local Indexes, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Partitioned Indexes, Partitioning, Performance Tuning.
1 comment so far

In this little series, I’m going to discuss how Automatic Indexing works in relation to Partitioning.

I’ve discussed Indexing and Partitioning many times previously and how Oracle has various options when indexing a partitioned table:

  • Non-Partitioned Index
  • Globally Partitioned Index
  • Locally Partitioned Index

So the question(s) are how does Automatic Indexing handle scenarios with partitioned objects?

A very important point to make at the start is that based on my research, the answer has already changed significantly since Automatic Indexing was first released. So it’s important to understand that Automatic Indexing is an ever evolving capability, that will advance and improve as time goes on.

I’ll focus on how the feature currently works (as of Oracle Database 19.5), but will mention previously identified behaviour as a reference on how things can easily change.

In my first simple little example, I’m just going to create a range-partitioned table, partitioned based on RELEASE_DATE, with a partition for each year’s worth of data:

SQL> CREATE TABLE big_bowie1(id number, album_id number, country_id number, release_date date,
total_sales number) PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2013 VALUES LESS THAN (TO_DATE('01-JAN-2014', 'DD-MON-YYYY')),
PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')),
PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')),
PARTITION ALBUMS_2020 VALUES LESS THAN (MAXVALUE));

Table created.

 

I’ll now add about 8 years worth of data:

SQL> INSERT INTO big_bowie1 SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> COMMIT;

Commit complete.

 

As discussed previously, I’ll importantly collect statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE1');

PL/SQL procedure successfully completed.

 

I’ll now run the following very selective query based the TOTAL_SALES column that is NOT part of the partitioning key:

 

SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42;

19 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2468051548

---------------------------------------------------------------------------------------------------------
| Id | Operation                | Name       | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop     |
---------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |            |   20 |   520 |    643 (15)| 00:00:01 |       |           |
|  1 | PARTITION RANGE ALL      |            |   20 |   520 |    643 (15)| 00:00:01 |     1 |         8 |
|* 2 | TABLE ACCESS STORAGE FULL| BIG_BOWIE1 |   20 |   520 |    643 (15)| 00:00:01 |     1 |         8 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("TOTAL_SALES"=42)
    filter("TOTAL_SALES"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      44014 consistent gets
       9516 physical reads
          0 redo size
       1107 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         19 rows processed

 

Without an index in place, the CBO has no choice but to use a FTS. But what will Automatic Indexing make of things?

If we look at the next Automatic Indexing report:

 

SQL> select dbms_auto_index.report_last_activity() from dual;

GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 13-OCT-2020 01:47:48
Activity end                : 13-OCT-2020 02:59:48
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 184.55 MB (184.55 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 2
SQL statements improved (improvement factor) : 1 (44119.6x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 25135.8x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Owner | Table      | Index                | Key         | Type   | Properties |
---------------------------------------------------------------------------------
| BOWIE | BIG_BOWIE1 | SYS_AI_2zt7rg40mxa4n | TOTAL_SALES | B-TREE | NONE       |
---------------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : chwm2gubm8fx9
SQL Text            : SELECT * FROM big_bowie1 WHERE total_sales = 42
Improvement Factor  : 44119.6x

Execution Statistics:
-----------------------------
                     Original Plan                Auto Index Plan
                     ---------------------------- ----------------------------
Elapsed Time (s):    4387193                      1173
CPU Time (s):        2599423                      1037
Buffer Gets:         749507                       22
Optimizer Cost:      643                          22
Disk Reads:          470976                       2
Direct Writes:       0                            0
Rows Processed:      323                          19
Executions:          17                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 2468051548

-----------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost | Time        |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT          |            |      |       |  643 |             |
|  1 | PARTITION RANGE ALL       |            |   20 |   520 |  643 | 00:00:01    |
|  2 | TABLE ACCESS STORAGE FULL | BIG_BOWIE1 |   20 |   520 |  643 | 00:00:01    |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 937174207

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                 | Rows | Bytes | Cost | Time       |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                      |   19 |   494 |   22 | 00:00:01   |
|   1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE1           |   19 |   494 |   22 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                           | SYS_AI_2zt7rg40mxa4n |   19 |       |    3 | 00:00:01   |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("TOTAL_SALES"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We notice a couple of interesting points.

Firstly, yes Automatic Indexing has created an index based on the TOTAL_SALES column (SYS_AI_2zt7rg40mxa4n) as it improves performance by a reported 44119.6x.

Note also that the Automatic Index is a Non-Partitioned (Global) Index. From a performance perspective, this is the most efficient index to create to improve the performance of this query as the CBO only has the one index structure to navigate (vs. a LOCAL index that would require having to navigate down all 8 index structures for each table partition.

If we look at the index details:

SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes
WHERE table_name = 'BIG_BOWIE1';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_2zt7rg40mxa4n           NO  YES VISIBLE   VALID

 

We notice that this is indeed a Non-Partitioned Index, that is both VISIBLE and VALID and so can be potentially used by any database session.

If we now re-run the query:

SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42;

19 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 937174207

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name                 | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop    |
-----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                          |                      |   20 |   520 |      23 (0)| 00:00:01 |       |          |
|  1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| BIG_BOWIE1           |   20 |   520 |      23 (0)| 00:00:01 | ROWID | ROWID    |
|* 2 | INDEX RANGE SCAN                          | SYS_AI_2zt7rg40mxa4n |   20 |       |       3 (0)| 00:00:01 |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TOTAL_SALES"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         23 consistent gets
          0 physical reads
          0 redo size
       1166 bytes sent via SQL*Net to client
        369 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         19 rows processed

 

We can see the query now uses the newly created Automatic Index and is indeed more efficient, performing now just 23 consistent gets (previously 44014 consistent gets).

 

However, this was NOT previous behaviour.

The documentation previously mentioned that only LOCAL indexes are used when indexing partitioned tables.

If we run the same demo on Oracle Database 19.3, we get the following report:

 

GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 14-OCT-2020 13:12:07
Activity end                : 14-OCT-2020 14:24:07
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 192.94 MB (192.94 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (1950.5x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1950.5x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Owner | Table      | Index                | Key         | Type   | Properties |
---------------------------------------------------------------------------------
| BOWIE | BIG_BOWIE1 | SYS_AI_8armv0hqq73fa | TOTAL_SALES | B-TREE | LOCAL      |
---------------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 2pp8ypramw30s
SQL Text            : SELECT * FROM big_bowie1 WHERE total_sales = 42
Improvement Factor  : 1950.5x

Execution Statistics:
-----------------------------
                     Original Plan                Auto Index Plan
                     ---------------------------- ----------------------------
Elapsed Time (s):    6996973                      27327
CPU Time (s):        6704215                      12819
Buffer Gets:         815306                       49
Optimizer Cost:      12793                        28
Disk Reads:          2                            40
Direct Writes:       0                            0
Rows Processed:      475                          25
Executions:          19                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 4294056405

-----------------------------------------------------------------------------
| Id | Operation          | Name       | Rows | Bytes | Cost  | Time        |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT    |            |      |       | 12793 |             |
| 1 | PARTITION RANGE ALL |            |   20 |   520 | 12793 | 00:00:01    |
| 2 | TABLE ACCESS FULL   | BIG_BOWIE1 |   20 |   520 | 12793 | 00:00:01    |
-----------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value : 3781269341

--------------------------------------------------------------------------------------------------------------
|  Id | Operation                                 | Name                 | Rows | Bytes | Cost | Time        |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                      |   25 |   650 |   28 | 00:00:01    |
|   1 | PARTITION RANGE ALL                       |                      |   25 |   650 |   28 | 00:00:01    |
|   2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE1           |   25 |   650 |   28 | 00:00:01    |
| * 3 | INDEX RANGE SCAN                          | SYS_AI_8armv0hqq73fa |   25 |       |   17 | 00:00:01    |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 3 - access("TOTAL_SALES"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

As we can see, in this scenario, the newly created Automatic Index has a “Property” of LOCAL.

If we look at its index details:

 

SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes
WHERE table_name = 'BIG_BOWIE1';

INDEX_NAME                     PAR AUT VISIBILIT STATUS
------------------------------ --- --- --------- --------
SYS_AI_8armv0hqq73fa           YES YES VISIBLE   N/A

SQL> SELECT index_name, partitioning_type, partition_count, locality FROM user_part_indexes
WHERE table_name = 'BIG_BOWIE1';

INDEX_NAME                     PARTITION PARTITION_COUNT LOCALI
------------------------------ --------- --------------- ------
SYS_AI_8armv0hqq73fa           RANGE                   8 LOCAL

 

We can see how a Local Index was previously created.

As such if we re-run an equivalent query:

SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42;

25 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3781269341

-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                | Name                 | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop     |
-----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                         |                      |   20 |   520 |      26 (0)| 00:00:01 |       |           |
|  1 | PARTITION RANGE ALL                      |                      |   20 |   520 |      26 (0)| 00:00:01 |     1 |         8 |
|  2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| BIG_BOWIE1           |   20 |   520 |      26 (0)| 00:00:01 |     1 |         8 |
|* 3 | INDEX RANGE SCAN                         | SYS_AI_8armv0hqq73fa |   20 |       |      17 (0)| 00:00:01 |     1 |         8 |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         50 consistent gets
          0 physical reads
          0 redo size
       1555 bytes sent via SQL*Net to client
        409 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         25 rows processed

 

Although the query is returning 6 more rows (as with the random number generation, has a slightly different data set), it’s more expensive proportionally now having to perform 50 consistent gets as it now has to read 8 index structures rather than just the one.

So (IMHO), Automatic Indexing has improved here, creating a more efficient index structure than previously. So always bear in mind that Automatic Indexing is an evolving beast, improving and adapting as time moves on.

However, note the compromise here is that by having an effectively Global index structure, there may be some additional issues depending on any subsequent structural changes to the table.

More on Automatic Indexing and Partitioning in my next post…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part III (Do Anything You Say) October 8, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Stale Statistics.
2 comments

In Part I of this series, we saw how Automatic Indexing will not create a viable Automatic Index if there are stale or missing statistics on the underlining segments. In Part II we saw how these SQL statements effectively become blacklisted and when segment statistics are subsequently collected, Automatic Indexing will still not create viable Automatic Indexes when the SQL statements are re-run.

So how do we get Automatic Indexing to now kick in and create necessary indexes on these problematic SQLs?

As I’ve discussed previously in relation to blacklisted SQLs, we need to run a NEW SQL statement that hasn’t been blacklist that will result in a necessary index to be created. An easy way to do this is just to include a new comment within the previous SQL to give the SQL a new signature.

If we now run the following “new” SQL statement (identical to the problematic SQL but with a comment embedded):

SQL> select /* new */ * from bowie_stale where code=42;

        ID       CODE NAME
---------- ---------- ------------------------------------------
   1000041         42 David Bowie
   6000041         42 David Bowie
        41         42 David Bowie
   3000041         42 David Bowie
   7000041         42 David Bowie
   8000041         42 David Bowie
   4000041         42 David Bowie
   9000041         42 David Bowie
   5000041         42 David Bowie
   2000041         42 David Bowie

 

If we now wait to see what the next Automatic Indexing task makes of things:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 07-JUL-2020 06:34:49
Activity end                : 07-JUL-2020 06:35:54
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 142.61 MB (142.61 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (19787.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 19787.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Owner | Table       | Index                | Key  | Type   | Properties |
---------------------------------------------------------------------------
| BOWIE | BOWIE_STALE | SYS_AI_300kk2unp8tr0 | CODE | B-TREE | NONE       |
---------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

We see that the index on the CODE column (SYS_AI_300kk2unp8tr0) has now been created.

Further down the report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : du6psd0xmzpg5
SQL Text            : select /* new */ * from bowie_stale where code=42
Improvement Factor  : 19787.7x

Execution Statistics:
-----------------------------
                  Original Plan Auto           Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 137261                       2620
CPU Time (s):     84621                        1769
Buffer Gets:      277028                       13
Optimizer Cost:   544                          13
Disk Reads:       275947                       2
Direct Writes:    0                            0
Rows Processed:   70                           10
Executions:       7                            1

 

A new index was indeed created because of this new SQL statement, with a performance improvement of 19787.7x.

Further down the report to the Plans Section:

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 65903426

-----------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost | Time        |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |             |      |       |  544 |             |
| 1 | TABLE ACCESS STORAGE FULL | BOWIE_STALE |   10 |   230 |  544 | 00:00:01    |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2558864466

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   10 |   230 |   13 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_STALE          |   10 |   230 |   13 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_300kk2unp8tr0 |   10 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We can see that the new plan using the new Automatic Index with a much lower CBO cost.

If we now look at the status of this index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       16891          10000000

 

We see that the index is now both VISIBLE and VALID (previously, it was INVISIBLE and UNUSABLE).

As such, the Automatic Index can now potentially be used by any SQL, including the previous problematic query.

So with a viable index now in place, if we re-run the initial problematic query:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2558864466

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      |   10 |   230 |      14 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_STALE          |   10 |   230 |      14 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_300kk2unp8tr0 |   10 |       |       3 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         14 consistent gets
          0 physical reads
          0 redo size
        738 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

We see that finally, the SQL uses the new Automatic Index and is indeed much more efficient as a result, with just 14 consistent gets required (when previously it was 39430 consistent gets).

So if ever you come across the scenario where an SQL does not have an Automatic Index created when clearly it should, it could be that it has been blacklisted and needs a different SQL to actually generate the necessary index.

To avoid some of these issues, make sure you do not have stale or missing statistics when reliant on Automatic Indexing. The new High Frequency Statistics Collection capability to designed to specifically avoid such a scenario.

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part I (Dead Against It) October 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, High Frequency Statistics Collection, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Stale Statistics, Unusable Indexes.
5 comments

A “golden rule” when working with Automatic Indexing is that things don’t work properly if there are stale statistics on the dependant objects. Stale statistics can of course be problematic but they can be particularly troublesome when dealing with Automatic Indexing.

In the Oracle Autonomous Database environments, this issue is addressed somewhat by the new High Frequency Statistics Collection capability, which helps to automatically collect stale statistics on a regular basis. However, in on-prem Exadata environments where this can more easily be turned off or collected less frequently, it’s a potential issue worth consideration.

I’ll start with a simple little table, with a CODE column that has lots of distinct values:

SQL> create table bowie_stale (id number constraint bowie_stale_pk primary key, code number, name varchar2(42));

Table created.

SQL> insert into bowie_stale select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

Importantly, I don’t collect statistics on this newly populated table…

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME        NUM_ROWS     BLOCKS LAST_ANAL
--------------- ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

If we now run the following query a number of times while there are no statistics on the table:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice but to use a FTS as I don’t yet have an index on the CODE column.

If I now wait for the next Automatic Indexing task to kick in AND if there are still NO statistics on the table:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 05-JUL-2020 06:36:31
Activity end                 : 05-JUL-2020 06:37:07
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates            : 1
Indexes created             : 0
Space used                  : 0 B
Indexes dropped             : 0
SQL statements verified     : 0
SQL statements improved     : 0
SQL plan baselines created  : 0
Overall improvement factor  : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

 

You can see that there was the one index candidate BUT no Automatic Index appears to have been created.

Assuming there are still no statistics:

 

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE';

TABLE_NAME                       NUM_ROWS    BLOCKS  LAST_ANAL
------------------------------ ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE2';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

 

If we look now at what indexes exist on the table:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID
SYS_AI_300kk2unp8tr0           YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='BOWIE_STALE2' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
BOWIE_STALE_PK                 ID                                 1
SYS_AI_300kk2unp8tr0           CODE                               1

 

We notice there is now an Automatic Index BUT it remains in an UNUSABLE/INVISIBLE state. This means the index can’t be used by the CBO.

So if we now re-run the SQL query again:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice still but to use the FTS.

In Part II, we’ll see that once we get into this scenario, it can be a tad problematic to get ourselves out of it and get the Automatic Index created as we would like…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
add a comment

 

In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.

However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.

Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).

These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.

One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.

This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1005K|   135M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |  1005K|   135M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.

However, if we now remove the histogram on the CODE column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’);

PL/SQL procedure successfully completed.

 

There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.

So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 16:42:33
Activity end                 : 18-AUG-2020 16:43:06
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created                              : 0
Space used                                   : 0 B
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved                      : 0
SQL plan baselines created (SQL statements)  : 1 (1)
Overall improvement factor                   : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.

If we look at the Verification Details part of this report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The following SQL plan baselines were created:
-------------------------------------------------------------------------------
Parsing Schema Name     : BOWIE
SQL ID                  : 3yz8unzhhvnuz
SQL Text                : select /*+ dynamic_sampling(0) */ * from
space_oddity where code in (190000, 170000, 150000,
130000, 110000, 90000, 70000, 50000, 30000, 10000)
SQL Signature           : 3910785437403172730
SQL Handle              : SQL_3645e6a2952fcf7a
SQL Plan Baselines (1)  : SQL_PLAN_3cjg6naakzmvu198c05b9

We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.

Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.

Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.

Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.

This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:

 

SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status
from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz';

EXECUTION_NAME             ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS
-------------------------- -------------------- ---------------------- ---------
SYS_AI_2020-08-18/16:42:33                41169                 410291 REGRESSED

 

If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Hint Report (identified by operation id / Query Block Name / Object Alias):

Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 -  SEL$1
U -  dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS

Note
-----

- SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          4  db block gets
      41170  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.

So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Prettiest Star)) September 25, 2020

Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
1 comment so far

In my previous few blog posts, I’ve been discussing some issues in relation to how Automatic Indexes handle SQL statements that accesses skewed data. In this post, I’m going to setup the scenario in which Automatic Indexing can potentially use Baselines to help address some of these issues. BUT, as we’ll see, I’m having to manufacture things somewhat to make this work due to the problem of the Automatic Indexing task using Dynamic Sampling of level 11, whereas most usual database sessions do not.

To set things up, I’m going recap what I’ve previously discussed (but with a slight difference), by creating a table that has significant data skew on the CODE column, with most values very uncommon, but with a handful of values being very common:

SQL> create table space_oddity (id number constraint space_oddity_pk primary key, code number, name varchar2(142));

Table created.

SQL> begin
2     for i in 1..2000000 loop
3       if mod(i,2) = 0 then
4          insert into space_oddity values(i, ceil(dbms_random.value(0,1000000)), 'David Bowie is really Ziggy Stardust and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history');
5       else
6          insert into space_oddity values(i, mod(i,20)*10000, 'Ziggy Stardust is really David Bowie and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history.');
7       end if;
8     end loop;
9     commit;
10  end;
11  /

PL/SQL procedure successfully completed.

 

So most CODE values will only occur a few times if at all, but a few values divisible by 10000 have many many occurrences within the table.

Importantly, we will initially collect statistics with NO histograms on the CODE column, which is the default behaviour anyways if no SQL has previous run with predicates on the column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 

If we run a query based on a rare value for CODE:

SQL> set arraysize 5000

SQL> select * from space_oddity where code=25;

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |     3 |   423 | 11356   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |     3 |   423 | 11356   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=25)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      40974  consistent gets
          0  physical reads
          0  redo size
       1018  bytes sent via SQL*Net to client
        402  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

 

Without an index, the CBO has no choice at this point but to perform a FTS. BUT note that the 2 rows returned is very similar to the 3 estimated rows, which would make an index likely the way to go if such an index existed.

However, the following SQL accesses many of the common values of CODE and returns many rows:

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Again, without an index in place, the CBO has no choice but to perform a FTS but this is almost certainly the way to go regardless. BUT without a histogram on the CODE column, the CBO has got the cardinality estimate way way off and thinks only 32 rows are to be returned and not the actual 1000011 rows.

So what does Automatic Indexing make of things. Let’s wait and have a look at the next Automatic Indexing Report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 15:57:14
Activity end                 : 18-AUG-2020 15:58:10
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                              : 1
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 35.65 MB (35.65 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 1
SQL statements improved (improvement factor)  : 1 (40984.3x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 40984.3x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
----------------------------------------------------------------------------
| Owner | Table        | Index                | Key  | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | SPACE_ODDITY | SYS_AI_82bdnqs7q8rtm | CODE | B-TREE | NONE       |
----------------------------------------------------------------------------

 

So Automatic Indexing has indeed created the index (SYS_AI_82bdnqs7q8rtm) on the CODE column BUT this is based on only the one SQL statement:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name  : BOWIE
SQL ID               : 19sv1g6tt0g1y
SQL Text             : select * from space_oddity where code=25
Improvement Factor   : 40984.3x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  5417408                       139265
CPU Time (s):      1771880                       7797
Buffer Gets:       327876                        5
Optimizer Cost:    11356                         5
Disk Reads:        649                           2
Direct Writes:     0                             0
Rows Processed:    16                            2
Executions:        8                             1

 

The Automatic Indexing task has correctly identified a significant improvement of 40984.3x when using an index on the SQL statement that returned just the 2 rows. The other SQL statement that returns many rows IS NOT MENTIONED.

This is because the Automatic Indexing tasks uses Dynamic Sampling Level=11, meaning it determines the more accurate cardinality estimate on the fly and correctly identifies that a vast number of rows are going to be returned. As a result, it correctly determines that the new Automatic Indexing if used would be detrimental to performance and would not be used by the CBO.

BUT most importantly, it also makes the assumption that the CBO would automatically likewise make this same decision to NOT use any such index in other database sessions and so there’s nothing to protect.

BUT this assumption is incorrect IF other database sessions don’t likewise use Dynamic Sampling with Level=11.

BUT by default, including in Oracle’s Autonomous Database Transaction Processing Cloud environment, the Dynamic Sampling Level is NOT set to 11, but the 2.

Therefore, most database sessions will not be able to determine the correct cardinality estimate on the fly and so will incorrectly assume the number of returned rows is much less than in reality and potentially use any such new Automatic Index inappropriately…

So if we look at the Plans Section of the Automatic Indexing report:

 

PLANS SECTION

---------------------------------------------------------------------------------------------
- Original
-----------------------------

Plan Hash Value  : 2301175572
-----------------------------------------------------------------------------
| Id | Operation           | Name         | Rows | Bytes | Cost  | Time     |
-----------------------------------------------------------------------------
|  0 | SELECT STATEMENT    |              |      |       | 11356 |          |
|  1 |   TABLE ACCESS FULL | SPACE_ODDITY |    3 |   423 | 11356 | 00:00:01 |
-----------------------------------------------------------------------------

- With Auto Indexes

-----------------------------
Plan Hash Value  : 54782313
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |    3 |   423 |    5 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | SPACE_ODDITY         |    3 |   423 |    5 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_82bdnqs7q8rtm |    2 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=25)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

The new plan for the SQL returning 2 rows when using the new Automatic Index and is much more efficient with a significantly reduced cost (just 3 down from 11356).

But again, the plans for the SQL that returns many rows are not listed as the Automatic Indexing task has already determined that an index would make such a plan significantly less efficient.

If we now rerun the SQL the returns many rows (and BEFORE High Frequency Collection Statistics potentially kicks in):

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    32 |  4512 |    35   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SPACE_ODDITY         |    32 |  4512 |    35   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | SYS_AI_82bdnqs7q8rtm |    32 |       |    12   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR
           "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     410422  consistent gets
          0  physical reads
          0  redo size
  145536076  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Note that the cardinality estimate is still way way wrong, thinking that just 32 rows are to be returned, when is fact 1000011 rows are returned.

As a result, the CBO has decided to incorrectly use the new Automatic Index. Incorrectly, in that the number of consistent gets has increased 10x from the previous FTS plan (410,422 now, up from 41,169).

One way to resolve this is to collect histograms on the CODE column (or wait for the High Frequency Stats Collection to kick in):

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 2048’);

PL/SQL procedure successfully completed.

If we now re-run this SQL:

SQL> select * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |   996K|   133M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |   996K|   133M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The cardinality estimate is now much more accurate and the the execution plan now uses the more efficient FTS.

In Part II, we’ll look at how the Automatic Indexing tasks can be made to identify the dangers of a new index to SQLs that might degrade in performance and how it will create a Baseline to protect against any such SQL regressions….

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
add a comment

As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).

As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.

One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).

If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:

SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  101K|  2388K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        609 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.

Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true)  in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.

In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.

We currently have the following columns and statistics on the IGGY_POP table:

SQL> select column_name, num_distinct, density, num_buckets, histogram
from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME          NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
-------------------- ------------ ---------- ----------- ---------------
ID                        9705425          0         254 HYBRID
CODE1                         100  .00000005         100 FREQUENCY
CODE2                         100  .00000005         100 FREQUENCY
NAME                            1 5.0210E-08           1 FREQUENCY

 

If we now collect Extended Statistics on both CODE1, CODE2 columns:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID                                  9705425          0         254 HYBRID
CODE1                                   100  .00000005         100 FREQUENCY
CODE2                                   100  .00000005         100 FREQUENCY
NAME                                      1 5.0210E-08           1 FREQUENCY
SYS_STU#29QF8Y9BUDOW2HCDL47N44           99  .00000005         100 FREQUENCY

 

The CBO now has some idea on the cardinality if both columns are used within a predicate.

If we re-run the problematic query without the hint:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  100K|  2343K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.

However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time       |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          | 50000 |  878K |   575 (15) | 00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 |  878K |   575 (15) | 00:00:01   |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)
    filter("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.

This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.

So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.

 

As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part I (Don’t Look Down) August 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Full Table Scans, Index Rebuild, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
4 comments

I’ve discussed many times the importance of data clustering in relation to the efficiency of indexes. With respect to the efficiency of Automatic Indexes including their usage within Oracle’s Autonomous Database environments, data clustering is just as important.

The following demo was run on an Oracle 19c database within the Oracle Autonomous Database Transaction Processing Cloud environment.

I begin by creating a simple table that has the key column CODE, in which data is populated in a manner where the data is very poorly clustered:

 

SQL> create table nickcave (id number, code number, name varchar2(42));

Table created.

SQL> insert into nickcave select rownum, mod(rownum, 100), 'Nick Cave and the Bad Seeds'
     from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'NICKCAVE');

PL/SQL procedure successfully completed.

 

So we have 100 evenly distributed distinct CODE values but they’re all distributed throughout the table.

The following SQL statement is basically returning just 1% of the data and is executed a number of times:

 

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows    | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     100K|  3613K|  9125   (5)| 00:00:01|
|   1 |  PX COORDINATOR              |          |         |       |            |         |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000 |     100K|  3613K|  9125   (5)| 00:00:01|
|   3 |    PX BLOCK ITERATOR         |          |     100K|  3613K|  9125   (5)| 00:00:01|
|*  4 |     TABLE ACCESS STORAGE FULL| NICKCAVE |     100K|  3613K|  9125   (5)| 00:00:01|
------------------------------------------------------------------------------------------

Without an index, the CBO currently has no choice but to use a Full Table Scan to access the table. So we wait for the next Automatic Index process to kick in:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

The Automatic Indexing report makes no mention of Automatic Indexes on the NICKCAVE table…

If we look to see if any indexes have actually been created:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor 
     from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='NICKCAVE'
     order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that yes, an Automatic Index (SYS_AI_dh8pumfww3f4r) has been created on the CODE column of the NICKCAVE table BUT it remains in an INVISIBLE, UNUSABLE state.

So Automatic Indexing considered an index on CODE, created it in an INVISIBLE, USABLE state but when testing it, failed in that it found it to be less efficient than the current FTS and so reverted the Automatic Index back to an UNUSABLE index.

Therefore, if we run a bunch of other similar SQL statements such as the following:

SQL> select * from nickcave where code=24;

SQL> select * from nickcave where code=42;

SQL> select * from nickcave where code=13;

 

They all use the FTS as again, the CBO has no choice with no VALID index on the CODE column available.

If we keep checking the Automatic Indexing report:

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

There’s still no mention of an index on the CODE column. The existing Automatic Index remains in an UNUSABLE state:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

 

Basically, the index remains ineffective because with a Clustering Factor of 4158302, it’s just too inefficient to return the 1% (100000 rows) of the table.

Even in an Autonomous Database environment, nothing will automatically change with this scenario.

In my next post, we’ll look at how we can improve the performance of this query and get an Automatic Index to actually kick in with a USABLE index…

New Oracle Cloud Offering – Indexing as a Service (IDXaaS) (I Pity The Fool) April 1, 2016

Posted by Richard Foote in 12c Rel 2, IDXaaS, Oracle Cloud, Oracle Indexes.
12 comments

This of course is an April Fools joke. Sorry !!

A very exciting announcement !! I’ve recently been promoted within Oracle Corporation to lead their brand new Oracle Cloud offering “Indexing as a Service” (IDXaaS) team, based on my previous work and experience in the indexing space. Yes, I’m both thrilled and excited to be involved in this new venture 🙂

Promising both improved levels of service and reductions in maintenance and administration overheads, this new Oracle Cloud offering will take all indexing related hassles from on-premise databases and address them in the Cloud. No more index rebuilds, no more concerns you don’t have the right indexes in place and importantly no more worrying that indexes are taking up valuable TBs of storage. Oracle indexes can now be safely and seamlessly migrated and stored in the Oracle Cloud.

Relational databases depend heavily on indexes to ensure data can be retrieved quickly and efficiently, but indexes can be very difficult to administer, can use considerable resources to maintain and consume considerable amounts of disk space. They can be so problematic that some folk actually devote an entire blog on the subject !! Imagine if all these issues and overheads can be taken away from you when administrating Oracle databases…

Index DDL will be enhanced in the 12c Release 2 Oracle Database to optionally specify a Cloud Domian Identifier (CDI) instead of a tablespace. So you will be able to either build or rebuild an index into the Oracle IDXaaS Cloud with syntax such as follows: CREATE INDEX bowie_idx on bowie(id) CLOUD LOCATION bowies_cloud_domain; This index will fully index the bowie table that exists in an on premise database (or indeed a database that is also in the cloud) but the index itself will be created and maintained in the Oracle Cloud. The database is fully aware of the actual physical location of the index as it is when created in a traditional tablespace and will simply update the index structure as DML is applied to the base table as it has always done. All Oracle indexing types and options will be fully supported including Oracle Text and Spatial indexes.

Like indexes stored in a tablespace, these indexes will need to be rebuilt or coalesced every now and then, but you won’t need to worry about this as Oracle will now implicitly do this for you in the Oracle Cloud. I’m currently hiring a team of DBAs with expertise in the “art” of rebuilding indexes who will ensure all your indexes will be rebuilt automatically for customers as necessary. By default, these operations will usually be performed on Sundays within the Oracle Cloud, but as indexes can be rebuilt online and as these indexes are no longer physically stored within your database, it’s an operation that is now performed automatically and seamlessly for you. Customers no longer need concern themselves with these types of index operations or when they occur, so while “we” work over the weekend, your DBAs can instead relax and keep their gardens nice and weed free. Please leave a comment on the blog if you think you have what it takes to effectively rebuild Oracle indexes and interested in joining my elite IDXaaS team !!

For customers that choose to use the “Premium IDXaaS” offering, Oracle will automatically create or drop indexes as required in the Oracle Cloud. Customers no longer have to make the decision on what indexes should be created on which columns in which order; this will be performed automatically for you by Oracle’s IDXaaS. By running new advanced analysis routines on your database workloads, the Premium IDXaaS offering will create an index if it will help the efficiency of any of your current workloads. Conversely, if Oracle believes an index is redundant (for example if your database is running on Exadata), indexes will first be automatically made “invisible” and if there are no serious repercussions, will be automatically dropped 7 days later. DBAs and Developers will no longer need to worry about which indexes to create/drop, significantly reducing the complexities in managing both current and future Oracle database environments.

Oracle will also introduce a new package that will easily migrate all your existing indexes for you into the Oracle Cloud. The DBMS_INDEX_CLOUD package will automatically migrate all indexes for a specified Table/Partition/Schema/Pluggable Database/Database into the Oracle Cloud, which will be a fully online, seamless operation. Once completed, associated indexes within “on premise” tablespaces will effectively be dropped and be replaced with their Cloud equivalents. All that “expensive” storage that was previously tied up in storing all those Oracle indexes can now be freed up to store much more business rich data such as emails, selfie photos, David Bowie music catalogs, etc. Note that these significant storage savings will also translate in smaller backups, smaller Standby databases, smaller Production clones, etc. Importantly, the Oracle Cost Based Optimizer will be fully aware of the Cloud Indexes (as their definitions, statistics and other metadata are still stored within the database data dictionary) and will use the Cloud Indexes as appropriate. No changes to existing SQL is necessary, the CBO will decide to use an index in the cloud in the same manner as it chooses to use an index currently. There is also a new Cloud Index execution path option that allows data to be retrieved via the Cloud Index in a “good enough” manner which is much more efficient than a normal index scan, although it might not necessarily retrieve all the possible data that meets an SQL predicate. It effectively provides what we refer to as that “internet” data retrieval experience.

Oracle Open World 2016 will be the time of the official release for this exciting new Oracle Cloud offering. I will be presenting a number of sessions at OOW16 on this new IDXaaS and there will be a number of live demos to demonstrate its unique capabilities and the simplicity of the migration process. However, there will be a number of announcements before then, including the opportunity to be involved in a beta customer program. There will also be a number of hands-on workshops being conducted globally, with customers getting the chance to see for themselves how easily it is to move database indexes into the Oracle Cloud.

Lastly and perhaps most exciting of all, this new IDXaaS capability will initially be a free option with all current Oracle databases from 12c Release 2 onwards. You will be able to migrate all your existing 12.2 Oracle database indexes onto the Oracle Cloud and they will be stored, maintained and administrated for free as part of your current Oracle database licenses. All Oracle database editions will be supported. This free offer however is likely to end once the 12c Release 2 database is officially released but the additional pay as you go licensing constructs are proposed to be extremely competitive with yearly “indexed” priced increments. Also to be subsequently released will be Oracle IDXaaS support for most other commercial databases including DB2, SQL Server, NoSQL, MySQL, Sybase, Hana, Hadoop and Notepad.

So administrating Oracle (and other) Databases will soon change forever.  Oracle indexes will no longer have to be a concern for customers as all database indexes can instead be stored in the Oracle Cloud via the new IDXaaS offering, allowing Oracle to automatically store and manage these tricky database objects for you. Not only will index rebuilds be a thing of the past, but so will all concerns regarding index storage, creation and maintenance. I’m very excited to be involved in this new undertaking and indeed with my move to Oracle HQ in San Francisco in the coming months (I’ll up in the Clouds level of Building 1). I look forward to talking IDXaaS with you in the days to come 🙂

 

This of course is an April Fools joke. Sorry !!