jump to navigation

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
add a comment

 

In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.

However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.

Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).

These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.

One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.

This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1005K|   135M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |  1005K|   135M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.

However, if we now remove the histogram on the CODE column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’);

PL/SQL procedure successfully completed.

 

There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.

So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 16:42:33
Activity end                 : 18-AUG-2020 16:43:06
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created                              : 0
Space used                                   : 0 B
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved                      : 0
SQL plan baselines created (SQL statements)  : 1 (1)
Overall improvement factor                   : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.

If we look at the Verification Details part of this report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The following SQL plan baselines were created:
-------------------------------------------------------------------------------
Parsing Schema Name     : BOWIE
SQL ID                  : 3yz8unzhhvnuz
SQL Text                : select /*+ dynamic_sampling(0) */ * from
space_oddity where code in (190000, 170000, 150000,
130000, 110000, 90000, 70000, 50000, 30000, 10000)
SQL Signature           : 3910785437403172730
SQL Handle              : SQL_3645e6a2952fcf7a
SQL Plan Baselines (1)  : SQL_PLAN_3cjg6naakzmvu198c05b9

We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.

Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.

Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.

Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.

This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:

 

SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status
from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz';

EXECUTION_NAME             ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS
-------------------------- -------------------- ---------------------- ---------
SYS_AI_2020-08-18/16:42:33                41169                 410291 REGRESSED

 

If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Hint Report (identified by operation id / Query Block Name / Object Alias):

Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 -  SEL$1
U -  dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS

Note
-----

- SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          4  db block gets
      41170  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.

So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
add a comment

As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).

As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.

One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).

If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:

SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  101K|  2388K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        609 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.

Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true)  in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.

In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.

We currently have the following columns and statistics on the IGGY_POP table:

SQL> select column_name, num_distinct, density, num_buckets, histogram
from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME          NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
-------------------- ------------ ---------- ----------- ---------------
ID                        9705425          0         254 HYBRID
CODE1                         100  .00000005         100 FREQUENCY
CODE2                         100  .00000005         100 FREQUENCY
NAME                            1 5.0210E-08           1 FREQUENCY

 

If we now collect Extended Statistics on both CODE1, CODE2 columns:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID                                  9705425          0         254 HYBRID
CODE1                                   100  .00000005         100 FREQUENCY
CODE2                                   100  .00000005         100 FREQUENCY
NAME                                      1 5.0210E-08           1 FREQUENCY
SYS_STU#29QF8Y9BUDOW2HCDL47N44           99  .00000005         100 FREQUENCY

 

The CBO now has some idea on the cardinality if both columns are used within a predicate.

If we re-run the problematic query without the hint:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  100K|  2343K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.

However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time       |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          | 50000 |  878K |   575 (15) | 00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 |  878K |   575 (15) | 00:00:01   |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)
    filter("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.

This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.

So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.

 

As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Explain Plan For Index, Extended Statistics, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle General, Oracle Indexes.
1 comment so far

Following on from my previous few posts on “data skew”, I’m now going to look at it from a slightly different perspective, where there is an inherent relationship between columns. The CBO has difficulties in recognising (by default) that some combinations of column values are far more common than other combinations, resulting in incorrect cardinality estimates and resultant poor execution plans.

As we’ll see, this skew in returned data can lead to poor execution plans due to the inappropriate use of newly created Automatic Indexes…

I’ll start by creating a simple table that has two columns of interest, CODE1 and CODE2:

SQL> create table iggy_pop (id number, code1 number, code2 number, name varchar2(42));

Table created.

SQL> insert into iggy_pop select rownum, mod(rownum, 100)+1, mod(rownum, 100)+1, 'David Bowie'
from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP');

PL/SQL procedure successfully completed.

 

Both columns CODE1 and CODE2 each have 100 distinct values, so that the possible combinations of data from both columns is 100 x 100 = 10,000. HOWEVER, the values of CODE1 and CODE2 are always the same and so there is in fact only 100 distinct combinations of data because of this inherent relationship between columns.

If we run the following query for a combination of data that exists:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name      | Rows | Bytes | Cost (%CPU)|   Time     |
--------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT         |          |   1000|  24000|    575 (15)|   00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |   1000|  24000|    575 (15)|   00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Without an index, the CBO has no choice but to use a FTS. However, the interesting thing to note is how the cardinality estimate is way wrong, with 100,000 rows returned but only 1000 rows estimated. The CBO incorrect assumes that 1/10000th of the data is being returned and not actual the 1/100 (1%).

If we run a query on a combination of data that doesn’t exist:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          | 1000 |  18000|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 1000 |  18000|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)
    filter("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

The CBO still estimates that 1000 rows are to be returned. However, with no rows returned, an index would be a much better alternative than the current FTS in this case.

Let’s now wait and see what the Automatic Indexing process makes of all this (following are highlights from the Auto Indexing Last Activity report):

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 18-SEP-2020 01:24:17
Activity end                : 18-SEP-2020 01:25:29
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 134.22 MB (134.22 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (41301.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 41301.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner | Table    | Index                | Key         | Type   | Properties |
-------------------------------------------------------------------------------
| BOWIE | IGGY_POP | SYS_AI_1awkddqkwa4f8 | CODE1,CODE2 | B-TREE | NONE       |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

So Oracle does indeed create an automatic index on the CODE1, CODE2 columns. However, notice that only 1 statement has been verified and not the above two statements that I had executed during the previous period.

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : bdnf0barn3jk7
SQL Text            : select code1, code2, name from iggy_pop where code1=1 and code2=42
Improvement Factor  : 41301.7x

Execution Statistics:
-----------------------------
                  Original Plan                 Auto Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 72085                        1342
CPU Time (s):     39272                        679
Buffer Gets:      123907                       3
Optimizer Cost:   575                          4
Disk Reads:       122859                       2
Direct Writes:    0                            0
Rows Processed:   0                            0
Executions:       3                            1

 

So only the SQL that returned 0 rows has been reported. As expected, it runs much more efficiently with an index than via the previous FTS, with an Improvement Factor of some 41301.7x.

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 3288467

--------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost | Time        |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |          |      |       |  575 |             |
| 1 | TABLE ACCESS STORAGE FULL | IGGY_POP | 1000 | 18000 |  575 | 00:00:01    |
--------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2496796491

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |    2 |    36 |    4 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | IGGY_POP             |    2 |    36 |    4 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_1awkddqkwa4f8 |    1 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE1"=1 AND "CODE2"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

If we look at the comparison between plans, the new plan of course uses the newly created Automatic Index.

The critical point to notice here however is that the cardinality estimates are almost spot for the new execution plan (2 rows is much closer to reality than the previous 1000).

The reason why it’s much more accurate is because the Auto Indexing process session uses the new Dynamic Sampling Level = 11. This enables the CBO to sample data on the fly and determine a much more accurate cardinality estimate than by default where the Dynamic Sampling Level=2.

This also explains why the other statement which returned many rows was not “verified”. Actually, it was but because the Auto Index process with Dynamic Sampling set to 11 correctly identified that too many rows were being returned to make any new index viable, this statement did NOT cause the new index to be kept.

So it was only the SQL that returned no rows that resulted in the newly created Automatic Index. The other statement was correctly determined by the Automatic Indexing process to run worse with the new index and so determined that the CBO would simply ignore the index if created.

BUT this assumption of the CBO ignoring the index is NOT correct as we’ll see…

If we look at the new Automatic Index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='IGGY_POP';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_1awkddqkwa4f8           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       15362           4083700

 

We can see the index is both VISIBLE and VALID and so can potentially be used now by ANY subsequent SQL statement.

Now the important thing to note is that the default for most sessions in a database is for Dynamic Sampling to be set to 2 and for Optimizer_Adaptive_Statistics=False. Importantly, this is also the case in Oracle’s Autonomous Transaction Processing Cloud service.

SQL> show parameter sampling

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2
SQL> show parameter optimizer_adaptive

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_adaptive_plans             boolean     TRUE
optimizer_adaptive_reporting_only    boolean     FALSE
optimizer_adaptive_statistics        boolean     FALSE

 

So this is DIFFERENT to the settings for the Automatic Indexing process. In a standard session, the CBO will NOT have the capability to accurately determine the correct cardinality estimates as we saw previously.

If we now re-run the SQL that returns no rows:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 1000 | 18000 |     413 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP             | 1000 | 18000 |     413 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_1awkddqkwa4f8 | 1000 |       |       4 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

 

The execution uses the new index, because even though it STILL thinks 1000 rows are to be returned, that’s sufficiently few for the index to be costed the cheaper option.

When when re-run the SQL that returns many many rows:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2496796491

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 1000 | 24000 |     413 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP             | 1000 | 24000 |     413 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_1awkddqkwa4f8 | 1000 |       |       4 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
         25 recursive calls
          0 db block gets
      41981 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          1 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Ouch. It also uses the new Automatic Index, because it also STILL thinks only 1000 rows are to be returned and just like the previous SQL statement, is determined to be the cheaper option.

BUT in this case it isn’t really the cheaper option, having to read the table potentially piecemeal at a time via the index, rather than more efficiently with fewer and larger multiblock reads via a FTS.

This is not really how Automatic is designed to work. Its meant to protect us from making SQL statements regress in performance BUT because there is a difference in how a normal session and the Automatic Indexing process determines the cost of execution plans, these scenarios can eventuate.

In my next blog I’ll look at how to address this specific scenario and then look at an example of how Automatic Indexing is really meant to work via the use of automated baselines…

Oracle 19c Automatic Indexing: Data Skew Part II (Everything’s Alright) September 14, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Automatic Table Statistics, Autonomous Transaction Processing, Data Skew, Exadata, High Frequency Statistics Collection, Histograms, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning.
1 comment so far

In my previous post, I discussed an example with data skew, in which the Automatic Indexing process created a new index, but somehow the CBO when using the index estimated the correct cardinality estimate even though no histograms were explicitly calculated.

In this post I’ll answer HOW this achieved by the CBO.

Get some idea on the answer by now looking at the column details:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                     10 FREQUENCY
NAME                      1 NONE

We can see that there is now indeed an histogram on the column. When and how were these histograms collected?

The answer lies with a new Oracle Database 19c feature called “High-Frequency Automatic Statistics Collection“, which is available on Exadata environments. As I’m running all these demos on the Oracle Autonomous Transaction Processing Cloud environment which runs on an Exadata platform, this feature is enabled by default.

To highlight the capabilities of this features more fully, I’m going to setup a slightly different demo with three tables:

SQL> create table bowie1 (id number, code number, name varchar2(42));  <= Stale with no stats

Table created.

SQL> insert into bowie1 select rownum, mod(rownum, 100)+1, 'David Bowie' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

 

Table BOWIE1 has no statistics collected on it.

 

SQL> create table bowie2 (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie2 select rownum, mod(rownum, 100)+1, 'David Bowie' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2');

PL/SQL procedure successfully completed.

SQL> insert into bowie2 select rownum+100000, mod(rownum, 100)+1, 'Ziggy Stardust' from dual connect by level <= 50000;

50000 rows created.

SQL> commit;

Commit complete.

 

BOWIE2 table has new rows added after statistics have been collected and so has “stale” outdated stats.

 

SQL> create table bowie3 (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie3 select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie3 set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie3 set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie3 set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie3 set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie3 set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie3 set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie3 set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie3 set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie3 set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'bowie3', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select code, count(*) from bowie3 group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

The BOWIE3 table is as my previous example, with data skew but with NO histograms collected. I’m now going to run a query on BOWIE3 where the CBO gets the cardinality estimate hopelessly wrong because of the missing histogram on the CODE column:

SQL> select * from bowie3 where code=7;

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2517725203

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |   100K|  1953K|   974   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE3 |   100K|  1953K|   974   (2)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=7)

 

If we look at the current statistics on these tables:

 

SQL> select table_name, num_rows, stale_stats, notes from user_tab_statistics
where table_name in ('BOWIE1', 'BOWIE2', 'BOWIE3');

TABLE_NAME        NUM_ROWS STALE_S NOTES
--------------- ---------- ------- ------------------------------
BOWIE1
BOWIE2              100000 YES
BOWIE3             1000000 NO
BOWIE2              150000         STATS_ON_CONVENTIONAL_DML

 

We can see that BOWIE1 has indeed no statistics.

BOWIE2 is marked as having state statistics, although thanks to another Oracle Database 19c feature called “Real-Time Statistics Collection“, does have some additional statistics captured (such as NUM_ROWS) when the additional rows were inserted. I’ll discuss this feature more fully in a later blog article.

BOWIE3 is considered fine in that it does have statistics which are NOT stale, BUT…

 

SQL> select column_name, num_buckets, histogram from user_tab_col_statistics
where table_name='BOWIE3';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

We don’t currently have any histograms even though a simple single table query was previously run based on a CODE predicate which had hopelessly inaccurate cardinality estimates.

If we wait approximately 15 minutes (default) for the High-Frequency Automatic Statistics Collection process to run and look at these column statistics again:

SQL> select table_name, num_rows, stale_stats from user_tab_statistics
where table_name in ('BOWIE1', 'BOWIE2', 'BOWIE3');

TABLE_NAME        NUM_ROWS STALE_S
--------------- ---------- -------
BOWIE1              100000 NO
BOWIE2              150000 NO
BOWIE3             1000000 NO

SQL> select column_name, num_buckets, histogram from user_tab_col_statistics where table_name='BOWIE3';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                     10 FREQUENCY
NAME                      1 NONE

 

We now notice that:

BOWIE1 now has statistics captured, as the High-Frequency Automatic Statistics Collection process looks for tables with missing statistics.

BOWIE2 now has fully up to date statistics, as the High-Frequency Automatic Statistics Collection process looks for tables with stale statistics.

BOWIE3 now has histograms on the CODE columns, as the High-Frequency Automatic Statistics Collection process looks out for missing histograms if queries have been subsequently run with poor cardinality estimates.

Having more accurate, appropriate and up to date statistics all supports the CBO in making much better decisions in relation to the use of any newly created Automatic Indexes.

 

You can configure High-Frequency Automatic Statistics Collection in the following manner:

SQL> EXEC DBMS_STATS.SET_GLOBAL_PREFS('AUTO_TASK_STATUS','ON');

PL/SQL procedure successfully completed.

This turns the feature ON/OFF. It’s OFF by default on standard Exadata environments but ON by default in Autonomous Database environment.

 

SQL> EXEC DBMS_STATS.SET_GLOBAL_PREFS('AUTO_TASK_MAX_RUN_TIME','900');

PL/SQL procedure successfully completed.

This configures how long to allow the process to run (default is 3600 seconds/60 minutes).

 

SQL> EXEC DBMS_STATS.SET_GLOBAL_PREFS('AUTO_TASK_INTERVAL','900');

PL/SQL procedure successfully completed.

This configures the interval between the process running (default is every 900 seconds/15 minutes).

 

In my next post, I’ll look at a slightly more complex data skew example with Automatic Indexing, where both selective and unselective SQL predicates are invoked…

Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets) September 10, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Data Skew, Full Table Scans, Histograms, Index Access Path, Index statistics, Low Cardinality, Oracle Blog, Oracle Indexes, Oracle19c, Performance Tuning.
1 comment so far

When it comes to Automatic Indexes, things can become particularly interesting when dealing with data skew (meaning that some columns values are much less common than other column values). The next series of blog posts will look at a number of different scenarios in relation to how Automatic Indexing works with data that is skewed and not uniformly distributed.

I’ll start with a simple little example, that has an interesting little twist at the end.

The following table has a CODE column, which has 10 distinct values that a widely skewed, with some values much less common than others:

SQL> create table bowie_skew (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie_skew set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

 

I’ll collect statistics on this table, but explicitly NOT collect histograms, so that the CBO will have no idea that the data is actually skewed. Note if I collected data with the default size, there would still be no histograms, as the column has yet to be used within an SQL predicate and so has no column usage recorded.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_SKEW', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We can clearly see that some CODE values (such as “6”) have relatively few values, with only 100 occurrences:

SQL> select code, count(*) from bowie_skew group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

As I explicitly collected statistics with SIZE 1, we currently have NO histograms in the table:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

 

Let’s now run the following query with a predicate on CODE=6, returning just 100 rows:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name         | Rows  | Bytes | Cost (%CPU)| Time       |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |              |   100K|  1953K|   570   (7)| 00:00:01 |
|   1 |  PX COORDINATOR                |              |         |         |              |            |
|   2 |   PX SEND QC (RANDOM)          | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR           |              |   100K|  1953K|   570   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_SKEW |   100K|  1953K|   570   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("CODE"=6)
    filter("CODE"=6)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
      3781  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

The CBO has no choice but to use a FTS as I currently have no indexes on the CODE column. Note also that the CBO has got its cardinality estimates way wrong, expecting 100,000 rows and not the actual 100 rows, as I have no histograms on the CODE column.

So let’s now wait 15 minutes or so and see what the Automatic Indexing process decides to do. Following are portions of the next Auto Indexing report:

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
--------------------------------------------------------------------------
| Owner | Table      | Index                | Key  | Type   | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE       |
--------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : fn4shnphu4bvj
SQL Text             : select * from bowie_skew where code=6
Improvement Factor   : 41.1x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  119596                        322
CPU Time (s):      100781                        322
Buffer Gets:       11347                         103
Optimizer Cost:    570                           4
Disk Reads:        0                             0
Direct Writes:     0                             0
Rows Processed:    100                           100
Executions:        1                             1

 

So we can see that yes, Auto Indexing has decided to create a new index here on the CODE column (“SYS_AI_7psvzc164vbng“) as it improves the performance of the query by a factor of 41.1x.

If we look further down the Auto Indexing report and compare the execution plans:

 

PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value  : 3374004665
-----------------------------------------------------------------------------------------
| Id | Operation                      | Name       | Rows   | Bytes   | Cost | Time     |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |            |        |         |  570 |          |
|  1 |  PX COORDINATOR                |            |        |         |      |          |
|  2 |    PX SEND QC (RANDOM)         | :TQ10000   | 100000 | 2000000 |  570 | 00:00:01 |
|  3 |     PX BLOCK ITERATOR          |            | 100000 | 2000000 |  570 | 00:00:01 |
|  4 |      TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 |  570 | 00:00:01 |
-----------------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value  : 140816325
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100 |  2000 |    4 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW           |  100 |  2000 |    4 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_7psvzc164vbng |  100 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=6)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

We can see that new execution plan indeed uses the index BUT interestingly, it has a correct cardinality estimate of 100 and not 100,000 as per the original plan.

Now this can be explained in that the Automatic Indexing process uses a Dynamic Sampling level of 11, meaning it can calculate the correct cardinality on the fly and can cause difficulties between what the Automatic Indexing process thinks the CBO costs will be vs. the CBO costs in a default database session that uses the (usually default) Dynamic Sampling level of 2 (as I’ve discussed previously).

BUT when I now rerun the SQL query again:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
---------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |   100 |  2000 |     4   (0)|
|   1 |  PX COORDINATOR                       |                      |       |       |            |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |   100 |  2000 |     4   (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW           |   100 |  2000 |     4   (0)|
|   4 |     BUFFER SORT                       |                      |       |       |            |
|   5 |      PX RECEIVE                       |                      |   100 |       |     3   (0)|
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |   100 |       |     3   (0)|
|   7 |        PX SELECTOR                    |                      |       |       |            |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_7psvzc164vbng |   100 |       |     3   (0)|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("CODE"=6)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       103  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

We notice the new Automatic Index is now used BUT also that the CBO has now determined the correct cardinality estimate of 100. But how is this possible when I haven’t recalculated the table statistics?

I’ll explain in my next post.

Oracle 19c Automatic Indexing: DDL Statements With Auto Indexes (No Control) September 1, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Drop Automatic Indexing, Drop Index, Index Coalesce, Index Rebuild, Index Shrink, Invisible Indexes, Online DDL, Oracle Indexes.
2 comments

 

I’ve had a number of questions in relation to DDL support for Automatic Indexes since my last post on how one can now drop Automatic Indexes, so decided to quickly discuss what DDL statements are supported with Automatic Indexes.

Many DDL commands are NOT supported with Automatic Indexes, such as making indexes (IN)VISIBLE and (UN)USABLE and changing storage attributes:

 

SQL> alter index "SYS_AI_600vgjmtqsgv3" invisible;

alter index "SYS_AI_600vgjmtqsgv3" invisible
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes


SQL> alter index "SYS_AI_600vgjmtqsgv3" unusable;

alter index "SYS_AI_600vgjmtqsgv3" unusable
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes


SQL> ALTER INDEX "SYS_AI_600vgjmtqsgv3" INITRANS 5;
ALTER INDEX "SYS_AI_600vgjmtqsgv3" INITRANS 5
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

You also can’t drop indexes with the DDL statement:

SQL> drop index "SYS_AI_600vgjmtqsgv3";

drop index "SYS_AI_600vgjmtqsgv3"
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

 

Although as discussed in my last post, you can now drop Automatic Indexes by using DBMS_AUTO_INDEX.DROP_AUTO_INDEXES.

 

You can however potentially improve the structure of an Automatic Index by using the REBUILD, COALESCE or SHRINK (SPACE) options:

 

SQL> alter index "SYS_AI_600vgjmtqsgv3" rebuild online;

Index altered.

SQL> alter index "SYS_AI_600vgjmtqsgv3" coalesce;

Index altered.

SQL> alter index "SYS_AI_600vgjmtqsgv3" shrink space;

Index altered.

 

Interestingly, if Oracle considers an Automatic Index but decides it’s not efficient enough to be created, the Automatic Indexing process can leave a new Automatic Index in UNUSABLE / INVISIBLE state (as previously discussed), which can be subsequently rebuilt:

SQL> select index_name, status, visibility from user_indexes where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT
------------------------------ -------- ---------
SYS_AI_600vgjmtqsgv3           UNUSABLE INVISIBLE

SQL> alter index "SYS_AI_600vgjmtqsgv3" rebuild online;

Index altered.

SQL> select index_name, status, visibility from user_indexes where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT
------------------------------ -------- ---------
SYS_AI_600vgjmtqsgv3           VALID    INVISIBLE

 

So the index is now VALID and actually physically created. But you can’t subsequently make it VISIBLE, which means it can’t ordinarily be used by the CBO:

SQL> alter index "SYS_AI_600vgjmtqsgv3" visible;
alter index "SYS_AI_600vgjmtqsgv3" visible
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

 

When you rebuild an Automatic Index, you can however change the manner in which it’s compressed:

SQL> select index_name, status, visibility, compression from user_indexes
where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT COMPRESSION
------------------------------ -------- --------- -------------
SYS_AI_600vgjmtqsgv3           VALID    INVISIBLE ADVANCED LOW

SQL> alter index "SYS_AI_600vgjmtqsgv3" rebuild nocompress;

Index altered.

SQL> select index_name, status, visibility, compression from user_indexes
where index_name='SYS_AI_600vgjmtqsgv3';

INDEX_NAME                     STATUS   VISIBILIT COMPRESSION
------------------------------ -------- --------- -------------
SYS_AI_600vgjmtqsgv3           VALID    INVISIBLE DISABLED

 

And no, you can’t rename an Automatic Index:

 

SQL> alter index "SYS_AI_600vgjmtqsgv3" rename to BOWIE_INDEX;
alter index "SYS_AI_600vgjmtqsgv3" rename to BOWIE_INDEX
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

 

So the answer is it depends on what one can and can’t do currently with an Automatic Index, which of course is subject to change in the future…

Oracle 19c Automatic Indexing: Dropping Automatic Indexes Part II (New Angels of Promise) August 25, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Drop Automatic Indexing, Drop Index, DROP_AUTO_INDEXES.
3 comments

Just a quick update on a previous post on dropping Automatic Indexes.

As I discussed previously, one could drop Automatic Indexes by moving them to a new tablespace and then dropping the tablespace. This cumbersome technique was necessary because there was no direct facility to drop Automatic Indexes. Additionally, it’s worth noting this process isn’t viable in Autonomous Databases Cloud environments as it’s not possible to create/drop tablespaces.

As I’ve also discussed previously, there may be scenarios when you may want to drop an Automatic Index, such as when newly created Automatic Indexes don’t get used by the CBO in a manner as predicted by the Automatic Indexing process.

Thankfully, there is now (since Oracle Database 20c and back-ported to 19.5) an API option to easily drop Automatic Indexes when desired:

SQL> exec DBMS_AUTO_INDEX.DROP_AUTO_INDEXES(owner=>’BOWIE’, index_name=>'”SYS_AI_600vgjmtqsgv3″‘, allow_recreate=>true);

PL/SQL procedure successfully completed.

 

Note: the index name must be further enclosed in a ” “ due to the mixed-case naming convention of Automatic Indexes.

The last parameter controls whether or not you want to allow the Automatic Indexing process to subsequently potentially recreate the index again.  The default is false.

As I’ve mentioned a number of times in this series, Automatic Indexing is still a relatively new feature that will only improve and expand in future versions…

 

A big thanks to Nigel Bayliss for the heads-up on this new Auto Indexing feature.

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part III (Star) August 11, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Internals, Index statistics, Oracle, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
add a comment

In Part I we looked at a scenario where an index was deemed to be too inefficient for Automatic Indexing to create a VALID index, because of the poor clustering of data within the table.

In Part II we improved the data clustering but the previous SQLs could still not generate a new Automatic Index because they had effectively been blacklisted.

So how do we get Automatic Indexing to improve the performance of these queries?

Basically, we need to run some new SQL statements to those previously run which have not been blacklisted, that can make the Automatic Indexing process kick in and create the necessary indexes.

For example, if we now run the following SQL statements that have not previously run:

select * from nickcave where code=1;

select * from nickcave where code=2;

select * from nickcave where code=3;

 

And now wait for the next Automatic Indexing process period and look at the following (partial) Automatic Indexing report:

 

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 22-JUN-2020 04:26:31
Activity end                 : 22-JUN-2020 04:27:25
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                              : 0
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 167.77 MB (167.77 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 3
SQL statements improved (improvement factor)  : 3 (76x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 76x


INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
------------------------------------------------------------------------
| Owner | Table    | Index                | Key  | Type   | Properties |
------------------------------------------------------------------------
| BOWIE | NICKCAVE | SYS_AI_dh8pumfww3f4r | CODE | B-TREE | NONE       |
------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 5k1wmtu7um5q9
SQL Text             : select * from nickcave where code=1
Improvement Factor   : 76x

Execution Statistics:
-----------------------------

                   Original Plan                   Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  1725103                       106145
CPU Time (s):      1534305                       62314
Buffer Gets:       291835                        779
Optimizer Cost:    9125                          792
Disk Reads:        0                             197
Direct Writes:     0                             0
Rows Processed:    500000                        100000
Executions:        5                             1

 

We can see that an index has indeed now been created on the CODE column because one of the new statements is now deemed to be 76x more efficient thanks to the new index.

If we look at details of this new Automatic Index:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  VISIBLE   DISABLED      VALID      10000000       19518             57983

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='NICKCAVE'
order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that the index is now indeed VALID and VISIBLE with a much improved Clustering Factor at just 57983.

If we now re-run newer SQL statement:

 

SQL> select * from nickcave where code=1;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("CODE"=1)

Statistics
----------------------------------------------------------
          12  recursive calls
           0  db block gets
         779  consistent gets
           0  physical reads
         176  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

We notice the SQL statement is now indeed using this new Automatic Index.

If we now re-run our original SQL statement that had been using a FTS execution plan and that we couldn’t make Automatic Indexing create a VALID index because when originally run, the data clustering was too poor within the table:

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

    8 - access("CODE"=42)

Statistics
----------------------------------------------------------
          14  recursive calls
           4  db block gets
         780  consistent gets
         198  physical reads
       15224  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

This query is now also finally using the newly created index, because the CBO now too deems it to be more efficient with an index based execution plan.

The moral of the story. Automatic Indexing may initially deem a potential index to not be efficient enough to be created. However, things may change such as the clustering of table data (or the distribution of data values, etc. etc.) that may make a new index now viable. This though requires a NEW SQL statement to be executed, such that a non-blacklisted SQL can invoke the Automatic Indexing process to create the necessary Automatic Index.

Of course, things may change in the future. Future releases may have the facility to automatically re-cluster the data in tables optimally based on existing workloads and may also have a mechanism to identify that things have sufficient “changed” such that previously “failed” SQL statements from an Automatic Indexing perspective may warrant reevaluation.

This has only been tested up to version Oracle Database 19.5 of the Oracle Autonomous Database environments.

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part II (Wild Is The Wind) August 10, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Oracle Indexes, Performance Tuning.
2 comments

 

In my previous post, I discussed a scenario in which Oracle Automatic Indexing refused to create a VALID index, because the resultant index was too inefficient to access the necessary rows due to the poor clustering of data within the table.

If the performance of such an SQL were critical for business requirements, there is a way to address this scenario, by re-clustering the data within the table to align itself with the index. Although the re-clustering table operation can now be very easily performed online since Oracle Database 12.2 (without having to use the dbms_redefinition process), this is NOT automatically performed within the Autonomous Database self-tuning framework (yet).

But it’s an activity we can perform manually to improve the performance of such critical SQLs as follows:

SQL> alter table nickcave add clustering by linear order(code);

Table altered.

SQL> alter table nickcave move online;

Table altered.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

 

With the data in the table now perfectly aligned with the index, we would ordinarily now expect the index to be more efficient method to retrieve this 1% of the data.

However, if we now re-run the previously executed SQLs each a number of times:

 

SQL> select * from nickcave where code=24;

SQL> select * from nickcave where code=42;

SQL> select * from nickcave where code=13;

And now wait until next Automatic Indexing process period:

SQL> select dbms_auto_index.report_last_activity() report from dual;

...

We notice that the Automatic Index is still NOT mentioned in the Automatic Indexing reports and still remains UNUSABLE:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

 

So what’s going on?

In order to prevent the same SQLs from being continually re-evaluated to see if an index might be preferable, the Automatic Indexing process puts previously evaluated SQLs on a type of blacklist and therefore don’t get subsequently re-evaluated.

So although the new clustering of the data within the table would now likely warrant the creation of a new index, if we just run the some SQLs as previously, nothing changes. No Automatic Index is created and the SQLs remain in their current “sub-optimal” state.

In Part III, we’ll look at how to finally get Automatic Indexing to create these indexes and improve the performance of these queries…

Oracle 19c Automatic Indexing: Common Index Creation Trap (Rat Trap) June 30, 2020

Posted by Richard Foote in 19c, 19c New Features, ASSM, Automatic Indexing, CBO, Clustering Factor, Data Clustering, Oracle Indexes, TABLE_CACHED_BLOCKS.
1 comment so far

When I go to a customer site to resolve performance issues, one of the most common issues I encounter is in relation to inefficient SQL. And one of the most common causes for inefficient SQL I encounter is because of deficiencies the default manner by which the index Clustering Factor is calculated.

When it comes to both Automatic Indexes and in relation to the Oracle Autonomous Database Cloud Services, the “flawed” default manner by which the index Clustering Factor is calculated still applies. So we need to exercise some caution when Auto Indexes are created and the impact their default statistics can have on the performance of subsequent SQL statements.

To illustrate with a simple example, I’ll first create a table with the key column being the ID column which will be effectively unique. The table will be populated via a basic procedure that just inserts 1M rows. The procedure uses an ORDER sequence, such that the ID values are generated in a monotonically increasing manner:

SQL> create table bowie_assm (id number, code number, name varchar2(42));

Table created.

SQL> create sequence bowie_assm_seq order;

Sequence created.

Procedure created.

SQL> create or replace procedure pop_bowie_assm as
2  begin
3    for i in 1..1000000 loop
4      insert into bowie_assm values (bowie_assm_seq.nextval, mod(i,1000), 'DAVID BOWIE');
5      commit;
6    end loop;
7  end;
8  /

Procedure created.

 

However crucially, the procedure is executed by 3 different session concurrently, to simulate a multi user environment inserting into a table…

 

SQL> exec pop_bowie_assm

PL/SQL procedure successfully completed.

 

We’ll now collect statistics on the table:

 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE_ASSM';

TABLE_NAME        NUM_ROWS     BLOCKS
--------------- ---------- ----------
BOWIE_ASSM         3000000      12137

 

So the table has 3M rows and is 12137 blocks in size.

If we run an SQL a few times where we select only the one ID value:

 

SQL> select * from bowie_assm where id = 42;

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |    22 |  1934   (6)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |     1 |    22 |  1934   (6)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |     1 |    22 |  1934   (6)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_ASSM |     1 |    22 |  1934   (6)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("ID"=42)
    filter("ID"=42)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
     12138  consistent gets
         0  physical reads
         0  redo size
       707  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The execution plan shows a Full Table Scan (FTS) is invoked, the only choice the CBO has without an index on the ID column. Clearly an index on the ID column would make the plan substantially more efficient with just 1 row selected from a 3M row table. Hopefully, Automatic Indexing will come to our rescue, so let’s check out the subsequent Automatic Indexing Report:

 

REPORT

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                              : 1
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 58.72 MB (58.72 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 2
SQL statements improved (improvement factor)  : 1 (1.2x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 1.1x

-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------

The following indexes were created:
-------------------------------------------------------------------------
| Owner | Table      | Index                | Key | Type   | Properties |
-------------------------------------------------------------------------
| BOWIE | BOWIE_ASSM | SYS_AI_2w1pss6qbdz6z | ID  | B-TREE | NONE       |
-------------------------------------------------------------------------

So yes indeed, an Automatic Index (SYS_AI_2w1pss6qbdz6z) was created on the ID column.

If we look at the default Clustering Factor of this index:

 

SQL> select index_name, auto, constraint_index, visibility, status, clustering_factor from user_indexes where table_name='BOWIE_ASSM';

INDEX_NAME           AUT CON VISIBILIT STATUS   CLUSTERING_FACTOR
-------------------- --- --- --------- -------- -----------------
SYS_AI_2w1pss6qbdz6z YES NO  VISIBLE   VALID              2504869

 

We notice the Clustering Factor is relatively high at 2504869, much higher than the 12137 number of blocks in the table.

But if the ID column in the table has been loaded via a monotonically increasing sequence, doesn’t that mean the ID values have been inserted in approximately in ID order? If so, doesn’t that mean the ID column should have a “good” Clustering Factor” as the order of the rows in the table matches the order of the indexed values in the ID index?

Clearly not.

The reason being that the table is stored in the default Automatic Segment Space Management (ASSM) tablespace type, which is designed to avoid contention by concurrent inserts from different sessions. Therefore each of the 3 sessions inserting into the table are each assigned to different table blocks, resulting in the rows not being precisely inserted in ID order. It’s very close to ID order, the the ID values clustered within a few blocks from each other, but not precisely stored in ID order.

However, by default, the Clustering Factor is calculated by reading each index entry and determining if it references a ROWID that accesses a table block different from the PREVIOUS index entry. If it does differ, it increments the Clustering Factor, if it doesn’t differ and accesses the same table block as the previous index entry, the Clustering Factor is NOT incremented.

So in theory, we could have 100 rows that reside in just 2 different table blocks, but if the odd IDs live in one block and the even IDs live in the other block, meaning that each ID is stored in a different table block to the previous, the Clustering Factor would have a value of 100 for these 100 rows, even though they only occupy 2 table blocks. The Clustering Factor is therefore much higher than in reality it should be as ultimately only 2 different table blocks are accessed within a negligible time from each other.

This is the “flaw” with how the default Clustering Factor is calculated. By noting if a table block access differs only from the previous table block accessed, it leaves the Clustering Factor calculation susceptible to exaggerated high values when the data really is relatively well clustered within the table.

If we run the same SQL as previously which only selects one ID value:

 

SQL> select * from bowie_assm where id = 42;

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |     1 |    22 |     4   (0)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |     1 |    22 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |     1 |    22 |     4   (0)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |     1 |       |     3   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |     1 |       |     3   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_2w1pss6qbdz6z |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID"=42)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
         4  consistent gets
         0  physical reads
         0  redo size
       707  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The CBO now uses the new Automatic Index as with just one row, the index is clearly more efficient regardless of the Clustering Factor value.

However, if we now run a query that selects a range of ID values, in this example between 42 and 4242 which represents only a relatively low 0.14% of the table:

 

SQL> select * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |  4202 | 92444 |  1934   (6)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |  4202 | 92444 |  1934   (6)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |  4202 | 92444 |  1934   (6)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_ASSM |  4202 | 92444 |  1934   (6)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("ID"<=4242 AND "ID">=42)
    filter("ID"<=4242 AND "ID">=42)

Statistics
----------------------------------------------------------
         8  recursive calls
         4  db block gets
     12138  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

The CBO decides to use a Full Table Scan as it deems the index with the massive Clustering Factor to be too expensive, with it having to visit differing blocks for the majority of the estimated 4202 rows (note at 4201 actual rows returned, this estimate by the CBO is practically spot on).

If we force the use of the index via an appropriate hint:

 

SQL> select /*+ index (bowie_assm) */ * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  4202 |       |    12   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  4202 |       |    12   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_2w1pss6qbdz6z |  4202 |       |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID">=42 AND "ID"<=4242)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
        26  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

Note at an estimated cost of 3530, this is greater than the 1934 cost of the FTS which explains why the CBO decides the FTS is best. However, if we look at the number of Consistent Gets, it’s only 26, meaning the CBO is actually getting these costs way wrong.

Why?

Because of the grossly inflated Clustering Factor.

As I’ve discussed previously, Oracle 12.1 introduced a new TABLE_CACHED_BLOCKS preference. Rather than the default value of 1, we can set this to any value up to 255. When calculating the Clustering Factor during statistics collection, it will NOT increment the Clustering Factor if the index visits a table block again that was one of the last “x” distinct table blocks visited. So by setting TABLE_CACHED_BLOCKS to (say) 42, if the index visits a block that was one of the last 42 distinct table blocks previously visited, don’t now increment the Clustering Factor. This can therefore generate a much more “accurate” Clustering Factor which can be significantly smaller than previously. This in turn makes the index much more efficient to the CBO because it then estimates far fewer table blocks need be accessed during a range scan.

So let’s change the TABLE_CACHED_BLOCKS value for this table to 42 (don’t increment now the Clustering Factor value when collecting statistics if we visit again any of the last 42 differently accessed table blocks) and recollect the segment statistics:

 

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE_ASSM', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM', cascade=>true);

PL/SQL procedure successfully completed.

 

If we now examine the new Clustering Factor value:

 

SQL> select index_name, auto, constraint_index, visibility, status, clustering_factor from user_indexes

where table_name='BOWIE_ASSM';

INDEX_NAME           AUT CON VISIBILIT STATUS   CLUSTERING_FACTOR
-------------------- --- --- --------- -------- -----------------
SYS_AI_2w1pss6qbdz6z YES NO  VISIBLE   VALID                11608

 

We can see that at just 11608 it’s substantially less than the previous 2504869.

If we now rerun the previous range scan SQL without the hint:

 

SQL> select * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  4202 | 92444 |    30   (4)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  4202 | 92444 |    30   (4)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |  4202 | 92444 |    30   (4)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  4202 |       |    12   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  4202 |       |    12   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_2w1pss6qbdz6z |  4202 |       |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID">=42 AND "ID"<=4242)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
        26  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

We can see the CBO now automatically uses the new Automatic Index. At a new cost of just 30, it’s substantially less than the previous index cost of 3530 and now much less than the 1934 for the FTS and so why the index is now automatically chosen by the CBO.

When Automatic Indexes are created, it’s usually a good idea to check on the Clustering Factor and because default ASSM tablespaces have a tendency to significantly escalate the values of index Clustering Factors, to look at recalculating them with an non-default setting of the TABLE_CACHED_BLOCKS statistics collection preference.

Of course, not only is this a good idea for Automatic Indexes, but for manually created indexes as well.

Although no doubt Autonomous Database Cloud services will look at these issues in the future, such self-tuning capabilities are not currently available. You will need to go in there and make these changes as necessary to fix the root issues with such inefficient SQL statements…

Oracle 19c Automatic Indexing: A More Complex Example (How Does The Grass Grow) June 16, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, CBO.
3 comments

In this post I’m going to put together in a slightly more complex SQL example a number of the concepts I’ve covered thus far in relation to the current implementation of Oracle Automatic Indexing.

I’ll begin by creating three tables, a larger TABLE1 and two smaller TABLE2 and TABLE3 lookup tables. Each table is created with only a Primary Key, Unique index and currently have no secondary indexes (this demo was run in an ATP Autonomous Cloud environment hence the odd parallel execution plans):

SQL> create table table1 (id number not null, code1 number not null, grade1 number not null, name1 varchar2(42));

Table created.

SQL> insert into table1 select rownum, mod(rownum, 1000)+1, mod(rownum, 200000)+1, 'David Bowie '|| rownum
from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index table1_id_i on table1(id);

Index created.

SQL> alter table table1 add primary key(id);

Table altered.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'TABLE1');

PL/SQL procedure successfully completed.


SQL> create table table2 (id number not null, stuff number not null, name2 varchar2(42));

Table created.

SQL> insert into table2 select rownum, mod(rownum, 100)+1, 'Ziggy Stardust ' || rownum from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index table2_id_i on table2(id);

Index created.

SQL> alter table table2 add primary key(id);

Table altered.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'TABLE2');

PL/SQL procedure successfully completed.


SQL> create table table3 (id number not null, code3 number not null, name3 varchar2(42));

Table created.

SQL> insert into table3 select rownum, mod(rownum, 500)+1, 'Thin White Duke ' || rownum from dual connect by level <=1000;

1000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index table3_id_i on table3(id);

Index created.

SQL> alter table table3 add primary key(id);

Table altered.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'TABLE3');

PL/SQL procedure successfully completed.

 

I’ll next run the following “complex” query a number of times:

 

SQL> select code1, grade1, name1, name2 from table1, table2
where table1.code1=table2.id and
table1.grade1 in (select table3.id from table3 where table3.code3=42);

10 rows selected.

Execution Plan
------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |             |    10 |   600 |     857  (7)| 00:00:01 |
|   1 |  PX COORDINATOR                  |             |       |       |             |          |
|   2 |   PX SEND QC (RANDOM)            | :TQ10000    |    10 |   600 |     857  (7)| 00:00:01 |
|   3 |    NESTED LOOPS                  |             |    10 |   600 |     857  (7)| 00:00:01 |
|   4 |     NESTED LOOPS                 |             |    10 |   600 |     857  (7)| 00:00:01 |
|*  5 |      HASH JOIN                   |             |    10 |   360 |     852  (7)| 00:00:01 |
|   6 |       JOIN FILTER CREATE         | :BF0000     |     2 |    16 |       2  (0)| 00:00:01 |
|*  7 |        TABLE ACCESS STORAGE FULL | TABLE3      |     2 |    16 |       2  (0)| 00:00:01 |
|   8 |       JOIN FILTER USE            | :BF0000     |  1000K|    26M|     833  (5)| 00:00:01 |
|   9 |        PX BLOCK ITERATOR         |             |  1000K|    26M|     833  (5)| 00:00:01 |
|* 10 |   TABLE ACCESS STORAGE FULL      | TABLE1      |  1000K|    26M|     833  (5)| 00:00:01 |
|* 11 |      INDEX UNIQUE SCAN           | TABLE2_ID_I |     1 |       |       0  (0)| 00:00:01 |
|  12 |     TABLE ACCESS BY INDEX ROWID  | TABLE2      |     1 |    24 |       1  (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

      5 - access("TABLE1"."GRADE1"="TABLE3"."ID")
      7 - storage("TABLE3"."CODE3"=42)
          filter("TABLE3"."CODE3"=42)
     10 - storage(SYS_OP_BLOOM_FILTER(:BF0000,"TABLE1"."GRADE1"))
          filter(SYS_OP_BLOOM_FILTER(:BF0000,"TABLE1"."GRADE1"))
     11 - access("TABLE1"."CODE1"="TABLE2"."ID")

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
      5777  consistent gets
         0  physical reads
         0  redo size
      1224  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
        10  rows processed

 

We note there are three predicates listed in which a column access could potentially benefit from an index being created:

table1.code1=table2.id (note that table2.id already has a unique index defined)

table1.grade1 in (Select…)

table3.code3=42

 

Let’s have a look at the corresponding Automatic Indexing report to see what the Oracle Automatic Indexing process made of this example:

 

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------

Activity start                  : 26-JUN-2019 08:56:08
Activity end                    : 26-JUN-2019 08:56:53
Executions completed            : 1
Executions interrupted          : 0
Executions with fatal error     : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                                : 3
Indexes created (visible / invisible)           : 2 (2 / 0)
Space used (visible / invisible)                : 18.94 MB (18.94 MB / 0 B)
Indexes dropped                                 : 0
SQL statements verified                         : 2
SQL statements improved (improvement factor)    : 1 (192.7x)
SQL plan baselines created                      : 0
Overall improvement factor                      : 170.2x

-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------

Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

 

We note there are 3 index candidates that were considered, BUT only 2 new indexes were actually created. Overall, the created indexes resulted in an estimated 192.7x improvement in the above SQL performance.

If we look further on in the report and at the actual indexes created:

 

INDEX DETAILS

-------------------------------------------------------------------------------
The following indexes were created:
------------------------------------------------------------------------
| Owner | Table  | Index                | Key    | Type   | Properties |
------------------------------------------------------------------------
| BOWIE | TABLE1 | SYS_AI_0p5nn18dt9bn1 | GRADE1 | B-TREE | NONE       |
| BOWIE | TABLE3 | SYS_AI_b4ntgxt6pdbfh | CODE3  | B-TREE | NONE       |
------------------------------------------------------------------------

VERIFICATION DETAILS

-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 21v8yqs9jrnzm
SQL Text             : select code1, grade1, name1, name2 from table1, table2
                       where table1.code1=table2.id and table1.grade1 in (select table3.id from table3 where table3.code3=42)
Improvement Factor   : 192.7x

Execution Statistics:
-----------------------------

                                Original Plan                 Auto Index Plan
                                ----------------------------  ----------------------------

Elapsed Time (s):               299396                        2170
CPU Time (s):                   289510                        1036
Buffer Gets:                    28912                         43
Optimizer Cost:                 857                           27
Disk Reads:                     0                             4
Direct Writes:                  0                             0
Rows Processed:                 50                            10
Executions:                     5                             1

 

We note the report states the two new indexes are created on the TABLE1.GRADE1 and TABLE3.CODE3 columns.

 

If look down further in the report and compare the before and after execution plans in the PLANS SECTION:

 

PLANS SECTION

- Original

-----------------------------
Plan Hash Value  : 1597150496
------------------------------------------------------------------------------------------------
| Id | Operation                          | Name        | Rows    | Bytes    | Cost | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |             |         |          |  857 |          |
|  1 |  PX COORDINATOR                    |             |         |          |      |          |
|  2 |    PX SEND QC (RANDOM)             | :TQ10000    |      10 |      600 |  857 | 00:00:01 |
|  3 |     NESTED LOOPS                   |             |      10 |      600 |  857 | 00:00:01 |
|  4 |      NESTED LOOPS                  |             |      10 |      600 |  857 | 00:00:01 |
|  5 |       HASH JOIN                    |             |      10 |      360 |  852 | 00:00:01 |
|  6 |        JOIN FILTER CREATE          | :BF0000     |       2 |       16 |    2 | 00:00:01 |
|  7 |         TABLE ACCESS STORAGE FULL  | TABLE3      |       2 |       16 |    2 | 00:00:01 |
|  8 |        JOIN FILTER USE             | :BF0000     | 1000000 | 28000000 |  833 | 00:00:01 |
|  9 |         PX BLOCK ITERATOR          |             | 1000000 | 28000000 |  833 | 00:00:01 |
| 10 |  TABLE ACCESS STORAGE FULL         | TABLE1      | 1000000 | 28000000 |  833 | 00:00:01 |
| 11 |       INDEX UNIQUE SCAN            | TABLE2_ID_I |       1 |          |    0 |          |
| 12 |      TABLE ACCESS BY INDEX ROWID   | TABLE2      |       1 |       24 |    1 | 00:00:01 |
------------------------------------------------------------------------------------------------

- With Auto Indexes

-----------------------------
Plan Hash Value  : 2014256176
----------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                 | Rows | Bytes | Cost | Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                      |   10 |   600 |   27 | 00:00:01 |
|   1 |   NESTED LOOPS                           |                      |   10 |   600 |   27 | 00:00:01 |
|   2 |    NESTED LOOPS                          |                      |   10 |   600 |   27 | 00:00:01 |
|   3 |     NESTED LOOPS                         |                      |   10 |   360 |   17 | 00:00:01 |
|   4 |      TABLE ACCESS BY INDEX ROWID BATCHED | TABLE3               |    2 |    16 |    3 | 00:00:01 |
| * 5 |       INDEX RANGE SCAN                   | SYS_AI_b4ntgxt6pdbfh |    2 |       |    1 | 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID BATCHED | TABLE1               |    5 |   140 |    7 | 00:00:01 |
| * 7 |       INDEX RANGE SCAN                   | SYS_AI_0p5nn18dt9bn1 |    5 |       |    2 | 00:00:01 |
| * 8 |     INDEX UNIQUE SCAN                    | TABLE2_ID_I          |    1 |       |    0 |          |
|   9 |    TABLE ACCESS BY INDEX ROWID           | TABLE2               |    1 |    24 |    1 | 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 5 - access("TABLE3"."CODE3"=42)
* 7 - access("TABLE1"."GRADE1"="TABLE3"."ID")
* 8 - access("TABLE1"."CODE1"="TABLE2"."ID")

 

We can see that the new plan uses both the new automatic indexes and has a new improved cost of 27 (down from 857).

 

If we look at all the indexes that have been created on these tables:

 

SQL> select table_name, index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name like 'TABLE%';

TABLE_NAME   INDEX_NAME             AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------ ---------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
TABLE1       TABLE1_ID_I            NO  NO  VISIBLE   DISABLED      VALID       1000000        2088              5202
TABLE1       SYS_AI_85dcbcnkgj0w0   YES NO  INVISIBLE DISABLED      UNUSABLE    1000000        2171            415525
TABLE1       SYS_AI_0p5nn18dt9bn1   YES NO  VISIBLE   DISABLED      VALID       1000000        2221           1000000
TABLE2       TABLE2_ID_I            NO  NO  VISIBLE   DISABLED      VALID         10000          20                44
TABLE3       TABLE3_ID_I            NO  NO  VISIBLE   DISABLED      VALID          1000           2                 5
TABLE3       SYS_AI_b4ntgxt6pdbfh   YES NO  VISIBLE   DISABLED      VALID          1000           3               999

SQL> select table_name, index_name, column_name, column_position from user_ind_columns
where table_name like 'TABLE%' order by table_name, index_name, column_position;

TABLE_NAME   INDEX_NAME             COLUMN_NAME     COLUMN_POSITION
------------ ---------------------- --------------- ---------------
TABLE1       SYS_AI_0p5nn18dt9bn1   GRADE1                        1
TABLE1       SYS_AI_85dcbcnkgj0w0   CODE1                         1
TABLE1       TABLE1_ID_I            ID                            1
TABLE2       TABLE2_ID_I            ID                            1
TABLE3       SYS_AI_b4ntgxt6pdbfh   CODE3                         1
TABLE3       TABLE3_ID_I            ID                            1

 

We note that both indexes listed as created in the Auto Indexing report on the TABLE1.GRADE1 (and TABLE3.CODE3 columns are both listed as being VISIBLE and VALID. Both of these indexes have been proven to improve the performance of the SQL statement by reducing the number of logical reads.

 

However, there is also an Auto Indexed defined on the other potential indexed column table1.code1, called “SYS_AI_85dcbcnkgj0w0” that has been left in an INVISIBLE, UNUSABLE state. This index has been shown to be ineffective in improving SQL performance and has been converted back to an UNUSABLE state.

If we run the query again and look at the resultant execution plan:

 

SQL> select code1, grade1, name1, name2
from table1, table2
where table1.code1=table2.id and
table1.grade1 in (select table3.id from table3 where table3.code3=24);

10 rows selected.

Execution Plan
-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                      |    10 |   600 |    15   (0)| 00:00:01 |
|   1 |  PX COORDINATOR                          |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                    | :TQ10000             |    10 |   600 |    15   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                          |                      |    10 |   600 |    15   (0)| 00:00:01 |
|   4 |     NESTED LOOPS                         |                      |    10 |   600 |    15   (0)| 00:00:01 |
|   5 |      NESTED LOOPS                        |                      |    10 |   360 |    10   (0)| 00:00:01 |
|   6 |       PX BLOCK ITERATOR                  |                      |       |       |            |          |
|*  7 |        TABLE ACCESS STORAGE FULL         | TABLE3               |     2 |    16 |     2   (0)| 00:00:01 | 
|   8 |       TABLE ACCESS BY INDEX ROWID BATCHED| TABLE1               |     5 |   140 |     4   (0)| 00:00:01 |
|*  9 |        INDEX RANGE SCAN                  | SYS_AI_0p5nn18dt9bn1 |     5 |       |     1   (0)| 00:00:01 |
|* 10 |      INDEX UNIQUE SCAN                   | TABLE2_ID_I          |     1 |       |     0   (0)| 00:00:01 |
|  11 |     TABLE ACCESS BY INDEX ROWID          | TABLE2               |     1 |    24 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - storage("TABLE3"."CODE3"=24)
       filter("TABLE3"."CODE3"=24)
   9 - access("TABLE1"."GRADE1"="TABLE3"."ID")
  10 - access("TABLE1"."CODE1"="TABLE2"."ID")

Statistics
----------------------------------------------------------
        10  recursive calls
         4  db block gets
        59  consistent gets
         0  physical reads
         0  redo size
      1122  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
        10  rows processed

 

We note the new execution plan now only uses one of newly created Automatic Indexes (SYS_AI_0p5nn18dt9bn1) and at just 59 consistent gets, is significantly more efficient than it was previously where it required 5777 consistent gets.

However, the other VISIBLE automatic index (SYS_AI_b4ntgxt6pdbfh on the BOWIE3 table) is NOT actually used in the new plan. Why?

Because as discussed previously, it’s entirely possible for the Auto Indexing process to consider a new plan with Auto Index to be more efficient because it uses less consistent gets BUT the CBO not use the plan because the plan has a higher cost. The plan generated by the Auto Index process has a cost of 27 and uses 43 buffer gets but the CBO uses the plan without the second Auto Index because it has a reduced cost of only 15, even though it uses 59 consistent gets.

So the Auto Indexing process can create any number of possible indexes for a particular query and may independently ultimately determine different states for the various candidate indexes and so only create and keep the necessary indexes to sufficiently improve an SQL.

We now have an SQL statement that automatic runs much more efficiently without human intervention thanks to these automatically created indexes…

Oracle 19c Automatic Indexing: Dropping Automatic Indexes (Fall Dog Bombs The Moon) May 12, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Drop Index, Index Rebuild, Oracle Indexes.
3 comments

reality

 

Julian Dontcheff recently wrote a nice article on the new Automatic Index Optimization feature available in the upcoming Oracle Database 20c release (I’ll of course blog about this new 20c feature in the near future).

Within the article, Julian mentioned a clever method of how to effectively drop Automatic Indexes that I thought would be worth checking out.

For a number of reasons (which I’ll cover in some detail in upcoming articles, but for now using Automatic Indexing in REPORT ONLY mode is but one reason), you can easily be left with an Automatic Index that might get in the way of things and you may want to drop it.

However, as we’ll see, you can’t easily drop an Automatic Index. The only “supported” manner to drop an Automatic Index is to wait for the Automatic Index Retention period to be exceeded (which is by default some 373 days and assumes the index is not used during this period).

Julian has come up with an alternate strategy.

By way of a demo, I currently have the following Automatic Index (SYS_AI_5zjkc60knz9zp):

SQL> select index_name, auto, status, visibility from user_indexes
where table_name='CRACKED_ACTOR';

INDEX_NAME                     AUT STATUS   VISIBILIT
------------------------------ --- -------- ---------
CRACKED_ACTOR_CODE1_CODE2_I    NO  VALID    VISIBLE
SYS_AI_5zjkc60knz9zp           YES VALID    INVISIBLE

SQL> select index_name, column_name, column_position from user_ind_columns
where index_name='SYS_AI_5zjkc60knz9zp';

INDEX_NAME                     COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------------------------ ---------------
SYS_AI_5zjkc60knz9zp           ID                                           1

 

So I have an Automatic Index on the ID column of the CRACKED_ACTOR table, BUT it’s currently INVISIBLE and so can’t be used be default by the CBO.

If I run the following query:

SQL> select * from cracked_actor where id=42;
Execution Plan
----------------------------------------------------------
Plan hash value: 786009234

-----------------------------------------------------------------------------------
| Id | Operation        | Name          | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT |               |    1 |    24 |       3 (0)| 00:00:01    |
|* 1 | TABLE ACCESS FULL| CRACKED_ACTOR |    1 |    24 |       3 (0)| 00:00:01    |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

     1 - filter("ID"=42)
Statistics
----------------------------------------------------------
         0 recursive calls
         0 db block gets
      7042 consistent gets
         0 physical reads
         0 redo size
       789 bytes sent via SQL*Net to client
       401 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
         1 rows processed

 

The CBO uses a Full Table Scan because the available Automatic Index on the ID column is current invisible.

If I try to convert it to being VISIBLE:

SQL> alter index "SYS_AI_5zjkc60knz9zp" visible;
alter index "SYS_AI_5zjkc60knz9zp" visible
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

I can’t, because you can’t alter an Automatic Index to be Visible/Invisible.

If I try to just drop the Automatic Index:

SQL> drop index "SYS_AI_5zjkc60knz9zp";
drop index "SYS_AI_5zjkc60knz9zp"
*
ERROR at line 1:
ORA-65532: cannot alter or drop automatically created indexes

Again, I can’t just simply drop an Automatic Index.

However, I am allowed to Rebuild (or Coalesce or Shrink) an Automatic Index. Therefore, I can create a new dummy tablespace and rebuild the Automatic Index to reside in this particular tablespace:

SQL> alter index "SYS_AI_5zjkc60knz9zp" rebuild tablespace bowie_stuff;

Index altered.

 

I can then drop this tablespace including all its contents (and hence drop the Automatic Index contained within):

SQL> drop tablespace bowie_stuff including contents and datafiles;

Tablespace dropped.

 

The problematic Automatic Index is now gone:

SQL> select index_name, auto, status, visibility from user_indexes
where table_name='CRACKED_ACTOR';

INDEX_NAME                     AUT STATUS   VISIBILIT
------------------------------ --- -------- ---------
CRACKED_ACTOR_CODE1_CODE2_I    NO  VALID     VISIBLE

I can now either manually create the necessary index or wait for the Automatic Index to now hopefully create a new, visible Automatic Index to address my query:

SQL> create index cracked_actor_id_i on cracked_actor(id);

Index created.

SQL> select index_name, auto, status, visibility from user_indexes where table_name='CRACKED_ACTOR';

INDEX_NAME                     AUT STATUS   VISIBILIT
------------------------------ --- -------- ---------
CRACKED_ACTOR_CODE1_CODE2_I    NO  VALID    VISIBLE
CRACKED_ACTOR_ID_I             NO  VALID    VISIBLE

 

I’ve now addressed the problematic query:

SQL> select * from cracked_actor where id=42;
Execution Plan
----------------------------------------------------------
Plan hash value: 4160941723

----------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name               | Rows | Bytes | Cost (%CPU)| Time        |
----------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                    |    1 |    24 |       2 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| CRACKED_ACTOR      |    1 |    24 |       2 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | CRACKED_ACTOR_ID_I |    1 |       |       1 (0)| 00:00:01    |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

      2 - access("ID"=42)
Statistics
----------------------------------------------------------
         1 recursive calls
         0 db block gets
         3 consistent gets
         0 physical reads
         0 redo size
       793 bytes sent via SQL*Net to client
       401 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
         1 rows processed

 

Thanks Julian for the cool tip 🙂

Oracle 19c Automatic Indexing: Mixing Manual and Automatic Indexes Part II (Stay) May 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Add Column To Existing Index, Automatic Indexing, Oracle Indexes.
2 comments

stay single

 

In my previous post, I discussed how Automatic Indexing did not recognise there was already an existing logically equivalent manually created index and so created effectively a redundant Automatic Index.

I also discussed previously how Automatic Indexing was clever enough to logically add new columns to existing Automatic Indexes if it determined such a new index can be used effectively for both previous and new workloads.

In this post, how will Automatic Indexing handle the scenario if a previously manually created index could also potentially be improved by adding a new column.

I’ll start by creating a table similar to my previous post but with more distinct values for the CODE3 column such that the test query will be more selective and so make the CBO favour the use of an index (Note: all examples are run on the OLTP Autonomous Database Cloud Service and hence the odd parallel execution plans):

SQL> create table major_tom6 (id number, code1 number, code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into major_tom6 select rownum, mod(rownum, 1000)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 100)), 'David Bowie' from dual connect by level = 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM6');

PL/SQL procedure successfully completed.

 

I’ll now manually create an index for BOTH combinations of the CODE2, CODE3 columns:

 

SQL> create index major_tom6_code2_code3_i on major_tom6(code2, code3);

Index created.

SQL> create index major_tom6_code3_code2_i on major_tom6(code3, code2);

Index created.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='MAJOR_TOM6';

INDEX_NAME                AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
MAJOR_TOM6_CODE2_CODE3_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       23697           9890973
MAJOR_TOM6_CODE3_CODE2_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       23697           9890973

 

If I now run the following query:

SQL> select * from major_tom6 where code2=42 and code3=42;

983 rows selected.

Execution Plan
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                          |  1000 | 28000 |   997   (1)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                          |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001                 |  1000 | 28000 |   997   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MAJOR_TOM6               |  1000 | 28000 |   997   (1)| 00:00:01 |
|   4 |     BUFFER SORT                       |                          |       |       |            |          |
|   5 |      PX RECEIVE                       |                          |  1000 |       |     5   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000                 |  1000 |       |     5   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                          |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | MAJOR_TOM6_CODE2_CODE3_I |  1000 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     8 - access("CODE2"=42 AND "CODE3"=42)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       971  consistent gets
         0  physical reads
         0  redo size
     27836  bytes sent via SQL*Net to client
      1303  bytes received via SQL*Net from client
        67  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
       983  rows processed

 

The CBO favours the use of an index as with just 983 rows returned from a 10M row table, the index is the cheaper access method.

If I now run the following SQL which also includes the more selectively CODE1 column predicate as well (which returns just 1 row):

SQL> select * from major_tom6 where code1=42 and code2=42 and code3=42;

Execution Plan
------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                          |     1 |    28 |   997   (1)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                          |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001                 |     1 |    28 |   997   (1)| 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| MAJOR_TOM6               |     1 |    28 |   997   (1)| 00:00:01 |
|   4 |     BUFFER SORT                       |                          |       |       |            |          |
|   5 |      PX RECEIVE                       |                          |  1000 |       |     5   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000                 |  1000 |       |     5   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                          |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | MAJOR_TOM6_CODE2_CODE3_I |  1000 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     3 - filter("CODE1"=42)
     8 - access("CODE2"=42 AND "CODE3"=42)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       971  consistent gets
         0  physical reads
         0  redo size
       867  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The CBO again uses the same index based on columns CODE2, CODE3 as this has already been proven to be more efficient than a FTS. However, an index that also included the CODE1 column would be even more efficient as the CBO could simply use this index to fetch just the row(s) of interest, without having to perform the unnecessary filtering on the CODE1 column.

So what does Automatic Indexing do in this scenario? If we look at the corresponding Auto Indexing Report:

 

INDEX DETAILS

-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
---------------------------------------------------------------------------------------
| Owner | Table      | Index                | Key               | Type   | Properties |
---------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM6 | SYS_AI_4nc6g08bw8db7 | CODE2,CODE3,CODE1 | B-TREE | NONE       |
---------------------------------------------------------------------------------------

 

We notice Auto Indexing has created a new index based on columns CODE2, CODE3, CODE1, however it has NOT dropped any indexes.

If we look at the Verification section of the Auto Indexing Report:

 

VERIFICATION DETAILS

-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 93zw1kj4n43n9
SQL Text             : select * from major_tom6 where code1=42 and code2=42 and code3=42
Improvement Factor   : 972.2x

Execution Statistics:
-----------------------------

                              Original Plan                 Auto Index Plan
                              ----------------------------  ----------------------------
Elapsed Time (s):             159122                        1240
CPU Time (s):                 70379                         1320
Buffer Gets:                  10698                         4
Optimizer Cost:               997                           4
Disk Reads:                   0                             2
Direct Writes:                0                             0
Rows Processed:               11                            1
Executions:                   11                            1

 

We can see the index was created because of a 972.2x improvement in the performance of the SQL query I ran.

If we look at the details of the indexes that now exist on the table:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='MAJOR_TOM6';

INDEX_NAME                AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
MAJOR_TOM6_CODE2_CODE3_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       23556           9890973
MAJOR_TOM6_CODE3_CODE2_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       24029           9890973
SYS_AI_4nc6g08bw8db7      YES NO  VISIBLE   DISABLED      VALID      10000000       29125           9999444

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='MAJOR_TOM6' order by index_name, column_position;

INDEX_NAME                COLUMN_NAME          COLUMN_POSITION
------------------------- -------------------- ---------------
MAJOR_TOM6_CODE2_CODE3_I  CODE2                              1
MAJOR_TOM6_CODE2_CODE3_I  CODE3                              2
MAJOR_TOM6_CODE3_CODE2_I  CODE3                              1
MAJOR_TOM6_CODE3_CODE2_I  CODE2                              2
SYS_AI_4nc6g08bw8db7      CODE2                              1
SYS_AI_4nc6g08bw8db7      CODE3                              2
SYS_AI_4nc6g08bw8db7      CODE1                              3

 

We notice a couple of key points.

Firstly, even though the previously created manual index on the columns (CODE2, CODE3) is now totally redundant because it has the same column list as the leading columns of the newly created Auto Index based on the columns (CODE2, CODE3, CODE1), Auto Indexing does NOT automatically drop the manually created index.

Auto Indexing ONLY automatically drops and logically recreates Auto Indexes.

Secondly, Auto Indexing is certainly aware of the previous workload because it has created the new Auto Index with the column list (CODE2, CODE3, CODE1) and NOT in the default CODE1, CODE2, CODE3 column order (as defined in the table definition).

This suggests Auto Indexing is indeed trying to create a new index that is able to cater for all known SQL workloads (predicates on just the CODE2, CODE3 columns and predicates on columns CODE1, CODE2, CODE3).

However, Auto Indexing does not (yet) have the capability to logically modify or drop obviously redundant manually created indexes (it can only do so on previously created Auto Indexes). This is likely one of the reasons why Oracle has provided us with the DROP_SECONDARY_INDEXES procedure in order to get rid of all those annoying manually created secondary indexes that can get in the way of an optimal indexing strategy.

More on DROP_SECONDARY_INDEXES in future posts.

Oracle 19c Automatic Indexing: Mixing Manual and Automatic Indexes Part I (I Can’t Read) April 21, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, CBO, Clustering Factor, Mixing Auto and Manual Indexes, Oracle Indexes.
1 comment so far

tin machine album

In previous articles, I discussed how Automatic Indexing has the capability to add columns or reorder the column list of previously created Automatic Indexes. However, how does Automatic Indexing handle these types of scenarios with regard to existing manually created indexes?

To investigate, let’s create a table identical to the table I created in my previous blog post where Automatic Indexing created an index that was ultimately not used by the CBO because although Automatic Indexing finds the new index more efficient, the CBO costs it as being too expensive and ignores it.

SQL> create table major_tom5 (id number, code1 number, code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into major_tom5 select rownum, mod(rownum, 1000)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 10)),  'David Bowie' from dual connect by level = 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM5');

PL/SQL procedure successfully completed.

However, in this demo, I’m going to first create a manual index, but with the column list in CODE3, CODE2 order. This is the opposite order in which a default Automatic Index would be created (CODE2, CODE3 order) as this is the order of the columns in the table definition:

SQL&gt; create index major_tom5_code3_code2_i on major_tom5(code3, code2);

Index created.

SQL&gt; select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor <span style="color:var(--color-text);">from user_indexes where table_name='MAJOR_TOM5';</span>

INDEX_NAME                AUT CON VISIBILIT COMPRESSION   STATUS   NUM_ROWS   LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
MAJOR_TOM5_CODE3_CODE2_I  NO  NO  VISIBLE   DISABLED      VALID      10000000       24181           8974538

The resultant index has a terrible Clustering Factor of 8974538 on a 10M row table.

If we run the following query with filtering predicates on these 2 indexed columns:

SQL> select * from major_tom5 where code3=4 and code2=42;

10051 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |  9982 |   272K|  7355   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |  9982 |   272K|  7355   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |  9982 |   272K|  7355   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM5 |  9982 |   272K|  7355   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - storage("CODE2"=42 AND "CODE3"=4)
       filter("CODE2"=42 AND "CODE3"=4)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
     45888  consistent gets
        68  physical reads
      5256  redo size
    149822  bytes sent via SQL*Net to client
       610  bytes received via SQL*Net from client
         4  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
     10051  rows processed

The CBO decides to NOT use the available index as it deems it too expensive, especially with such a poor Clustering Factor, to return the resultant 10,051 rows.

But what will Automatic Indexing do now. If we wait the 15 minute period until the next Automatic Indexing period and look at the resultant Automatic Indexing report:

INDEX DETAILS

-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
---------------------------------------------------------------------------------
| Owner | Table      | Index                | Key         | Type   | Properties |
---------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM5 | SYS_AI_2ajmncxsmg189 | CODE2,CODE3 | B-TREE | NONE       |
---------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : fmpwux2ptvasq
SQL Text             : select * from major_tom5 where code2=42 and code3=4
Improvement Factor   : 5.1x

Automatic Indexing has created a new index based on the column list CODE2, CODE3, because it considers such an index would improve performance of the query by a factor of 5.1x.

However, it has not recognised that the existing manual index based the column list CODE3, CODE2 would have done precisely the same job.

If we look further on in the Automatic Indexing report:

Execution Statistics:
-----------------------------
                              Original Plan                 Auto Index Plan
                              ----------------------------  ----------------------------
Elapsed Time (s):             993225                        26436
CPU Time (s):                 963727                        22535
Buffer Gets:                  137756                        9000
Optimizer Cost:               7355                          9069
Disk Reads:                   0                             26
Direct Writes:                0                             0
Rows Processed:               30153                         10051
Executions:                   3                             1

PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------

Plan Hash Value  : 2129981950
---------------------------------------------------------------------------------------
| Id | Operation                      | Name       | Rows   | Bytes  | Cost  | Time    |
---------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |            |        |        |  7355 |         |
|  1 |  PX COORDINATOR                |            |        |        |       |         |
|  2 |    PX SEND QC (RANDOM)         | :TQ10000   |  10000 | 280000 |  7355 | 00:00:01|
|  3 |     PX BLOCK ITERATOR          |            |  10000 | 280000 |  7355 | 00:00:01|
|  4 |      TABLE ACCESS STORAGE FULL | MAJOR_TOM5 |  10000 | 280000 |  7355 | 00:00:01|
---------------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------

Plan Hash Value  : 459198994
---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes  | Cost | Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      | 10159 | 284452 | 9069 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | MAJOR_TOM5           | 10159 | 284452 | 9069 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_2ajmncxsmg189 | 10051 |        |   27 | 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE2"=42 AND "CODE3"=4)

We notice the new execution plan using the newly created Automatic Index actually has a greater CBO cost than the previous FTS execution plan.

As we discussed in the previous post on when Automatic Indexing creating indexes that are not ultimately used by the CBO, although Automatic Indexing has indeed created this index because it has determined it’s going to be more efficient by a factor of 5.1x due to the reduction in Buffer Gets (137756 buffer gets old plan / 3 executions = 45,919 / 9000 buffer gets with index = 5.1), the CBO considers the execution plan using the Automatic Index to have a larger cost at 9069 than the previous FTS cost at just 7355.

Again just as with the existing, logically equivalent manually created index, the reason why the new Automatic Index is deemed too expensive by the CBO is because it likewise has the same terrible Clustering Factor:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='MAJOR_TOM5';

INDEX_NAME               AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
MAJOR_TOM5_CODE3_CODE2_I NO  NO  VISIBLE   DISABLED      VALID      10000000       24181           8974538
SYS_AI_2ajmncxsmg189     YES NO  VISIBLE   DISABLED      VALID      10000000       23697           8974538

If we re-run the initial query again with the newly created Visible/Valid Automatic Index:

SQL> select * from major_tom5 where code3=4 and code2=42;

10051 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |  9982 |   272K|  7355   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |  9982 |   272K|  7355   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |  9982 |   272K|  7355   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM5 |  9982 |   272K|  7355   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

    4 - storage("CODE2"=42 AND "CODE3"=4)
        filter("CODE2"=42 AND "CODE3"=4)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
     45888  consistent gets
        68  physical reads
      5256  redo size
    149822  bytes sent via SQL*Net to client
       610  bytes received via SQL*Net from client
         4  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
     10051  rows processed

The CBO ignores the newly created Automatic Index as it did the logically equivalent manually created index and uses the previous, cheaper FTS execution plan.

Automatic Indexing was NOT able to recognise that we already had an equivalent manually created index and so now we have TWO indexes that the CBO simply ignores as being too expensive…

More on mixing Automatic and Manual Indexes on my next post.

Oracle 19c Automatic Indexing: Adding Columns To Existing Automatic Indexes (2+2=5) April 7, 2020

Posted by Richard Foote in 19c, 19c New Features, Add Column To Existing Index, Automatic Indexing, Oracle Indexes.
2 comments

2+2=5 Single

 

In my previous post, I discussed how when the following query is run:

select * from major_tom3 where code3=4 and code2=42;

the Automatic Indexing process will create an index on (CODE2, CODE3) but ultimately not use the index as the CBO considers the corresponding index based execution plan too expensive.

I’m going to expand on the demo and run now the following SQL for the first time (note these examples are run on the OLTP Autonomous Cloud service which explains the odd default parallel based execution plans):

SQL> select * from major_tom3
where code1=42 and code3=4 and code2=42;

10 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    10 |   280 |  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |    10 |   280 |  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |    10 |   280 |  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM3 |    10 |   280 |  7354   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("CODE1"=42 AND "CODE2"=42 AND "CODE3"=4)
    filter("CODE1"=42 AND "CODE2"=42 AND "CODE3"=4)

Statistics
----------------------------------------------------------
         6 recursive calls
         0 db block gets
     45853 consistent gets
         0 physical reads
         0 redo size
      1032 bytes sent via SQL*Net to client
       588 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
        10 rows processed

 

This query has a predicate that includes both CODE2 and CODE3 filtering columns as previously, but now also a new filtering column on CODE1. This now makes the resultant SQL much more selective than the previous SQL, returning just 10 rows where the previous SQL returned 9968 rows.

The current explain plan still uses the previous Full Table Scan, as the only index available is the Automatic Index created previously based on CODE2, CODE3 which has already been shown to be too expensive to return the necessary rows. The additional filtering predicate based on CODE1 doesn’t make the index any more efficient, it still has to access the 9968 rows that match the CODE2, CODE3 predicates and then filter out most of the rows less the 10 that are actually required.

So what does the Automatic Index process do in this scenario?

Let’s look at the resultant Automatic Index report:

 

INDEX DETAILS

-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------------
| Owner | Table      | Index                | Key               | Type   | Properties |
---------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM3 | SYS_AI_cy8rs2dqb0nrp | CODE2,CODE3,CODE1 | B-TREE | NONE       |
---------------------------------------------------------------------------------------

The following indexes were dropped:
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Owner | Table      | Index                | Key         | Type   | Properties|
---------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM3 | SYS_AI_bnyacywycxx8b | CODE2,CODE3 | B-TREE | NONE      |
---------------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

So the first thing to note is that Oracle first creates a new index based on columns CODE2,CODE3,CODE1.

It then drops the previously created index based on the CODE2,CODE3 columns.

If we look at the Verification Details section of the report:

VERIFICATION DETAILS

-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 1hv3d685x2cy4
SQL Text             : select * from major_tom3 where code1=43 and code3=4 and code2=42
Improvement Factor   : 45853.6x

-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 97by2q15zprgc
SQL Text            : select * from major_tom3 where code1=42 and code3=4 and code2=42
Improvement Factor  : 45856.2x

Execution Statistics:
-----------------------------
                     Original Plan                Auto Index Plan
                     ---------------------------- ----------------------------
Elapsed Time (s):    2815446                      1046
CPU Time (s):        2741134                      1013
Buffer Gets:         596135                       13
Optimizer Cost:      7354                         13
Disk Reads:          0                            2
Direct Writes:       0                            0
Rows Processed:      130                          10
Executions:          13                           1
PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 2354969370

--------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows | Bytes | Cost | Time     |
--------------------------------------------------------------------------------
| 0  | SELECT STATEMENT          |            |      |       | 7354 |          |
| 1  | PX COORDINATOR            |            |      |       |      |          |
| 2  | PX SEND QC (RANDOM)       | :TQ10000   |   10 |   280 | 7354 | 00:00:01 |
| 3  | PX BLOCK ITERATOR         |            |   10 |   280 | 7354 | 00:00:01 |
| 4  | TABLE ACCESS STORAGE FULL | MAJOR_TOM3 |   10 |   280 | 7354 | 00:00:01 |
--------------------------------------------------------------------------------

Notes
-----
- dop = 2
- px_in_memory_imc = no
- px_in_memory = no
- With Auto Indexes
-----------------------------
Plan Hash Value : 2892362571

-----------------------------------------------------------------------------------------------------
|  Id | Operation                           | Name                 | Rows | Bytes | Cost | Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   10 |   280 |   13 | 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | MAJOR_TOM3           |   10 |   280 |   13 | 00:00:01 |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_cy8rs2dqb0nrp |   10 |       |    3 | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE2"=42 AND "CODE3"=4 AND "CODE1"=42)
Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

So what’s going on here?

Oracle has correctly determined that an index based on all 3 columns would enable a much more efficient access path for the new SQL, by a factor of 45856.2x no less.

In order to service BOTH known queries that the Automatic Index process has determined would benefit from an index based on predicates (CODE2=42 and CODE3=4) and (CODE2=42 and CODE3=4 and CODE1=42), a single index based on CODE2,CODE3,CODE1 would suffice.

As such, the existing index based on just CODE2,CODE3 is redundant as the new index has the same leading columns. Therefore, the existing index can be safely dropped.

If we look at the definition of the indexes on this table:

SQL> select index_name, auto, constraint_index, visibility, status compression from user_indexes where table_name='MAJOR_TOM3';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS
-------------------- --- --- --------- ------------- --------
SYS_AI_cy8rs2dqb0nrp YES NO  VISIBLE   DISABLED      VALID

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='MAJOR_TOM3' order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_cy8rs2dqb0nrp CODE2                              1
SYS_AI_cy8rs2dqb0nrp CODE3                              2
SYS_AI_cy8rs2dqb0nrp CODE1                              3

 

We notice we now have just the new index, which is both VISIBLE and VALID, based on columns in CODE2,CODE3,CODE1 order.

So Automatic Indexing is clever enough to recognise the scenario where a new index can replace an existing index by adding additional columns to cater for new SQL workloads.

Now that’s rather impressive…

Much more on Automatic Indexing to come.