Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.add a comment
As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).
As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.
One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).
If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:
SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 101K| 2388K| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=42 AND "CODE2"=42) filter("CODE1"=42 AND "CODE2"=42) Note ----- - dynamic statistics used: dynamic sampling (level=AUTO) - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 609 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.
Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true) in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.
In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.
We currently have the following columns and statistics on the IGGY_POP table:
SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM -------------------- ------------ ---------- ----------- --------------- ID 9705425 0 254 HYBRID CODE1 100 .00000005 100 FREQUENCY CODE2 100 .00000005 100 FREQUENCY NAME 1 5.0210E-08 1 FREQUENCY
If we now collect Extended Statistics on both CODE1, CODE2 columns:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254'); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM ------------------------------ ------------ ---------- ----------- --------------- ID 9705425 0 254 HYBRID CODE1 100 .00000005 100 FREQUENCY CODE2 100 .00000005 100 FREQUENCY NAME 1 5.0210E-08 1 FREQUENCY SYS_STU#29QF8Y9BUDOW2HCDL47N44 99 .00000005 100 FREQUENCY
The CBO now has some idea on the cardinality if both columns are used within a predicate.
If we re-run the problematic query without the hint:
SQL> select * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 100K| 2343K| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=42 AND "CODE2"=42) filter("CODE1"=42 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 581 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.
However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:
SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50000 | 878K | 575 (15) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 | 878K | 575 (15) | 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=1 AND "CODE2"=42) filter("CODE1"=1 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 368 bytes sent via SQL*Net to client 377 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.
This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.
So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.
As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…
Unique Indexes Force Hints To Be “Ignored” Part II (One Of The Few) February 19, 2019
Posted by Richard Foote in CBO, Hash Join, Hints, Oracle Indexes, Transitive Closure, Unique Indexes.5 comments
In Part I, I showed a demo of how the introduction of a Unique Index appears to force a hint to be “ignored”. This is a classic case of what difference a Unique Index can make in the CBO deliberations.
So what’s going on here?
When I run the first, un-hinted query:
SQL> select * from bowie1, bowie2 where bowie1.id=bowie2.code and bowie1.id=1;
Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("BOWIE1"."ID"=1) 6 - access("BOWIE2"."CODE"=1)
------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50 | 1150 | 5 (0) | 00:00:01 | | 1 | MERGE JOIN CARTESIAN | | 50 | 1150 | 5 (0) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE1 | 1 | 16 | 2 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | BOWIE1_ID_I | 1 | | 1 (0) | 00:00:01 | | 4 | BUFFER SORT | | 50 | 350 | 3 (0) | 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 50 | 350 | 3 (0) | 00:00:01 | |* 6 | INDEX RANGE SCAN | BOWIE2_CODE_I | 50 | | 1 (0) | 00:00:01 | -------------------------------------------------------------------------------------------------------
SQL> select /*+ use_hash(bowie2) */ * from bowie1, bowie2 where bowie1.id=bowie2.code and bowie1.id=1;
Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("BOWIE1"."ID"="BOWIE2"."CODE") 3 - access("BOWIE1"."ID"=1) 5 - access("BOWIE2"."CODE"=1)
------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 50 | 1150 | 5 (0) | 00:00:01 | |* 1 | HASH JOIN | | 50 | 1150 | 5 (0) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE1 | 1 | 16 | 2 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | BOWIE1_ID_I | 1 | | 1 (0) | 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 50 | 350 | 3 (0) | 00:00:01 | |* 5 | INDEX RANGE SCAN | BOWIE2_CODE_I | 50 | | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("BOWIE1"."ID"=1) 5 - access("BOWIE2"."CODE"=1)
------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 50 | 1150 | 5 (0) | 00:00:01 | | 1 | NESTED LOOPS | | 50 | 1150 | 5 (0) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID | BOWIE1 | 1 | 16 | 2 (0) | 00:00:01 | |* 3 | INDEX UNIQUE SCAN | BOWIE1_ID_I | 1 | | 1 (0) | 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 50 | 350 | 3 (0) | 00:00:01 | |* 5 | INDEX RANGE SCAN | BOWIE2_CODE_I | 50 | | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------
Unique Indexes Force Hints To Be “Ignored” Part I (What’s Really Happening) February 5, 2019
Posted by Richard Foote in CBO, Hash Join, Hints, Non-Unique Indexes, Oracle Indexes, Unique Indexes.2 comments
As I was compiling my new “Oracle Diagnostics and Performance Tuning” seminar, I realised there were quite a number of indexing performance issues I haven’t discussed here previously.
The following is a classic example of what difference a Unique Index can have over a Non-Unique index, while also covering the classic myth that Oracle sometimes chooses to “ignore” hints that might be present in your SQL.
To set the scene, I’m going to create two simple little tables, but importantly initially create only non-unique indexes for columns of interest (Note: I’ve had to remove the “<” in the “<=” predicate when populating the table to avoid formatting issues):
SQL> create table bowie1 as select rownum id, 'David Bowie' name from dual connect by level = 1000; Table created. SQL> create table bowie2 as select rownum id, mod(rownum,20)+1 code from dual connect by level = 1000; Table created. SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=>'BOWIE1', method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=>'BOWIE2', method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> create index bowie1_id_i on bowie1(id); Index created. SQL> create index bowie2_id_i on bowie2(id); Index created. SQL> create index bowie2_code_i on bowie2(code); Index created.
I’m now going to run the following query which does a simple join between the two tables and filters on the ID column from the BOWIE1 table:
SQL> select * from bowie1, bowie2 where bowie1.id=bowie2.code and bowie1.id=1; 50 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4266778954 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50 | 1150 | 5 (0) | 00:00:01 | | 1 | MERGE JOIN CARTESIAN | | 50 | 1150 | 5 (0) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE1 | 1 | 16 | 2 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | BOWIE1_ID_I | 1 | | 1 (0) | 00:00:01 | | 4 | BUFFER SORT | | 50 | 350 | 3 (0) | 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 50 | 350 | 3 (0) | 00:00:01 | |* 6 | INDEX RANGE SCAN | BOWIE2_CODE_I | 50 | | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("BOWIE1"."ID"=1) 6 - access("BOWIE2"."CODE"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 1815 bytes sent via SQL*Net to client 641 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 50 rows processed
The query uses a MERGE JOIN which I (incorrectly) think is a concern and decide that a HASH JOIN should be a better option. So I now put in a basic USE_HASH hint:
SQL> select /*+ use_hash(bowie2) */ * from bowie1, bowie2 where bowie1.id=bowie2.code and bowie1.id=1; 50 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1413846643 ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 50 | 1150 | 5 (0) | 00:00:01 | |* 1 | HASH JOIN | | 50 | 1150 | 5 (0) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE1 | 1 | 16 | 2 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | BOWIE1_ID_I | 1 | | 1 (0) | 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 50 | 350 | 3 (0) | 00:00:01 | |* 5 | INDEX RANGE SCAN | BOWIE2_CODE_I | 50 | | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("BOWIE1"."ID"="BOWIE2"."CODE") 3 - access("BOWIE1"."ID"=1) 5 - access("BOWIE2"."CODE"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 15 consistent gets 0 physical reads 0 redo size 1815 bytes sent via SQL*Net to client 641 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 50 rows processed
And the hint has worked as I had hoped.
I then decide that perhaps a Unique Index on the ID column might be a good idea (perhaps because I read up on all the advantages on Unique Indexes in this blog). So I drop and recreate the index as a Unique Index:
SQL> drop index bowie1_id_i; Index dropped. SQL> create unique index bowie1_id_i on bowie1(id); Index created.
I now re-run my hinted query to again use the Hash Join:
SQL> select /*+ use_hash(bowie2) */ * from bowie1, bowie2 where bowie1.id=bowie2.code and bowie1.id=1; 50 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4272245076 ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 50 | 1150 | 5 (0) | 00:00:01 | | 1 | NESTED LOOPS | | 50 | 1150 | 5 (0) | 00:00:01 | | 2 | TABLE ACCESS BY INDEX ROWID | BOWIE1 | 1 | 16 | 2 (0) | 00:00:01 | |* 3 | INDEX UNIQUE SCAN | BOWIE1_ID_I | 1 | | 1 (0) | 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 50 | 350 | 3 (0) | 00:00:01 | |* 5 | INDEX RANGE SCAN | BOWIE2_CODE_I | 50 | | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("BOWIE1"."ID"=1) 5 - access("BOWIE2"."CODE"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 15 consistent gets 0 physical reads 0 redo size 1815 bytes sent via SQL*Net to client 641 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 50 rows processed
And we notice the hint is now being “ignored”. The hint isn’t really ignored, it’s just no longer relevant to how the CBO has now constructed the query and associated plan with the Unique Index now making a key difference (no pun intended).
I’ll discuss in Part II why the Unique Index has made such a difference and why the hint is no longer viable.
Of course, to learn all this and a lot lot more, you can always attend my new Oracle Diagnostics and Performance Tuning” seminar one day 🙂