jump to navigation

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
add a comment

 

In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.

However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.

Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).

These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.

One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.

This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1005K|   135M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |  1005K|   135M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.

However, if we now remove the histogram on the CODE column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’);

PL/SQL procedure successfully completed.

 

There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.

So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 16:42:33
Activity end                 : 18-AUG-2020 16:43:06
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created                              : 0
Space used                                   : 0 B
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved                      : 0
SQL plan baselines created (SQL statements)  : 1 (1)
Overall improvement factor                   : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.

If we look at the Verification Details part of this report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The following SQL plan baselines were created:
-------------------------------------------------------------------------------
Parsing Schema Name     : BOWIE
SQL ID                  : 3yz8unzhhvnuz
SQL Text                : select /*+ dynamic_sampling(0) */ * from
space_oddity where code in (190000, 170000, 150000,
130000, 110000, 90000, 70000, 50000, 30000, 10000)
SQL Signature           : 3910785437403172730
SQL Handle              : SQL_3645e6a2952fcf7a
SQL Plan Baselines (1)  : SQL_PLAN_3cjg6naakzmvu198c05b9

We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.

Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.

Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.

Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.

This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:

 

SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status
from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz';

EXECUTION_NAME             ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS
-------------------------- -------------------- ---------------------- ---------
SYS_AI_2020-08-18/16:42:33                41169                 410291 REGRESSED

 

If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Hint Report (identified by operation id / Query Block Name / Object Alias):

Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 -  SEL$1
U -  dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS

Note
-----

- SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          4  db block gets
      41170  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.

So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Prettiest Star)) September 25, 2020

Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
1 comment so far

In my previous few blog posts, I’ve been discussing some issues in relation to how Automatic Indexes handle SQL statements that accesses skewed data. In this post, I’m going to setup the scenario in which Automatic Indexing can potentially use Baselines to help address some of these issues. BUT, as we’ll see, I’m having to manufacture things somewhat to make this work due to the problem of the Automatic Indexing task using Dynamic Sampling of level 11, whereas most usual database sessions do not.

To set things up, I’m going recap what I’ve previously discussed (but with a slight difference), by creating a table that has significant data skew on the CODE column, with most values very uncommon, but with a handful of values being very common:

SQL> create table space_oddity (id number constraint space_oddity_pk primary key, code number, name varchar2(142));

Table created.

SQL> begin
2     for i in 1..2000000 loop
3       if mod(i,2) = 0 then
4          insert into space_oddity values(i, ceil(dbms_random.value(0,1000000)), 'David Bowie is really Ziggy Stardust and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history');
5       else
6          insert into space_oddity values(i, mod(i,20)*10000, 'Ziggy Stardust is really David Bowie and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history.');
7       end if;
8     end loop;
9     commit;
10  end;
11  /

PL/SQL procedure successfully completed.

 

So most CODE values will only occur a few times if at all, but a few values divisible by 10000 have many many occurrences within the table.

Importantly, we will initially collect statistics with NO histograms on the CODE column, which is the default behaviour anyways if no SQL has previous run with predicates on the column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 

If we run a query based on a rare value for CODE:

SQL> set arraysize 5000

SQL> select * from space_oddity where code=25;

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |     3 |   423 | 11356   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |     3 |   423 | 11356   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=25)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      40974  consistent gets
          0  physical reads
          0  redo size
       1018  bytes sent via SQL*Net to client
        402  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

 

Without an index, the CBO has no choice at this point but to perform a FTS. BUT note that the 2 rows returned is very similar to the 3 estimated rows, which would make an index likely the way to go if such an index existed.

However, the following SQL accesses many of the common values of CODE and returns many rows:

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Again, without an index in place, the CBO has no choice but to perform a FTS but this is almost certainly the way to go regardless. BUT without a histogram on the CODE column, the CBO has got the cardinality estimate way way off and thinks only 32 rows are to be returned and not the actual 1000011 rows.

So what does Automatic Indexing make of things. Let’s wait and have a look at the next Automatic Indexing Report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 15:57:14
Activity end                 : 18-AUG-2020 15:58:10
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                              : 1
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 35.65 MB (35.65 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 1
SQL statements improved (improvement factor)  : 1 (40984.3x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 40984.3x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
----------------------------------------------------------------------------
| Owner | Table        | Index                | Key  | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | SPACE_ODDITY | SYS_AI_82bdnqs7q8rtm | CODE | B-TREE | NONE       |
----------------------------------------------------------------------------

 

So Automatic Indexing has indeed created the index (SYS_AI_82bdnqs7q8rtm) on the CODE column BUT this is based on only the one SQL statement:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name  : BOWIE
SQL ID               : 19sv1g6tt0g1y
SQL Text             : select * from space_oddity where code=25
Improvement Factor   : 40984.3x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  5417408                       139265
CPU Time (s):      1771880                       7797
Buffer Gets:       327876                        5
Optimizer Cost:    11356                         5
Disk Reads:        649                           2
Direct Writes:     0                             0
Rows Processed:    16                            2
Executions:        8                             1

 

The Automatic Indexing task has correctly identified a significant improvement of 40984.3x when using an index on the SQL statement that returned just the 2 rows. The other SQL statement that returns many rows IS NOT MENTIONED.

This is because the Automatic Indexing tasks uses Dynamic Sampling Level=11, meaning it determines the more accurate cardinality estimate on the fly and correctly identifies that a vast number of rows are going to be returned. As a result, it correctly determines that the new Automatic Indexing if used would be detrimental to performance and would not be used by the CBO.

BUT most importantly, it also makes the assumption that the CBO would automatically likewise make this same decision to NOT use any such index in other database sessions and so there’s nothing to protect.

BUT this assumption is incorrect IF other database sessions don’t likewise use Dynamic Sampling with Level=11.

BUT by default, including in Oracle’s Autonomous Database Transaction Processing Cloud environment, the Dynamic Sampling Level is NOT set to 11, but the 2.

Therefore, most database sessions will not be able to determine the correct cardinality estimate on the fly and so will incorrectly assume the number of returned rows is much less than in reality and potentially use any such new Automatic Index inappropriately…

So if we look at the Plans Section of the Automatic Indexing report:

 

PLANS SECTION

---------------------------------------------------------------------------------------------
- Original
-----------------------------

Plan Hash Value  : 2301175572
-----------------------------------------------------------------------------
| Id | Operation           | Name         | Rows | Bytes | Cost  | Time     |
-----------------------------------------------------------------------------
|  0 | SELECT STATEMENT    |              |      |       | 11356 |          |
|  1 |   TABLE ACCESS FULL | SPACE_ODDITY |    3 |   423 | 11356 | 00:00:01 |
-----------------------------------------------------------------------------

- With Auto Indexes

-----------------------------
Plan Hash Value  : 54782313
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |    3 |   423 |    5 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | SPACE_ODDITY         |    3 |   423 |    5 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_82bdnqs7q8rtm |    2 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=25)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

The new plan for the SQL returning 2 rows when using the new Automatic Index and is much more efficient with a significantly reduced cost (just 3 down from 11356).

But again, the plans for the SQL that returns many rows are not listed as the Automatic Indexing task has already determined that an index would make such a plan significantly less efficient.

If we now rerun the SQL the returns many rows (and BEFORE High Frequency Collection Statistics potentially kicks in):

SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000);

1000011 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    32 |  4512 |    35   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR                     |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SPACE_ODDITY         |    32 |  4512 |    35   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | SYS_AI_82bdnqs7q8rtm |    32 |       |    12   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR
           "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     410422  consistent gets
          0  physical reads
          0  redo size
  145536076  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

Note that the cardinality estimate is still way way wrong, thinking that just 32 rows are to be returned, when is fact 1000011 rows are returned.

As a result, the CBO has decided to incorrectly use the new Automatic Index. Incorrectly, in that the number of consistent gets has increased 10x from the previous FTS plan (410,422 now, up from 41,169).

One way to resolve this is to collect histograms on the CODE column (or wait for the High Frequency Stats Collection to kick in):

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 2048’);

PL/SQL procedure successfully completed.

If we now re-run this SQL:

SQL> select * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |   996K|   133M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |   996K|   133M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2678  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The cardinality estimate is now much more accurate and the the execution plan now uses the more efficient FTS.

In Part II, we’ll look at how the Automatic Indexing tasks can be made to identify the dangers of a new index to SQLs that might degrade in performance and how it will create a Baseline to protect against any such SQL regressions….

Oracle 19c Automatic Indexing: Data Skew Part III (The Good Son) September 16, 2020

Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Index Access Path, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Unusable Indexes.
add a comment

 

I’m going to expand just a tad on my previous posts on data skew and run a simple query that returns a few rows based on a column predicate AND another query on the same column that returns many rows.

The following table has a CODE column as with previous posts with the data heavily skewed:

SQL> create table bowie_skew (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie_skew set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

 

I’ll next collect statistics with NO histogram, as I don’t think they’re required at this point:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'bowie_skew', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we look at the table data:

SQL> select code, count(*) from bowie_skew group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

The value “7” only has 100 associated rows, while the value “10” is very common with 654,465 rows.

But I currently have no histograms:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

 

If I run the following query with a CODE=7 predicate just once:

SQL> select * from bowie_skew where code=7;

100 rows selected.

Execution Plan

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100K|  1953K|   570   (7)| 00:00:01  |
|   1 |  PX COORDINATOR              |            |       |       |            |           |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01  |
|   3 |    PX BLOCK ITERATOR         |            |   100K|  1953K|   570   (7)| 00:00:01  |
|*  4 |     TABLE ACCESS STORAGE FULL| bowie_skew |   100K|  1953K|   570   (7)| 00:00:01  |
--------------------------------------------------------------------------------------------

 

It uses a Full Table Scan (the CBO has no choice without an index) AND hopelessly gets the cardinality estimate wrong, thinking 100K are going to be returned (and not the 100 actual rows).  So the CBO is unlikely to use an index anyways as it would be deemed too expensive to return so many rows.

I’ll now run the following query many times on the CODE=10 predicate that returns many rows:

SQL> select * from bowie_skew where code=10;

654465 rows selected.

Execution Plan

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100K|  1953K|   570   (7)| 00:00:01  |
|   1 |  PX COORDINATOR              |            |       |       |            |           |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01  |
|   3 |    PX BLOCK ITERATOR         |            |   100K|  1953K|   570   (7)| 00:00:01  |
|*  4 |     TABLE ACCESS STORAGE FULL| bowie_skew |   100K|  1953K|   570   (7)| 00:00:01  |
--------------------------------------------------------------------------------------------

 

So again, no choice here with a FTS and we likely wouldn’t want to use an index anyways as it would be just too expensive.

If we check out what the Automatic Indexing process has done with such a workload:

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
--------------------------------------------------------------------------
| Owner | Table      | Index                | Key  | Type   | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE       |
--------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : 6fm3m8cg2jnun
SQL Text            : select * from bowie_skew where code=7
Improvement Factor  : 46.6x

Execution Statistics:
-----------------------------
                    Original Plan                Auto Index Plan
                    ---------------------------- ----------------------------
Elapsed Time (s):   36653                        1992
CPU Time (s):       33899                        967
Buffer Gets:        4291                         103
Optimizer Cost:     52                           4
Disk Reads:         0                            2
Direct Writes:      0                            0
Rows Processed:     100                          100
Executions:         1                            1

 

An Automatic Index on the CODE column is created (SYS_AI_7psvzc164vbng), with ONLY the SQL based on the CODE=7 predicate listed in the report. The other query is indeed too expensive for a new index to be viable and so isn’t listed.

If we look at the Plans Section of the Automatic Indexing report:

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 410492785

--------------------------------------------------------------------------------------
| Id | Operation                 | Name       | Rows   | Bytes   | Cost | Time       |
--------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT          |            |        |         | 52   |            |
| 1  | TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 | 52   | 00:00:01   |
--------------------------------------------------------------------------------------

Notes
-----
- dop_reason = no expensive parallel operation
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 140816325

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      | 119  | 2380  | 4    | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW           | 119  | 2380  | 4    | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_7psvzc164vbng | 100  |       | 3    | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=7)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

The important point to note here is that the cardinality estimates are relatively accurate despite there being no histograms at this stage because the Automatic Indexing session uses Dynamic Sampling Level=11. Missing/inaccurate statistics are calculated on fly and this enables the session to accurately determine the size of the returned data set and that an index is indeed the more efficient access path.

So with mixed workloads, all it takes is one SQL executed once that demonstrably improves thanks to an index for the associated Automatic Index to be created as a VISIBLE/VALID index:

SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_SKEW';

INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_7psvzc164vbng           YES VISIBLE   VALID       1000000        1537              8534

 

If we now run the query AFTER the histograms are subsequently created thanks to the High-Frequency Automatic Statistics Collection (see previous post), the new Automatic Index is now used:

SQL> select * from bowie_skew where code=7;

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 140816325

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      | 100  | 2000  |       4 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW           | 100  | 2000  |       4 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_7psvzc164vbng | 100  |       |       3 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=7)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
        104 consistent gets
          0 physical reads
          0 redo size
       2871 bytes sent via SQL*Net to client
        359 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
        100 rows processed

 

Note if the histogram is NOT yet collected, the CBO will not determine the correct cardinality estimate and will ignore the new Automatic Index (as previously discussed).

If we run again the query that returns many rows:

SQL> select * from bowie_skew where code=10;

654465 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 410492785

----------------------------------------------------------------------------------------
| Id | Operation                | Name       | Rows | Bytes | Cost (%CPU)| Time        |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |            |  654K|    12M|     52 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_SKEW |  654K|    12M|     52 (16)| 00:00:01    |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=10)
    filter("CODE"=10)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       3725 consistent gets
          0 physical reads
          0 redo size
    6549708 bytes sent via SQL*Net to client
       1790 bytes received via SQL*Net from client
        132 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     654465 rows processed

The new Automatic Index is correctly ignored by the CBO, as the query returns too many rows for the index to be viable.

So in this example, Automatic Indexing works exactly as it should. It creates a new Automatic Index for a query where it will indeed improve the performance, while other queries on the same column in which many more rows are returned are also run. For these other queries, the new Automatic Index is correctly not used as such an index would degrade the performance of the query.

In my next post, I’ll look at the first example with data skew where Automatic Indexing can be problematic…

Oracle 19c Automatic Indexing: Data Skew Part II (Everything’s Alright) September 14, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Automatic Table Statistics, Autonomous Transaction Processing, Data Skew, Exadata, High Frequency Statistics Collection, Histograms, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning.
2 comments

In my previous post, I discussed an example with data skew, in which the Automatic Indexing process created a new index, but somehow the CBO when using the index estimated the correct cardinality estimate even though no histograms were explicitly calculated.

In this post I’ll answer HOW this achieved by the CBO.

Get some idea on the answer by now looking at the column details:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                     10 FREQUENCY
NAME                      1 NONE

We can see that there is now indeed an histogram on the column. When and how were these histograms collected?

The answer lies with a new Oracle Database 19c feature called “High-Frequency Automatic Statistics Collection“, which is available on Exadata environments. As I’m running all these demos on the Oracle Autonomous Transaction Processing Cloud environment which runs on an Exadata platform, this feature is enabled by default.

To highlight the capabilities of this features more fully, I’m going to setup a slightly different demo with three tables:

SQL> create table bowie1 (id number, code number, name varchar2(42));  <= Stale with no stats

Table created.

SQL> insert into bowie1 select rownum, mod(rownum, 100)+1, 'David Bowie' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

 

Table BOWIE1 has no statistics collected on it.

 

SQL> create table bowie2 (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie2 select rownum, mod(rownum, 100)+1, 'David Bowie' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2');

PL/SQL procedure successfully completed.

SQL> insert into bowie2 select rownum+100000, mod(rownum, 100)+1, 'Ziggy Stardust' from dual connect by level <= 50000;

50000 rows created.

SQL> commit;

Commit complete.

 

BOWIE2 table has new rows added after statistics have been collected and so has “stale” outdated stats.

 

SQL> create table bowie3 (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie3 select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie3 set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie3 set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie3 set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie3 set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie3 set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie3 set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie3 set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie3 set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie3 set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'bowie3', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select code, count(*) from bowie3 group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

The BOWIE3 table is as my previous example, with data skew but with NO histograms collected. I’m now going to run a query on BOWIE3 where the CBO gets the cardinality estimate hopelessly wrong because of the missing histogram on the CODE column:

SQL> select * from bowie3 where code=7;

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2517725203

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |   100K|  1953K|   974   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE3 |   100K|  1953K|   974   (2)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=7)

 

If we look at the current statistics on these tables:

 

SQL> select table_name, num_rows, stale_stats, notes from user_tab_statistics
where table_name in ('BOWIE1', 'BOWIE2', 'BOWIE3');

TABLE_NAME        NUM_ROWS STALE_S NOTES
--------------- ---------- ------- ------------------------------
BOWIE1
BOWIE2              100000 YES
BOWIE3             1000000 NO
BOWIE2              150000         STATS_ON_CONVENTIONAL_DML

 

We can see that BOWIE1 has indeed no statistics.

BOWIE2 is marked as having state statistics, although thanks to another Oracle Database 19c feature called “Real-Time Statistics Collection“, does have some additional statistics captured (such as NUM_ROWS) when the additional rows were inserted. I’ll discuss this feature more fully in a later blog article.

BOWIE3 is considered fine in that it does have statistics which are NOT stale, BUT…

 

SQL> select column_name, num_buckets, histogram from user_tab_col_statistics
where table_name='BOWIE3';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

We don’t currently have any histograms even though a simple single table query was previously run based on a CODE predicate which had hopelessly inaccurate cardinality estimates.

If we wait approximately 15 minutes (default) for the High-Frequency Automatic Statistics Collection process to run and look at these column statistics again:

SQL> select table_name, num_rows, stale_stats from user_tab_statistics
where table_name in ('BOWIE1', 'BOWIE2', 'BOWIE3');

TABLE_NAME        NUM_ROWS STALE_S
--------------- ---------- -------
BOWIE1              100000 NO
BOWIE2              150000 NO
BOWIE3             1000000 NO

SQL> select column_name, num_buckets, histogram from user_tab_col_statistics where table_name='BOWIE3';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                     10 FREQUENCY
NAME                      1 NONE

 

We now notice that:

BOWIE1 now has statistics captured, as the High-Frequency Automatic Statistics Collection process looks for tables with missing statistics.

BOWIE2 now has fully up to date statistics, as the High-Frequency Automatic Statistics Collection process looks for tables with stale statistics.

BOWIE3 now has histograms on the CODE columns, as the High-Frequency Automatic Statistics Collection process looks out for missing histograms if queries have been subsequently run with poor cardinality estimates.

Having more accurate, appropriate and up to date statistics all supports the CBO in making much better decisions in relation to the use of any newly created Automatic Indexes.

 

You can configure High-Frequency Automatic Statistics Collection in the following manner:

SQL> EXEC DBMS_STATS.SET_GLOBAL_PREFS('AUTO_TASK_STATUS','ON');

PL/SQL procedure successfully completed.

This turns the feature ON/OFF. It’s OFF by default on standard Exadata environments but ON by default in Autonomous Database environment.

 

SQL> EXEC DBMS_STATS.SET_GLOBAL_PREFS('AUTO_TASK_MAX_RUN_TIME','900');

PL/SQL procedure successfully completed.

This configures how long to allow the process to run (default is 3600 seconds/60 minutes).

 

SQL> EXEC DBMS_STATS.SET_GLOBAL_PREFS('AUTO_TASK_INTERVAL','900');

PL/SQL procedure successfully completed.

This configures the interval between the process running (default is every 900 seconds/15 minutes).

 

In my next post, I’ll look at a slightly more complex data skew example with Automatic Indexing, where both selective and unselective SQL predicates are invoked…

METHOD_OPT=> SIZE AUTO Quiz Solution (The Trickster) September 1, 2011

Posted by Richard Foote in CBO, Histograms, Oracle Indexes, Oracle Statistics.
16 comments

I was going to leave it for a few days but there have already been so many comments and discussions on all this, I thought I better write something up. In case anyone was wondering, yes I probably am driving my colleagues at work mad with my “Question of the Day” !!

Unfortunately, some might be disappointed at both Oracle and myself 🙂

Yes, I did kinda set things up to trick the unwary and yes, perhaps the answer isn’t what many are expecting.

The answer to my previous question of which column is going to have a histogram when using the METHOD_OPT  SIZE AUTO option is in fact Column 2. Well done to everyone who got it right.

Why ?

The simplest answer is because it’s the only column of the three that has 254 or less distinct values.

Here’s the key point. When using METHOD_OPT SIZE AUTO, every column with 254 or less distinct values that has been referenced within a predicate, will have a Frequency-based histogram. Each and every one of them, regardless of whether the data is actually skewed or not. So Column 2 with only 254 distinct values AND having previously been referenced in a predicate was guaranteed to have a histogram.

If a column has more than 254 distinct values, whether it then has a Height-Based histogram depends on how the data is skewed. If the data is perfectly evenly distributed, then it won’t have a histogram. Column 1, having sequenced based unique values will not meet the criteria and so not have a histogram.

Column 3 is interesting. Having inserted the outlier value, it now has 255 distinct values and so no longer qualifies for an automatic frequency based histogram. However, if all its values are evenly distributed, then it won’t qualify for a height based histogram either and Column 3 only has just the one outlier value, all other values are evenly distributed values. Unfortunately, Oracle doesn’t pick up on rare outlier values (even if you collect 100% statistics and it’s one of the low/high points of the column) and so will not generate a height-based histogram.

The only column that qualifies is Column 2.

A demo to illustrate. First, let’s create and populate our table:

SQL> create table bowie (id number, code1 number, code2 number);

Table created.

SQL> insert into bowie select rownum, mod(rownum,254), mod(rownum,254) from dual  connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

Notice I’m using a MOD function to generate a perfectly even distribution of data. I’ve noticed a few examples (such as that by Charles Hooper in the comments of the Quiz posting), in which the DBMS_RANDOM function is used. Note this will almost certainly generate data with enough natural skewness on a 1M table with 254 random values that when the outlier 255th value is introduced, it will qualify for a height-based histogram. Very easy way to test and find out. Simply generate the 1M data with 255 random values and I suggest a height-based histogram is created regardless.

OK, I’ll run some SQL to generate sufficient workload to qualify the columns for automatic histograms:

SQL> select * from bowie where id = 42;
SQL> select * from bowie where code1 = 42;
SQL> select * from bowie where code2 = 42;

BTW, the difference between the SIZE AUTO and SIZE SKEWONLY options, is that AUTO requires previous workload to suggest a histogram might be relevant, SKEWONLY does not. 

If we were to collect statistics at this stage, we would notice that the second and third columns both have a Frequency-Based histogram as both columns only have 254 distinct values and so automatically qualify:

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE', estimate_percent=> null, cascade=>true);

PL/SQL procedure successfully completed.

SQL> select column_name, histogram from dba_tab_columns where table_name = 'BOWIE';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------------
ID                             NONE
CODE1                          FREQUENCY
CODE2                          FREQUENCY

If we were to run a query using the third column, notice how the cardinality estimates aren’t too bad in this example:

SQL> select * from bowie where code2 > 600;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |    13 |   660   (2)| 00:00:08 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     1 |    13 |   660   (2)| 00:00:08 |
---------------------------------------------------------------------------

There are no rows that are greater than 600 and so an estimate of 1 isn’t too bad at all.

OK, let’s add in this one, tiny little row and collect fresh, <strong>100% accurate statistics</strong> (Note: the accurate statistics is very important as Niall’s examples has demonstrated):

&nbsp;

SQL> insert into bowie values (1000001, 42, 99999999);

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE', estimate_percent=> null, cascade=>true);

PL/SQL procedure successfully completed.

SQL> select column_name, histogram from dba_tab_columns where table_name = 'BOWIE';

COLUMN_NAME                    HISTOGRAM
------------------------------ ---------------
ID                             NONE
CODE1                          FREQUENCY
CODE2                          NONE

Note that the third column now has 255 distinct values and so no longer qualifies for the automatic Frequency-Based histogram. As most of its data is perfectly evenly distributed with just the one outlier value, the column doesn’t qualify for a Height-based histogram either and so now has no histogram at all.

Note as I collected 100% accurate statistics, Oracle is definitely aware of this outlier value:

SQL> select column_name, low_value, high_value from dba_tab_columns where table_name='BOWIE' and column_name='CODE2';

COLUMN_NAME  LOW_VALUE  HIGH_VALUE
------------ ---------- ------------
CODE2        80         C464646464

SQL> var high_num number
SQL> exec dbms_stats.convert_raw_value('C464646464',:high_num);

PL/SQL procedure successfully completed.

SQL> print high_num

  HIGH_NUM
----------
  99999999

But it’s not enough for Oracle to automatically generate a histogram. Which is a shame really, because now we can have all sorts of problems:

SQL> select * from bowie where code2 > 600;
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |   999K|    12M|   660   (2)| 00:00:08 |
|*  1 |  TABLE ACCESS FULL| BOWIE |   999K|    12M|   660   (2)| 00:00:08 |
---------------------------------------------------------------------------

When previously it had the cardinality estimates spot on, now they’re terrible (expecting not 1 row but 999,000 rows !!) because without a histogram, Oracle is assuming even distribution between its low and high point values.

I’m not a great fan of either the SIZE AUTO or SIZE SKEWONLY options 😉

Hope you’re enjoying these little quizzes, I’ll have another one for you all soon.

METHOD_OPT => SIZE AUTO Quiz (Automatic For The People) August 31, 2011

Posted by Richard Foote in Method_Opt Size AUTO, Oracle Indexes, Oracle Statistics.
40 comments

OK, a nice, easy, simple one today. No tricks, honest 😉

You have a table with 3 columns and lets say 1M rows.

Column 1 is effectively unique, so it has 1M distinct values. Let’s say 1 – 1000000, not that it really matters.

Column 2 has 254 distinct values, all evenly distributed so it has approximately the same number of rows for each value. Let’s say the values are 0-253 again it doesn’t really matter.

Column 3 is identical to Column 2, it also has 254 distinct values, all evenly distributed as well such that it also has approximately the same number of rows for each value. Let’s say the range of values are the same, 0-253, again it doesn’t really matter.

You have various queries in the database in which all 3 columns are referenced somewhere in WHERE conditions (eg. WHERE Column1 = 42).

You then insert just one row that has the following values based on our example: VALUES (1000001, 42, 99999999).

The key points here is that for Column1, it’s just another unique value, just 1 greater than the previous maximum value. Nothing special.

For Column2, it’s just another of the existing 254 values that doesn’t really change the even distribution of the data. Nothing special.

However, for Column 3, it’s not only a new value that didn’t previously exist (and so there’s just the one row with this value in the data, whereas all the other values correspond to roughly 1/254 of the rows) but it’s also a value that is way way way outside the normal range of existing values (nothing comes close to having a value of 99999999).

OK, we have the scenario, hopefully you can see where I going with this.

You decide to collect fresh statistics with DBMS_STATS, you want them to be completely accurate so you use a 100% sample size (or compute with estimate_percent=>null). But because you want to get with the AUTO program and make life easier for yourself, you decide to let Oracle work out which columns might require and benefit from a histogram by using METHOD_OPT=>’ FOR ALL COLUMNS SIZE AUTO’.

Now finally comes the question. Of the three columns, only one column will have a histogram. Which one and why is it so ?

If you understand how Oracle collects statistics, the answer will hopefully be obvious 😉