jump to navigation

Costing Concatenated Indexes With Range Scan Predicates Part II (Coming Back To Life) July 27, 2022

Posted by Richard Foote in Automatic Indexing, CBO, Column Statistics, Concatenated Indexes, Explain Plan For Index, Full Table Scans, Index Access Path, Index Column Order, Index Column Reorder, Index Internals, Index statistics, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Index Seminar, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard Foote Training.
add a comment

In my previous Part I post, I discussed how the CBO basically stops the index leaf block access calculations after a non-equality predicate. This means that for an index with the leading indexed column being accessed via an unselective non-equality predicate, a large percentage of the index’s leaf blocks might need to be scanned, making the index access path unviable.

In the example in Part I, an index on the ID, CODE columns was too expensive due to the unselective range-scan predicate based on the leading ID column.

To provide the CBO a potentially much more efficient access path, we need an index with the more selective CODE predicate to be the leading column:

SQL> CREATE INDEX radiohead_code_id_i ON radiohead(code, id);

Index created.

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor

FROM user_indexes WHERE index_name = 'RADIOHEAD_CODE_ID_I';

INDEX_NAME                        BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
----------------------------- ---------- ----------- -----------------
RADIOHEAD_CODE_ID_I                    1         265             98619

If we now re-run the previous query:

SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_CODE_ID_I |     4 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        806  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice the CBO is now using this new index, as the costs for this index-based plan have dropped significantly, down to just 6 (from the previous 116). This overall cost of 6 is lower than the cost of 105 for the Full Table Scan and hence the reason why this index-based plan is now chosen by the CBO.

This is all due entirely to the significant drop in costs in accessing the index itself, now just 2 (from the previous 112).

Importantly, these much lower costs are accurate as we can tell via the reduced number of consistent reads, now just 7 (from 114 from the previous index-based plan).

If we now look at the associated costings:

Effective Index Selectivity = CODE selectivity x ID selectivity

= (1/10000) x ((5000-1000)/(10000-1) + 2 x (1/10000))

= 0.0001 x ((4000/9999) + 0.0002)

= 0.0001 x 0.40024)

= 0.000040024

Effective Table Selectivity = same as Index Selectivity

= 0.000040024

 

The effective index selectivity of 0.000040024 is now much lower than the previous (0.40024), as the CBO can now consider the product of the selectivities of both columns).

If we now plug this improved effective index selectivity into the index path costing calculations:

Index IO Cost = blevel +

ceil(effective index selectivity x leaf_blocks) +

ceil(effective table selectivity x clustering_factor)

 

Index IO Cost = 1  +  ceil(0.000040024 x 265) + ceil(0.000040024 x 99034)

= 1 + 1  + 4

= 2 + 4

= 6

Index Access Cost  = IO Costs + CPU Costs (in this plan, 0% of total costs and so unchanged from the IO costs)

= 2 + 4

= 6

We can see how the respective 2 and 6 improved CBO index costings are derived.

So again, it’s important to note that Automatic Indexing is doing entirely the correct thing with these examples, when it creates an index with the equality based predicate columns as the leading columns of the index…

Costing Concatenated Indexes With Range Scan Predicates Part I (Nothing To Be Desired) July 22, 2022

Posted by Richard Foote in BLEVEL, CBO, Clustering Factor, Concatenated Indexes, Index Access Path, Index Column Order, Index Column Reorder, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Richard Foote Consulting, Richard Foote Training, Richard's Blog.
1 comment so far

In my previous post, I discussed how Automatic Indexing ordered columns when derived from SQLs containing both equality and non-equality predicates.

I’ve since had offline questions asking why indexes are more effective with leading columns addressing the equality predicates rather than the leading columns addressing non-equality predicates. Based on the theory that for everyone who asks a question, there are likely numerous others wondering the same thing, I thought I’ll try to explain things with these posts.

I’ll start by creating the following simple table that has two columns (ID and CODE) that are both highly selective (they both have 10,000 distinct values in a 100,000 rows table, so 10 rows approximately per value):

SQL> CREATE TABLE radiohead (id NUMBER, code NUMBER, name VARCHAR2(42));

Table created.

SQL> INSERT INTO radiohead SELECT mod(rownum,10000)+1,

ceil(dbms_random.value(0,10000)), 'RADIOHEAD' FROM dual CONNECT BY LEVEL <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

I’ll next create an index based on the ID, CODE columns, with importantly the ID column as the leading column:

SQL> CREATE INDEX radiohead_id_code_i ON radiohead(id, code);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'RADIOHEAD',

estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 

When it comes to costing index accesses, some of the crucial statistics including the Blevel, Leaf_Blocks and often most crucial of all, the Clustering_Factor:

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor FROM user_indexes WHERE index_name = 'RADIOHEAD_ID_CODE_I';

INDEX_NAME               BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
RADIOHEAD_ID_CODE_I           1         265             99034

 

We begin by running the following query, with an equality predicate on the ID column and a relatively large, non-selective range predicate on the CODE column:

SQL> SELECT * FROM radiohead WHERE id = 42 AND CODE BETWEEN 1000 AND 5000;

Execution Plan
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_ID_CODE_I |     4 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        824  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

As (perhaps) expected, the CBO uses the index to retrieve the small number of rows (just 5 rows).

However, if we run the following query which also returns a small number of rows  (just 4 rows) BUT with the relatively unselective, non-equality predicate based on the leading indexed ID column:

SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     4 |    72 |   105  (11)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     4 |    72 |   105  (11)| 00:00:01 |
-------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        363  consistent gets
          0  physical reads
          0  redo size
        770  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice (perhaps unexpectedly) that the CBO now ignores the index and uses a Full Table Scan, even though only 4 rows are returned from a 100,000 row table.

This is a common area of confusion. Why does Oracle not use the index when both columns in the index are referenced in the SQL predicates and only a tiny number of rows are returned?

The answer comes down to the very unselective non-equality predicate (ID BETWEEN 1000 AND 5000) being serviced by the leading column (ID) of the index.

The “ID BETWEEN 1000 AND 5000” predicate basically covers 40% of all known ID values, which means Oracle must now read 40% of all Leaf Blocks within the index (one leaf block at a time), starting with ID =1000 and ending with ID = 5000. Although there are very few rows that then subsequently match up with the other (CODE = 140) predicate based on the second column (CODE) of the index, these relatively few values could exist anywhere within the 40% ID range.

Therefore, when costing the reading of the actual index, the CBO basically stops its calculations after the non-equality predicate on this leading ID column and indeed estimates that a full 40% of the index itself must be scanned.

If we force the CBO into a range scan via a basic index hint:

SQL> SELECT /*+ index(r) */ * FROM radiohead r WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |   116   (4)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |   116   (4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_ID_CODE_I |     4 |       |   112   (4)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        114  consistent gets
          0  physical reads
          0  redo size
        806  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice that the overall cost of this index based plan is 116, greater than the 105 cost of the Full Table Scan (and hence why the Full Table Scan was selected). We also notice that the vast majority of this 116 cost can be attributed to the index scan itself in the plan, which has a cost of 112.

If you have a calculator handy, this is basically how these costs are derived.

Range Selectivity = (Max Range Value–Min Range Value)/(Max Column Value–Min Column Value)

Effective Index Selectivity = Range Selectivity + 2 x ID density (as a BETWEEN clause was used which is inclusive of Min/Max range)

= (5000-1000)/(10000-1) + 2 x (1/10000)

= 0.40004 + 0.0002

= 0.40024

Effective Table Selectivity = ID selectivity (as above) x CODE selectivity

= 0.40024 x (1/10000)

= 0.40024 x 0.0001

= 0.000040024

These selectivities are then inserted into the following index costing formula:

Index IO Cost = blevel +

ceil(effective index selectivity x leaf_blocks) +

ceil(effective table selectivity x clustering_factor)

 

Index IO Cost = 1  +  ceil(0.40024 x 265) + ceil(0.000040024 x 99034)

= 1 + 107 + 4

= 108 + 4 = 112.

 

Index Access Cost = IO Costs + CPU Costs (in this plan, 4% of total costs)

= (108 + (112 x 0.04)) + (4 + (4 x 0.04))

= (108 + 4) + (4 + 0)

= 112 + 4

= 116

 

So we can clearly see how the CBO has made its calculations, come up with its costs and has decided that the Full Table Scan is indeed the cheaper alternative with the current index in place.

So Automatic Indexing is doing the right thing, by creating an index with the leading column based on the equality predicate and the second indexed column based on the unselective non-equality predicate.

I’ll expand on this point in an upcoming Part II post.

Oracle Database 19c Automatic Indexing – Indexed Column Reorder (What Shall We Do Now?) December 18, 2019

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Index Column Order, Index Column Reorder.
2 comments

 

I previously discussed how the default column order of an Automatic Index (in the absence of other factors) is based on the Column ID, the order in which the columns are defined in the table.

But what if there are “other factors” based on new workloads and the original index column order is no longer optimal or appropriate ?

I’ll begin by creating a table, with the following key column defined in CODE1, CODE2, CODE3 order:

 

SQL> create table major_tom (id number, code1 number, code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into major_tom select rownum, mod(rownum, 10)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 1000)), 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density from user_tab_columns where table_name='MAJOR_TOM';

COLUMN_NAME          NUM_DISTINCT    DENSITY
-------------------- ------------ ----------
ID                        9914368 1.0086E-07
CODE1                          10  .00000005
CODE2                         100  .00000005
CODE3                       1000        .001
NAME                           1           1

If I now run the following query with predicates based on these three columns:

SQL> select * from major_tom where code3=42 and code2=42 and code1=4;

15 rows selected.

Execution Plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    10 |   280 |  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |           |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000  |    10 |   280 |  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |           |    10 |   280 |  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM |    10 |   280 |  7354   (7)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

     4 - storage("CODE3"=42 AND "CODE2"=42 AND "CODE1"=4)
         filter("CODE3"=42 AND "CODE2"=42 AND "CODE1"=4)

Statistics
----------------------------------------------------------

        34 recursive calls
         5 db block gets
     45861 consistent gets
         0 physical reads
      1044 redo size
      1087 bytes sent via SQL*Net to client
       588 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
        15 rows processed

 

After the default 15 minutes period in which the Automatic Index task is run, if we look at what Automatic Index has been created:

 

INDEX DETAILS

-------------------------------------------------------------------------------

  1. The following indexes were created:

--------------------------------------------------------------------------------------
| Owner | Table     | Index                | Key               | Type   | Properties |
--------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM | SYS_AI_9mrs058nrg9d5 | CODE1,CODE2,CODE3 | B-TREE | NONE       |
--------------------------------------------------------------------------------------

VERIFICATION DETAILS

-------------------------------------------------------------------------------
 	1. The performance of the following statements improved: 
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : ayuj12hggwrvc
SQL Text             : select * from major_tom where code3=42 and code2=42 and code1=4
Improvement Factor   : 45853.8x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------

Elapsed Time (s):  3103394                       946
CPU Time (s):      3092860                       1017
Buffer Gets:       596102                        18
Optimizer Cost:    7354                          18
Disk Reads:        0                             2
Direct Writes:     0                             0
Rows Processed:    195                           15
Executions:        13                            1

 

We can see Oracle has indeed created an Automatic Index (SYS_AI_9mrs058nrg9d5) in the default CODE1, CODE2, CODE3 order.

But if we now run a new query, based on a predicate on just the CODE3 column:

 

SQL> select * from major_tom where code3=42; 

9961 rows selected.

Execution Plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           | 10000 |   273K|  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |           |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000  | 10000 |   273K|  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |           | 10000 |   273K|  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM | 10000 |   273K|  7354   (7)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

     4 - storage("CODE3"=42)
         filter("CODE3"=42)

Statistics
----------------------------------------------------------
         8 recursive calls
         4 db block gets
     45853 consistent gets
         0 physical reads
         0 redo size
    166137 bytes sent via SQL*Net to client
       599 bytes received via SQL*Net from client
         3 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
      9961 rows processed

 

We can see the CBO has NOT used the index, as the leading column of the existing index is not mentioned in the SQL predicate and the CBO deems an Index Skip Scan as too expensive:

If we now run an SQL with predicates based on just the CODE2 and CODE3 columns:

 

SQL> select * from major_tom where code3=42 and code2=42;

101 rows selected.

Execution Plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |   100 |  2800 |  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |           |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000  |   100 |  2800 |  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |           |   100 |  2800 |  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM |   100 |  2800 |  7354   (7)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

     4 - storage("CODE3"=42 AND "CODE2"=42)
         filter("CODE3"=42 AND "CODE2"=42)

Statistics
----------------------------------------------------------

         6 recursive calls
         0 db block gets
     45853 consistent gets
         0 physical reads
         0 redo size
      2281 bytes sent via SQL*Net to client
       588 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
       101 rows processed

 

The existing Automatic Index is again not used as the important CODE1 column which is the leading column of the index is not mentioned in the SQL predicates and the CBO deems an Index Skip Scan as too expensive.

I know from experience many DBAs would simply create a new index with CODE3, CODE2 as the leading columns. But what does Automatic Indexing do in the scenario?

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------

Activity start               : 21-JUN-2019 02:39:32
Activity end                 : 21-JUN-2019 02:40:18
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                       : 1
Indexes created (visible / invisible)  : 1 (1 / 0)
Space used (visible / invisible)       : 243.27 MB (243.27 MB / 0 B)
Indexes dropped (space reclaimed)      : 1 (243.27 MB)
SQL statements verified                : 1
SQL statements improved                : 0
SQL plan baselines created             : 0
Overall improvement factor             : 0x

-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------

Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

 

We notice in the Automatic Indexing report it’s stating that one new index has been created BUT that one index has also been dropped.

 

INDEX DETAILS
-------------------------------------------------------------------------------

         1. The following indexes were created:

-------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
| Owner | Table     | Index                | Key               | Type   | Properties |
--------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM | SYS_AI_00hxxxkgb821n | CODE3,CODE2,CODE1 | B-TREE | NONE       |
--------------------------------------------------------------------------------------
 
 	1. The following indexes were dropped: 

-------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
| Owner | Table     | Index                | Key               | Type   | Properties |
--------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM | SYS_AI_9mrs058nrg9d5 | CODE1,CODE2,CODE3 | B-TREE | NONE       |
--------------------------------------------------------------------------------------
 


SQL> select index_name, column_name, column_position from user_ind_columns where table_name='MAJOR_TOM' order by column_position;

INDEX_NAME           COLUMN_NAME     COLUMN_POSITION
-------------------- --------------- ---------------
SYS_AI_00hxxxkgb821n CODE3                         1
SYS_AI_00hxxxkgb821n CODE2                         2
SYS_AI_00hxxxkgb821n CODE1                         3

 

Automatic Indexing has created a new index (SYS_AI_00hxxxkgb821n), with the columns in CODE3, CODE2, CODE1 order, as this index is able to service all currently known SQL predicate combinations:

WHERE CODE3 = x AND CODE2 = y AND CODE1= z

WHERE CODE3 = x AND CODE2 = y

WHERE CODE3 = x

as the leading column in the index is listed in all three current scenarios.

 

This means the previous index in CODE1, CODE2, CODE3 order is now redundant as the new index can fully service all know SQL predicate combinations. As a result, Automatic Indexing drops the redundant index.

This is a really nice capability of Automatic Indexing, the ability to effectively reorder the columns within an index based on new workloads.

But what if subsequent new SQL workloads means the new index is not able on its own to service all such workloads. I’ll discuss this scenario in my next post.