jump to navigation

Costing Concatenated Indexes With Range Scan Predicates Part II (Coming Back To Life) July 27, 2022

Posted by Richard Foote in Automatic Indexing, CBO, Column Statistics, Concatenated Indexes, Explain Plan For Index, Full Table Scans, Index Access Path, Index Column Order, Index Column Reorder, Index Internals, Index statistics, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Index Seminar, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard Foote Training.
add a comment

In my previous Part I post, I discussed how the CBO basically stops the index leaf block access calculations after a non-equality predicate. This means that for an index with the leading indexed column being accessed via an unselective non-equality predicate, a large percentage of the index’s leaf blocks might need to be scanned, making the index access path unviable.

In the example in Part I, an index on the ID, CODE columns was too expensive due to the unselective range-scan predicate based on the leading ID column.

To provide the CBO a potentially much more efficient access path, we need an index with the more selective CODE predicate to be the leading column:

SQL> CREATE INDEX radiohead_code_id_i ON radiohead(code, id);

Index created.

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor

FROM user_indexes WHERE index_name = 'RADIOHEAD_CODE_ID_I';

INDEX_NAME                        BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
----------------------------- ---------- ----------- -----------------
RADIOHEAD_CODE_ID_I                    1         265             98619

If we now re-run the previous query:

SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_CODE_ID_I |     4 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        806  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice the CBO is now using this new index, as the costs for this index-based plan have dropped significantly, down to just 6 (from the previous 116). This overall cost of 6 is lower than the cost of 105 for the Full Table Scan and hence the reason why this index-based plan is now chosen by the CBO.

This is all due entirely to the significant drop in costs in accessing the index itself, now just 2 (from the previous 112).

Importantly, these much lower costs are accurate as we can tell via the reduced number of consistent reads, now just 7 (from 114 from the previous index-based plan).

If we now look at the associated costings:

Effective Index Selectivity = CODE selectivity x ID selectivity

= (1/10000) x ((5000-1000)/(10000-1) + 2 x (1/10000))

= 0.0001 x ((4000/9999) + 0.0002)

= 0.0001 x 0.40024)

= 0.000040024

Effective Table Selectivity = same as Index Selectivity

= 0.000040024

 

The effective index selectivity of 0.000040024 is now much lower than the previous (0.40024), as the CBO can now consider the product of the selectivities of both columns).

If we now plug this improved effective index selectivity into the index path costing calculations:

Index IO Cost = blevel +

ceil(effective index selectivity x leaf_blocks) +

ceil(effective table selectivity x clustering_factor)

 

Index IO Cost = 1  +  ceil(0.000040024 x 265) + ceil(0.000040024 x 99034)

= 1 + 1  + 4

= 2 + 4

= 6

Index Access Cost  = IO Costs + CPU Costs (in this plan, 0% of total costs and so unchanged from the IO costs)

= 2 + 4

= 6

We can see how the respective 2 and 6 improved CBO index costings are derived.

So again, it’s important to note that Automatic Indexing is doing entirely the correct thing with these examples, when it creates an index with the equality based predicate columns as the leading columns of the index…

Costing Concatenated Indexes With Range Scan Predicates Part I (Nothing To Be Desired) July 22, 2022

Posted by Richard Foote in BLEVEL, CBO, Clustering Factor, Concatenated Indexes, Index Access Path, Index Column Order, Index Column Reorder, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Richard Foote Consulting, Richard Foote Training, Richard's Blog.
1 comment so far

In my previous post, I discussed how Automatic Indexing ordered columns when derived from SQLs containing both equality and non-equality predicates.

I’ve since had offline questions asking why indexes are more effective with leading columns addressing the equality predicates rather than the leading columns addressing non-equality predicates. Based on the theory that for everyone who asks a question, there are likely numerous others wondering the same thing, I thought I’ll try to explain things with these posts.

I’ll start by creating the following simple table that has two columns (ID and CODE) that are both highly selective (they both have 10,000 distinct values in a 100,000 rows table, so 10 rows approximately per value):

SQL> CREATE TABLE radiohead (id NUMBER, code NUMBER, name VARCHAR2(42));

Table created.

SQL> INSERT INTO radiohead SELECT mod(rownum,10000)+1,

ceil(dbms_random.value(0,10000)), 'RADIOHEAD' FROM dual CONNECT BY LEVEL <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

I’ll next create an index based on the ID, CODE columns, with importantly the ID column as the leading column:

SQL> CREATE INDEX radiohead_id_code_i ON radiohead(id, code);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'RADIOHEAD',

estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 

When it comes to costing index accesses, some of the crucial statistics including the Blevel, Leaf_Blocks and often most crucial of all, the Clustering_Factor:

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor FROM user_indexes WHERE index_name = 'RADIOHEAD_ID_CODE_I';

INDEX_NAME               BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
RADIOHEAD_ID_CODE_I           1         265             99034

 

We begin by running the following query, with an equality predicate on the ID column and a relatively large, non-selective range predicate on the CODE column:

SQL> SELECT * FROM radiohead WHERE id = 42 AND CODE BETWEEN 1000 AND 5000;

Execution Plan
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_ID_CODE_I |     4 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        824  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

As (perhaps) expected, the CBO uses the index to retrieve the small number of rows (just 5 rows).

However, if we run the following query which also returns a small number of rows  (just 4 rows) BUT with the relatively unselective, non-equality predicate based on the leading indexed ID column:

SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     4 |    72 |   105  (11)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     4 |    72 |   105  (11)| 00:00:01 |
-------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        363  consistent gets
          0  physical reads
          0  redo size
        770  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice (perhaps unexpectedly) that the CBO now ignores the index and uses a Full Table Scan, even though only 4 rows are returned from a 100,000 row table.

This is a common area of confusion. Why does Oracle not use the index when both columns in the index are referenced in the SQL predicates and only a tiny number of rows are returned?

The answer comes down to the very unselective non-equality predicate (ID BETWEEN 1000 AND 5000) being serviced by the leading column (ID) of the index.

The “ID BETWEEN 1000 AND 5000” predicate basically covers 40% of all known ID values, which means Oracle must now read 40% of all Leaf Blocks within the index (one leaf block at a time), starting with ID =1000 and ending with ID = 5000. Although there are very few rows that then subsequently match up with the other (CODE = 140) predicate based on the second column (CODE) of the index, these relatively few values could exist anywhere within the 40% ID range.

Therefore, when costing the reading of the actual index, the CBO basically stops its calculations after the non-equality predicate on this leading ID column and indeed estimates that a full 40% of the index itself must be scanned.

If we force the CBO into a range scan via a basic index hint:

SQL> SELECT /*+ index(r) */ * FROM radiohead r WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan
-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |   116   (4)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |   116   (4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_ID_CODE_I |     4 |       |   112   (4)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        114  consistent gets
          0  physical reads
          0  redo size
        806  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice that the overall cost of this index based plan is 116, greater than the 105 cost of the Full Table Scan (and hence why the Full Table Scan was selected). We also notice that the vast majority of this 116 cost can be attributed to the index scan itself in the plan, which has a cost of 112.

If you have a calculator handy, this is basically how these costs are derived.

Range Selectivity = (Max Range Value–Min Range Value)/(Max Column Value–Min Column Value)

Effective Index Selectivity = Range Selectivity + 2 x ID density (as a BETWEEN clause was used which is inclusive of Min/Max range)

= (5000-1000)/(10000-1) + 2 x (1/10000)

= 0.40004 + 0.0002

= 0.40024

Effective Table Selectivity = ID selectivity (as above) x CODE selectivity

= 0.40024 x (1/10000)

= 0.40024 x 0.0001

= 0.000040024

These selectivities are then inserted into the following index costing formula:

Index IO Cost = blevel +

ceil(effective index selectivity x leaf_blocks) +

ceil(effective table selectivity x clustering_factor)

 

Index IO Cost = 1  +  ceil(0.40024 x 265) + ceil(0.000040024 x 99034)

= 1 + 107 + 4

= 108 + 4 = 112.

 

Index Access Cost = IO Costs + CPU Costs (in this plan, 4% of total costs)

= (108 + (112 x 0.04)) + (4 + (4 x 0.04))

= (108 + 4) + (4 + 0)

= 112 + 4

= 116

 

So we can clearly see how the CBO has made its calculations, come up with its costs and has decided that the Full Table Scan is indeed the cheaper alternative with the current index in place.

So Automatic Indexing is doing the right thing, by creating an index with the leading column based on the equality predicate and the second indexed column based on the unselective non-equality predicate.

I’ll expand on this point in an upcoming Part II post.

Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”) July 14, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, Index Column Order, Invisible Indexes, Non-Equality Predicates, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning, Richard Foote Training, Richard's Blog.
3 comments

I’m currently putting together some Exadata related training for a couple of customers and came across a rather strange anomaly with regard the status of Automatic Indexes, when created in part on unselective, non-equality predicates.

As discussed previously, Oracle Database 21c now allows the creation of Automatic Indexes based on non-equality predicates (previously, Automatic Indexes were only created on equality-based predicates).

But one appears to get rather odd resultant Automatic Indexes in the scenario where the non-equality predicate is not particularly selective but other predicates are highly selective.

To illustrate, I’ll create a basic table that has two columns (ID and CODE) that are both highly selective:

SQL> create table ziggy_new (id number, code number, name varchar2(42));

Table created.

SQL> insert into ziggy_new select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_NEW');

PL/SQL procedure successfully completed.

So there are currently no indexes on this table.

I’ll next run the following SQL (and others similar) a number of times:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525

----------------------------------------------------------------------------------------
| Id  | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)
       filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      38605 consistent gets
      38600 physical reads
          0 redo size
        725 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

Without any indexes, the CBO currently has no choice but to use a Full Table Scan.

But only 1 row is returned. The first equality predicate on the CODE column is highly selective and on its own would only return 10 rows out of the 10M row table. The second, non-equality range-based predicate on the ID column is nowhere near as selective and offers limited additional filtering.

The CBO stops calculating index related costs after a non-equality predicate column (as subsequent column values could exist anywhere within the preceding range), and so the more effective index here is one based on (CODE, ID) with the non-equality predicate column second,  or potentially just on the CODE column only, as the ID range offers minimal filtering benefits.

So what does Automatic Indexing make of things?

If we look at the subsequent Automatic Indexing report:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 3
Indexes created (visible / invisible)        : 1 (0 / 1)
Space used (visible / invisible)             : 209.72 MB (0 B / 209.72 MB)
Indexes dropped                              : 0
SQL statements verified                      : 44
SQL statements improved (improvement factor) : 12 (64.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1.6x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Owner | Table     | Index                | Key     | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE       |
----------------------------------------------------------------------------

So Automatic Indexing has indeed created an index based on CODE,ID (a common Automatic Indexing trait appears to be to create an index based on all available predicates).

BUT the index is created as an INVISIBLE Index and so can not generally be used by database sessions.

SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='ZIGGY_NEW';

INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_75j16xff1ag3j           YES INVISIBLE VALID      10000000       25123          10000000

SQL> select index_name, column_name, column_position
     from user_ind_columns where table_name='ZIGGY_NEW';

INDEX_NAME                     COLUMN_NAME  COLUMN_POSITION
------------------------------ ------------ ---------------
SYS_AI_75j16xff1ag3j           CODE                       1
SYS_AI_75j16xff1ag3j           ID                         2

 

So re-running the previous SQL statements continues to use a Full Table Scan:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525

----------------------------------------------------------------------------------------
|  Id | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1)
       filter("CODE"=42 AND "ID"<=100000 AND "ID">=1)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      38605 consistent gets
      38600 physical reads
          0 redo size
        725 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

 

Automatic Indexing appears to only create Invisible indexes when there is an inefficient non-equality predicate present. It won’t create the index as a Visible index, even though it would significantly benefit these SQL statements that caused its creation. And Automatic Indexing won’t create an index on just the highly selective CODE equality predicate, which would also be of much benefit to these SQL statements.

If we now run similar queries, but with much more selective non-equality predicates, such as:

SQL> select * from ziggy_new where code=1 and id between 1 and 10;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3165184525

----------------------------------------------------------------------------------------
|  Id | Operation                  | Name      | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |    1 |    23 |    6738 (2) | 00:00:01 |
| * 1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 |    6738 (2) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("CODE"=1 AND "ID"<=10 AND "ID">=1)
       filter("CODE"=1 AND "ID"<=10 AND "ID">=1)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      38604 consistent gets
      38600 physical reads
          0 redo size
        503 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

Again, with no (Visible) index present, the CBO currently has no choice but to use the Full Table Scan.

But during the next cycle, after Automatic Indexing kicks in again:

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 5
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 209.72 MB (209.72 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 89
SQL statements improved (improvement factor) : 31 (71.9x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 1.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
----------------------------------------------------------------------------
| Owner | Table     | Index                | Key     | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE       |
----------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : d4znwcu4h52ca
SQL Text            : select * from ziggy_new where code=42 and id between 1 and 10
Improvement Factor  : 38604x

Execution Statistics:
-----------------------------
                    Original Plan                Auto Index Plan
                    ---------------------------- ----------------------------
Elapsed Time (s):   3398605                      68
CPU Time (s):       3166824                      68
Buffer Gets:        463250                       3
Optimizer Cost:     6738                         4
Disk Reads:         463200                       0
Direct Writes:      0                            0
Rows Processed:     0                            0
Executions:         12                           1

PLANS SECTION
--------------------------------------------------------------------------------
-------------

- Original
-----------------------------
Plan Hash Value : 3165184525

--------------------------------------------------------------------------------
| Id | Operation                  | Name      | Rows | Bytes | Cost | Time     |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |           |      |       | 6738 |          |
|  1 |  TABLE ACCESS STORAGE FULL | ZIGGY_NEW |    1 |    23 | 6738 | 00:00:01 |
--------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value : 1514586396

-------------------------------------------------------------------------------------------------------
|  Id | Operation                            | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                      |    1 |    23 |    4 | 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |    4 | 00:00:01 |
| * 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

But this time, the index on the CODE,ID columns is created as a Visible index.

INDEX_NAME                     AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --------- -------- ---------- ----------- -----------------
SYS_AI_75j16xff1ag3j           YES VISIBLE   VALID      10000000       25123          10000000

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='ZIGGY_NEW';

INDEX_NAME                     COLUMN_NAME  COLUMN_POSITION
------------------------------ ------------ ---------------
SYS_AI_75j16xff1ag3j           CODE                       1
SYS_AI_75j16xff1ag3j           ID                         2

So this index can be generally used, both by the newer SQLs that generated the now Visible index:

SQL> select * from ziggy_new where code=42 and id between 1 and 10;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 1514586396

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |                      |    1 |    23 |       4 (0) | 00:00:01 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |       4 (0) | 00:00:01 |
|* 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          3 consistent gets
          0 physical reads
          0 redo size
        503 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

And also used by the SQLs with the unselective non-equality predicates, that Automatic Indexing would only create as Invisible indexes:

SQL> select * from ziggy_new where code=42 and id between 1 and 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1514586396

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |                      |    1 |    23 |       4 (0) | 00:00:01 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW            |    1 |    23 |       4 (0) | 00:00:01 |
|* 2 |   INDEX RANGE SCAN                   | SYS_AI_75j16xff1ag3j |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42 AND "ID">=1 AND "ID"<=100000)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          5 consistent gets
          0 physical reads
          0 redo size
        729 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

 

Automatic Indexing appears to currently not quite do the right thing with SQL statements that have unselective non-equality predicates, by creating such indexes as only Invisible Indexes, inclusive of the unselective columns.

Although an edge case, I would recommend looking through the list of created Automatic Indexes to see if any such Invisible/Valid indexes exists, as it can suggest there are current inefficient SQL statements that could benefit from such indexes being Visible.

Automatic Indexes: Automatically Rebuild Unusable Indexes Part IV (“Nothing Has Changed”) May 31, 2022

Posted by Richard Foote in 19c, 19c New Features, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Column Order, Index Internals, Local Indexes, Mixing Auto and Manual Indexes, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle19c, Unusable Indexes.
1 comment so far

In a previous post, I discussed how Automatic Indexing (AI) does not automatically rebuild a manually built index that is in an Unusable state (but will rebuild an Unusable automatically created index).

The demo I used was a simple one, based on manually created indexes referencing a non-partitioned table.

In this post, I’m going to use a demo based on manually created indexes referencing a partitioned table.

I’ll start by creating a rather basic range-based partitioned table, using the RELEASE_DATE column to partition the data by year:

SQL> CREATE TABLE big_bowie (id number, album_id number, country_id number, release_date date,
total_sales number) PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2020 VALUES LESS THAN (TO_DATE('01-JAN-2021', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2021 VALUES LESS THAN (MAXVALUE));

Table created.

SQL> INSERT INTO big_bowie SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE');

PL/SQL procedure successfully completed.

I’ll next manually create a couple indexes; a non-partitioned index based on just the ALBUM_ID column and a prefixed locally partitioned index, based on the columns RELEASE_DATE, TOTAL_SALES:

 

SQL> create index album_id_i on big_bowie(album_id);

Index created.

SQL> create index release_date_total_sales_i on big_bowie(release_date, total_sales) local;

Index created.

 

If we now re-organise just partition ALBUMS_2017 (without using the ONLINE clause):

SQL> alter table big_bowie move partition albums_2017;

Table altered.

This results in the non-partitioned index and the ALBUMS_2017 local index partition becoming Unusable:

SQL> select index_name, status from user_indexes where table_name='BIG_BOWIE';

INDEX_NAME                     STATUS
------------------------------ --------
ALBUM_ID_I                     UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     N/A

SQL> select index_name, partition_name, status from user_ind_partitions
     where index_name='RELEASE_DATE_TOTAL_SALES_I';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2014          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2015          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2016          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2017          UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2018          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2019          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2020          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2021          USABLE

Let’s now run a number of queries a number of times. The first series is based on a predicate on just the ALBUM_ID column, such as:

SQL> select * from big_bowie where album_id=42;

2000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1510748290

-------------------------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           | 2000 | 52000 |    7959 (2) | 00:00:01 |       |       |
|   1 | PARTITION RANGE ALL |           | 2000 | 52000 |    7959 (2) | 00:00:01 |     1 |     8 |
| * 2 |  TABLE ACCESS FULL  | BIG_BOWIE | 2000 | 52000 |    7959 (2) | 00:00:01 |     1 |     8 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("ALBUM_ID"=42)
  - filter("ALBUM_ID"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      48593 consistent gets
      42881 physical reads
          0 redo size
      44289 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       2000 rows processed

We’ll also run a series of queries based on both the RELEASE_DATE column using dates from the unusable index partition and the TOTAL_SALES column, such as:

SQL> select * from big_bowie where release_date='01-JUN-2017' and total_sales=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3245457041

----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    1 |    26 |     986 (2) | 00:00:01 |       |       |
|   1 | PARTITION RANGE SINGLE |           |    1 |    26 |     986 (2) | 00:00:01 |     4 |     4 |
| * 2 |  TABLE ACCESS FULL     | BIG_BOWIE |    1 |    26 |     986 (2) | 00:00:01 |     4 |     4 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("TOTAL_SALES"=42 AND "RELEASE_DATE"=TO_DATE(' 2017-06-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
   - filter("TOTAL_SALES"=42 AND "RELEASE_DATE"=TO_DATE(' 2017-06-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       5573 consistent gets
          0 physical reads
          0 redo size
        676 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

Without a valid/usable index, the CBO currently has no choice but to use a Full Table Scan on the first query, and a Full Partition Scan on the partition with the unusable local index.

So what does AI make of things? Does it rebuild the unusable manually created indexes so the associated indexes can be used to improve these queries?

If we wait until the next AI task completes and check out the indexes on the table:

SQL> select index_name, status, partitioned from user_indexes where table_name='BIG_BOWIE';

INDEX_NAME                     STATUS   PAR
------------------------------ -------- ---
RELEASE_DATE_TOTAL_SALES_I     N/A      YES
ALBUM_ID_I                     UNUSABLE NO
SYS_AI_aw2825ffpus5s           VALID    NO
SYS_AI_2hf33fpvnqztw           VALID    NO

SQL> select index_name, partition_name, status from user_ind_partitions
     where index_name='RELEASE_DATE_TOTAL_SALES_I';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2014          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2015          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2016          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2017          UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2018          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2019          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2020          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2021          USABLE

We notice that AI has created two new non-partitioned automatic indexes, while both the manually created indexes remain in the same unusable state. If we look at the columns associated with these new automatic indexes:

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BIG_BOWIE';

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
ALBUM_ID_I                     ALBUM_ID                           1
RELEASE_DATE_TOTAL_SALES_I     RELEASE_DATE                       1
RELEASE_DATE_TOTAL_SALES_I     TOTAL_SALES                        2
SYS_AI_aw2825ffpus5s           ALBUM_ID                           1
SYS_AI_aw2825ffpus5s           RELEASE_DATE                       2
SYS_AI_2hf33fpvnqztw           TOTAL_SALES                        1
SYS_AI_2hf33fpvnqztw           RELEASE_DATE                       2

As we can see, AI has logically replaced both unusable indexes.

The manual index based on ALBUM_ID has been replaced with an inferior index based on the ALBUM_ID, RELEASE_DATE columns. Inferior in that the automatic index is both redundant (if only the manual index on ALBUM_ID were rebuilt) and in that it has the logically unnecessary RELEASE_DATE column to inflate the size of the index.

The manual index based on the RELEASE_DATE, TOTAL_SALES columns has been replaced with a redundant automatic index based on the reversed TOTAL_SALES, RELEASE_DATE columns.

Now, AI has indeed automatically addressed the current FTS performance issues associated with these queries by creating these indexes, but a better remedy would have been to rebuild the unusable manual indexes and hence negate the need for these redundant automatic indexes.

But currently (including with version 21.3), AI will NOT rebuild unusable manually created indexes, no matter the scenario, and will instead create additional automatic indexes if it’s viable for it to do so.

A reason why Oracle at times recommends dropping all current manually created secondary indexes before implementing AI (although of course this comes with a range of obvious issues and concerns).

If these manually created indexes didn’t exist, I’ll leave it as an exercise to the discernable reader on what automatic indexes would have been created…

As always, this restriction may change in future releases…

Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part III (“Loaded”) April 28, 2022

Posted by Richard Foote in 19c, Advanced Index Compression, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Column Order, Index Compression, Oracle, Oracle 21c, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Overloading.
add a comment

In my previous two posts, I’ve discussed scenarios where Automatic Indexing (AI) does not currently created automatic indexes and you may need to manually create the necessary indexes.

In this post, I’ll discuss a third scenario where AI will create an index, but you may want to manually create an even better one…

I’ll start by creating a relatively “large” table, with 20+ columns:

SQL> create table bowie_overload (id number, code1 number, code2 number, stuff1 varchar2(42), stuff2 varchar2(42), stuff3 varchar2(42), stuff4 varchar2(42), stuff5 varchar2(42), stuff6 varchar2(42), stuff7 varchar2(42), stuff8 varchar2(42), stuff9 varchar2(42), stuff10 varchar2(42), stuff11 varchar2(42), stuff12 varchar2(42), stuff13 varchar2(42), stuff14 varchar2(42), stuff15 varchar2(42), stuff16 varchar2(42), stuff17 varchar2(42), stuff18 varchar2(42), stuff19 varchar2(42), stuff20 varchar2(42), name varchar2(42));

Table created.

SQL> insert into bowie_overload select rownum, mod(rownum, 1000)+1, '42', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke','David Bowie', 'Major Tom', 'Ziggy Stardust', 'Aladdin Sane', 'Thin White Duke', 'The Spiders From Mars' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_OVERLOAD');

PL/SQL procedure successfully completed.

 

The main columns to note here are CODE1 which contains 1000 distinct values (and so is kinda selective on a 10M row table, but not spectacularly so, especially with a poor clustering factor) and CODE2 which always contains the same value of “42” (and so will compress wonderfully for maximum effect).

I’ll next run the following query a number of times:

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1883860831

--------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                | 10000 | 70000 |  74817 (1) | 00:00:03 |
| * 1 | TABLE ACCESS STORAGE FULL | BOWIE_OVERLOAD | 10000 | 70000 |  74817 (1) | 00:00:03 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=24)
    filter("CODE1"=24)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     869893 consistent gets
     434670 physical reads
          0 redo size
     183890 bytes sent via SQL*Net to client
       7378 bytes received via SQL*Net from client
        668 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

 

Without an index, the CBO currently has no choice but to perform a FTS. An index on the CODE1 column would provide the necessary filtering to fetch and return the required rows.

BUT, if this query was important enough, we could improve things further by “Overloading” this index with the CODE2 column, so we could use the index exclusively to get all the necessary data, without having to access the table at all. Considering an index on just the CODE1 column would need to fetch a reasonable number of rows (10000) and would need to visit a substantial number of different table blocks due to its poor clustering, overloading the index in this scenario would substantially reduce the necessary workloads of this query.

So what does AI do in this scenario, is overloading an index considered?

If we look at the AI report:

GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 28-APR-2022 12:15:45
Activity end                : 28-APR-2022 12:16:33
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 1
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 134.22 MB (134.22 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 2
SQL statements improved (improvement factor) : 2 (47.1x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 47.1x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner | Table          | Index                | Key   | Type   | Properties |
-------------------------------------------------------------------------------
| BOWIE | BOWIE_OVERLOAD | SYS_AI_aat8t6ad0ux0h | CODE1 | B-TREE | NONE       |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name : BOWIE
SQL ID              : bh5cuyv8ga0bt
SQL Text            : select code1, code2 from bowie_overload where code1=42
Improvement Factor  : 46.9x

Execution Statistics:
-----------------------------
                    Original Plan                Auto Index Plan
                    ---------------------------- ----------------------------
Elapsed Time (s):   42619069                     241844
CPU Time (s):       25387841                     217676
Buffer Gets:        12148771                     18499
Optimizer Cost:     74817                        10021
Disk Reads:         6085380                      9957
Direct Writes:      0                            0
Rows Processed:     140000                       10000
Executions:         14                           1

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 1883860831

--------------------------------------------------------------------------------
| Id | Operation         | Name           | Rows  | Bytes | Cost  | Time       |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |                |       |       | 74817 |            |
|  1 | TABLE ACCESS FULL | BOWIE_OVERLOAD | 10000 | 70000 | 74817 | 00:00:03   |
--------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value : 2541132923

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost  | Time       |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |  9281 | 64967 | 10021 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD       |  9281 | 64967 | 10021 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_aat8t6ad0ux0h | 10000 |       |    18 | 00:00:01   |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE1"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We see that an automatic index on just the CODE1 column was created.

 

SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_OVERLOAD';

INDEX_NAME                AUT VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_aat8t6ad0ux0h      YES VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position;

INDEX_NAME                COLUMN_NAME     COLUMN_POSITION
------------------------- --------------- ---------------
SYS_AI_aat8t6ad0ux0h      CODE1                         1

 

If we now re-run the query (noting in Oracle21c after you invalidate the current cursor):

 

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2541132923

------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 |  Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      | 10000 | 70000 |   10021 (1)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_OVERLOAD       | 10000 | 70000 |   10021 (1)| 00:00:01 |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_aat8t6ad0ux0h | 10000 |       |      18 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE1"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      10021 consistent gets
          0 physical reads
          0 redo size
      50890 bytes sent via SQL*Net to client
         63 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

The query now uses the newly created automatic index.

BUT, at 10021 consistent gets, it’s still doing a substantial amount to work here.

If we manually create another index that overloads the only other column (CODE2) required in this query:

SQL> create index bowie_overload_code1_code2_i on bowie_overload(code1,code2) compress advanced low;

Index created.

I’m using COMPRESS ADVANCED LOW as used by the automatic index, noting that CODE2 only contains the value “42” for all rows, making it particularly perfect for compression and a “best case” scenario when it comes to the minimal overheads potentially associated with overloading this index (I’m trying yo give AI every chance here):

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_OVERLOAD';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_aat8t6ad0ux0h           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000
BOWIE_OVERLOAD_CODE1_CODE2_I   NO  NO  VISIBLE   ADVANCED LOW  VALID      10000000       15363          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_OVERLOAD' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
BOWIE_OVERLOAD_CODE1_CODE2_I   CODE1                         1
BOWIE_OVERLOAD_CODE1_CODE2_I   CODE2                         2
SYS_AI_aat8t6ad0ux0h           CODE1                         1

In fact, my manually created index is effectively the same size as the automatic index, with the same number (15363) of leaf blocks.

So I’m giving AI the best possible scenario in which it could potentially create an overloaded index.

But I’ve never been able to get AI to create overloaded indexes. Only columns in filtering predicates are considered for inclusion in automatic indexes.

If I now re-run my query again:

SQL> select code1, code2 from bowie_overload where code1=42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1161047960

-------------------------------------------------------------------------------------------------
| Id  | Operation        | Name                         |  Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                              | 10000 | 70000 |      18 (0)| 00:00:01 |
| * 1 | INDEX RANGE SCAN | BOWIE_OVERLOAD_CODE1_CODE2_I | 10000 | 70000 |      18 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("CODE1"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         21 consistent gets
          0 physical reads
          0 redo size
      50890 bytes sent via SQL*Net to client
         63 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
      10000 rows processed

We notice the CBO now uses the manually created index without any table access path, as it can just use the index to access the necessary data.

The number of consistent gets as a result has reduced significantly, down to just 21, a fraction of the previous 10021 when the automatic index was used.

So the scenario an of overloaded index that could significantly reduce database resources, which is currently not supported by AI, is another example of where may want to manually create a necessary index.

As always, this may change in the future releases…

Oracle Database 19c Automatic Indexing: Index Compression Update (New Morning) January 27, 2021

Posted by Richard Foote in 19c, 19c New Features, Advanced Index Compression, Autonomous Database, Autonomous Transaction Processing, AUTO_INDEX_COMPRESSION, Exadata, Index Column Order, Index Compression, Oracle, Oracle Blog, Oracle General, Oracle Indexes, Oracle19c.
add a comment

 

I was reminded in a recent comment by Rajeshwaran Jeyabal that I hadn’t updated my post on Automatic Indexing with Advanced Compression that’s in need of a couple of amendments.

Initially when Automatic Indexing was released, the ability to set Advanced Compression was NOT included in the official documentation, although the EXEC DBMS_AUTO_INDEX.CONFIGURE( ‘AUTO_INDEX_COMPRESSION‘ , ‘ON’); option was readily accessible. This has now been fixed and the associated doco on setting Advanced Compression for Automatic Indexes can be found here.

The other significant change is that Advanced Compression Low is now the default behaviour when Automatic Indexes are created in the Oracle ATP Autonomous Database Cloud environment. This makes sense in that if you have access to the Advanced Compression option, setting all indexes to Advanced Compression Low is the no-brainer setting as I’ve discussed previously. So several of my more recent posts show how Automatic Indexes have been created with Advanced Compression Low set.

What hasn’t changed however is how Automatic Indexing does NOT consider the efficiency of an index in relation to Index Compression when deciding how to order the columns within the index.

The default order of columns within an index (when other SQL predicates are not a consideration) is simply the order by which the columns appear within the table. Even though an index could be significantly smaller thanks to Index Compression if columns with more repeated values are ordered first within an index, this is not something Automatic Indexing currently considers.

The demo in my original piece still works exactly the same in the current 19c database versions of the ATP Autonomous Cloud environments. Manually created indexes can be significantly smaller if index columns are reordered or dropped entirely if they don’t provide filtering benefits.

When reading my blog, please do take note of the date of blog piece, especially in relation to Automatic Indexing. Things are only accurate as at time of publication and may change subsequently.

I thank Rajeshwaran for getting me to pull my finger out and update my blog accordingly…

Oracle Database 19c Automatic Indexing – Indexed Column Reorder (What Shall We Do Now?) December 18, 2019

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Index Column Order, Index Column Reorder.
2 comments

 

I previously discussed how the default column order of an Automatic Index (in the absence of other factors) is based on the Column ID, the order in which the columns are defined in the table.

But what if there are “other factors” based on new workloads and the original index column order is no longer optimal or appropriate ?

I’ll begin by creating a table, with the following key column defined in CODE1, CODE2, CODE3 order:

 

SQL> create table major_tom (id number, code1 number, code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into major_tom select rownum, mod(rownum, 10)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 1000)), 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density from user_tab_columns where table_name='MAJOR_TOM';

COLUMN_NAME          NUM_DISTINCT    DENSITY
-------------------- ------------ ----------
ID                        9914368 1.0086E-07
CODE1                          10  .00000005
CODE2                         100  .00000005
CODE3                       1000        .001
NAME                           1           1

If I now run the following query with predicates based on these three columns:

SQL> select * from major_tom where code3=42 and code2=42 and code1=4;

15 rows selected.

Execution Plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    10 |   280 |  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |           |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000  |    10 |   280 |  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |           |    10 |   280 |  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM |    10 |   280 |  7354   (7)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

     4 - storage("CODE3"=42 AND "CODE2"=42 AND "CODE1"=4)
         filter("CODE3"=42 AND "CODE2"=42 AND "CODE1"=4)

Statistics
----------------------------------------------------------

        34 recursive calls
         5 db block gets
     45861 consistent gets
         0 physical reads
      1044 redo size
      1087 bytes sent via SQL*Net to client
       588 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
        15 rows processed

 

After the default 15 minutes period in which the Automatic Index task is run, if we look at what Automatic Index has been created:

 

INDEX DETAILS

-------------------------------------------------------------------------------

  1. The following indexes were created:

--------------------------------------------------------------------------------------
| Owner | Table     | Index                | Key               | Type   | Properties |
--------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM | SYS_AI_9mrs058nrg9d5 | CODE1,CODE2,CODE3 | B-TREE | NONE       |
--------------------------------------------------------------------------------------

VERIFICATION DETAILS

-------------------------------------------------------------------------------
 	1. The performance of the following statements improved: 
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : ayuj12hggwrvc
SQL Text             : select * from major_tom where code3=42 and code2=42 and code1=4
Improvement Factor   : 45853.8x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------

Elapsed Time (s):  3103394                       946
CPU Time (s):      3092860                       1017
Buffer Gets:       596102                        18
Optimizer Cost:    7354                          18
Disk Reads:        0                             2
Direct Writes:     0                             0
Rows Processed:    195                           15
Executions:        13                            1

 

We can see Oracle has indeed created an Automatic Index (SYS_AI_9mrs058nrg9d5) in the default CODE1, CODE2, CODE3 order.

But if we now run a new query, based on a predicate on just the CODE3 column:

 

SQL> select * from major_tom where code3=42; 

9961 rows selected.

Execution Plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           | 10000 |   273K|  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |           |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000  | 10000 |   273K|  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |           | 10000 |   273K|  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM | 10000 |   273K|  7354   (7)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

     4 - storage("CODE3"=42)
         filter("CODE3"=42)

Statistics
----------------------------------------------------------
         8 recursive calls
         4 db block gets
     45853 consistent gets
         0 physical reads
         0 redo size
    166137 bytes sent via SQL*Net to client
       599 bytes received via SQL*Net from client
         3 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
      9961 rows processed

 

We can see the CBO has NOT used the index, as the leading column of the existing index is not mentioned in the SQL predicate and the CBO deems an Index Skip Scan as too expensive:

If we now run an SQL with predicates based on just the CODE2 and CODE3 columns:

 

SQL> select * from major_tom where code3=42 and code2=42;

101 rows selected.

Execution Plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |   100 |  2800 |  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |           |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000  |   100 |  2800 |  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |           |   100 |  2800 |  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM |   100 |  2800 |  7354   (7)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

     4 - storage("CODE3"=42 AND "CODE2"=42)
         filter("CODE3"=42 AND "CODE2"=42)

Statistics
----------------------------------------------------------

         6 recursive calls
         0 db block gets
     45853 consistent gets
         0 physical reads
         0 redo size
      2281 bytes sent via SQL*Net to client
       588 bytes received via SQL*Net from client
         2 SQL*Net roundtrips to/from client
         0 sorts (memory)
         0 sorts (disk)
       101 rows processed

 

The existing Automatic Index is again not used as the important CODE1 column which is the leading column of the index is not mentioned in the SQL predicates and the CBO deems an Index Skip Scan as too expensive.

I know from experience many DBAs would simply create a new index with CODE3, CODE2 as the leading columns. But what does Automatic Indexing do in the scenario?

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------

Activity start               : 21-JUN-2019 02:39:32
Activity end                 : 21-JUN-2019 02:40:18
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                       : 1
Indexes created (visible / invisible)  : 1 (1 / 0)
Space used (visible / invisible)       : 243.27 MB (243.27 MB / 0 B)
Indexes dropped (space reclaimed)      : 1 (243.27 MB)
SQL statements verified                : 1
SQL statements improved                : 0
SQL plan baselines created             : 0
Overall improvement factor             : 0x

-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------

Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

 

We notice in the Automatic Indexing report it’s stating that one new index has been created BUT that one index has also been dropped.

 

INDEX DETAILS
-------------------------------------------------------------------------------

         1. The following indexes were created:

-------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
| Owner | Table     | Index                | Key               | Type   | Properties |
--------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM | SYS_AI_00hxxxkgb821n | CODE3,CODE2,CODE1 | B-TREE | NONE       |
--------------------------------------------------------------------------------------
 
 	1. The following indexes were dropped: 

-------------------------------------------------------------------------------
--------------------------------------------------------------------------------------
| Owner | Table     | Index                | Key               | Type   | Properties |
--------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM | SYS_AI_9mrs058nrg9d5 | CODE1,CODE2,CODE3 | B-TREE | NONE       |
--------------------------------------------------------------------------------------
 


SQL> select index_name, column_name, column_position from user_ind_columns where table_name='MAJOR_TOM' order by column_position;

INDEX_NAME           COLUMN_NAME     COLUMN_POSITION
-------------------- --------------- ---------------
SYS_AI_00hxxxkgb821n CODE3                         1
SYS_AI_00hxxxkgb821n CODE2                         2
SYS_AI_00hxxxkgb821n CODE1                         3

 

Automatic Indexing has created a new index (SYS_AI_00hxxxkgb821n), with the columns in CODE3, CODE2, CODE1 order, as this index is able to service all currently known SQL predicate combinations:

WHERE CODE3 = x AND CODE2 = y AND CODE1= z

WHERE CODE3 = x AND CODE2 = y

WHERE CODE3 = x

as the leading column in the index is listed in all three current scenarios.

 

This means the previous index in CODE1, CODE2, CODE3 order is now redundant as the new index can fully service all know SQL predicate combinations. As a result, Automatic Indexing drops the redundant index.

This is a really nice capability of Automatic Indexing, the ability to effectively reorder the columns within an index based on new workloads.

But what if subsequent new SQL workloads means the new index is not able on its own to service all such workloads. I’ll discuss this scenario in my next post.

Oracle Database 19c Automatic Indexing: Index Compression (Ghosteen) December 9, 2019

Posted by Richard Foote in 19c, 19c New Features, Advanced Index Compression, Automatic Indexing, AUTO_INDEX_COMPRESSION, Index Column Order, Index Compression, Index Internals.
2 comments

 

 

In my previous post on Automatic Indexing, I discussed how the default index column order (in absence of other factors) is column id, the order in which the columns are defined in the table. In this post, I’ll explore if this changes if index compression is also implemented.

By default, Automatic Indexing does NOT use index compression. However, if you have access to the Advanced Compression option, you have the choice to turn on index compression in the following manner:

SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_COMPRESSION','ON');

PL/SQL procedure successfully completed.

Note: the AUTO_INDEX_COMPRESSION parameter is not actually documented, which could be a documentation bug or that Oracle is not yet ready to release this capability. The above will enable Automatic Indexes to be created with Compress Advanced Low, which is the “no-brain” index compression option as it will compress (deduplicate) safely with negligible CPU overheads. However, index column order is still critical to ensure effective compression as we shall see…

We begin by creating a simple table, that has four columns of interest, CODE1, CODE2, CODE3 and STATUS. They are defined in this order within the table, but CODE1 has the most number of distinct values (5000000 distinct values), then CODE2 (1000), then CODE3 (10) and finally STATUS which only has the 1 distinct value.

SQL> create table bowie_compress (id number, code1 number, code2 number, code3 number, status varchar2(42), name varchar2(42));

Table created.

SQL> insert into bowie_compress select rownum, mod(rownum, 5000000)+1, mod(rownum, 1000)+1, mod(rownum, 10)+1, 'COMPLETED’,

'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_COMPRESS');

PL/SQL procedure successfully completed.

 

In terms of being the most efficient from a compression perspective, it would be better to have the index defined in STATUS, CODE3, CODE2, CODE1 order, so that the leading columns in the index have the most duplicated values that enable effective deduplication and hence index compression. I’ve discussed the importance of index column for effective index compression a number of times previously. Arguably, it would be better not to actually index the STATUS column at all as with just 1 distinct value, provides no effective filtering benefits.

Having the CODE1 column as the leading column however with so many distinct values would effectively make the index non-compressible (with LOW compression), as the leading column would have too many distinct values to benefit much from compression.

So how does Automatic Indexing handle this scenario? Does it keep the same default index column order or does it alter the index column order to provide better index compression benefits?

Let’s run the following SQL with all four columns in the predicates and see what index Automatic Indexing creates…

 

SQL> select * from bowie_compress where code1=42 and code2=42 and code3=2 and status='COMPLETED';

Execution Plan

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |     1 |    41 |   9998  (5)| 00:00:01 |
|   1 |  PX COORDINATOR              |                |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000       |     1 |    41 |   9998  (5)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |                |     1 |    41 |   9998  (5)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_COMPRESS |     1 |    41 |   9998  (5)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

4 - storage("CODE1"=42 AND "CODE2"=42 AND "CODE3"=2 AND "STATUS"='COMPLETED')
     filter("CODE1"=42 AND "CODE2"=42 AND "CODE3"=2 AND "STATUS"='COMPLETED')

Statistics
----------------------------------------------------------
        6 recursive calls
        0 db block gets
    63562 consistent gets
        0 physical reads
        0 redo size
     1038 bytes sent via SQL*Net to client
      588 bytes received via SQL*Net from client
        2 SQL*Net roundtrips to/from client
        0 sorts (memory)
        0 sorts (disk)
        2 rows processed

 

If we look at the Automatic Indexing report for the period in which the above SQL was run:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-JUL-2019 00:18:35
Activity end                 : 18-JUL-2019 00:19:58
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                                : 1
Indexes created (visible / invisible)           : 1 (1 / 0)
Space used (visible / invisible)                : 293.6 MB (293.6 MB / 0 B)
Indexes dropped                                 : 0
SQL statements verified                         : 1
SQL statements improved (improvement factor)    : 1 (63563.9x)
SQL plan baselines created                      : 0
Overall improvement factor                      : 63563.9x

-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
--------------------------------------------------------------------------------------------------
| Owner | Table          | Index                | Key                      | Type   | Properties |
--------------------------------------------------------------------------------------------------
| BOWIE | BOWIE_COMPRESS | SYS_AI_bkdhrsd29vd4f | CODE1,CODE2,CODE3,STATUS | B-TREE | NONE       |
--------------------------------------------------------------------------------------------------

 

We see that Automatic Index has created the index with all four columns from the SQL predicate in again the default column order as the column order as defined in the table (CODE1, CODE2, CODE3, STATUS). Even though Automatic Index Compression was enabled, it hasn’t considered the column cardinalities in its determination of best index column order.

Automatic Indexing has the tendency to index ALL columns specified in SQL predicates, regardless of whether all such columns provide filtering benefits AND does not consider the best column order from a compression perspective when determining index column order. So definitely room for improvement here methinks.

If we look at the definition and size of the resultant Automatic Index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_COMPRESS';

INDEX_NAME                   AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_bkdhrsd29vd4f         YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       35451          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_COMPRESS' order by index_name, column_position;

INDEX_NAME                   COLUMN_NAME     COLUMN_POSITION
---------------------------- --------------- ---------------
SYS_AI_bkdhrsd29vd4f         CODE1                         1
SYS_AI_bkdhrsd29vd4f         CODE2                         2
SYS_AI_bkdhrsd29vd4f         CODE3                         3
SYS_AI_bkdhrsd29vd4f         STATUS                        4

 

We note the index has 35451 leaf blocks.

If we were to create the index manully in a more appropriate manner from a compression perspective, with the index columns defined in reverse order and also with another index without the redundant STATUS column:

SQL> create index bowie_compress_best_order_i on bowie_compress(status, code3, code2, code1) compress advanced low;

Index created.

SQL> create index bowie_compress_best_order2_i on bowie_compress(code3, code2, code1) compress advanced low;

Index created.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_COMPRESS';

INDEX_NAME                   AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_bkdhrsd29vd4f         YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       35451          10000000
BOWIE_COMPRESS_BEST_ORDER_I  NO  NO  VISIBLE   ADVANCED LOW  VALID      10000000       23509          10000000
BOWIE_COMPRESS_BEST_ORDER2_I NO  NO  VISIBLE   ADVANCED LOW  VALID      10000000       23462          10000000

 

We notice the resultant indexes are substantially smaller, at just 23509 and 23462 leaf blocks respectively.

It could well be that Index Compression is not yet documented because Oracle appreciates that more work yet needs to be done to create indexes optimally from a compression perspective…

Oracle Database 19c Automatic Indexing: Default Index Column Order Part II (Future Legend) September 11, 2019

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Clustering Factor, Index Column Order.
4 comments

In Part I, we explored some options that Oracle might adopt when ordering the columns within an Automatic Index by default, in the absence of other factors where there is only the one SQL statement to be concerned with.

A point worth making is that if all columns of an index are specified within SQL equality predicates, then the ordering of columns within an index is of little consequence. I’ve discussed this point a number of times previously.

Let’s explore if perhaps the resultant Clustering Factor of an index might be a factor in the default Automatic Index column order.

I begin by creating a table that has two columns of interest, CODE1 which is very well clustered and CODE2 which is poorly clustered:

SQL> create table muse (id number, code2 number, code1 number, name varchar2(42));

Table created.

SQL> create sequence muse_seq;

Sequence created.

SQL> create or replace procedure pop_muse as
begin
  for code1_value in 1..10000 loop
     for i in 1..100 loop
        insert into muse values (muse_seq.nextval, ceil(dbms_random.value(0,100)), code1_value, 'Back Holes');
     end loop;
   end loop;
   commit;
end;
/

Procedure created.

SQL> exec pop_muse

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MUSE');

 

We then run the following query in which to hopefully create an Automatic Index on both CODE1 and CODE2 columns:

 

SQL> select * from muse where code1=406 and code2=83;

15 rows selected.

 

If we wait for the Automatic Index to be created and check out the Automatic Index report:

 

INDEX DETAILS
-------------------------------------------------------------------------------
  1 The following indexes were created:
*: invisible

----------------------------------------------------------------------------
| Owner | Table | Index                | Key         | Type   | Properties |
----------------------------------------------------------------------------
| BOWIE | MUSE  | SYS_AI_c1m8fkukj1368 | CODE2,CODE1 | B-TREE | NONE       |
----------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
1 The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE

SQL ID               : 0pdqsvpggupnz

SQL Text             : select * from muse where code1=406 and code2=83

Improvement Factor   : 4092.8x

 

SQL> select index_name, column_name, column_position from user_ind_columns
     where table_name='MUSE' order by index_name, column_position;

INDEX_NAME             COLUMN_NAME          COLUMN_POSITION
---------------------- -------------------- ---------------
SYS_AI_c1m8fkukj1368   CODE2                              1
SYS_AI_c1m8fkukj1368   CODE1                              2

 

We notice the index is created in CODE2, CODE1 column order.

If we create a manual index with the column order reversed:

 

SQL> create index muse_code1_code2_i on muse(code1, code2);

Index created.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
     from user_indexes where table_name='MUSE';

INDEX_NAME             AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_c1m8fkukj1368   YES NO  VISIBLE   DISABLED      VALID       1000000        2506            362900
MUSE_CODE1_CODE2_I     NO  NO  VISIBLE   DISABLED      VALID       1000000        2510            129878

 

We notice that the manual index has the better resultant Clustering Factor. So the Clustering Factor doesn’t appear to be a factor in Automatic Index column order (no pun intended).

If we re-create the initial table in Part I, but this time with the columns defined in the table in reverse order:

 

SQL> create table major_tom3 (id number, code3 number, code2 number, code1 number, name varchar2(42));

Table created.

SQL> insert into major_tom3 select rownum, mod(rownum, 1000)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 10)),
'David Bowie' from dual connect by level  commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM3');

PL/SQL procedure successfully completed.

 

If we again run the following query:

 

SQL> select * from major_tom3 where code3=4 and code2=42 and code1=42

...

 

And wait for the Automatic Index to be created and look at the resultant report:

 

INDEX DETAILS

-------------------------------------------------------------------------------
   1 The following indexes were created:
---------------------------------------------------------------------------------------
| Owner | Table      | Index                | Key               | Type   | Properties |
---------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM3 | SYS_AI_g6sw030tg5ba9 | CODE3,CODE2,CODE1 | B-TREE | NONE       |
---------------------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
   1 The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE

SQL ID               : 22kts3uwj7kma

SQL Text             : select * from major_tom3 where code3=4 and code2=42 and code1=42

Improvement Factor   : 45854.1x

 

SQL> select i.index_name, i.column_name, i.column_position, t.num_distinct
from user_ind_columns i, user_tab_columns t
where i.table_name = t.table_name and i.column_name = t.column_name and i.table_name='MAJOR_TOM3'
order by i.index_name, i.column_position;

INDEX_NAME           COLUMN_NAME     COLUMN_POSITION NUM_DISTINCT
-------------------- --------------- --------------- ------------
SYS_AI_g6sw030tg5ba9 CODE3                         1         1000
SYS_AI_g6sw030tg5ba9 CODE2                         2          100
SYS_AI_g6sw030tg5ba9 CODE1                         3           10

 

We notice that the resultant Automatic Index has been created in CODE3, CODE2, CODE1 order.

After creating many many Automatic Indexes under all sorts of different scenarios, the DEFAULT behaviour is for Oracle to create Automatic Indexes in Column ID order (the order in which they are defined in the table definition).

Of course as we’ll see in future posts, if there are several conflicting SQL predicates, there are various other factors that govern a more appropriate Automatic Index order, but the fact that Oracle creates Automatic Indexes in Column ID order in the absence of other factors is useful to know.

As I said previously, if all indexed columns are specified in SQL equality predicates, index column order has little consequence. But as we’ll see in the next post, there are scenarios where index column order can be very important and this default index column order may not be the most optimal…

Oracle Database 19c Automatic Indexing: Default Index Column Order Part I (Anyway Anyhow Anywhere) September 2, 2019

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Index Column Order, Oracle Indexes.
3 comments

pin ups

The next thing I was curious about regarding Automatic Indexing was in which order would Oracle by default order the columns within an index. This can be a crucial decision with respect to the effectiveness of the index (but then again, may not be so crucial as well). Certainly one would expect the index column order be dependent on the SQL predicates running in the database and I’ll discuss all that in future posts, but what is the default behaviour here with regard index column order based (for now) on a single SQL predicate.

I could come up with a number of possible options that Oracle might adopt when determining the default index column order such as:

  • Column Name Order
  • Column ID Order
  • (Reverse) Column Cardinality Order
  • Best Clustering Factor
  • Other (Random even)

So to investigate this, I started with a basic table with 3 columns (CODE1, CODE2, CODE3) that had differing levels of cardinality:

SQL> create table major_tom (id number, code1 number, code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into major_tom select rownum, mod(rownum, 10)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 1000)), 'David Bowie' from dual connect by level  commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density from user_tab_columns where table_name='MAJOR_TOM';

COLUMN_NAME          NUM_DISTINCT    DENSITY
-------------------- ------------ ----------
ID                        9914368 1.0086E-07
CODE1                          10  .00000005
CODE2                         100  .00000005
CODE3                        1000       .001
NAME                            1          1

I then ran the following query with a predicate based on the 3 columns CODE1, CODE2 and CODE3:

SQL> select * from major_tom where code3=42 and code2=42 and code1=4;

15 rows selected.

Execution Plan
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    10 |   280 |  7354   (7)| 00:00:01 |
|   1 |  PX COORDINATOR              |           |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000  |    10 |   280 |  7354   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |           |    10 |   280 |  7354   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| MAJOR_TOM |    10 |   280 |  7354   (7)| 00:00:01 |
------------------------------------------------------------------------------------------

If we look at the resultant Automatic Index:

INDEX DETAILS

-------------------------------------------------------------------------------
1 . The following indexes were created:
--------------------------------------------------------------------------------------
| Owner | Table     | Index                | Key               | Type   | Properties |
--------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM | SYS_AI_9mrs058nrg9d5 | CODE1,CODE2,CODE3 | B-TREE | NONE       |
--------------------------------------------------------------------------------------

 

SQL> select i.index_name, i.column_name, i.column_position, t.num_distinct
from user_ind_columns i, user_tab_columns t
where i.table_name = t.table_name and i.column_name = t.column_name and i.table_name='MAJOR_TOM'
order by i.index_name, i.column_position;

INDEX_NAME           COLUMN_NAME     COLUMN_POSITION NUM_DISTINCT
-------------------- --------------- --------------- ------------
SYS_AI_9mrs058nrg9d5 CODE1                         1           10
SYS_AI_9mrs058nrg9d5 CODE2                         2          100
SYS_AI_9mrs058nrg9d5 CODE3                         3         1000

 

We notice that the Automatic Index is in CODE1, CODE2, CODE3 order.

If we create a similar table, but this time have the columns with a different order of cardinality:

SQL> create table major_tom2 (id number, code1 number, code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into major_tom2 select rownum, mod(rownum, 1000)+1, ceil(dbms_random.value(0, 100)), ceil(dbms_random.value(0, 10)),
'David Bowie' from dual connect by level;

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'MAJOR_TOM2');

PL/SQL procedure successfully completed.

SQL> select * from major_tom where code3=42 and code2=42 and code1=4;

15 rows selected.

 

We notice that the resultant automatic index is still in the same CODE1, CODE2 and CODE3 order:

INDEX DETAILS

-------------------------------------------------------------------------------
1. The following indexes were created:
---------------------------------------------------------------------------------------
| Owner | Table      | Index                | Key               | Type   | Properties |
---------------------------------------------------------------------------------------
| BOWIE | MAJOR_TOM2 | SYS_AI_7w9t3tt9u171r | CODE1,CODE2,CODE3 | B-TREE | NONE       |
---------------------------------------------------------------------------------------

 

SQL> select i.index_name, i.column_name, i.column_position, t.num_distinct
from user_ind_columns i, user_tab_columns t
where i.table_name = t.table_name and i.column_name = t.column_name and i.table_name='MAJOR_TOM2'
order by i.index_name, i.column_position;

INDEX_NAME           COLUMN_NAME     COLUMN_POSITION NUM_DISTINCT
-------------------- --------------- --------------- ------------
SYS_AI_7w9t3tt9u171r CODE1                         1         1000
SYS_AI_7w9t3tt9u171r CODE2                         2          100
SYS_AI_7w9t3tt9u171r CODE3                         3           10

 

So we can eliminate column cardinality as being a contributing factor in Oracle deciding in which manner to order the indexed columns.

Which is unfortunate as we’ll see in a future post when we decide to implement Oracle Index Compression with Automatic Indexing.

In the next post, we’ll explore further other considerations and confirm how Oracle does indeed decide to order columns within an Automatic Index by default.

Index Column Order – Impact On Index Branch Blocks Part II (The Weeping Song) July 5, 2018

Posted by Richard Foote in Index Column Order, Index Compression, Oracle Indexes.
add a comment

weeping song

In Part I, I discussed how the order of columns in an index makes no real difference to the effectiveness of the index if all columns are referenced via equality predicates.

If the leading column has a high number of distinct columns, it might result in less necessary data within index branches as less data is required to determine the unique path down to the leaf block containing the first index entry of interest. This might save a moderate number of index branch blocks. The number of branch blocks though has a trivial impact on index performance, if as in the vast majority of cases, the index height remains the same.

However, if one can potentially significantly reduce the number of required leaf blocks within an index, this might not only also significantly reduce the number of associated index branch blocks, but obviously the overall size of the index. This is possible with Basic Index Compression, but such compression is only possible if the leading column(s) has relatively few distinct values.

So going back to the demo in Part I, when the index was created with the ID column leading (which had many distinct values):

SQL> create index ziggy_id_code_i ON ziggy(id, code);

Index created.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14135         23      176612   113264736  101146313

We note the size of the index, with 14135 leaf blocks and 23 branch blocks.

If we now attempt to compress this index with basic index compression:

SQL> alter index ziggy_id_code_i rebuild compress;

Index altered.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      15795         26      197435   126505652  113167136

We notice basic index compression has been totally ineffective. In fact, the index has increased in size with there now being 15795 leaf blocks and 26 branch blocks. The number of compressed index columns makes no difference, as it’s the leading column with high distinct values that is the problem here.

That’s because the de-duplication at the leaf block level necessary for effective basic index compression is impossible with the ID column leading as there are little to no replicated column values. Basic index compression must have high numbers of replicated column values in at least the leading column(s) to be effective.

If we look at the index with the replicated CODE column as the leading column:

SQL> create index ziggy_code_id_i on ziggy(code,id);

Index created.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14125         83      656341   113666656  101626042

We notice although the number of leaf blocks are similar to the previous non-compressed index at 14125 leaf blocks, at 83 there are more branch blocks (previous index had just 23). As discussed in Part I, this is because the relatively large sized CODE column must be stored in the branch blocks.

However, this index is compressible with the leading CODE column having duplicate values. Therefore, if we compress the index by compressing just the CODE column:

SQL> alter index ziggy_code_id_i rebuild compress 1;

Index altered.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3       4620         28      214696    37166416   33369357

We notice not only has the number of branch blocks reduced (28 down from 83), but more importantly, we have significantly reduced the number of overall leaf blocks (4620 down from 14125).

So if reducing the size of the resultant index is the aim, you will generally get a much better result by using basic index compression and ensuring the columns with the few distinct values are the leading columns, than by potentially moderately reducing branch blocks with the leading column more distinct.

The other advantage to placing the columns with fewer distinct values as the leading columns of an index is that it makes an Index Skip Scan a viable execution path if the leading column(s) is not referenced in a predicate. This is not possible if the leading column is too distinct. I’ve discussed Index Skip Scans previously in this blog.

Note basic index compression is free (you don’t need the Advanced Compression Option), but you do need to be on Enterprise Edition to use this feature.

Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out) June 4, 2018

Posted by Richard Foote in Block Dumps, Branch Blocks, Index Branches, Index Column Order, Index Compression, Index Internals, Oracle Indexes.
15 comments

day in day out bowie

I recently replied on Twitter to some comments regarding an excellent blog post by Franck Pachot – Covering indexes in Oracle, and branch size, where I disagreed somewhat with one of the conclusions stated in the post:

ensure that selective columns appear as early as possible (without compromising the index access efficiency of course) in order to lower the bytes required to address branches and leaves“.

Based on the Twitter discussion, the post was updated on 14 April 2018 with an additional clarification that putting the most selective indexed column first is a “common misconception“.

I’ve written a number of times about index column order, including this post that’s now some 10 years old – “It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?“. The summary being that it generally makes no appreciable difference to the performance of an index in which order you position the columns in an index, if all index columns are referenced equality type SQL predicates. I thought it might be worth revisiting this topic, with a new example that discusses why I specifically disagree with the notion of putting the most selective columns first, despite the possible impact on Index Branches.

I’ll begin with a simple table that has 2 columns of interest, the ID which is effectively unique and the CODE column which is “relatively” large in size but only has 5 distinct values:

SQL> CREATE TABLE ziggy AS
SELECT rownum id, 'SOME LARGE OFTEN REPEATED VALUE ' || mod(rownum,5) code, 'ZIGGY' name
FROM dual CONNECT BY LEVEL <= 2000000;

Table created.

I'll next create a concatenated index based on both the ID and CODE columns, with the highly selective ID column leading:

SQL> create index ziggy_id_code_i ON ziggy(id, code);

Index created.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14135         23      176612   113264736  101146313

So we notice the index has a Height of 3, with a total of 23 Index Branch blocks. There are a total of 14,135 leaf blocks.

If we look at a partial block dump of a Branch block:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 21
kdxcofbo 70=0x46
kdxcofeo 7840=0x1ea0
kdxcoavs 7770
kdxbrlmc 29440826=0x1c13b3a
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8050] dba: 29441507=0x1c13de3
col 0; len 4; (4): c3 0a 45 4e
col 1; TERM
row#1[8040] dba: 29442190=0x1c1408e
col 0; len 4; (4): c3 14 1b 58
col 1; TERM
row#2[8030] dba: 29442871=0x1c14337
col 0; len 4; (4): c3 1d 55 62
col 1; TERM

We can see that each entry in the Index Branch only contains the leading ID column. That’s because the column is so selective that it provides all the necessary data to determine the exact Leaf Block location of any given indexed value. The following columns (CODE and ROWID) do not provide any additional useful information and would be redundant if stored. Therefore each Index Branch entry is shown with a TERM value, meaning that subsequent indexed values are not stored within the Index Branch.

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=4242 AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(("ID"=4 OR "ID"=42 OR "ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242)
AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We note for now the number of consistent gets (5 and 19) for each of these queries.

If we now create another index, but this time with the columns the other way around and so with the very unselective CODE column leading:

SQL> create index ziggy_code_id_i on ziggy(code,id);

Index created.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14125         83      656341   113666656  101626042

So the number of Index Branch blocks has increased from 23 to 83 compared to the other index (although the number of Leaf Blocks are almost the same). Note that at 83, the percentage of branch blocks to leaf blocks is still tiny, just 0.06%.

The reason for the greater number of Index Branches can be seen with a partial index block dump of an Index Branch:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 81
kdxcofbo 190=0xbe
kdxcofeo 4458=0x116a
kdxcoavs 4268
kdxbrlmc 29440318=0x1c1393e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8016] dba: 29440496=0x1c139f0
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 0d 3d 38
col 2; TERM
row#1[7972] dba: 29440676=0x1c13aa4
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 1a 0c 51
col 2; TERM
row#2[7928] dba: 29440854=0x1c13b56
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 26 40 06
col 2; TERM

With the larger CODE column now leading, the column must therefore be stored within the Branch Block. However, as this column is so unselective with just 5 distinct values (notice how the same col 0 CODE value is repeated for each of the displayed branch entries), it’s not sufficient on its own to ensure the navigation down to the first leaf block containing the required index entry. Therefore, the next column (the highly selective col 1 ID column) is also necessary as part of each branch entry.

The branch entry with both the CODE and ID columns has ranges sufficiently selective enough to ensure any indexed value can be found within leaf blocks. Therefore the third column (the Rowid) is not required and is marked with the TERM value in the block dump.

So on the surface, it looks as if this index is not as efficient as there are indeed more Index Branches within the index. However, during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case), having more branches as in this example makes no appreciable difference to the efficiency of the index.

If we run the same queries as we did when using the previous index:

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND "ID"=4242)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND ("ID"=4 OR "ID"=42 OR
"ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242))

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We notice the number of consistent gets remains exactly the same, with the additional branch blocks making no appreciable difference to the performance of the index.

So the column order, providing all index columns are referenced with equality type SQL predicates, makes no real difference to the performance of the index. In both cases, there are enough columns referenced in the branch blocks to always point down to the first index leaf block that contains the first index entry of interest.

In Part II, we’ll see how having the unselective column as the leading column of the index can actually make an appreciable positive difference to the index.

Index Advanced Compression: Multi-Column Index Part II (Blow Out) September 24, 2015

Posted by Richard Foote in Advanced Index Compression, Concatenated Indexes, Index Column Order, Index Rebuild, Oracle Indexes.
1 comment so far

I previously discussed how Index Advanced Compression can automatically determine not only the correct number of columns to compress, but also the correct number of columns to compress within specific leaf blocks of the index.

However, this doesn’t mean we can just order the columns within the index without due consideration from a “compression” perspective. As I’ve discussed previously, the column order within an index can be very important (especially with regard the use of the index if the leading column of the index is not specified in the SQL), including with regard to the possible compression capabilities of an index.

Advanced Index Compression does not change this and if we order columns inappropriately, one of the consequences can be the index simply can’t be compressed.

To illustrate, back to my simple little example:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit;

Commit complete.

But this time, I’m going to create the index with the columns the other way around than I had in the previous post. The effectively unique ID column is now the leading column in the index, followed by the CODE column that indeed has many duplicate values. There is a “myth” that suggests this is actually a more “efficient” way to order an index, put the column with most distinct values first in the index. This is of course not true (yes, I’ve covered this one before as well).

SQL> create index bowie_idx on bowie(id, code) pctfree 0;

Index created.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2363          2

So the index without compression has 2363 leaf blocks.

As the leading column is effectively unique, we simply can’t now compress this index effectively. That’s because compression requires there to be duplicate index entries starting with at least the leading column. If the leading column has few (or no) duplicates, then by compressing the index Oracle is effectively creating a prefixed entry within the leaf block for each and every index entry. The whole point of index (or table) compression is to effectively de-duplicate the index values but there’s nothing to de-duplicate if there are no repeating values in at least the leading column of the index.

If we attempt to just compress fully the index anyways:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3115          2

It actually results in a bigger, not smaller index. The leaf blocks has gone up from 2363 to 3115.

Unlike the previous post where the columns in the index were the other way around, if we attempt to just compress the first column, it makes no difference to the inefficiency of the index compression because the number of prefix entries we create remains exactly the same:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3115          2

So the index remains at the higher 3115 leaf blocks.

The good thing with Advanced Index Compression is that we can “give it a go”, but it will not result in a larger index structure. If there’s nothing to compress within a leaf block, Oracle just ignores it and moves on to the next leaf block. If there’s nothing to compress at all within the index, the index remains the same as if it’s not been compressed:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2363          2

So the index is now back to 2363 leaf blocks, the same as if it wasn’t compressed at all. No it hasn’t helped, but at least it hasn’t made things worse.

So the order of the columns still plays a vital part in the “compress-ability” of the index, even with Index Advanced Compression at your disposal. If both the ID and CODE columns are referenced in your code, then having CODE as the leading column of the index would both improve the manner in which the index can be compressed and make a Skip-Scan index scan viable in the case when the CODE column might not occasionally be specified.

Now, if we change the leading column and create some duplicates (in this case, we update about 10% of the rows to now have duplicate values in the leading ID column):

SQL> update bowie set id=42 where id between 442000 and 542000;

100001 rows updated.

SQL> commit;

Commit complete.

SQL> alter index bowie_idx rebuild nocompress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2338          2

With a whole bunch of IDs with a value of 42, the non-compressed index now has 2338 leaf blocks. Yes, 10% of the leading columns have duplicates, but 90% of the index doesn’t and remains effectively unique. So if we try and compress this index now:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2941          2

The compressed index now has 2941 leaf blocks and is still larger than the 2338 non-compressed index. Yes, it’s compressed the 10% of the index that it could, but the inefficiencies in dealing with the other 90% has resulted in an overall larger index. So not too good really.

Again, compressing just the leading ID column doesn’t improve matters:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2977          2

In fact, at 2977 it’s even worse than compressing all the index because by compressing both columns, we could also effectively compress the duplicate CODE columns as well within that 10% of the index where we had duplicate ID values. With compressing just the ID column, we don’t get the benefit of compressing the duplicate CODE values. So not very good either.

In either case, compressing the index is ineffective as we end up with a bigger, not smaller index.

But not with Index Advanced Compression:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2265          2

We now have a index structure at just 2265 leaf blocks that is indeed smaller than the non-compressed index (2338 leaf blocks) because Oracle can now compress just the 10% of index where compression is effective and just ignore the rest of the index (90%) where compression is ineffective.

The best of both worlds, where Index Advanced Compression can compress just the part of an index where it effectively can and ignore and not make matters worse in any parts of the index where index compression is ineffective.

An indexing no-brainer …