jump to navigation

Oracle 19c Automatic Indexing: Common Index Creation Trap (Rat Trap) June 30, 2020

Posted by Richard Foote in 19c, 19c New Features, ASSM, Automatic Indexing, CBO, Clustering Factor, Data Clustering, Oracle Indexes, TABLE_CACHED_BLOCKS.
trackback

When I go to a customer site to resolve performance issues, one of the most common issues I encounter is in relation to inefficient SQL. And one of the most common causes for inefficient SQL I encounter is because of deficiencies the default manner by which the index Clustering Factor is calculated.

When it comes to both Automatic Indexes and in relation to the Oracle Autonomous Database Cloud Services, the “flawed” default manner by which the index Clustering Factor is calculated still applies. So we need to exercise some caution when Auto Indexes are created and the impact their default statistics can have on the performance of subsequent SQL statements.

To illustrate with a simple example, I’ll first create a table with the key column being the ID column which will be effectively unique. The table will be populated via a basic procedure that just inserts 1M rows. The procedure uses an ORDER sequence, such that the ID values are generated in a monotonically increasing manner:

SQL> create table bowie_assm (id number, code number, name varchar2(42));

Table created.

SQL> create sequence bowie_assm_seq order;

Sequence created.

Procedure created.

SQL> create or replace procedure pop_bowie_assm as
2  begin
3    for i in 1..1000000 loop
4      insert into bowie_assm values (bowie_assm_seq.nextval, mod(i,1000), 'DAVID BOWIE');
5      commit;
6    end loop;
7  end;
8  /

Procedure created.

 

However crucially, the procedure is executed by 3 different session concurrently, to simulate a multi user environment inserting into a table…

 

SQL> exec pop_bowie_assm

PL/SQL procedure successfully completed.

 

We’ll now collect statistics on the table:

 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE_ASSM';

TABLE_NAME        NUM_ROWS     BLOCKS
--------------- ---------- ----------
BOWIE_ASSM         3000000      12137

 

So the table has 3M rows and is 12137 blocks in size.

If we run an SQL a few times where we select only the one ID value:

 

SQL> select * from bowie_assm where id = 42;

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |    22 |  1934   (6)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |     1 |    22 |  1934   (6)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |     1 |    22 |  1934   (6)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_ASSM |     1 |    22 |  1934   (6)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("ID"=42)
    filter("ID"=42)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
     12138  consistent gets
         0  physical reads
         0  redo size
       707  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The execution plan shows a Full Table Scan (FTS) is invoked, the only choice the CBO has without an index on the ID column. Clearly an index on the ID column would make the plan substantially more efficient with just 1 row selected from a 3M row table. Hopefully, Automatic Indexing will come to our rescue, so let’s check out the subsequent Automatic Indexing Report:

 

REPORT

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                              : 1
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 58.72 MB (58.72 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 2
SQL statements improved (improvement factor)  : 1 (1.2x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 1.1x

-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------

The following indexes were created:
-------------------------------------------------------------------------
| Owner | Table      | Index                | Key | Type   | Properties |
-------------------------------------------------------------------------
| BOWIE | BOWIE_ASSM | SYS_AI_2w1pss6qbdz6z | ID  | B-TREE | NONE       |
-------------------------------------------------------------------------

So yes indeed, an Automatic Index (SYS_AI_2w1pss6qbdz6z) was created on the ID column.

If we look at the default Clustering Factor of this index:

 

SQL> select index_name, auto, constraint_index, visibility, status, clustering_factor from user_indexes where table_name='BOWIE_ASSM';

INDEX_NAME           AUT CON VISIBILIT STATUS   CLUSTERING_FACTOR
-------------------- --- --- --------- -------- -----------------
SYS_AI_2w1pss6qbdz6z YES NO  VISIBLE   VALID              2504869

 

We notice the Clustering Factor is relatively high at 2504869, much higher than the 12137 number of blocks in the table.

But if the ID column in the table has been loaded via a monotonically increasing sequence, doesn’t that mean the ID values have been inserted in approximately in ID order? If so, doesn’t that mean the ID column should have a “good” Clustering Factor” as the order of the rows in the table matches the order of the indexed values in the ID index?

Clearly not.

The reason being that the table is stored in the default Automatic Segment Space Management (ASSM) tablespace type, which is designed to avoid contention by concurrent inserts from different sessions. Therefore each of the 3 sessions inserting into the table are each assigned to different table blocks, resulting in the rows not being precisely inserted in ID order. It’s very close to ID order, the the ID values clustered within a few blocks from each other, but not precisely stored in ID order.

However, by default, the Clustering Factor is calculated by reading each index entry and determining if it references a ROWID that accesses a table block different from the PREVIOUS index entry. If it does differ, it increments the Clustering Factor, if it doesn’t differ and accesses the same table block as the previous index entry, the Clustering Factor is NOT incremented.

So in theory, we could have 100 rows that reside in just 2 different table blocks, but if the odd IDs live in one block and the even IDs live in the other block, meaning that each ID is stored in a different table block to the previous, the Clustering Factor would have a value of 100 for these 100 rows, even though they only occupy 2 table blocks. The Clustering Factor is therefore much higher than in reality it should be as ultimately only 2 different table blocks are accessed within a negligible time from each other.

This is the “flaw” with how the default Clustering Factor is calculated. By noting if a table block access differs only from the previous table block accessed, it leaves the Clustering Factor calculation susceptible to exaggerated high values when the data really is relatively well clustered within the table.

If we run the same SQL as previously which only selects one ID value:

 

SQL> select * from bowie_assm where id = 42;

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |     1 |    22 |     4   (0)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |     1 |    22 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |     1 |    22 |     4   (0)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |     1 |       |     3   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |     1 |       |     3   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_2w1pss6qbdz6z |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID"=42)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
         4  consistent gets
         0  physical reads
         0  redo size
       707  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The CBO now uses the new Automatic Index as with just one row, the index is clearly more efficient regardless of the Clustering Factor value.

However, if we now run a query that selects a range of ID values, in this example between 42 and 4242 which represents only a relatively low 0.14% of the table:

 

SQL> select * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |  4202 | 92444 |  1934   (6)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |  4202 | 92444 |  1934   (6)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |  4202 | 92444 |  1934   (6)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_ASSM |  4202 | 92444 |  1934   (6)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("ID"<=4242 AND "ID">=42)
    filter("ID"<=4242 AND "ID">=42)

Statistics
----------------------------------------------------------
         8  recursive calls
         4  db block gets
     12138  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

The CBO decides to use a Full Table Scan as it deems the index with the massive Clustering Factor to be too expensive, with it having to visit differing blocks for the majority of the estimated 4202 rows (note at 4201 actual rows returned, this estimate by the CBO is practically spot on).

If we force the use of the index via an appropriate hint:

 

SQL> select /*+ index (bowie_assm) */ * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  4202 |       |    12   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  4202 |       |    12   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_2w1pss6qbdz6z |  4202 |       |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID">=42 AND "ID"<=4242)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
        26  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

Note at an estimated cost of 3530, this is greater than the 1934 cost of the FTS which explains why the CBO decides the FTS is best. However, if we look at the number of Consistent Gets, it’s only 26, meaning the CBO is actually getting these costs way wrong.

Why?

Because of the grossly inflated Clustering Factor.

As I’ve discussed previously, Oracle 12.1 introduced a new TABLE_CACHED_BLOCKS preference. Rather than the default value of 1, we can set this to any value up to 255. When calculating the Clustering Factor during statistics collection, it will NOT increment the Clustering Factor if the index visits a table block again that was one of the last “x” distinct table blocks visited. So by setting TABLE_CACHED_BLOCKS to (say) 42, if the index visits a block that was one of the last 42 distinct table blocks previously visited, don’t now increment the Clustering Factor. This can therefore generate a much more “accurate” Clustering Factor which can be significantly smaller than previously. This in turn makes the index much more efficient to the CBO because it then estimates far fewer table blocks need be accessed during a range scan.

So let’s change the TABLE_CACHED_BLOCKS value for this table to 42 (don’t increment now the Clustering Factor value when collecting statistics if we visit again any of the last 42 differently accessed table blocks) and recollect the segment statistics:

 

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE_ASSM', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM', cascade=>true);

PL/SQL procedure successfully completed.

 

If we now examine the new Clustering Factor value:

 

SQL> select index_name, auto, constraint_index, visibility, status, clustering_factor from user_indexes

where table_name='BOWIE_ASSM';

INDEX_NAME           AUT CON VISIBILIT STATUS   CLUSTERING_FACTOR
-------------------- --- --- --------- -------- -----------------
SYS_AI_2w1pss6qbdz6z YES NO  VISIBLE   VALID                11608

 

We can see that at just 11608 it’s substantially less than the previous 2504869.

If we now rerun the previous range scan SQL without the hint:

 

SQL> select * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  4202 | 92444 |    30   (4)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  4202 | 92444 |    30   (4)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |  4202 | 92444 |    30   (4)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  4202 |       |    12   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  4202 |       |    12   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_2w1pss6qbdz6z |  4202 |       |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID">=42 AND "ID"<=4242)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
        26  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

We can see the CBO now automatically uses the new Automatic Index. At a new cost of just 30, it’s substantially less than the previous index cost of 3530 and now much less than the 1934 for the FTS and so why the index is now automatically chosen by the CBO.

When Automatic Indexes are created, it’s usually a good idea to check on the Clustering Factor and because default ASSM tablespaces have a tendency to significantly escalate the values of index Clustering Factors, to look at recalculating them with an non-default setting of the TABLE_CACHED_BLOCKS statistics collection preference.

Of course, not only is this a good idea for Automatic Indexes, but for manually created indexes as well.

Although no doubt Autonomous Database Cloud services will look at these issues in the future, such self-tuning capabilities are not currently available. You will need to go in there and make these changes as necessary to fix the root issues with such inefficient SQL statements…

Comments»

1. emanueol - June 30, 2020

Nice thanks for sharing 🙂

Liked by 1 person


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: