jump to navigation

Oracle 19c Automatic Indexing: Common Index Creation Trap (Rat Trap) June 30, 2020

Posted by Richard Foote in 19c, 19c New Features, ASSM, Automatic Indexing, CBO, Clustering Factor, Data Clustering, Oracle Indexes, TABLE_CACHED_BLOCKS.
1 comment so far

When I go to a customer site to resolve performance issues, one of the most common issues I encounter is in relation to inefficient SQL. And one of the most common causes for inefficient SQL I encounter is because of deficiencies the default manner by which the index Clustering Factor is calculated.

When it comes to both Automatic Indexes and in relation to the Oracle Autonomous Database Cloud Services, the “flawed” default manner by which the index Clustering Factor is calculated still applies. So we need to exercise some caution when Auto Indexes are created and the impact their default statistics can have on the performance of subsequent SQL statements.

To illustrate with a simple example, I’ll first create a table with the key column being the ID column which will be effectively unique. The table will be populated via a basic procedure that just inserts 1M rows. The procedure uses an ORDER sequence, such that the ID values are generated in a monotonically increasing manner:

SQL> create table bowie_assm (id number, code number, name varchar2(42));

Table created.

SQL> create sequence bowie_assm_seq order;

Sequence created.

Procedure created.

SQL> create or replace procedure pop_bowie_assm as
2  begin
3    for i in 1..1000000 loop
4      insert into bowie_assm values (bowie_assm_seq.nextval, mod(i,1000), 'DAVID BOWIE');
5      commit;
6    end loop;
7  end;
8  /

Procedure created.

 

However crucially, the procedure is executed by 3 different session concurrently, to simulate a multi user environment inserting into a table…

 

SQL> exec pop_bowie_assm

PL/SQL procedure successfully completed.

 

We’ll now collect statistics on the table:

 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE_ASSM';

TABLE_NAME        NUM_ROWS     BLOCKS
--------------- ---------- ----------
BOWIE_ASSM         3000000      12137

 

So the table has 3M rows and is 12137 blocks in size.

If we run an SQL a few times where we select only the one ID value:

 

SQL> select * from bowie_assm where id = 42;

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |    22 |  1934   (6)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |     1 |    22 |  1934   (6)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |     1 |    22 |  1934   (6)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_ASSM |     1 |    22 |  1934   (6)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("ID"=42)
    filter("ID"=42)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
     12138  consistent gets
         0  physical reads
         0  redo size
       707  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The execution plan shows a Full Table Scan (FTS) is invoked, the only choice the CBO has without an index on the ID column. Clearly an index on the ID column would make the plan substantially more efficient with just 1 row selected from a 3M row table. Hopefully, Automatic Indexing will come to our rescue, so let’s check out the subsequent Automatic Indexing Report:

 

REPORT

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                              : 1
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 58.72 MB (58.72 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 2
SQL statements improved (improvement factor)  : 1 (1.2x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 1.1x

-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------

The following indexes were created:
-------------------------------------------------------------------------
| Owner | Table      | Index                | Key | Type   | Properties |
-------------------------------------------------------------------------
| BOWIE | BOWIE_ASSM | SYS_AI_2w1pss6qbdz6z | ID  | B-TREE | NONE       |
-------------------------------------------------------------------------

So yes indeed, an Automatic Index (SYS_AI_2w1pss6qbdz6z) was created on the ID column.

If we look at the default Clustering Factor of this index:

 

SQL> select index_name, auto, constraint_index, visibility, status, clustering_factor from user_indexes where table_name='BOWIE_ASSM';

INDEX_NAME           AUT CON VISIBILIT STATUS   CLUSTERING_FACTOR
-------------------- --- --- --------- -------- -----------------
SYS_AI_2w1pss6qbdz6z YES NO  VISIBLE   VALID              2504869

 

We notice the Clustering Factor is relatively high at 2504869, much higher than the 12137 number of blocks in the table.

But if the ID column in the table has been loaded via a monotonically increasing sequence, doesn’t that mean the ID values have been inserted in approximately in ID order? If so, doesn’t that mean the ID column should have a “good” Clustering Factor” as the order of the rows in the table matches the order of the indexed values in the ID index?

Clearly not.

The reason being that the table is stored in the default Automatic Segment Space Management (ASSM) tablespace type, which is designed to avoid contention by concurrent inserts from different sessions. Therefore each of the 3 sessions inserting into the table are each assigned to different table blocks, resulting in the rows not being precisely inserted in ID order. It’s very close to ID order, the the ID values clustered within a few blocks from each other, but not precisely stored in ID order.

However, by default, the Clustering Factor is calculated by reading each index entry and determining if it references a ROWID that accesses a table block different from the PREVIOUS index entry. If it does differ, it increments the Clustering Factor, if it doesn’t differ and accesses the same table block as the previous index entry, the Clustering Factor is NOT incremented.

So in theory, we could have 100 rows that reside in just 2 different table blocks, but if the odd IDs live in one block and the even IDs live in the other block, meaning that each ID is stored in a different table block to the previous, the Clustering Factor would have a value of 100 for these 100 rows, even though they only occupy 2 table blocks. The Clustering Factor is therefore much higher than in reality it should be as ultimately only 2 different table blocks are accessed within a negligible time from each other.

This is the “flaw” with how the default Clustering Factor is calculated. By noting if a table block access differs only from the previous table block accessed, it leaves the Clustering Factor calculation susceptible to exaggerated high values when the data really is relatively well clustered within the table.

If we run the same SQL as previously which only selects one ID value:

 

SQL> select * from bowie_assm where id = 42;

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |     1 |    22 |     4   (0)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |     1 |    22 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |     1 |    22 |     4   (0)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |     1 |       |     3   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |     1 |       |     3   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_2w1pss6qbdz6z |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID"=42)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
         4  consistent gets
         0  physical reads
         0  redo size
       707  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
         1  rows processed

 

The CBO now uses the new Automatic Index as with just one row, the index is clearly more efficient regardless of the Clustering Factor value.

However, if we now run a query that selects a range of ID values, in this example between 42 and 4242 which represents only a relatively low 0.14% of the table:

 

SQL> select * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |  4202 | 92444 |  1934   (6)| 00:00:01 |
|   1 |  PX COORDINATOR              |            |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000   |  4202 | 92444 |  1934   (6)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |            |  4202 | 92444 |  1934   (6)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_ASSM |  4202 | 92444 |  1934   (6)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("ID"<=4242 AND "ID">=42)
    filter("ID"<=4242 AND "ID">=42)

Statistics
----------------------------------------------------------
         8  recursive calls
         4  db block gets
     12138  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

The CBO decides to use a Full Table Scan as it deems the index with the massive Clustering Factor to be too expensive, with it having to visit differing blocks for the majority of the estimated 4202 rows (note at 4201 actual rows returned, this estimate by the CBO is practically spot on).

If we force the use of the index via an appropriate hint:

 

SQL> select /*+ index (bowie_assm) */ * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |  4202 | 92444 |  3530   (1)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  4202 |       |    12   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  4202 |       |    12   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_2w1pss6qbdz6z |  4202 |       |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID">=42 AND "ID"<=4242)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
        26  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

Note at an estimated cost of 3530, this is greater than the 1934 cost of the FTS which explains why the CBO decides the FTS is best. However, if we look at the number of Consistent Gets, it’s only 26, meaning the CBO is actually getting these costs way wrong.

Why?

Because of the grossly inflated Clustering Factor.

As I’ve discussed previously, Oracle 12.1 introduced a new TABLE_CACHED_BLOCKS preference. Rather than the default value of 1, we can set this to any value up to 255. When calculating the Clustering Factor during statistics collection, it will NOT increment the Clustering Factor if the index visits a table block again that was one of the last “x” distinct table blocks visited. So by setting TABLE_CACHED_BLOCKS to (say) 42, if the index visits a block that was one of the last 42 distinct table blocks previously visited, don’t now increment the Clustering Factor. This can therefore generate a much more “accurate” Clustering Factor which can be significantly smaller than previously. This in turn makes the index much more efficient to the CBO because it then estimates far fewer table blocks need be accessed during a range scan.

So let’s change the TABLE_CACHED_BLOCKS value for this table to 42 (don’t increment now the Clustering Factor value when collecting statistics if we visit again any of the last 42 differently accessed table blocks) and recollect the segment statistics:

 

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE_ASSM', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM', cascade=>true);

PL/SQL procedure successfully completed.

 

If we now examine the new Clustering Factor value:

 

SQL> select index_name, auto, constraint_index, visibility, status, clustering_factor from user_indexes

where table_name='BOWIE_ASSM';

INDEX_NAME           AUT CON VISIBILIT STATUS   CLUSTERING_FACTOR
-------------------- --- --- --------- -------- -----------------
SYS_AI_2w1pss6qbdz6z YES NO  VISIBLE   VALID                11608

 

We can see that at just 11608 it’s substantially less than the previous 2504869.

If we now rerun the previous range scan SQL without the hint:

 

SQL> select * from bowie_assm where id between 42 and 4242;

4201 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  4202 | 92444 |    30   (4)| 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  4202 | 92444 |    30   (4)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_ASSM           |  4202 | 92444 |    30   (4)| 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  4202 |       |    12   (0)| 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  4202 |       |    12   (0)| 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_2w1pss6qbdz6z |  4202 |       |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("ID">=42 AND "ID"<=4242)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
        26  consistent gets
         0  physical reads
         0  redo size
     54767  bytes sent via SQL*Net to client
       588  bytes received via SQL*Net from client
         2  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
      4201  rows processed

 

We can see the CBO now automatically uses the new Automatic Index. At a new cost of just 30, it’s substantially less than the previous index cost of 3530 and now much less than the 1934 for the FTS and so why the index is now automatically chosen by the CBO.

When Automatic Indexes are created, it’s usually a good idea to check on the Clustering Factor and because default ASSM tablespaces have a tendency to significantly escalate the values of index Clustering Factors, to look at recalculating them with an non-default setting of the TABLE_CACHED_BLOCKS statistics collection preference.

Of course, not only is this a good idea for Automatic Indexes, but for manually created indexes as well.

Although no doubt Autonomous Database Cloud services will look at these issues in the future, such self-tuning capabilities are not currently available. You will need to go in there and make these changes as necessary to fix the root issues with such inefficient SQL statements…

Rebuilding Indexes: Danger With Clustering Factor Calculation (Chilly Down) July 17, 2018

Posted by Richard Foote in CBO, Clustering Factor, Data Clustering, Index Rebuild, Oracle Indexes, TABLE_CACHED_BLOCKS.
1 comment so far

chilly down

Let me start by saying if you don’t already following Jonathan Lewis’s excellent Oracle blog, do yourself a favour. In a recent article, Jonathan highlighted a danger with rebuilding indexes (or indeed creating an index) when used in relation to collecting index statistics with the TABLE_CACHED_BLOCKS preference.

I’ve discussed the importance of the TABLE_CACHED_BLOCKS statistics collection preference a number of times previously, but the issue discussed by Jonathan is worth repeating here.

Let me start by repeating a demo I’ve used previously, by creating a table stored in an ASSM tablespace with data that is well clustered, but reported as being badly clustered due to how the Clustering Factor (CF) is calculated by default.

Firstly, I create a simple table and sequence and run a procedure that populates the table with a monotonically increasing ID column  populated via the sequence. But importantly, the procedure is executed concurrently from 3 separate sessions such that the monotonically increasing ID values are not stored in the table in precisely ID order as each of the 3 sessions inserts rows into different sets of table blocks:

SQL> create table bowie_assm (id number, name varchar2(42));

Table created.

SQL> create sequence bowie_assm_seq order;

Sequence created.

SQL> create or replace procedure pop_bowie_assm as
2 begin
3 for i in 1..100000 loop
4 insert into bowie_assm values (bowie_assm_seq.nextval, 'DAVID BOWIE');
5 commit;
6 end loop;
7 end;
8 /

Procedure created.

The following is executed concurrently in 3 different sessions:

SQL> exec pop_bowie_assm

PL/SQL procedure successfully completed.

If you can imagine 3 different blocks within the table, block one has rows with ID values 1,4,7,10,13,16…, block two has rows with ID values 2,5,8,11,14,17… and block three has rows with ID values 3,6,9,12,15,18…

So the data is well clustered in that the data for a large number of consecutive IDs are stored within a few blocks, but they’re not stored precisely in ID order within the table.

If we now create an index on the ID column and look at the Clustering Factor (CF) of the index:

SQL> create index bowie_assm_id_i on bowie_assm(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

We note the calculated CF is extremely poor at 219416 (a value much closer to the number of index entries than the number of blocks in the table) as the default calculation notes that most index entries have a rowid that points to a different table block to the previous index entry rowid.

If we run a query that only requires a moderate number of rows (approx. 0.13% of the table) to be returned:

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows | Bytes | Cost (%CPU) |  Time     |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            |  389 |  6613 |    282 (11) |  00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE_ASSM |  389 |  6613 |    282 (11) |  00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
 974 consistent gets
   0 physical reads
   0 redo size
8869 bytes sent via SQL*Net to client
 883 bytes received via SQL*Net from client
  27 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
 388 rows processed

We note the CBO decides to use a Full Table Scan (FTS) as the index is too costly and inefficient to use with such a poor CF value.

However, if say retrieving 100 rows, the CBO thinks it needs to visit many more table blocks than the 3 blocks that in actual fact contain the 100 rows of interest.

The TABLE_CACHED_BLOCKS statistics preference allows us to modify how the CF is calculated by not incrementing the CF value if an index rowid points to a block that was visited just TABLE_CACHED_BLOCKS ago.

If we now re-calculate the CF but with the TABLE_CACHED_BLOCKS preference set to say 42:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE_ASSM', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ASSM_ID_I',estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000               909

We notice the CF has dropped significantly, down to just 909 from its previous 219416 value.

If we now re-run the same query as before:

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                 |  389 |  6613 |       4 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_ASSM      |  389 |  6613 |       4 (0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_ASSM_ID_I |  389 |       |       2 (0) |  00:00:01 |
-------------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
   6  consistent gets
   0  physical reads
   0  redo size
8734  bytes sent via SQL*Net to client
 608  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   0  sorts (memory)
   0  sorts (disk)
 388  rows processed

We notice the CBO now automatically decides to use the index and more importantly, that at just 6 consistent gets, the query is now much more efficient as a result.

The index was always the more efficient access method, but because of the poor CF that was previously calculated, the CBO got it wrong. Now that a more “accurate” CF is calculated, all is now well.

However, if we now decide to rebuild this index:

alter index bowie_assm_id_i rebuild;

Index altered.

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows | Bytes | Cost (%CPU) | Time      |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            |  389 |  6613 |    282 (11) |  00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE_ASSM |  389 |  6613 |    282 (11) |  00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   3 recursive calls
   0 db block gets
 956 consistent gets
   0 physical reads
   0 redo size
4094 bytes sent via SQL*Net to client
 608 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
 388 rows processed

So we’re back to the less efficient FTS. Why ? A look at the CF reveals the problem:

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

When the index is rebuilt and so when the index statistics are implicitly recalculated, the TABLE_CACHED_BLOCKS preference is ignored. This applies even if this preference is set at the schema or database level:

SQL> exec dbms_stats.set_schema_prefs(ownname=>user, pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_database_prefs(pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> alter index bowie_assm_id_i rebuild online;

Index altered.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

This issue also applies when an index is newly created, any TABLE_CACHED_BLOCKS setting is ignored, until the time when statistics are again collected via DBMS_STATS:

SQL> drop index bowie_assm_id_i;

Index dropped.

SQL> create index bowie_assm_id_i on bowie_assm(id);

Index created.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ASSM_ID_I',estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000               909

This is currently being investigation by Oracle as unpublished bug 28292026.

Again, another example of the dangers of blindly rebuilding indexes without a valid justification…

Clustering Factor Calculation Improvement Part III (Too Much Rope) June 4, 2013

Posted by Richard Foote in 11g, CBO, Clustering Factor, Index statistics, TABLE_CACHED_BLOCKS.
9 comments

In my previous post, I discussed how for larger tables that really do have randomised data, not even setting the new TABLE_CACHED_BLOCKS to its 255 maximum value is going to make any real difference to the resultant Clustering Factor (CF) of an index. The 255 maximum value here protects us somewhat from abusing this capability and setting unrealistic CF values.

However, for smaller tables in particular, we do need to exercise some caution.

In the following example, we’re only creating a relatively small table and associated index with a CODE column that is randomly distributed throughout the table:

SQL> create table bowie (id number, code number, text varchar2(30));
Table created.

SQL> insert into bowie select rownum,  trunc(dbms_random.value(0, 100)), 'DAVID BOWIE'
from dual connect by level <= 70000;

70000 rows created.

SQL> commit;

Commit complete.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true,
     method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> create index bowie_code_i on bowie(code);

Index created.

If we look at the CF of this index:

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I';

TABLE_NAME   INDEX_NAME       BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ ------------ ---------- ---------- -----------------
BOWIE        BOWIE_CODE_I        244      70000             22711

We notice that at 22,711, the CF it’s pretty average. The table though is quite small at only 244 blocks.

If we run a simple query:

SQL> select * from bowie where code between 42 and 44;
2050 rows selected.

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  2814 | 56280 |    65   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  2814 | 56280 |    65   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter('CODE'<=44 AND 'CODE'>=42)

Statistics
----------------------------------------------------------

0  recursive calls
1  db block gets
254  consistent gets
0  physical reads
0  redo size
25044  bytes sent via SQL*Net to client
519  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2050  rows processed

The CBO goes for a Full Table Scan. This is not unexpected as we likely have to visit all 244 blocks anyways to fetch the required 2050 rows due to the CODE data being so randomly distributed throughout the table. In a naive attempt to improve things, we decide to improve the CF by setting the TABLE_CACHED_BLOCKS to the maximum 255 value:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE',
pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', estimate_percent=> null, cascade=> true,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
FROM user_tables t, user_indexes i WHERE t.table_name = i.table_name AND i.index_name='BOWIE_CODE_I';

TABLE_NAME   INDEX_NAME         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------ -------------- ---------- ---------- -----------------
BOWIE        BOWIE_CODE_I          244      70000               240

Indeed, we have improved the CF, dramatically reducing it down to just 240 from the previous 22711. Even though the column value for the CODE really is randomly distributed throughout the table, the CF now suggests the data is perfectly clustered. This is because with a table with only 244 blocks, incrementing the CF if the current index entry references a table block more than 255 blocks ago is now impossible. The CF is now guaranteed to be “perfect” as each index entry can only reference one of the 244 table blocks and so is incremented only when each table block is referenced the first time.

This dramatic reduction in the CF will certainly make the index more attractive to the CBO. But is this really a good thing:

SQL> select * from bowie where code between 42 and 44;
2050 rows selected.

Execution Plan

----------------------------------------------------------

Plan hash value: 1602289932
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |  2814 | 56280 |    17   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |  2814 | 56280 |    17   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_CODE_I |  2814 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access('CODE'>=42 AND 'CODE'<=44)

Statistics
----------------------------------------------------------

0  recursive calls
0  db block gets
681  consistent gets
0  physical reads
0  redo size
20895  bytes sent via SQL*Net to client
519  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2050  rows processed

The CBO is now using the index, but is now performing significantly more consistent gets, 681 rather than the previous 254. This execution plan is actually much less efficient than the previous FTS execution plan. The CBO is getting this wrong now as the CF isn’t really anywhere near as good as it’s now being lead to believe.

Caution setting TABLE_CACHED_BLOCKS to a value that is anywhere close to the number of blocks in the table. This is one of the reasons for Oracle having a 1% of table blocks default value for this setting.