jump to navigation

Empty Leaf Blocks and Statistics (Sense Of Doubt) July 8, 2008

Posted by Richard Foote in Index Access Path, Index Block Splits, Index Delete Operations, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
14 comments

I’ve recently been discussing how empty index blocks or those blocks that contain nothing but deleted index entries are placed on the index freelist and can potentially be recycled during subsequent index block split operations.

A point that’s not so well known about such empty index blocks is how Oracle considers them when calculating index related statistics and the possible implications this may have on the CBO.

Let’s set the scene with an example I’ve used previously where we load a table/index with 10000 entries and then subsequently delete the vast majority of them.

SQL> create table rich as select rownum id, ‘Bowie’ text from dual connect by level <= 10000;
 
Table created.
 
SQL> create index rich_i on rich(id);
 
Index created.

OK, so we now have an index with 10000 entries. Let’s just check to see how many leaf blocks we currently have:

SQL> analyze index rich_i validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, del_lf_rows from index_stats;

   LF_ROWS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
     10000         21           0

So we currently have 10000 LF_ROWS and 21 LK_BLKS with no deleted index rows at this stage.

Let’s now deleted the vast majority of rows from the table and hence index row entries from the index:
SQL> delete rich where id <= 9990;
 
9990 rows deleted.
 
SQL> commit;
 
Commit complete.

OK, so now we have an index with the vast majority of the index entries having been deleted and with all but one index leaf block effectively empty.

Let’s start by looking at how the ANALYZE INDEX … VALIDATE STRUCTURE deals with empty leaf blocks and index entries:

SQL> analyze index rich_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, lf_blks, del_lf_rows from index_stats;

   LF_ROWS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
     10000         21        9990

The first thing we notice is that the LF_ROWS statistics still has a value of 10000. It still counts index entries, even if they’ve been deleted.

We also notice that the LF_BLKS value is 21 so those leaf blocks that are effectively empty are still counted as well.

Let’s now collect statistics using DBMS_STATS as currently recommended by Oracle:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> ‘RICH’, cascade => true, estimate_percent=> null, method_opt=> ‘FOR ALL COLUMNS SIZE 1′);

If we now look at the index statistics:

SQL> select index_name, num_rows, leaf_blocks from dba_indexes where index_name = ‘RICH_I’;

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- -------- -----------
RICH_I           10           1

We notice a couple of important differences. Firstly, the NUM_ROWS value is 10, highlighting that only non-deleted index entries are counted. We also notice that the number of LEAF_BLOCKS is only 1, highlighting that only those index leaf blocks that contain non-deleted index entries are counted. Although there are 20 other leaf blocks within the index structure, these are not counted and considered by the CBO when statistics are calculated using DBMS_STATS.

If we run the following simple little query that effectively selects all remaining rows from the table, we notice the following execution plan:

SQL> select * from rich where id between 1 and 10000;

        ID TEXT
---------- -----
      9991 Bowie
      9992 Bowie
      9993 Bowie
      9994 Bowie
      9995 Bowie
      9996 Bowie
      9997 Bowie
      9998 Bowie
      9999 Bowie
     10000 Bowie

Execution Plan
--------------------------------------------
|Id | Operation                   | Name   |
--------------------------------------------
| 0 | SELECT STATEMENT            |        |
| 1 |  TABLE ACCESS BY INDEX ROWID| RICH   |
|*2 |   INDEX RANGE SCAN          | RICH_I |
--------------------------------------------

The index is actually used to select all the remaining 10 rows, in part because the index related costs are so low.

Let’s see what would happens if we were to use the old, ANALYZE command to calculate the index statistics:

SQL> analyze index rich_i compute statistics;

Index analyzed.

First, let’s see if the index statistics are any different …

select index_name, num_rows, leaf_blocks from dba_indexes where index_name = ‘RICH_I’;

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- -------- -----------
RICH_I           10          21

OK, a big big difference here. Where previously, DBMS_STATS didn’t include the empty leaf blocks in it’s statistics, we now notice that using the ANALYZE command does include such empty leaf blocks. The LEAF_BLOCKS value is now 21, not 1 as it was previously. Note though that the number of NUM_ROWS is still 10, so it still doesn’t count the deleted index entries themselves, just the empty leaf blocks.

But leaf blocks is one of the key statistics used by the CBO when calculating the cost of using an index related access path. Could this all make a difference in how our previous query is costed by the CBO ?

SQL> select * from rich where id between 1 and 10000;

        ID TEXT
---------- -----
      9991 Bowie
      9992 Bowie
      9993 Bowie
      9994 Bowie
      9995 Bowie
      9996 Bowie
      9997 Bowie
      9998 Bowie
      9999 Bowie
     10000 Bowie

10 rows selected.

Execution Plan
----------------------------------
| Id  | Operation         | Name |
----------------------------------
|   0 | SELECT STATEMENT  |      |
|*  1 |  TABLE ACCESS FULL| RICH |
----------------------------------

Oh yes indeed. Now the CBO has decided to use a Full Table Scan, in large part because of the additional calculated costs associated with using the index.

Note these tests work the same on all supported versions of Oracle.

So empty leaf blocks can still have a large impact on not only how a query may perform but indeed on how the CBO calculates the associated costs, depending on how the statistics are generated.

Yes, there are differences between the ANALYZE command and DBMS_STATS. This is one of the more subtle differences …

Follow

Get every new post delivered to your Inbox.

Join 1,808 other followers