jump to navigation

Empty Leaf Blocks and Statistics (Sense Of Doubt) July 8, 2008

Posted by Richard Foote in Index Access Path, Index Block Splits, Index Delete Operations, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
trackback

I’ve recently been discussing how empty index blocks or those blocks that contain nothing but deleted index entries are placed on the index freelist and can potentially be recycled during subsequent index block split operations.

A point that’s not so well known about such empty index blocks is how Oracle considers them when calculating index related statistics and the possible implications this may have on the CBO.

Let’s set the scene with an example I’ve used previously where we load a table/index with 10000 entries and then subsequently delete the vast majority of them.

SQL> create table rich as select rownum id, ‘Bowie’ text from dual connect by level <= 10000;
 
Table created.
 
SQL> create index rich_i on rich(id);
 
Index created.

OK, so we now have an index with 10000 entries. Let’s just check to see how many leaf blocks we currently have:

SQL> analyze index rich_i validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, del_lf_rows from index_stats;

   LF_ROWS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
     10000         21           0

So we currently have 10000 LF_ROWS and 21 LK_BLKS with no deleted index rows at this stage.

Let’s now deleted the vast majority of rows from the table and hence index row entries from the index:
SQL> delete rich where id <= 9990;
 
9990 rows deleted.
 
SQL> commit;
 
Commit complete.

OK, so now we have an index with the vast majority of the index entries having been deleted and with all but one index leaf block effectively empty.

Let’s start by looking at how the ANALYZE INDEX … VALIDATE STRUCTURE deals with empty leaf blocks and index entries:

SQL> analyze index rich_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, lf_blks, del_lf_rows from index_stats;

   LF_ROWS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
     10000         21        9990

The first thing we notice is that the LF_ROWS statistics still has a value of 10000. It still counts index entries, even if they’ve been deleted.

We also notice that the LF_BLKS value is 21 so those leaf blocks that are effectively empty are still counted as well.

Let’s now collect statistics using DBMS_STATS as currently recommended by Oracle:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> ‘RICH’, cascade => true, estimate_percent=> null, method_opt=> ‘FOR ALL COLUMNS SIZE 1’);

If we now look at the index statistics:

SQL> select index_name, num_rows, leaf_blocks from dba_indexes where index_name = ‘RICH_I’;

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- -------- -----------
RICH_I           10           1

We notice a couple of important differences. Firstly, the NUM_ROWS value is 10, highlighting that only non-deleted index entries are counted. We also notice that the number of LEAF_BLOCKS is only 1, highlighting that only those index leaf blocks that contain non-deleted index entries are counted. Although there are 20 other leaf blocks within the index structure, these are not counted and considered by the CBO when statistics are calculated using DBMS_STATS.

If we run the following simple little query that effectively selects all remaining rows from the table, we notice the following execution plan:

SQL> select * from rich where id between 1 and 10000;

        ID TEXT
---------- -----
      9991 Bowie
      9992 Bowie
      9993 Bowie
      9994 Bowie
      9995 Bowie
      9996 Bowie
      9997 Bowie
      9998 Bowie
      9999 Bowie
     10000 Bowie

Execution Plan
--------------------------------------------
|Id | Operation                   | Name   |
--------------------------------------------
| 0 | SELECT STATEMENT            |        |
| 1 |  TABLE ACCESS BY INDEX ROWID| RICH   |
|*2 |   INDEX RANGE SCAN          | RICH_I |
--------------------------------------------

The index is actually used to select all the remaining 10 rows, in part because the index related costs are so low.

Let’s see what would happens if we were to use the old, ANALYZE command to calculate the index statistics:

SQL> analyze index rich_i compute statistics;

Index analyzed.

First, let’s see if the index statistics are any different …

select index_name, num_rows, leaf_blocks from dba_indexes where index_name = ‘RICH_I’;

INDEX_NAME NUM_ROWS LEAF_BLOCKS
---------- -------- -----------
RICH_I           10          21

OK, a big big difference here. Where previously, DBMS_STATS didn’t include the empty leaf blocks in it’s statistics, we now notice that using the ANALYZE command does include such empty leaf blocks. The LEAF_BLOCKS value is now 21, not 1 as it was previously. Note though that the number of NUM_ROWS is still 10, so it still doesn’t count the deleted index entries themselves, just the empty leaf blocks.

But leaf blocks is one of the key statistics used by the CBO when calculating the cost of using an index related access path. Could this all make a difference in how our previous query is costed by the CBO ?

SQL> select * from rich where id between 1 and 10000;

        ID TEXT
---------- -----
      9991 Bowie
      9992 Bowie
      9993 Bowie
      9994 Bowie
      9995 Bowie
      9996 Bowie
      9997 Bowie
      9998 Bowie
      9999 Bowie
     10000 Bowie

10 rows selected.

Execution Plan
----------------------------------
| Id  | Operation         | Name |
----------------------------------
|   0 | SELECT STATEMENT  |      |
|*  1 |  TABLE ACCESS FULL| RICH |
----------------------------------

Oh yes indeed. Now the CBO has decided to use a Full Table Scan, in large part because of the additional calculated costs associated with using the index.

Note these tests work the same on all supported versions of Oracle.

So empty leaf blocks can still have a large impact on not only how a query may perform but indeed on how the CBO calculates the associated costs, depending on how the statistics are generated.

Yes, there are differences between the ANALYZE command and DBMS_STATS. This is one of the more subtle differences …

Comments»

1. kenny r. - July 8, 2008

Richard,

Thanks for another interesting post.

You made a comment at the end of your post saying that there are differences between the ANALYZE command and DBMS_STATS.

I was not able to find these differences documented anywhere. Are these just things that you find by experimentation?

Like

2. Daniel - July 8, 2008

So, are you saying that in this case, using ANALYZE instead of DBMS_STATS would be better?

Like

3. Asif Momen - July 8, 2008

Hi Daniel,

No, not at all. As documented, DBMS_STATS is more superior than ANALYZE, the same is proved in this post also.

When ANALYZE was used to collect statistics, incorrect information was loaded in the dictionary and the Optimizer picked FTS instead of INDEX RANGE SCAN.

Regards

Asif Momen

Like

4. Richard Foote - July 9, 2008

Hi Kenny

There are a number of differences, mainly regarding the various additional options you have with DBMS_STATS (sampling, parallel, etc.).

However perhaps one of the biggest differences is with regard to how partitioned statistics are calculated. With DBMS_STATS, you can gather Global stats at whatever level you may wish but with the Analyze command, you can’t really gather any global stats, although the lowest partitioned granuality avalaible is kinda effectively the same.

But without Global stats, some stats can be problematic.

Possibly something worth discussing at some point.

Like

5. Richard Foote - July 9, 2008

Hi Daniel and Asif

It depends a little which might be better. Oracle has currently taken the approach that if you have empty leaf blocks, you’re not likely to want to access them so let’s not include them in the stats and pretend they’re not there in any costings.

And this would be best if indeed it’s true.

But if you do need to access the empty leaf blocks because they’re in a section of the index you might still traverse, then the costings may not be accurate.

In this particular case, because we do need to access the empty index blocks, I would perhaps argue that indeed analyze provides the better stats although at the end of the day, the costings work out to be somewhat similar either way.

It’s just something to be aware of and to consider in these types of scenarios.

Like

6. The case of the mysterious Table Scan « Alien’s Weblog - July 13, 2008

[…] other day I was reading Richard Foote’s blog (reminder to self, put the blog on the directory) and I wanted to run some tests for […]

Like

7. Purav - July 15, 2008

Hi Richard, In your reply to Daniel & Asif, in line 1 of paragraph 4 you say “because we do need to access the empty index blocks, I would perhaps argue that indeed analyze provides the better stats”.
My question: 1) Why do we need to access the empty index blocks? According to me, we DO NOT need to access the empty index blocks. 2) How does analyze provide better stats? According to me, though it provides correct stats, its not useful. On the contrary, atleast in this case, the stats provided by DBMS_STATS is better leading to a Index based access to the table thereby reducing IO.

Kindly help me understand.

btw, very good article. thanks.

Like

8. Richard Foote - July 24, 2008

Hi Purav

Sorry, just noticed your comment.

1) We need to access the empty leaf blocks because they’re still within the index structure and are not recycled until a subsequent block split. So we do indeed need to access the empty leaf blocks if a query covers the possible range covered empty leaf blocks.

2) Analyze in my opinion provides better stats in the example above, because it correctly costs the I/Os associated with these empty blocks. The range scan (between 1 and 10000) needs to start at the first empty leaf block because there could possibly be an index entry we need to access. These extra I/Os related to the index could in an extreme example drive the CBO to use a more efficient full table scan, if only the CBO knew the empty leaf blocks were there.

Simply run the demo as I’ve set it up and closely look at the consistent reads being generated by the index range scan. You’ll notice it’s a value considerably higher than it could be if the empty leaf blocks weren’t there. To test the difference, rebuild the index and re-run the scan. The consistent reads would have dropped considerably.

DBMS_STATS assumes you don’t need to access a section of the index that has “empty” leaf blocks. This may not always be the case.

Hope this helps your understanding.

Like

9. Vladimir - August 6, 2008

Hi Richard,

I have one little question on this very interesting topic. Your examples are very convincing, but I’d observed you gather statistics of index by ANALYZE and look through it with the view INDEX_STATS for the first time. Later you gather statistics of index by DBMS_STATS and look through it with the view DBA_INDEXES. And for the third time you gather statistics of index by ANALYZE and look through it with the view DBA_INDEXES. That is, there is used different methods with different views.

Is it all right? What view is “more right”? Are these views equivalent as used here?

Thanks.

Like

10. Richard Foote - August 6, 2008

Hi Vladimir

The point of this exercise was to highlight how statistics differ, depending on how you gather them.

If you use ANALYZE VALIDATE STRUCTURE, it poulates INDEX_STATS with just the details of the last index analyzed.

If you use ANALYZE COMPUTE STATISTICS (or estimate) or DBMS_STATS, it doesn’t populate INDEX_STATS, it populates DBA_INDEXES with the latest statistics for the index.

The right view therefore depends on how you actually collect the statistics.

Like

11. Vladimir - August 6, 2008

Thank you for immediate reply!

Yes, I understood the purpose of your experiments. The question consists in the following..

So, ANALYZE VALIDATE STRUCTURE populates INDEX_STATS.
But in the last time you execute the statement:

SQL> analyze index rich_i validate structure;

And then execute the query:

select index_name, num_rows, leaf_blocks from dba_indexes where index_name = ‘RICH_I’;

That is, analyze DBA_INDEXES (and this view was really changed).

I will pleasure your explainations of your actions (analyzing DBA_INDEXES instead of INDEX_STATS in this case). And why DBA_INDEXES was changed if INDEX_STATS had?

Thanks.

Like

12. Richard Foote - August 6, 2008

Hi Vladimir

I see now, thank-you !! Why wasn’t that picked up before !!

Yes, that’s me and my cut ‘n’ paste skills letting me down.

And yes, that most definitely is the compute statistics command changing the stats.

Fixed.

Thanks again.

Would you mind please proof-reading my other posts 🙂

Like

13. Kostas Hairopoulos - March 16, 2009

Richard,

this article is quite interesting. We have a case with a staging table and associated indexes in a 4K ASSM tablespace with auto extend. There is a process (p1) which runs continuously inserting data in this table and another process (p2) which reads a batch of those inserted rows updates other tables and then deletes the rows for the specific batch.
The DBA’s believe that this staging table should not be analyzed or DBMS_STATS but we noticed that the table and his indexes don’t reuse the space. They also claim that without analyzing those staging tables are faster. Table started with 16k extent size and now its 2049 blocks extent. The same with his associated index also. Since the instance has a dynamic sampling of 2, I presume that the optimizer should already find HWM. I cannot understand why tabke and index are not reused. Any comments will be appreciated

Like

14. Richard Foote - March 24, 2009

Hi Kostas

I can only speculate without looking at your database.

Are you inserting more rows than you delete ? Are you using an append or parallel insert ? Do the delete batches span a relatively small number of rows per distinct table block ? Are the deleted index values always less than any newly inserted index values ?

Whether you analyze or don’t will impact how the CBO costs accesses to the staging tables, but not how the table/indexes are populated or how space is /is not reused.

There are a number of bugs associated with ASSM, especially if you have a large number of concurrent inserts occuring, that might be worth checking in metalink and investigating.

Might be worth dumping some of the table blocks to see what the contents might be in case you think space is not being reused appropriately.

Just some thoughts.

Like

Racer I. - October 12, 2018

Hi Richard,

I’ve just had an interesting discussion at AskTom about this (https://asktom.oracle.com/pls/apex/f?p=100:11:::NO::P11_QUESTION_ID:1396006300346456969
, my entries start in the middle, “INSERT growth”).
Would you have some ideas how to address this or think this might be worth mentioning with / adressing by Oracle?
Especially our enhancement suggestions (the second entry)).

regards,

Like


Leave a reply to Vladimir Cancel reply