jump to navigation

Estimate Index Size With Explain Plan (I Can’t Explain) April 24, 2014

Posted by Richard Foote in Estimate Index Size, Explain Plan For Index, Oracle Indexes.
trackback

I discussed recently an updated MOS note that details the needs vs. the implications of rebuilding indexes.

Following is a neat little trick if you want to very quickly and cheaply estimate the size of an index if it were to be rebuilt or a new index before you actually create the thing. I meant to blog about this sometime ago but was re- reminded of it when I recently came across this entry in Connor McDonald’s excellent blog.

I’ll start by creating a table with a bunch of rows:

SQL> create table ziggy as select o.* from dba_objects o, dba_users;

Table created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> select count(*) from ziggy;

  COUNT(*)
----------
   3939187

I’m thinking of creating an index on the OBJECT_NAME column, but I’m unsure if I’ll have enough free space in my tablespace. So let’s quickly get an estimate of the index size by simply generating the explain plan of the CREATE INDEX statement:

SQL> explain plan for create index ziggy_object_name_i on ziggy(object_name);

Explained.

Elapsed: 00:00:00.09

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1219136602

----------------------------------------------------------------------------------------------

| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT |                     |  3939K|    93M| 22032   (3)| 00:00:01 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_OBJECT_NAME_I |       |       |            |          |
|   2 |   SORT CREATE INDEX    |                     |  3939K|    93M|            |          |
|   3 |    TABLE ACCESS FULL   | ZIGGY               |  3939K|    93M| 17199   (4)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 159M bytes

 

Notice the little note below the execution plan. Oracle has estimated an index size of approximately 159M bytes and it’s only taken it 0.09 seconds to do so. A trace of the session highlights how Oracle simply uses the table statistics in its determination of the estimated index size.

Well, that’s OK I have sufficient space for an index of that size.  Let’s create the physical index and check out its actual size:

SQL> create index ziggy_object_name_i on ziggy(object_name);

Index created.

SQL> select bytes from dba_segments where segment_name='ZIGGY_OBJECT_NAME_I';

     BYTES
----------
 163577856

SQL> analyze index ziggy_object_name_i validate structure;

Index analyzed.

SQL> select btree_space from index_stats;

BTREE_SPACE
-----------
  157875040

 

Not bad at all, the estimate and actual index sizes are pretty well spot on.

There are some limitations however. Let’s pick another column, SUBOBJECT_NAME, which has a large number of NULL values:

SQL> select count(*) from ziggy where subobject_name is not null;

  COUNT(*)
----------
     33669

SQL> explain plan for create index ziggy_subobject_name_i on ziggy(subobject_name);

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 4065057084

-------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT |                        |  3939K|  7693K| 20132  (4)| 00:00:01 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_SUBOBJECT_NAME_I |       |       |     |          |
|   2 |   SORT CREATE INDEX    |                        |  3939K|  7693K|     |          |
|   3 |    TABLE ACCESS FULL   | ZIGGY                  |  3939K|  7693K| 17238  (4)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 100M bytes

The SUBOBJECT_NAME column only has a relatively few (33,669) values that are not null, but the explain plan is still estimating the index to have the full 3.9 million rows (remembering that fully null indexed values are not indexed in a B-Tree index). The estimated index size of 100M is therefore not going to be particularly accurate.

SQL> create index ziggy_subobject_name_i on ziggy(subobject_name);

Index created.

SQL> select bytes from dba_segments where segment_name='ZIGGY_SUBOBJECT_NAME_I';

      
     BYTES
----------
   1048576

SQL> analyze index ziggy_subobject_name_i validate structure;

Index analyzed.

SQL> select btree_space from index_stats;

BTREE_SPACE
-----------
     928032

So in this example, the estimated index size is indeed way off. This method doesn’t seem to cater for null index values and assumes the index to be fully populated.

However, if we simply take the known ratio of  not null values (in this example, 33669 not null rows /3939187 total rows =0.00855) and then apply it to the calculated estimate (100M x .00855 = 0.855M), where are now back into accurate ballpark territory again.

Of course, such estimates are based on the accuracy of the table statistics. If we have stale statistics, we’ll have stale index size estimates.

Let’s insert more rows and double the size of the table and associated index: 

SQL> insert into ziggy select * from ziggy;

3939187 rows created.

SQL> commit;

Commit complete.

If we re-run the index creation explain plan:

SQL> explain plan for create index ziggy_object_name_i on ziggy(object_name);

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 746589531

----------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT |                     |  3939K|    93M| 22032   (3)| 00:00:01 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_OBJECT_NAME_I |       |       |            |          |
|   2 |   SORT CREATE INDEX    |                     |  3939K|    93M|            |          |
|   3 |    INDEX FAST FULL SCAN| ZIGGY_OBJECT_NAME_I |       |       |            |          |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 159M bytes

We get the same estimate as before. We need to update the table statistics in order to get an updated and more accurate index size estimate:

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> explain plan for create index ziggy_object_name_i on ziggy(object_name);

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 746589531

----------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

|   0 | CREATE INDEX STATEMENT |                     |  7878K|   187M| 45811   (3)| 00:00:02 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_OBJECT_NAME_I |       |       |            |          |
|   2 |   SORT CREATE INDEX    |                     |  7878K|   187M|            |          |
|   3 |    INDEX FAST FULL SCAN| ZIGGY_OBJECT_NAME_I |       |       |            |          |


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 318M bytes

Both the estimated index entries and  index size are now much more accurate.

The number of expected index entries is therefore a useful guide as to the potential accuracy of the size estimate.

So the next time you’re wondering whether an index is significantly larger than it should or whether you have sufficient space for a new index, this is a useful, simple technique to get a quick estimate.

Comments»

1. Martin Preiss - April 24, 2014

Hi Richard,

maybe it is worth mentioning that the estimate isn’t very useful for bitmap indexes (giving the same result as for a b*tree index with the same column list – at least in 11.2.0.1 and earlier versions).

And that there is a bug for 32bit systems resulting in extremely wrong predictions for tables with a lot of rows – http://jonathanlewis.wordpress.com/2012/01/04/index-size-bug/.

Regards

Martin

Like

Richard Foote - April 28, 2014

Thanks Martin

Consider it mentioned thanks to your comments 🙂

And yes, it still doesn’t work for bitmap indexes with 12c either as the formula used needs to be completely different.

Like

Martin Preiss - April 28, 2014

Hi Richard,

would it not be quite difficult to predict the size of bitmap indexes given the massive effect of value clustering for their compression?

That said: compression for b*tree indexes seems also not to be considered by the size estimate.

Like

2. Remi T. - April 25, 2014

You get the same exact estimation (apparently) using dbms_space.create_index_cost (from alloc_bytes)

Like

Richard Foote - April 28, 2014

Hi Remi

Yes, create_index_cost actually calls an explain statement under the covers.

Like

3. dbabibleantony - July 1, 2014

Hi Richard, This question is not related to this blog.I have a question about unique key index of a columns with a key values are generated by a java code with a combination of letters and numbers.
The key length from the leaf block dump is 74 bytes.I do see 50/50 leaf node split all the time and the space in the leaf blocks are under utilized.I have been noticing insert to the base table is taking longer time and more waits “db file sequential read” on index blocks.

Col1 col2
————————————————– ———————————-
4d5638359f7c46bbbd7c13aa4277faee 9a4f29f785ba4f1dabd0560dedc05100
0c6f5ca0bf474140aadbc1f15608a459 5593a630032841b7a4289b360f10bd2c
8508c2eb02cf46a9ba8fe73bbf18b4f9 fd99d031bbc145c898a6b7cc24774484
e9d2db99c82a47d39ed54ec8a16279fb a6e43327ea734c8187c9a781d5f7afd5
03593bb9f79d41279fde80ef78123da0 45e4d2ddd2684be290d4ea4027acb95f
1a7a14610d3d4c32a44cb7b1cfd52472 ff20d3522ea44b9aa63aaa8503f515d6
9bce07b67d6b41cf8e4b7ba461c249b7 9712eac56db74af28d839fc6749ac917
c650100d7e40486babb31ebe463957b5 8cae1ae0c4e14741b083f371915eb53d
d393600902bc47ac86e65e6d85bf7177 a694eab853774007a9f6f5783be37c1a

Like

4. dbabibleantony - July 1, 2014

It looks like the value of incoming rows falls within minimum/maximum values is the reason behind 50/50 splits. I would like to get your expert opinion on rebuilding this index regularly.

Like

5. Richard Foote - July 3, 2014

You basically have a random set of data that will be evenly distributed throughout your index structure. You’ll only have 50-50 block splits as it would be an extremely rare event for the largest known value to be inserted when the right most block is full.

However, this would not be enough IMHO to warrant regular rebuilds. The index will have on average approx. 25-30% free space but this free space will eventually get used, hence causing more 50-50 block splits. Unless your index is at a very particular size, a rebuild is unlikely to reduce the blevel at all and so inserts will basically costs the same afterwards.

A smaller index would possibly increase it’s caching characteristics but I would suggest a slightly larger buffer cache if necessary would be an easier long term fix. Note the index will only be approx. 15-20% smaller as you’ll want say 10% free space to prevent the new index from splitting straight away.

Easy enough to test performance of an insert before the rebuild vs afterwards to see if such a rebuild/coalesce was beneficial.

Like

6. Free script to very quickly and cheaply estimate the size of an index if it were to be rebuilt | Oracle SQL Tuning Tools and Tips - July 19, 2014

[…] that we can actually use a little trick of the EXPLAIN PLAN FOR command and actually use the CBO to estimate what would be the size of an index if we were to create (or rebuild) it. In this latter blog posting Richard explains and demonstrates how it can be […]

Like


Leave a comment