jump to navigation

Estimate Index Size With Explain Plan (I Can’t Explain) April 24, 2014

Posted by Richard Foote in Estimate Index Size, Explain Plan For Index, Oracle Indexes.
9 comments

I discussed recently an updated MOS note that details the needs vs. the implications of rebuilding indexes.

Following is a neat little trick if you want to very quickly and cheaply estimate the size of an index if it were to be rebuilt or a new index before you actually create the thing. I meant to blog about this sometime ago but was re- reminded of it when I recently came across this entry in Connor McDonald’s excellent blog.

I’ll start by creating a table with a bunch of rows:

SQL> create table ziggy as select o.* from dba_objects o, dba_users;

Table created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> select count(*) from ziggy;

  COUNT(*)
----------
   3939187

I’m thinking of creating an index on the OBJECT_NAME column, but I’m unsure if I’ll have enough free space in my tablespace. So let’s quickly get an estimate of the index size by simply generating the explain plan of the CREATE INDEX statement:

SQL> explain plan for create index ziggy_object_name_i on ziggy(object_name);

Explained.

Elapsed: 00:00:00.09

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1219136602

----------------------------------------------------------------------------------------------

| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT |                     |  3939K|    93M| 22032   (3)| 00:00:01 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_OBJECT_NAME_I |       |       |            |          |
|   2 |   SORT CREATE INDEX    |                     |  3939K|    93M|            |          |
|   3 |    TABLE ACCESS FULL   | ZIGGY               |  3939K|    93M| 17199   (4)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 159M bytes

 

Notice the little note below the execution plan. Oracle has estimated an index size of approximately 159M bytes and it’s only taken it 0.09 seconds to do so. A trace of the session highlights how Oracle simply uses the table statistics in its determination of the estimated index size.

Well, that’s OK I have sufficient space for an index of that size.  Let’s create the physical index and check out its actual size:

SQL> create index ziggy_object_name_i on ziggy(object_name);

Index created.

SQL> select bytes from dba_segments where segment_name='ZIGGY_OBJECT_NAME_I';

     BYTES
----------
 163577856

SQL> analyze index ziggy_object_name_i validate structure;

Index analyzed.

SQL> select btree_space from index_stats;

BTREE_SPACE
-----------
  157875040

 

Not bad at all, the estimate and actual index sizes are pretty well spot on.

There are some limitations however. Let’s pick another column, SUBOBJECT_NAME, which has a large number of NULL values:

SQL> select count(*) from ziggy where subobject_name is not null;

  COUNT(*)
----------
     33669

SQL> explain plan for create index ziggy_subobject_name_i on ziggy(subobject_name);

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 4065057084

-------------------------------------------------------------------------------------------------
| Id  | Operation              | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT |                        |  3939K|  7693K| 20132  (4)| 00:00:01 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_SUBOBJECT_NAME_I |       |       |     |          |
|   2 |   SORT CREATE INDEX    |                        |  3939K|  7693K|     |          |
|   3 |    TABLE ACCESS FULL   | ZIGGY                  |  3939K|  7693K| 17238  (4)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 100M bytes

The SUBOBJECT_NAME column only has a relatively few (33,669) values that are not null, but the explain plan is still estimating the index to have the full 3.9 million rows (remembering that fully null indexed values are not indexed in a B-Tree index). The estimated index size of 100M is therefore not going to be particularly accurate.

SQL> create index ziggy_subobject_name_i on ziggy(subobject_name);

Index created.

SQL> select bytes from dba_segments where segment_name='ZIGGY_SUBOBJECT_NAME_I';

      
     BYTES
----------
   1048576

SQL> analyze index ziggy_subobject_name_i validate structure;

Index analyzed.

SQL> select btree_space from index_stats;

BTREE_SPACE
-----------
     928032

So in this example, the estimated index size is indeed way off. This method doesn’t seem to cater for null index values and assumes the index to be fully populated.

However, if we simply take the known ratio of  not null values (in this example, 33669 not null rows /3939187 total rows =0.00855) and then apply it to the calculated estimate (100M x .00855 = 0.855M), where are now back into accurate ballpark territory again.

Of course, such estimates are based on the accuracy of the table statistics. If we have stale statistics, we’ll have stale index size estimates.

Let’s insert more rows and double the size of the table and associated index: 

SQL> insert into ziggy select * from ziggy;

3939187 rows created.

SQL> commit;

Commit complete.

If we re-run the index creation explain plan:

SQL> explain plan for create index ziggy_object_name_i on ziggy(object_name);

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 746589531

----------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT |                     |  3939K|    93M| 22032   (3)| 00:00:01 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_OBJECT_NAME_I |       |       |            |          |
|   2 |   SORT CREATE INDEX    |                     |  3939K|    93M|            |          |
|   3 |    INDEX FAST FULL SCAN| ZIGGY_OBJECT_NAME_I |       |       |            |          |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 159M bytes

We get the same estimate as before. We need to update the table statistics in order to get an updated and more accurate index size estimate:

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> explain plan for create index ziggy_object_name_i on ziggy(object_name);

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Plan hash value: 746589531

----------------------------------------------------------------------------------------------
| Id  | Operation              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

|   0 | CREATE INDEX STATEMENT |                     |  7878K|   187M| 45811   (3)| 00:00:02 |
|   1 |  INDEX BUILD NON UNIQUE| ZIGGY_OBJECT_NAME_I |       |       |            |          |
|   2 |   SORT CREATE INDEX    |                     |  7878K|   187M|            |          |
|   3 |    INDEX FAST FULL SCAN| ZIGGY_OBJECT_NAME_I |       |       |            |          |


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Note
-----
   - estimated index size: 318M bytes

Both the estimated index entries and  index size are now much more accurate.

The number of expected index entries is therefore a useful guide as to the potential accuracy of the size estimate.

So the next time you’re wondering whether an index is significantly larger than it should or whether you have sufficient space for a new index, this is a useful, simple technique to get a quick estimate.

Follow

Get every new post delivered to your Inbox.

Join 1,901 other followers