jump to navigation

Unique Bitmap Indexes Part I (Unnatural Selection) March 24, 2010

Posted by Richard Foote in Bitmap Indexes, Index Internals, Oracle Indexes, Unique Indexes.
trackback

As I’ve discussed previously, a Bitmap index can be considered over a B-tree index (where concurrent DML is not an issue) even if there are potentially tens of millions of distinct values, in a table that has say hundreds of millions of rows.
 
However, if a column is unique or “approaches” uniqueness, then one has gone too far and the bitmap index is going to be larger and less efficient than an equivalent b-tree index. So you wouldn’t consider a bitmap index on a column with a million distinct values if the table itself only has in the vicinity of a million rows as well.
 
To understand why a column approaching uniqueness shouldn’t be considered as a bitmap index, one only needs to understand the structure and corresponding differences of index entries in both bitmap and b-tree indexes.
 
I’ll begin by creating a simple table and populating it with a million rows.


SQL> create table bowie (id number, name varchar2(20));
 
Table created.
 
SQL> insert into bowie select rownum, 'Ziggy Stardust' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.

   

Note that the ID column is unique. We can therefore create a Unique b-tree index:
 


SQL> create unique index bowie_unique_i on bowie(id) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_UNIQUE_I';

INDEX_NAME          BLEVEL LEAF_BLOCKS DISTINCT_KEYS
--------------- ---------- ----------- -------------
BOWIE_UNIQUE_I           2        1875       1000000

 

Note that the unique index has 1875 leaf blocks. If we dump a leaf block and look at some (say 5) of the index entries:
 

row#0[8025] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 00
col 0; len 2; (2):  c1 02
row#1[8014] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 01
col 0; len 2; (2):  c1 03
row#2[8003] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 02
col 0; len 2; (2):  c1 04
row#3[7992] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 03
col 0; len 2; (2):  c1 05
row#4[7981] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 04

 

We notice the length of these first 5 index entries are all 11 bytes (len=11).
 
An index entry from this Unique index basically consists of the indexed value (col 0) which is 2 bytes in length in the above sample plus the following overhead:
 
2 bytes for flags and locks
6 bytes for the rowid
1 byte for the index column length
 
So there’s a total of 9 bytes of overhead per index entry in this index in addition to the index value itself. Note also there’s an index entry for each and every indexed value. This is always the case for a non-compressed b-tree index.
 
If we now compare this with an equivalent  Non-Unique index on the same column:

 
 
SQL> drop index bowie_unique_i;
 
Index dropped.
 
SQL> create index bowie_nonunique_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_NONUNIQUE_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_NONUNIQUE_I          2        1999       1000000
 

 

We notice the index is now somewhat larger than the equivalent Unique index, with there now being 1999 leaf blocks, an increase of 124 leaf blocks. A block dump of a leaf block reveals the key difference:
 

 
row#0[8024] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
row#1[8012] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 01
row#2[8000] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 02
row#3[7988] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 03
row#4[7976] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 04

 

As I’ve discussed previously, Oracle makes the Non-Unique index effectively unique by adding the rowid as an additional indexed column within the index entry (col 1 is this additional index column comprising the rowid). There are therefore 2 columns in the index entry, not just the one (denoted by col 0 and col 1). This ensures all duplicate indexed values are subsequently sorted in rowid order within the index and can be efficiently accessed as necessary.
 
The consequence of this subtle difference is that an additional byte is now required to store the length of the rowid column and so the total overhead increases from 9 bytes to 10 bytes per index entry. The overall length of an index entry has therefore increased from 11 to 12 byes (len=12) and this results in the overall increase of 124 leaf blocks in the index, required to effectively store these additional 1 million bytes.
 
If we now create the index as an equivalent bitmap index:
 

  
 
SQL> drop index bowie_nonunique_i;
 
Index dropped.
 
SQL> create bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_BITMAP_I             2        3124       1000000

  

We now notice the index has increased substantially from 1999 leaf blocks for the Non-Unique index to 3124 leaf blocks.
 
Again, a dump of an index leaf block highlights the reason for the increase:
 

 
row#0[8015] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  00
row#1[7994] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  01
row#2[7973] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  02
row#3[7952] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  03
row#4[7931] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  04
 

 

The index entry structure is now somewhat different. We now have an index that has not 1 column (as in the Unique index) or 2 columns (as in the Non-unique index) but 4 columns in the index entry. As before, we still have the index column value of 2 bytes but we now have the following overheads per index entry:
 
2 bytes for flags and locking (as before)
1 byte for the indexed column length (as before)
6 bytes for a rowid index column (col 1) stating the start of a range of rowids that are covered by the particular index entry
1 byte for the length of this start rowid index column
6 bytes for a rowid index column (col 2) stating the end of a range of rowids that are covered by the particular index entry
1 byte for the length of this end rowid index column
1 byte for the bitmap bit sequence column (col 3) required for all the bits referencing rows within the above rowid ranges
1 byte for the length of this bitmap column
 
So the total overhead for each of the 5 index entries listed above is now 19 bytes, not 9 or 10 bytes as for the equivalent b-tree indexes. The length of an index entry is therefore 21 bytes in total, not 11 or 12 bytes as for the equivalent b-tree indexes.

A few important points to note.
 
As the columns are effectively unique, the number of index entries are the same for both b-tree and bitmap indexes. A key advantage of a bitmap index over a b-tree index is that for each distinct value, a single index entry is sufficient to cater for a range of rowids, potentially covering the whole table. For example, a specific column value with 100 duplicates may only need the one index entry for the column value within a bitmap index, but would require 100 different index entries within a (non-compressed) b-tree. However, as the column in the above example is unique, there are no duplicate values and so this potential saving is not possible in this bitmap index.
 
Notice also the size of the bitmap string for each index entry is actually tiny, just a single byte, even though there are 1 million rows in the table. It doesn’t even look like it’s using a million bits to store the necessary bitmap string information for each index entry. This is because for each index entry, only one bit is ever set to 1 (“true”), all other occurrences are logically false as only 1 row in the table ever has the specific index value. Remember, the column values are effectively unique.

Therefore, Oracle can use a very narrow range of rowid ranges for each index entry and effectively not bother storing details for the vast majority of the possible rowid ranges within the table as there’s only one bit that’s of interest and it only corresponds to a specific part of the table. Even in cases where there might just be a duplicate here or there, most values would be zeros (false) regardless and can be compressed extremely efficiently (topic for another day).

Although many folks commonly think otherwise (see original Burleson article for a perfect example of the following misperception), if a column which is unique or is approaching uniqueness is indexed via a bitmap index, the overheads associated with the bitmap string in the index entry is usually very minimal as by definition most bit values are logically “false” (0), with only the rare “true” (1) bit value needing to be stored and reference.

The issue is not necessarily with the overheads associated with just the bitmap string per se but also with the other overhead components, namely the additional rowid and column length bytes.
 
In short, the bitmap index can’t be any more efficient that use just 1 byte to store the necessary bitmap string information (plus 1 byte for the bitmap string length), however 19 bytes of overhead is the minimum required, mainly because of the requirement to store 2 rowids instead of 1 rowid and for all the additional index column length bytes. If the bitmap index needs to cater for a wider range of rowids and for more occurrences of 1s (true) values, then the overheads associated with the bitmap sequence can of course be much more considerable than the 1 byte (again, a topic for another day).
 
Therefore, the bitmap index is going to be significantly less efficient if the indexed values are unique or near unique as there’s all this additional overhead per index entry without the subsequent savings by not having to store separate index entries for duplicates column values. There needs to be at least some measure of duplication within a column for a bitmap index to have some chance of being the more efficient when compared to an equivalent b-tree index.
 
However, how many duplicate values within a column are actually necessary for a bitmap index to be considered and be viable alternative ? The answer is far fewer than many may think (again see original Burleson article for a common misunderstanding in this respect), although this question will be addressed in a future post on the subject.

About these ads

Comments»

1. Flado - March 25, 2010

Richard,
I know you said it’s a topic for another day, but could you throw some light on the encoding of the bitmap string (col3 in your bitmap index dump)? I mean, given that the rowid range covered is 8 rows, and that a byte has 8 bits, I would have expected to see the values 00, 01, 02, 04, and 08 in the bitmap block dump, not 00, 01, 02, 03, and 04. Why is that?
And I’m impressed that a bitmap of 00 is used to encode a bitstring of seven zeros and a one! Thinking about it, there appears to be no need to use 00 to represent a string of zeros – the mere presence of the index entry means at least one bit must be 1, so we could use the value 00 to encode something useful :-)

Thanks!
Flado

Richard Foote - March 26, 2010

Hi Flado

The reason why is simply because Oracle uses a single byte Hex value to represent a bit string pattern with the number of O bits that preceed a 1 bit value. For a range of 8 rowids that contain a single 1 bit, basically:

00 represents 10000000
01 represents 01000000
02 represents 00100000
03 represents 00010000
04 represents 00001000
05 represents 00000100
06 represents 00000010
07 represents 00000001

and so on for as many bits that can fit in a byte.

Things then get more complex when you start dealing with 2 bytes representations of larger bit strings.

When it comes to bitmap indexes, I always refer to the writings of Jonathan Lewis and this presentation in particular by Julian Dyke:

http://www.juliandyke.com/Presentations/BitmapIndexInternals.ppt

Flado - April 26, 2010

“07 represents 00000001

and so on for as many bits that can fit in a byte.”
:-) nice.
Took me a month to notice the joke, though :-/

2. Martin Preiss - March 26, 2010

Richard,
thank you again for the broad explanation of the topic. I have one additional question: you write: “In part this is because the data is extremely well clustered however by far the more important consideration here is that for each index entry, only one bit is ever set to 1 (“true”) …”. I understand that the size of the bitmap is small because Zeros are compressed, but I don’t see were the clustering becomes important if we do not have any groups of subsequent 1/True – values. I assumed that the physical order of the table would not be relevant for a bitmap index on a unique column – is this wrong?

Regards

Martin Preiß

Richard Foote - March 26, 2010

Hi Martin

You are of course absolutely correct.

This blog piece started life as a discussion on how relatively few duplicate values are necessary for a bitmap to outperform a btree but I decide to change tack part way through and discuss a column that is totally unique instead. In the example I finally used, the clustering of the data within the table will indeed make no difference to the overall size of the bitmap index I created.

I’ll discuss “approaching” uniqueness and what that actually means and the impact of the clustering of the data in a different way in a future post. I’ll also cover how the CF is meaningless as captured for bitmap indexes as well.

I’ve change the text in the post to remove the error. Thanks for pointing it out, much appreciated.

3. Mohammad Illiyaz - April 23, 2010

Lovely article…clear and crisp…As usual Richard foote style…you always get me addicted..cant sleep until i finish the part2..

Richard Foote - April 26, 2010

Hi Mohammad

Make sure you get plenty of sleep, it’s very important :)

4. Vijay - April 23, 2010

Hi Richard,

Is it possible to extend the discussion regarding composite bitmap indexes block level storage. I haven’t find any document explaining about internal storage of composite indexes . If you have written any blogs regarding the topic, please point me to the necessary links.

Regards,
VJ

Richard Foote - April 26, 2010

Hi VJ

As I’ve just said to Martin, yes I promise to discuss composite bitmap indexes in coming posts.

5. Vitek - April 27, 2010

Hello Richard,

how do I convert rowid from the 6 bytes long number stored in unique index to it’s common representation?

Great blog!
Vitek

Flado - April 28, 2010

Hi Vitek,

here is one way (I think Richard might have a better one):

with s as (select replace('02 00 6b 0a 00 00', ' ') rid from dual),
rrid as (select 
  dbms_utility.data_block_address_file(to_number(substr(rid,1,8),'XXXXXXXX')) file#,
  dbms_utility.data_block_address_block(to_number(substr(rid,1,8),'XXXXXXXX')) block#,
  to_number(substr(rid,-4),'XXXX') row#
from s)
select t.*, t.rowid, dbms_rowid.rowid_create(0, 0, file#,block#,row#) restricted_rid
from t, rrid
where t.rowid=dbms_rowid.rowid_create(1, &data_object_id, file#,block#,row#);

These are restricted ROWIDs, whereas by “common representation” you probably mean the 10-byte extended ROWIDs. To convert a restricted rowid to an extended rowid you need the data object id, as demonstrated above. You can (in most cases) find the data object id of the table from file# and block# in dba_extents, but if you’re dumping indexes, you should already know on which table they are defined.

Hope this helps.

Richard Foote - April 30, 2010

Hi Flado

No, it’s a nice way to do it :)

Richard Foote - April 30, 2010

Hi Vitek

See Flado’s comments below for a nice way to determine and convert the rowid from the dump representation.

I actually discuss the dbms_utility functions in my rebuilding the truth presentation.

To do a similar thing “manually”, the following usually does the trick. Using a rowid of ’02 00 6b 0a 00 00′ as in Flado’s example:

For the file number, take the first 3 digits (020), present it as hex020 = 32 and divide by 4 = 8

For the block number, take the next 5 digits (06b0a), present it as hex6b0a = 6 x 4096 + 11x 256 + 0 x 16 + 10 = 24576+2816+0+10 = 27402

For the row number, take the last 4 digits (0000), present it as hex0 = 0.

So it’s file id 8, block 27402, row number 0 (meaning the first row in the block as counts start with 0).

Hope it makes some sense !!

Vitek - May 1, 2010

Thanks for explanation. However, the calculation doesn’t work for me.

The following number is an rowid entry from an unique index dump: 00 41 1b 5a 00 a0.

Converting every 2 bytes separately to dec gives me file id 65/4, block 7002 and row 160.

Converting first 3 numbers, next 5 and last 4 gives me the same results as ROWID_ROW_NUMBER(refferenced row rowid) which is file_id 4/4=1, block 72538 and row 160 which I believe is correct.

6. Vitek - May 1, 2010

I still can’t get the idea how index values are encoded. Would someone give mi a hand with “manual” converting those values back to decimals?

row#98[2282] flag: —-S-, lock: 2, len=11, data:(6): …
col 0; len 2; (2): c1 64
row#99[2293] flag: —-S-, lock: 2, len=11, data:(6): …
col 0; len 2; (2): c2 02
row#100[2304] flag: —-S-, lock: 2, len=12, data:(6): …
col 0; len 3; (3): c2 02 02

Thanks in advance,
Vitek

7. Richard Foote - May 1, 2010

Hi Vitek

Yes of course you’re correct, thank-you. I’ve updated my comments.

Regarding converting raw data “manually” to something meaningful, some values are easier (characters for example) than others. See this nice blog entry:

http://dioncho.wordpress.com/2009/07/14/decoding-block-dump-using-utl_raw/

8. 19/03 /2010 – 26/03/2010 « Coskan’s Approach to Oracle - May 3, 2010

[...] 12-How does oracle store unique bitmap indexes Richard Foote-Unique Bitmap Indexes Part I (Unnatural Selection) [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,895 other followers

%d bloggers like this: