jump to navigation

Bitmap Index Degradation After DML Prior To 10g (Beauty and the Beast) May 25, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Index Internals, Oracle Indexes.
5 comments

Bitmap Indexes have a bad reputation with regard to being problematic and suffering from severe degradation following heavy DML operations, especially larger amounts of insert activity. Bitmap indexes have been known to grow substantially and require periodic rebuilds following such insert activity.

While this was certainly the case in previous versions of Oracle, things have dramatically improved since version 10g. I thought it might be worthwhile explaining why Bitmap Indexes had such issues prior to 10g and how things are now so much better, in many cases making Bitmap index rebuilds unnecessary.

To start with, a demo on an Oracle 9.2.0.7 database. I’m first going to create a simple little table with 1M rows, with a Bitmap index on a CODE column with 1,000 distinct values. The values are effectively inserted into the table in an evenly distributed manner throughout the entire table.

SQL> create table bowie as select mod(rownum,1000)+1 id, mod(rownum,1000)+1 code,'BOWIE' name from dual connect by level <= 1000000;

Table created.

SQL> create bitmap index bowie_code_i on bowie(code) compute statistics;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

We notice that the Bitmap index has 1000 index entries, one for each index value.  There are 500 leaf blocks which means we can only fit 2 index entries in each leaf block on average. If we look at a partial block dump of the first leaf block:

row#0[5013] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 cc 0c 00 e0
col 2; len 6; (6):  02 02 d6 2d 01 7f
col 3; len 2998; (2998):
03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...
row#1[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

I’ve listed just the first portions of each of the 2 index entries in the leaf block. Each indexed value is different (col 0)  and each have a bitmap string of 2998 bytes in size (col3). Therefore in an 8K block, Oracle can’t indeed fit a 3rd index entry in a block, hence why there are only 2 index entries per block. The bitmap string column is quite large because in a 1M row table, there are 1000 occurences of each indexed value littered throughout the table that need to be referenced.

OK, next I’m going to insert just 1 additional row. It has a CODE value of 1, which is the minimum existing CODE value and so should be indexed in the first index leaf block. Let’s see what impact this has on the index statistics:

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

The crucial point here is that although the value (1) already existed within the index, Oracle has nonetheless created an additional index entry (1001 NUM_ROWS up from 1000). If look at a partial dump of the leaf block now:

row#1[1973] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 1; (1):  00
row#2[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

We notice there is indeed a new index entry (row#1) which only has a rowid range (col 1 and col 2) of just 8 rows (between 02 02 d6 2e 00 00 and 02 02 d6 2e 00 07). So rather than somehow modify the existing index entry, Oracle has created a new index entry with a rowid range of just 8 consecutive rowids.

This means we should now be able to add an additional 7 rows with a CODE value of 1, which providing they can all fit within the same table block, will be associated with this allocated rowid range.

SQL> begin
2  for i in 1..7 loop
3  insert into bowie values (1, 1, 'ZIGGY');
4  commit;
5  end loop;
6  end;
7  /

PL/SQL procedure successfully completed.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

Indeed, after inserting these additional rows, no new index entries were required as all the new rows fall within the rowid range of the new index entry. But add one more row …

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1002

Indeed, we now have yet another additional index entry (1002 NUM_ROWS up from 1001).  If we now look at a partial block dump of the index leaf block:

row#1[1819] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 2; (2):  c8 ff
row#2[1798] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 08
col 2; len 6; (6):  02 02 d6 2e 00 0f
col 3; len 1; (1):  00
row#3[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 

We notice we  have yet another index entry (row#2) covering another narrow 8 row range.

Basically prior to 10g, if a new index value didn’t fit within an existing bitmap index rowid range for the corresponding indexed value, a new index entry is added and all the overheads associated with it, with a default range of just 8 rowids. This is an extremely costly process as the new index entry not only has to store the index value itself again, but additionally 2 rowids and corresponding length bytes and the necessary bitmap string column as well. Additionally, in all likelihood, the rows in the specific rowid range are quite likely to contain differing values in the index columns and so will in turn require new index entries. I’ve just inserted a whole bunch of CODE values of 1 in the above example. In a randomly inserted column value, this might be quite a rare event as each subsequent indexed value is quite likely to differ for each new consecutive row.

If we create a table and index first and then populate the table with the 1 million rows, the number of index entries and size of index itself becomes huge …

SQL> create table radiohead (id number, code number, name char(5));

Table created.

SQL> create bitmap index radiohead_code_i on radiohead(code);

Index created.

SQL> begin
2  for i in 1..1000 loop
3    for j in 1..1000 loop
4     insert into radiohead values (j, j, 'BOWIE');
5     commit;
6    end loop;
7  end loop;
8  end;
9  /

PL/SQL procedure successfully completed.

SQL> analyze index radiohead_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        2        5347    1000000

Because the clustering is so bad, each new row for a given index value isn’t within an existing 8 rowid window and so each and every row in the table requires its own bitmap index entry. As a result, the Bitmap index is huge at 5,347 leaf blocks and has a full1M NUM_ROWS. A partital block dump shows just how inefficient this index is:

row#0[4273] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8a 00 00
col 2; len 6; (6):  02 02 d8 8a 00 07
col 3; len 1; (1):  00
row#1[4294] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8c 00 e0
col 2; len 6; (6):  02 02 d8 8c 00 e7
col 3; len 1; (1):  01
row#2[4315] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8f 00 40
col 2; len 6; (6):  02 02 d8 8f 00 47
col 3; len 1; (1):  04
row#3[4336] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 91 01 20
col 2; len 6; (6):  02 02 d8 91 01 27
col 3; len 1; (1):  05
row#4[4357] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 94 00 88
col 2; len 6; (6):  02 02 d8 94 00 8f
col 3; len 1; (1):  00

We can see that each and every indexed value (col 1) in this partial leaf block dump is the same (hex c1 02) with each only having a rowid range (col 1 and col 2) that covers the bare minimum 8 row range. It’s about as inefficient a Bitmap index as you can get with 1000 distinct values in a 1M row table.

If we now however rebuild the Bitmap index:

SQL> alter index radiohead_code_i rebuild compute statistics;

Index altered.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        1         500       1000

We get back to our nice, efficient Bitmap Index structure with just the bare minimum 1000 index entries, all fitting into a fraction of the index leaf blocks (just 500 down from 5,347).

Fortunately, Oracle has got a lot more cleverer since 10g with regard to how it maintains its Bitmap Index structures during DML as I’ll cover soon in Part II.

Follow

Get every new post delivered to your Inbox.

Join 1,901 other followers