jump to navigation

Indexing Foreign Key Constraints With Bitmap Indexes (Locked Out) April 17, 2014

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Foreign Keys, Index Internals, Oracle Indexes.
4 comments

Franck Pachot made a very valid comment in my previous entry on Indexing Foreign Keys (FK) that the use of a Bitmap Index on the FK columns does not avoid the table locks associated with deleting rows from the parent table. Thought I might discuss why this is the case and why only a B-Tree index does the trick.

Let’s first setup some very simple Parent-Child tables:

SQL> create table bowie_dad (id number, dad_name varchar2(30));

Table created.

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> insert into bowie_dad values (2, 'ZIGGY STARDUST');

1 row created.

SQL> insert into bowie_dad values (3, 'MAJOR TOM');

1 row created.

SQL> insert into bowie_dad values (4, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

SQL> create table bowie_kid (id number, kid_name varchar2(30), dad_id number);

Table created.

SQL> insert into bowie_kid select rownum, 'ALADDIN SANE', mod(rownum,3)+2 from dual connect by level >=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> alter table bowie_dad add primary key(id);

Table altered.

SQL> alter table bowie_kid add constraint bowie_kid_fk foreign key(dad_id) references bowie_dad(id);

Table altered.

OK, so we have a small parent table (BOWIE_DAD) and a much larger child table (BOWIE_KID) with all the necessary constraints in place. Note we don’t actually have a child row with a  FK DAD_ID = 1. So we can potentially delete this row from the BOWIE_DAD table (where ID = 1).

Let’s begin by creating a B-Tree  index on the FK column (DAD_ID) and have a look a partial block dump of the first leaf block in the index:

SQL> create index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0035f861
Leaf block dump
===============
header address 360809060=0x15818264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06
…..

 

We’ll compare future block dumps with this one but for now just note that the first index entry has a value of (hex) C1 03, which corresponds to the minimum value for DAD_ID = 2 we currently have in this table/index.

If we insert a new child record in one session (but not yet commit);

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we can delete (but not yet commit) the unwanted parent row without any locking implications thanks to this index on the FK column:

SQL> delete bowie_dad where id = 1;

1 row deleted.

In a third session, we can insert another child record again with no locking implications, providing we don’t attempt to use the parent value the second session is in the process of deleting:

SQL> insert into bowie_kid values (1000002, 'LOW', 3);

1 row created.

But if we do try to insert a new child row with a FK value for which the parent is in the process of being deleted:

SQL> insert into bowie_kid values (1000003, 'HEROES', 1);

The statement hangs and it will do so until the transaction deleting the parent record commits (in which case it will receive an ORA-02291 integrity constraint error) or the transaction rolls back (in which case the insert will succeed).

If we take a fresh dump of the first leaf block (which must contain the associated index entry as it’s the minimum value now in the table):

 Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0x0008.004.00000b8a  0x01431602.01c5.14  —-    1  fsc 0x0000.00000000
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 514
kdxcofbo 1064=0x428
kdxcofeo 1868=0x74c
kdxcoavs 804
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1868] flag: ——-, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 80 7f 38 00 00
row#1[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#2[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03

 

We notice we indeed do have a new index entry (highlighted above), with all the associated locking information in ITL slot 2 for the new row in which the session is locked. So the key point here is that the index is indeed updated and Oracle can proceed or not depending on what happens with the transaction on the parent table. The overhead of this new index entry is minimal and locking can be easily policed and restricted to just the index entries with this specific value (hex) C1 02 which corresponds to DAD_ID = 1.

If we do indeed proceed with the delete on the parent table:

SQL> commit;

Commit complete.

 

The session attempting to insert the now deleted parent FK value indeed fails:

 

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);
insert into bowie_kid values (1000002, 'HEROES', 1)
*
ERROR at line 1:
ORA-02291: integrity constraint (BOWIE.BOWIE_KID_FK) violated - parent key not
found

 

And we notice with a fresh block dump that the index entry has been removed by the now unlocked session:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0035f861
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06

Everything is back to the way it was previously.

 

OK, let’s now re-insert the parent row, drop the FK index and replace it with a Bitmap Index instead:

 

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> commit;

Commit complete.

SQL> drop index bowie_kid_fk_i;

Index dropped.

SQL> create bitmap index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

If we take a look at a partial block dump of the first leaf block of this Bitmap Index:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0x00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.003602fc
Leaf block dump
===============
header address 360809060=0x15818264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

….

 

We notice the first key difference here in that these Bitmap Index entries are potentially HUGE, with just the 2 index entries in this block. The other thing to note is the combination of Bitmap indexes and DMLs can result in locking hell because if an index entry needs to be modified (resulting in a change in the compressed bitmap string), all rows between the rowid ranges specified within the Bitmap Index entry are effectively locked. So Bitmap Indexes introduce severe locking issues, regardless of the Parent/Child update issue highlighted above.

If we insert a child row in one session:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

And in another session insert another row with the same FK value:

SQL> insert into bowie_kid values (1000002, 'HEROES', 4);

The session hangs until the transaction in the first session completes because of the locking implications introduced with the Bitmap Index.

 

Therefore, with a Bitmap Index in place, the last of our worries will be locking issues associated with deleting a parent row. After rolling back the above, we attempt the following. In one session, we insert a child record:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we delete the unwanted parent row:

SQL> delete bowie_dad where id = 1;

and it hangs. The Bitmap Index is not effective in preventing this lock as it was with the B-Tree Index.

In a third session, we attempt to insert a child row with the soon to be deleted parent key:

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);

and it hangs as well. So the Bitmap Index on the FK does not prevent the locking hell such parent deletes can introduce into our environments.

If we roll all this back and simply have one session delete a parent row:

SQL> delete bowie_dad where id = 1;

1 row deleted.

And in another session insert a child row with the FK about to be deleted, the insert hangs as expected with an exclusive transaction lock:

SQL> insert into bowie_kid values (1000001, 'BOWIE', 1);

 

However, if we look at a fresh partial block dump of the first Bitmap Index leaf block:

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0x00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.003602fc
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

…..

 

Unlike the B-Tree index which was updated, the Bitmap index has remained unchanged. No attempt was made by Oracle at this stage to insert the index entry as such a new Bitmap Index entry would likely generate too much overheads and not appreciably reduce the locking implications of these DML statements with these Bitmap Indexes in place anyways. The actual index update is delayed until such as change is possible with the rollback of the parent deletion.

However, in a third session, an insert into the child table with a FK that’s not to be deleted is successful:

SQL> insert into bowie_kid values (1000002, 'BOWIE', 4);

1 row created.

Bitmap indexes are simply not designed with concurrency in mind and have efficiencies that make it easier for single sessions to load data in Data Warehouses environments where they are indeed suitable.

One advantage of the Bitmap index is that at least Oracle doesn’t have to perform a FTS on the (potentially huge) child table when checking for the existence of any associated child FK values. Oracle can quickly use the index to determine whether the parent delete can proceed or not. If we roll everything back and just attempt to delete a parent row:

SQL> delete bowie_dad where id = 1;

1 row deleted.

       
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
          3  consistent gets
          0  physical reads
        676  redo size
        862  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice at just 3 consistent gets, the potentially expensive FTS on the child table has been avoided. Drop the Bitmap index and the FTS must be performed to ensure no current FK values would violate the constraint when the parent row is deleted:

SQL> drop index bowie_kid_fk_i;

Index dropped.

   
SQL> delete bowie_dad where id = 1;

1 row deleted.

    
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
       3629  consistent gets
          0  physical reads
        676  redo size
        863  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice without the Bitmap Index in place, we are now performing many more (3629) consistent gets due to the necessary FTS.

So using a Bitmap Index to police a FK constraint doesn’t reduce the locking implications associated with deleting parent rows (with Bitmap indexes, we have locking hell regardless if there’s much DML) but it does at least reduce the overheads of checking the associated child table.

Bitmap Indexes & Minimize Records_Per_Block (Little Wonder) July 19, 2011

Posted by Richard Foote in Bitmap Indexes, MINIMIZE RECORDS_PER_BLOCK, Oracle Indexes.
Tags:
6 comments

As mentioned in my first post regarding the use of Bitmap Indexes to service Not Equal predicates, an index entry in a Bitmap Index consists of the indexed value and a pair of rowids that determine the range of potential rows spanned by the corresponding bitmap sequence. However, Oracle has no way of determining how many rows might actually be in each specific data block and so must make an assumption that all blocks might hold the maximum number of rows that could potentially fit within a block and assign a bitmap bit accordingly. If a row doesn’t actually exist, then it’s simply a “phantom” row and is assigned a 0 to signify that it doesn’t contain the value of the index entry.
 
This maximum number of possible rows that could potentially fit in a block is called the “Hakan Factor” and is determined at the creation of the table based on the definition of the table (such as number of columns, type of columns, whether they’re nullable, etc.) and of course the block size. The smaller the possible size of the row, the more rows that could fit in a block and the more bits that need to be assigned by the Bitmap Index to cover all possible rowids within the rowid range within a Bitmap Index entry. As an example within an 8K block, taking into consideration block overheads, the maximum number of rows within a block that Oracle can potentially estimate can be as many as 736 rows.
 
These additional 0s that get assigned to cater for rows that might not actually exist, although compressed to some degree, still takes up some space within the index. This additional space can be very significant if:
 
– The difference between the minimum possible size of a row and the actual average size of a row is large (or to put it another way, if the difference between the estimated number of rows per blocks and the actual number of rows per block is large)
– The effective clustering of the indexed data is poor within the table as this will limit the effective compression of the additional 0 bits
 
To highlight how all this can make a significant difference to the size of a Bitmap Index, a simple demo as usual to illustrate.
 
First, I’m going to create a table that has a number of nullable VARCHAR2(100) fields, so they might contain up to 100 characters or perhaps no value at all. The potential size of a row might be tiny or it might be quite large.

 
SQL> create table muse (id number, code number, name1 varchar2(100), name2 varchar2(100), name3 varchar2(100), name4 varchar2(100), name5 varchar2(100), name6 varchar2(100), name7 varchar2(100), name8 varchar2(100), name9 varchar2(100), name10 varchar2(100));
 
Table created.

 
OK, time to populate the table. A couple of key points with the data I’m going to use.
 
Firstly, the CODE column is going to have 100 distinct values but these values will be evenly distributed throughout the entire table. So the clustering associated with this column will be terrible, as bad as it gets.
 
Secondly, although all the VARCHAR2(100) columns might not contain much data (or indeed any at all), in actual fact they’re going to be almost fully populated with data. So although the potential average size of a row could have been quite tiny, in actual fact all the rows are quite large. Although we could potentially have fitted many rows within our (8K) block, in actual fact we’re only going to be able to fit just 7 rows per block. There isn’t actually a single block within our table that contains more than 7 rows.

 
SQL> insert into muse select rownum, mod(rownum,100), 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia','Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'MUSE', estimate_percent=>null, cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select table_name, num_rows, blocks, avg_row_len from dba_tables where table_name='MUSE';
 
TABLE_NAME   NUM_ROWS     BLOCKS AVG_ROW_LEN
---------- ---------- ---------- -----------
MUSE          1000000     145549         998

 
 
Let’s now create a Bitmap Index on the CODE column. I’ll set the PCTFREE to 0 to build the smallest possible index structure:

 
SQL> create bitmap index muse_code_i on muse(code) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, num_rows from dba_indexes where index_name = 'MUSE_CODE_I';
 
INDEX_NAME  LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY   NUM_ROWS
----------- ----------- ----------------------- ----------
MUSE_CODE_I         400                       4        800

 
 
So the Bitmap Index currently consists of 400 leaf blocks.
 
As we now know this table has rows that on average are considerably larger than the minimum possible row size, the Bitmap Index has had to cater for the possible existence of many rows that don’t actually exist. Additionally, as the clustering of the indexed data is very poor, the Bitmap Index will not be able to effectively compress these additional 0 bits as much as it might, as there are bits set to 1 littered all over the place that will hamper the effective compression capabilities of the index (I’ve discuss the impact of the Clustering Factor on the effectiveness of Bitmap Index compression previously).
 
Therefore, it might well be beneficial to more accurately determine the number of rows that really exist within a block. We can change the Hakan Factor by altering the table with the MINIMIZE RECORDS_PER_BLOCK clause. Effectively this results in Oracle performing a full table scan, checking for the number of rows per block (a quick check of the nrow count in the block header suffices) and keeping track of the block that currently contains the most number of rows. The highest value of the nrow count within the table becomes the new Hakan Factor.
 
Let’s give it a go:

 
SQL> alter table muse minimize records_per_block;
alter table muse minimize records_per_block
*
ERROR at line 1:
ORA-28602: statement not permitted on tables containing bitmap indexes

 
Unfortunately, this statement is not permitted if there are already any Bitmap indexes assigned to the table as they have already been based on the current Hakan Factor. All current Bitmap Indexes assigned to the table must first be dropped.

 
SQL> drop index muse_code_i;
 
Index dropped.
 
SQL> alter table muse minimize records_per_block;
 
Table altered.

 
 
OK, so now Oracle has a much more accurate picture of the actual number of rows that exist within a block in this table. The new Hakan Factor is based on the maximum number of rows that actually currently exist within a block in the table (just 7 in this specific example), which is significantly less than was defined previously. Oracle ensures the integrity of the new Hakan Factor from here on in by now limiting the number of rows that can be inserted into blocks within the table to this new value, even if in the future additional rows could potentially have fitted within a block. Once the Hakan Factor is reached, the block is taken off the freelist or marked as full in an ASSM segment.
 
Now any Bitmap Index on this table only has to cater for a relatively small number of rows per block, vastly reducing the number of bits that need to be considered and stored.
 
This can significantly reduce the overall size of associated bitmap indexes:
 

 
SQL> create bitmap index muse_code_i on muse(code) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, num_rows from dba_indexes where index_name = 'MUSE_CODE_I';
 
INDEX_NAME  LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY   NUM_ROWS
----------- ----------- ----------------------- ----------
MUSE_CODE_I         150                       1        300

The new Bitmap Index is now only 150 leaf blocks in size, substantially smaller than the previous 400 leaf blocks.

Bitmap Indexes and Not Equal Part II (Sheep) July 7, 2011

Posted by Richard Foote in Bitmap Indexes, NOT Equal, Oracle Indexes.
11 comments

An excellent comment/question by mdinh made me realise my demos in Part I might be a little extreme in returning 0 rows and perhaps give the false impression that Not Equal conditions are only considered or applicable if no rows are returned. This is not the case and with the bitmap index now considered with Not Equal conditions, the choice of whether or not to actually use the index as usual comes down to the comparative costs associated with the available plans. 

So, I’ll expand on my demo a tab by introducing a new value for the FLAG column: 

SQL> update radiohead
  2  set flag = 1
  3  where rownum < 101;

 100 rows updated.

 SQL> commit;

 Commit complete.

 SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD',cascade=>true, estimate_percent=>null, method_opt=> 'FOR COLUMNS FLAG SIZE 5');

 PL/SQL procedure successfully completed.

 

 
OK, so now we have some 100 rows which have a value of FLAG which are not equal to 42, which are evenly distributed among all 5 CODE values. I’ve created a histogram however on the FLAG column as the 2 values (1 and 42) are not evenly distributed.

Let’s run the query now:

SQL> select * from radiohead where code = 1 and flag <> 42;
20 rows selected.

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2786215024

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    20 |   300 |     46  (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    20 |   300 |     46  (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                  |       |       |            |          |
|   3 |    BITMAP MINUS              |                  |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
|*  5 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_FLAG_I |       |       |            |          |
-------------------------------------------------------------------------------------------------

 
Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CODE"=1)
   5 - access("FLAG"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         47  consistent gets
          0  physical reads
          0  redo size
        775  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

 

 
We notice a few key points. Firstly, as we have a histogram on the FLAG column and the data is perfectly evenly distributed among the CODE values, the CBO has got the estimated cardinality of 20 rows spot on. So all things being equal, we can have some confidence the CBO has done the right thing and selected the most efficient execution plan.

We also notice that the cost has now gone up considerably to 46 (up from 3) but it’s still significantly less than the cost of 761 associated with a Full Table Scan. Therefore, the CBO has still chosen the same execution plan with the two bitmap indexes returning the 20 rows, as it did when it returned none in the previous example.

 
In answer to another comment/question by SJ12345, regarding the use of unbounded predicates, if we now try the following:

SQL> select * from radiohead where code = 1 and flag > 42;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 2939001425

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    10 |   150 |      6  (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    10 |   150 |      6  (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                  |       |       |            |          |
|   3 |    BITMAP AND                |                  |       |       |            |          |
|   4 |     BITMAP MERGE             |                  |       |       |            |          |
|*  5 |      BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("FLAG">42)
       filter("FLAG">42)
   6 - access("CODE"=1)

 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Yep, cheap with Bitmap Indexes says the CBO. Note the difference here though is that the CBO uses a BITMAP MERGE to first get all possible rowid values of FLAG that are > 42 and then uses a BITMAP AND operation in combination with the CODE Bitmap index to get all rowids that match from both Bitmap indexes. However, as it evaluates the Bitmap Index on the FLAG index first and there are no index entries with a value > 42, it doesn’t have to actually worry about the CODE condition as no rows can possibly be returned. Therefore a very tiny 2 consistent gets are all that are necessary. 

The following will looking for anything < than 42, remembering we now have 20 rows that meet this condition:

SQL> select * from radiohead where code = 1 and flag < 42;

20 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2939001425

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    20 |   300 |      8  (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    20 |   300 |      8  (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                  |       |       |            |          |
|   3 |    BITMAP AND                |                  |       |       |            |          |
|   4 |     BITMAP MERGE             |                  |       |       |            |          |
|*  5 |      BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("FLAG"<42)
       filter("FLAG"<42)
   6 - access("CODE"=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        775  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

 
Yep, also cheap with Bitmap Indexes, using the same plan as the previous > than example but using more consistent gets as there are a number of rows that need to be accessed this time (although all in the same data block).

To now complete the picture:

SQL> select * from radiohead where code = 1 and (flag < 42 or flag > 42);

20 rows selected.

 
Execution Plan
----------------------------------------------------------
Plan hash value: 3720408756

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |    30 |   450 |   14   (0)| 00:00:01 |
|   1 |  CONCATENATION                |                  |       |       |           |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    10 |   150 |    6   (0)| 00:00:01 |
|   3 |    BITMAP CONVERSION TO ROWIDS|                  |       |       |           |          |
|   4 |     BITMAP AND                |                  |       |       |           |          |
|   5 |      BITMAP MERGE             |                  |       |       |           |          |
|*  6 |       BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |           |          |
|*  7 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |           |          |
|   8 |   TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    20 |   300 |    8   (0)| 00:00:01 |
|   9 |    BITMAP CONVERSION TO ROWIDS|                  |       |       |           |          |
|  10 |     BITMAP AND                |                  |       |       |           |          |
|  11 |      BITMAP MERGE             |                  |       |       |           |          |
|* 12 |       BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |           |          |
|* 13 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |           |          |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("FLAG">42)
       filter("FLAG">42)
   7 - access("CODE"=1)
  12 - access("FLAG"<42)
       filter(LNNVL("FLAG">42) AND "FLAG"<42)
  13 - access("CODE"=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        775  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

We now get a combination of both previous plans, concatenated together. Note in this case, it’s actually a cheaper and more efficient alternative to the first Not Equal example. Having got the setup for this demo, you can of course create the same demo yourselves and have a play and experiment. Me, I’m now off to watch Cadel Evans win the Tour De France :)

Bitmap Indexes and Not Equal (Holy Holy) July 5, 2011

Posted by Richard Foote in Bitmap Indexes, NOT Equal, Oracle Indexes.
14 comments

Way back, I previously discussed how the CBO will simply ignore any possible indexes when determining the best execution plan involving a NOT EQUAL(<>) condition, even if an index might in theory provide the most efficient access path. Oracle just assumes that the vast majority of rows are likely to be returned and so doesn’t even bother to cost and consider any potential indexes. The previous discussion was aimed specifically at B-Tree indexes, but as a comment at the time by Christian Antognini highlighted, things are a little different for Bitmap indexes. Thought it might be worth exploring this point a little further.

To start and to recap, I’ll begin by creating a simple little table, populated with 1,000,000 rows. It has 2 columns of interest for now, one called CODE which has just 5 distinct values and another called FLAG that only has the 1 distinct value (a value of ’42’ naturally):

SQL> create table radiohead (code number not null, type number not null, flag number not null, name varchar2(30));

 Table created.

 SQL> insert into radiohead select mod(rownum,5)+1, mod(rownum,20)+1, 42, 'ZIGGY' from dual connect by level <= 1000000;

 1000000 rows created.

 SQL> commit;

 Commit complete. 

 I’ll begin by creating standard B-Tree indexes on these columns:

SQL> create index radiohead_code_i on radiohead(code);

 Index created.

 SQL> create index radiohead_type_i on radiohead(type);

 Index created.

 SQL> create index radiohead_flag_i on radiohead(flag);

 Index created.

 SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

 PL/SQL procedure successfully completed. 

If we run a query that returns all rows that don’t have a FLAG value of 42 (of which there are none):

SQL> select * from radiohead where flag <> 42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"<>42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 
Note that although the CBO has estimated it will likely only return just the 1 row, it has opted to go for a Full Table Scan. A 10053 trace would show that the index on the FLAG column wasn’t even considered by the CBO. The use of the Not Equal (<>) condition has totally negated the use of the available index.

If we look at a query now on the CODE column:

SQL> select * from radiohead where code = 1;

200000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   200K|  2929K|   761   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |   200K|  2929K|   761   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=1)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2809  consistent gets
          0  physical reads
          0  redo size
    1602034  bytes sent via SQL*Net to client
        824  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     200000  rows processed

 

As there are only 5 evenly distributed values, the CBO has got the cardinality estimate spot on and has decided that visiting the table 200,000 times via the index is just too expensive and that the Full Table Scan is the more efficient method. Fair enough.

If we now run a query that looks for all values of a specific CODE but only if the FLAG is not 42 (which again is going to return 0 rows):


SQL> alter session set events '10053 trace name context forever';

Session altered.

SQL> select * from radiohead where code = 1 and flag <> 42;

no rows selected

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"<>42 AND "CODE"=1)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 
Again, the Full Table Scan is the way to go says the CBO. The index on the FLAG column is not considered and the index on the CODE column is just too expensive. A 10053 trace confirms this:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: RADIOHEAD  Alias: RADIOHEAD
    #Rows: 1000000  #Blks:  2764  AvgRowLen:  15.00
Index Stats::
  Index: RADIOHEAD_CODE_I  Col#: 1
    LVLS: 2  #LB: 1950  #DK: 5  LB/K: 390.00  DB/K: 2755.00  CLUF: 13775.00
  Index: RADIOHEAD_FLAG_I  Col#: 3
    LVLS: 2  #LB: 1950  #DK: 1  LB/K: 1950.00  DB/K: 2755.00  CLUF: 2755.00
  Index: RADIOHEAD_TYPE_I  Col#: 2
    LVLS: 2  #LB: 1950  #DK: 20  LB/K: 97.00  DB/K: 2755.00  CLUF: 55100.00
Access path analysis for RADIOHEAD
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for RADIOHEAD[RADIOHEAD]

  Table: RADIOHEAD  Alias: RADIOHEAD
    Card: Original: 1000000.000000  Rounded: 1  Computed: 0.20  Non Adjusted: 0.20
  Access Path: TableScan
    Cost:  762.05  Resp: 762.05  Degree: 0
      Cost_io: 750.00  Cost_cpu: 259683730
      Resp_io: 750.00  Resp_cpu: 259683730
  Access Path: index (AllEqRange)
    Index: RADIOHEAD_CODE_I
    resc_io: 3147.00  resc_cpu: 114411172
    ix_sel: 0.200000  ix_sel_with_filters: 0.200000 
    Cost: 3152.31  Resp: 3152.31  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 762.05  Degree: 1  Resp: 762.05  Card: 0.20  Bytes: 0

***************************************

Note that the index on the FLAG column is not even mentioned within the possible execution plans and the index on the CODE column has a cost of 3152.31 which is way more than the Full Table Scan cost of 762. So the Full Table Scan is selected, even though no rows are returned and the CBO estimates that just 1 row is likely to be returned. OK, let’s now drop the B-Tree indexes and replace them with Bitmap indexes:


SQL> drop index radiohead_code_i;

Index dropped.

SQL> drop index radiohead_type_i;

Index dropped.

SQL> drop index radiohead_flag_i;

Index dropped.

SQL> create bitmap index radiohead_code_i on radiohead(code);

Index created.

SQL> create bitmap index radiohead_type_i on radiohead(type);

Index created.

SQL> create bitmap index radiohead_flag_i on radiohead(flag);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 
If we now run the same query again on the FLAG column:

SQL> select * from radiohead where flag <> 42;

 no rows selected

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

 ------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

 Predicate Information (identified by operation id):
---------------------------------------------------

    1 - filter("FLAG"<>42)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 

We notice that again the index is not chosen, even though the bitmap index stores references to those rowids where this condition is not true (a bitmap value of 0) and even though the CBO estimates only the 1 row is likely to be returned. To see why this is the case, let’s look at a partial bitmap index entry via a block dump of the bitmap index:

 
Block header dump:  0x01c01d1c
 Object id on Block? Y
 seg/obj: 0x13e38  csc: 0x00.1234e2a  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1c01d18 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.01234e2a
Leaf block dump
===============
header address 214311524=0xcc62264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 960=0x3c0
kdxcoavs 920
kdxlespl 0
kdxlende 0
kdxlenxt 29367581=0x1c01d1d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——, lock: 0, len=3537
col 0; len 2; (2):  c1 2b
col 1; len 6; (6):  01 40 2c 82 00 00
col 2; len 6; (6):  01 40 2c c4 00 7f
col 3; len 3516; (3516):
 cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff
 ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff ff ff ff
 ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff
 ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff
 ff ff ff ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf
 …

We notice the bitmap index entry consists of the indexed value (c1 2b), a start rowid (01 40 2c 82 00 00), an end rowid (01 40 2c c4 00 7f) and a bitmap string for which a bit corresponds to every row within the rowid range, set to either 1 (for true) or 0 (for false). The 0s are compressed and represented by a value based on the actual number of compressed bits.

However, if the bitmap entry only has a start and end rowid range, how does it actually know the location of all the corresponding rows, as there could be differing number of rows for any of the given data blocks. How does it know just how many rows actually exist within the rowid range ?

The answer is that it can’t possibly know. Therefore, Oracle makes a very important assumption and based on the definition of the table, determines the maximum number of rows that could potentially fit within a block and assigns a bit for every possible rowid that could in theory exist within the specified rowid range (I’ll expand on this point in my next post).

If the rowid actually corresponds to an existing row, then the bit is set accordingly depending on the value of the indexed column for that row. If the rowid doesn’t exist (or doesn’t exist yet), then the corresponding bit is simply set to a 0. If there are a whole bunch of consecutive 0s for rows that don’t exist, they get compressed and the overheads are minimised.

However, the value of a bit set to 0 can therefore potentially mean one of two things. It could mean that the row exists but doesn’t have the value represented by the index entry or it could mean that the row simply doesn’t exist at all. There is no way for Oracle to tell the difference between these two scenarios.

If one is after rows for which the column has a specific value, then no problem, all the bits with a value of 1 must correspond to rows that really do exist and have the column value of interest. However, if one is after all rows for which the column value is not the one represented by a bitmap index entry (as in a <> condition), then referencing all the bits that have a 0 won’t be sufficient as they could potentially point at rows that don’t actually exist and accessing a table looking up rows that don’t exist will open up a can of worms.

Therefore, just like a B-Tree index, the CBO will not consider a Bitmap index for a query that exclusively contains a not equal or not in condition.

If we now look at the second query based on the CODE column:

SQL> select * from radiohead where code = 1;

200000 rows selected.

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   200K|  2929K|   761   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |   200K|  2929K|   761   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=1)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2809  consistent gets
          0  physical reads
          0  redo size
    1602034  bytes sent via SQL*Net to client
        824  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     200000  rows processed

 

We notice that the CBO again chooses the Full Table Scan as again, the query is returning 20% of all rows and deems it too expensive to visit the table 200,000 times to retrieve the data via the index, even if the Bitmap index structure itself is relatively small and efficient. So again, no difference to the B-Tree index example.

However, if we run the third query based on both the CODE column and the <> condition on the FLAG column:

SQL> select * from radiohead where code = 1 and flag <> 42;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1712231689

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |     1 |    15 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | RADIOHEAD        |     1 |    15 |     3   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS |                  |       |       |            |          |
|   3 |    BITMAP MINUS               |                  |       |       |            |          |
|   4 |     BITMAP MINUS              |                  |       |       |            |          |
|*  5 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
|*  6 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_FLAG_I |       |       |            |          |
|*  7 |     BITMAP INDEX SINGLE VALUE | RADIOHEAD_FLAG_I |       |       |            |          |
--------------------------------------------------------------------------------------------------

 
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("CODE"=1)
   6 - access("FLAG" IS NULL)
   7 - access("FLAG"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         47  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Now we see a distinct difference from the B-Tree index example as both Bitmap Indexes have been used rather than a Full Table Scan.

In conjunction with another index that returns rowids of interest that obviously exist, a bitmap index can be used successfully to determine a Not Equal condition. By logically subtracting all the matching rowids of one bitmap index (that contains rowids than aren’t of interest) from the other bitmap index (which contains rowids that are of interest), a list of actual rowids of interest can be determined to access the table. Note this can also potentially be performed by looking up the 0 bits, as corresponding rows do not have the indexed value and any matching rowids can be proven to exist by their appearance within the other Bitmap index.

As most of this processing only involves simple bit comparisons via accesses to relatively small, efficient Bitmap index structures, the relative overheads can be significantly reduced from that of the Full Table Scan (eg. in this example, consistent gets reduced from 2770 to just 47).

So a Not Equal/Not In can be serviced via a Bitmap Index, providing another index is also accessed that returns rowids of interest.

Oracle11g Bitmap-Join IOTs (Us and Them) January 25, 2011

Posted by Richard Foote in 11g, Bitmap Indexes, Index Organized Tables, Oracle Indexes.
6 comments

With each new database release, nice little improvements and enhanced options continually get added. Since 11g R1, two index related features can finally be used in combination with each other.
 
To demonstrate, I’m first going to create and populate a so-called “large” Data Warehouse table. 
 

 
SQL> CREATE TABLE big_dwh_table (id NUMBER PRIMARY KEY, album_id NUMBER, artist_id NUMBER, country_id NUMBER, format_id NUMBER, release_date DATE, total_sales NUMBER);
 
Table created.
 
SQL> CREATE SEQUENCE dwh_seq;
 
Sequence created.
 
SQL> create or replace procedure pop_big_dwh_table as
  2  v_id          number;
  3  v_artist_id   number;
  4  begin
  5    for v_album_id in 1..10000 loop
  6        v_artist_id:= ceil(dbms_random.value(0,100));
  7        for v_country_id in 1..100 loop
  8          select dwh_seq.nextval into v_id from dual;
  9          insert into big_dwh_table values (v_id, v_album_id, v_artist_id, v_country_id, ceil(dbms_random.value(0,4)), trunc(sysdate-mod(v_id,ceil(dbms_random.value(0,1000)))), ceil(dbms_random.value(0,500000)));
 10       end loop;
 11    end loop;
 12 commit;
 13 end;
 14 /
 
Procedure created.
 
SQL> exec pop_big_dwh_table
 
PL/SQL procedure successfully completed.

 

I’ll next create a standard bitmap index on the ALBUM_ID column and collect a few statistics:
 

  
SQL> create bitmap index big_dwh_table_album_id_i on big_dwh_table(album_id);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'BIG_DWH_TABLE', estimate_percent=> null, cascade=> true, method_opt=>'FOR  ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 

OK, I’m now going to create and populate a “smaller” dimension/detail heap table and a few associated indexes:

 
SQL> CREATE TABLE albums (album_id number, album_details varchar2(30));
 
Table created.
 
SQL> INSERT INTO albums SELECT rownum, substr(object_name,1,30) FROM dba_objects WHERE rownum <= 10000;
 
10000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> alter table albums add primary key(album_id);
 
Table altered.
 
SQL> create index albums_details_i on albums(album_details);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'ALBUMS', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 

If we now run a little query that joins the two tables together:
 

 
SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$';
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1936297994
 
----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                          |   125 |  4250 |    25   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |                          |       |       |            |          |
|   2 |   NESTED LOOPS                |                          |   125 |  4250 |    25   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| ALBUMS                   |     1 |    22 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | ALBUMS_DETAILS_I         |     1 |       |     1   (0)| 00:00:01 |
|   5 |    BITMAP CONVERSION TO ROWIDS|                          |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE | BIG_DWH_TABLE_ALBUM_ID_I |       |       |            |          |
|   7 |   TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE            |   100 |  1200 |    25   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("A"."ALBUM_DETAILS"='TAB$')
   6 - access("B"."ALBUM_ID"="A"."ALBUM_ID")
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
       1648  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 

The resultant execution plan is pretty good and efficient and what we would expect. It performs a nested loop join to join the tables together which based on the relatively small number of rows returned makes sense and uses the b-tree index to get the specific album details from the dimension table and the bitmap index to find the matching albums details from the larger table.
 
However, as this is a very frequently executed join condition, we can potentially improve things and reduce the 10 consistent gets by introducing a bitmap-join index. A bitmap-join index performs the “join” operation once, when the index is created and during subsequent DML operations by creating an index based on column(s) on the smaller dimension tables that directly references rows in the larger fact table.
    

 
SQL> drop index albums_details_i;
 
Index dropped.
 

SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
     FROM big_dwh_table b, albums a
     WHERE b.album_id = a.album_id;
 
Index created.

 

So the bitmap-join index is based on the ALBUM_DETAILS column from the smaller ALBUMS table, but it references and has rowids associated with the larger BIG_DWH_TABLE table, with the bitmap-join definition containing details on how the join between the two tables needs to be performed. It if want to know what rows within the larger table have ALBUM_DETAILS of interest, the corresponding bitmap-join index will find all such rows without having to access the smaller ALBUMS table that contains this column.
 
If we now run the same query as before:
  

 
SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$';
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 950886520
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   125 |  1500 |    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE           |   125 |  1500 |    26   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                         |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_DWH_ALBUM_DETAILS_I |       |       |            |          |
--------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("B"."SYS_NC00008$"='TAB$')
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1648  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
 
   

We notice the nested loop operation is no longer required. In fact, we don’t need to reference the smaller ALBUMS table at all as all the required information can now be obtained by using the bitmap-join index and direct accesses to the larger table. The number of consistent gets has therefore reduced from 10 down to just 6.
 
Note in our example, there is no actual Foreign Key (FK) constraint in the larger table (in a Data Warehouse, such constraints may not be necessary and/or get in the way). The bitmap-join index doesn’t require a FK constraint to be in place however it’s necessary that the column in the join condition referencing the detail table be Unique else there could be a many-to-many join condition which wouldn’t make sense when attempting to populate the bitmap-join index.
 
However, make one of the tables in the Bitmap-Join index an Index Organized Table (IOT), in this case the smaller detail table …
 

 
SQL> drop table albums;
 
Table dropped.
 
SQL> CREATE TABLE albums (album_id number primary key, album_details varchar2(30)) organization index;
 
Table created.
 
SQL> INSERT INTO albums SELECT rownum, substr(object_name,1,30) FROM dba_objects WHERE rownum <= 10000;
 
10000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'ALBUMS', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 

SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
  2       FROM big_dwh_table b, albums a
  3       WHERE b.album_id = a.album_id;
CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
                                               *
ERROR at line 1:
ORA-25966: join index cannot be based on an index organized table

 

 
   
 
and we get the above error as prior to 11g R1, there was a restriction that no table within a Bitmap-Join index could be an Index Organized Table.
 

Now, if we run exactly the same demo but in an Oracle11g database:
 
 

 
SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
     FROM big_dwh_table b, albums a
     WHERE b.album_id = a.album_id;
 
Index created.
 

SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$';
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 950886520
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   125 |  1500 |    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE           |   125 |  1500 |    26   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                         |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_DWH_ALBUM_DETAILS_I |       |       |            |          |
--------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("B"."SYS_NC00008$"='TAB$')
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1648  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 

It all now works fine.
 
So since Oracle 11g R1, there’s one less reason not use Index Organized Tables and/or Bitmap-Join indexes in your Data Warehouse :)

Bitmap Index Degradation Since 10g (Fix You) June 1, 2010

Posted by Richard Foote in Bitmap Indexes, Index Internals, Oracle Indexes.
6 comments

As discussed in my earlier post on Bitmap Index Degradation After DML Prior To 10g, Oracle wasn’t particularly efficient in the manner it maintained Bitmap Indexes after DML operations. During insert operations, if an existing Bitmap index entry didn’t cover the rowid range of a new row to be inserted, Oracle would create a new Bitmap index entry with a default range of 8 rowids and all the overheads this entails (and all the overheads this entails. As such, Bitmap indexes prior to 9i would often explode in size.

Thankfully since 10g, these issues have been largely addressed.

To illustrate, I’m going to run the exact same demo as I did in 9i, this time specifically on a 10.2.0.4 windows database, although you should get similar results in all versions of 10g/11g as well.

As before, I’m going to create the same table and populate it with the same data, creating a Bitmap index on the CODE column which has 1000 distinct values:

SQL> create table bowie as select mod(rownum,1000)+1 id, mod(rownum,1000)+1 code,'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_code_i on bowie(code) compute statistics;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

 

A freshly created Bitmap index has the same leaf blocks and index entries as in 9i.

Looking at a partial block dump of the first leaf block in the index:

row#0[5013] flag: ------, lock: 0, len=3019
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 42 1e 8c 00 e0
col 2; len 6; (6):  01 42 28 ad 01 7f
col 3; len 2998; (2998):
 03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...

row#1[1994] flag: ------, lock: 0, len=3019
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 42 1e 8a 00 00
col 2; len 6; (6):  01 42 28 ab 00 9f
col 3; len 2998; (2998):
 00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
 c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4
...

 

We’ll look back and compare differences within this leaf block in a moment but for now just note there are 2 index entries within the leaf block, with 1 index entry for each distinct value of the CODE column (as with the 9i example).

OK, next we insert a new row and see what happens. Remember in 9i, Oracle created a new index entry as no existing bitmap index entries had a rowid range that span the rowid of this newly inserted row:

SQL> insert into bowie values (1, 1, 'ZIGGY');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index bowie_code_i compute statistics;
 
Index analyzed.
 
SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

OK, we notice a significant difference here from the 9i example. Although we’ve just inserted a new row, Oracle has not created a new Bitmap index entry (still 1000 rows in the index). Clearly, Oracle has managed to reuse the existing index entry for the newly inserted CODE value “1”, rather than add a new index entry.

A partial leaf block dump reveals what has happened:

row#0[1992] flag: ------, lock: 2, len=3021
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 42 1e 8c 00 e0
col 2; len 6; (6):  01 42 28 ae 00 07
col 3; len 3000; (3000):
 03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...

 

Rather than create a new index entry for the CODE value “1”, Oracle has made a couple of key changes to the existing index entry. Firstly, we notice it has changed the end rowid range (col 2) from a value of 01 42 28 ad 01 7f  to  01 42 28 ae 00 07 so that the rowid range now includes the rowid associated with the newly inserted row.

Additionally, it has modified the bitmap index string column (col 3) to incorporate the location of the newly inserted row within the increased rowid range. This has resulted in the bitmap string column increasing from 2998 to 3000 bytes, thus increasing the overall size of the index entry by the 2 additional bytes (3021 up from 3019).

So since 10g, Oracle is significantly more efficient and where possible will simply adjust the current rowid range of the Bitmap index entry and modify the bitmap string accordingly to accommodate a new row value (resulting in an overall increase of just 2 bytes overall in this example) rather than create a totally new index entry (which required an additional 21 bytes in the 9i example).

If we were to populate this Bitmap index from scratch as we did in the 9i example:

SQL> create table radiohead (id number, code number, name char(5));
 
Table created.
 
SQL> create bitmap index radiohead_code_i on radiohead(code);
 
Index created.
 
SQL> begin
  2  for i in 1..1000 loop
  3    for j in 1..1000 loop
  4     insert into radiohead values (j, j, 'BOWIE');
  5     commit;
  6    end loop;
  7  end loop;
  8  end;
  9  /
 
PL/SQL procedure successfully completed.
 
SQL> analyze index radiohead_code_i compute statistics;
 
Index analyzed.
 
SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        2         781       1000

  

We notice the index has not deteriorated anywhere near to the same degree as the 9i bitmap index example. Previously, the 9i bitmap index grew to a massive 5,347 leaf blocks but this 10g version has only become a moderate 781 leaf blocks, just 281 leaf blocks greater than a freshly rebuilt bitmap index. It has only grown by some 56%, (of which much has to do with the free space associated with subsequent index block splits), whereas the 9i version of the index grew by a massive 969%.

In summary, bitmap indexes in currently “supported” versions of Oracle are maintained in a much more efficient manner than they were previously, to the point where the need for frequently rebuilds has been much reduced, even in tables in which such indexes are not dropped during heavy loads.

That said, Bitmap indexes are still unsuitable in OLTP type environments (even in 11g)  due to the locking implications associated with them. Perhaps a discussion for another day.

Bitmap Index Degradation After DML Prior To 10g (Beauty and the Beast) May 25, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Index Internals, Oracle Indexes.
5 comments

Bitmap Indexes have a bad reputation with regard to being problematic and suffering from severe degradation following heavy DML operations, especially larger amounts of insert activity. Bitmap indexes have been known to grow substantially and require periodic rebuilds following such insert activity.

While this was certainly the case in previous versions of Oracle, things have dramatically improved since version 10g. I thought it might be worthwhile explaining why Bitmap Indexes had such issues prior to 10g and how things are now so much better, in many cases making Bitmap index rebuilds unnecessary.

To start with, a demo on an Oracle 9.2.0.7 database. I’m first going to create a simple little table with 1M rows, with a Bitmap index on a CODE column with 1,000 distinct values. The values are effectively inserted into the table in an evenly distributed manner throughout the entire table.

SQL> create table bowie as select mod(rownum,1000)+1 id, mod(rownum,1000)+1 code,'BOWIE' name from dual connect by level <= 1000000;

Table created.

SQL> create bitmap index bowie_code_i on bowie(code) compute statistics;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

We notice that the Bitmap index has 1000 index entries, one for each index value.  There are 500 leaf blocks which means we can only fit 2 index entries in each leaf block on average. If we look at a partial block dump of the first leaf block:

row#0[5013] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 cc 0c 00 e0
col 2; len 6; (6):  02 02 d6 2d 01 7f
col 3; len 2998; (2998):
03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...
row#1[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

I’ve listed just the first portions of each of the 2 index entries in the leaf block. Each indexed value is different (col 0)  and each have a bitmap string of 2998 bytes in size (col3). Therefore in an 8K block, Oracle can’t indeed fit a 3rd index entry in a block, hence why there are only 2 index entries per block. The bitmap string column is quite large because in a 1M row table, there are 1000 occurences of each indexed value littered throughout the table that need to be referenced.

OK, next I’m going to insert just 1 additional row. It has a CODE value of 1, which is the minimum existing CODE value and so should be indexed in the first index leaf block. Let’s see what impact this has on the index statistics:

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

The crucial point here is that although the value (1) already existed within the index, Oracle has nonetheless created an additional index entry (1001 NUM_ROWS up from 1000). If look at a partial dump of the leaf block now:

row#1[1973] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 1; (1):  00
row#2[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

We notice there is indeed a new index entry (row#1) which only has a rowid range (col 1 and col 2) of just 8 rows (between 02 02 d6 2e 00 00 and 02 02 d6 2e 00 07). So rather than somehow modify the existing index entry, Oracle has created a new index entry with a rowid range of just 8 consecutive rowids.

This means we should now be able to add an additional 7 rows with a CODE value of 1, which providing they can all fit within the same table block, will be associated with this allocated rowid range.

SQL> begin
2  for i in 1..7 loop
3  insert into bowie values (1, 1, 'ZIGGY');
4  commit;
5  end loop;
6  end;
7  /

PL/SQL procedure successfully completed.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

Indeed, after inserting these additional rows, no new index entries were required as all the new rows fall within the rowid range of the new index entry. But add one more row …

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1002

Indeed, we now have yet another additional index entry (1002 NUM_ROWS up from 1001).  If we now look at a partial block dump of the index leaf block:

row#1[1819] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 2; (2):  c8 ff
row#2[1798] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 08
col 2; len 6; (6):  02 02 d6 2e 00 0f
col 3; len 1; (1):  00
row#3[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 

We notice we  have yet another index entry (row#2) covering another narrow 8 row range.

Basically prior to 10g, if a new index value didn’t fit within an existing bitmap index rowid range for the corresponding indexed value, a new index entry is added and all the overheads associated with it, with a default range of just 8 rowids. This is an extremely costly process as the new index entry not only has to store the index value itself again, but additionally 2 rowids and corresponding length bytes and the necessary bitmap string column as well. Additionally, in all likelihood, the rows in the specific rowid range are quite likely to contain differing values in the index columns and so will in turn require new index entries. I’ve just inserted a whole bunch of CODE values of 1 in the above example. In a randomly inserted column value, this might be quite a rare event as each subsequent indexed value is quite likely to differ for each new consecutive row.

If we create a table and index first and then populate the table with the 1 million rows, the number of index entries and size of index itself becomes huge …

SQL> create table radiohead (id number, code number, name char(5));

Table created.

SQL> create bitmap index radiohead_code_i on radiohead(code);

Index created.

SQL> begin
2  for i in 1..1000 loop
3    for j in 1..1000 loop
4     insert into radiohead values (j, j, 'BOWIE');
5     commit;
6    end loop;
7  end loop;
8  end;
9  /

PL/SQL procedure successfully completed.

SQL> analyze index radiohead_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        2        5347    1000000

Because the clustering is so bad, each new row for a given index value isn’t within an existing 8 rowid window and so each and every row in the table requires its own bitmap index entry. As a result, the Bitmap index is huge at 5,347 leaf blocks and has a full1M NUM_ROWS. A partital block dump shows just how inefficient this index is:

row#0[4273] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8a 00 00
col 2; len 6; (6):  02 02 d8 8a 00 07
col 3; len 1; (1):  00
row#1[4294] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8c 00 e0
col 2; len 6; (6):  02 02 d8 8c 00 e7
col 3; len 1; (1):  01
row#2[4315] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8f 00 40
col 2; len 6; (6):  02 02 d8 8f 00 47
col 3; len 1; (1):  04
row#3[4336] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 91 01 20
col 2; len 6; (6):  02 02 d8 91 01 27
col 3; len 1; (1):  05
row#4[4357] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 94 00 88
col 2; len 6; (6):  02 02 d8 94 00 8f
col 3; len 1; (1):  00

We can see that each and every indexed value (col 1) in this partial leaf block dump is the same (hex c1 02) with each only having a rowid range (col 1 and col 2) that covers the bare minimum 8 row range. It’s about as inefficient a Bitmap index as you can get with 1000 distinct values in a 1M row table.

If we now however rebuild the Bitmap index:

SQL> alter index radiohead_code_i rebuild compute statistics;

Index altered.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        1         500       1000

We get back to our nice, efficient Bitmap Index structure with just the bare minimum 1000 index entries, all fitting into a fraction of the index leaf blocks (just 500 down from 5,347).

Fortunately, Oracle has got a lot more cleverer since 10g with regard to how it maintains its Bitmap Index structures during DML as I’ll cover soon in Part II.

Concatenated Bitmap Indexes Part II (Everybody’s Got To Learn Sometime) May 12, 2010

Posted by Richard Foote in Bitmap Indexes, CBO, Concatenated Indexes, Oracle Cost Based Optimizer, Oracle Indexes.
2 comments

A basic little post to conclude this discussion.

The issues regarding whether to go for single column indexes vs. concatenated indexes are similar for Bitmap indexes as they are for B-Tree indexes.
 
It’s generally more efficient to access a concatenated index as it’s only the one index with less processing and less throwaway rowids/rows to contend with.  However it’s more flexible to have single column indexes, especially for Bitmap indexes that are kinda designed to be used concurrently, as concatenated indexes are heavily dependant on the leading column being known in queries.
 
If we look at the second table from Part I which had the concatenated index being significantly larger than the sum of the single column indexes, we notice that it can still have a part to play with the CBO. When we run a query that references both columns in predicates:

SQL> select * from bowie2 where id = 42 and code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4165488265
 
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100 |  1200 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     |   100 |  1200 |    21   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BOWIE2_3_I |       |       |            |          |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("ID"=42 AND "CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        103  consistent gets
         26  physical reads
          0  redo size
       3030  bytes sent via SQL*Net to client
        482  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
    

 
The CBO favours the concatenated index with the total number of consistent gets at 103. This despite the fact the concatenated index has some 10,000 distinct entries and is somewhat larger than the sum of the single column indexes. If we now drop the concatenated index and re-run the same query:
 
  
 

SQL> drop index bowie2_3_i;
 
Index dropped.
 
SQL> select * from bowie2 where id = 42 and code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2338088592
 
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100 |  1200 |    22   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     |   100 |  1200 |    22   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|   3 |    BITMAP AND                |            |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| BOWIE2_1_I |       |       |            |          |
|*  5 |     BITMAP INDEX SINGLE VALUE| BOWIE2_2_I |       |       |            |          |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("ID"=42)
   5 - access("CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        105  consistent gets
          0  physical reads
          0  redo size
       3030  bytes sent via SQL*Net to client
        482  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
 

 

The CBO can use a BITMAP AND operation by accessing and ANDing the associated bitmap columns from both single column indexes. However this is little less efficient than using the single concatenated index (105 vs 103 consistent gets) even though the concatenated index is somewhat larger than the other 2 indexes combined as Oracle needs to access and process two Bitmap index segments, not one. However as is very common, note in both examples, most of the consistent gets are in relation to getting the 100 rows out of the table, not so much with regard to the indexes themselves.
 
However, it we just reference the CODE column in a predicate:

SQL> select * from bowie2 where code = 42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2522233487

-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            | 10000 |   117K|   489   (1)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     | 10000 |   117K|   489   (1)| 00:00:03 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BOWIE2_2_I |       |       |            |          |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2861  consistent gets
          0  physical reads
          0  redo size
     257130  bytes sent via SQL*Net to client
       7742  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed
 

   

Providing it’s cheaper than other alternatives, the single column bitmap index can be considered and used by the CBO. However, if we only had the previous concatenated index:

SQL> drop index bowie2_1_i;

Index dropped.

SQL> drop index bowie2_2_i;

Index dropped.

SQL> create bitmap index bowie2_3_i on bowie2(id,code) pctfree 0;

Index created.

SQL> select * from bowie2 where code = 42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1495904576

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 10000 |   117K|   497   (6)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE2 | 10000 |   117K|   497   (6)| 00:00:03 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3011  consistent gets
       2343  physical reads
          0  redo size
     165134  bytes sent via SQL*Net to client
       7742  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

 

As the leading column is not specified, the concatenated Bitmap index is ineffective and the CBO decides to use a FTS. So it’s a similar scenario as with B-tree indexes.

A concatenated Bitmap index can potentially use less or more space than corresponding single column Bitmap indexes, it depends on the number of index entries that are derived and the distribution of the data with the table. However regardless, a concatenated Bitmap index can still be a viable alternative if at least the leading column is specified and be the more efficient option if all columns are generally specified, even if the overall size of the index is somewhat greater than the sum of the alternative single column Bitmap indexes. Then again, it’s less flexible and may not be considered if the leading column is not referenced.

If columns are generally all specified in SQL predicates, then combining them all in a single concatenated Bitmap index is a viable option. It all depends. Understanding why it depends is of course important in making the correct decision with regard which way to go with Bitmap indexes …

Concatenated Bitmap Indexes Part I (Two Of Us) May 6, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Clustering Factor, Concatenated Indexes, Oracle Indexes.
6 comments

Although Bitmap Indexes are commonly created on one column, you can create multi-column, concatenated Bitmap indexes as well.
 
Many of the same issues and factors in deciding to create a single, multi-column index vs. several, single column indexes apply to Bitmap indexes as they do with B-Tree indexes, although there are a number of key differences to consider as well.
 
The first difference to note is that a bitmap index doesn’t have as many index entries as there are rows in the table (with not null values), as with B-Tree indexes. A Bitmap index can potentially just have only as many index entries as there are distinct values for the indexed columns. This is one of the main reasons why Bitmap indexes can be considerably smaller than equivalent B-Tree indexes. A newly created Bitmap index only needs to have multiple index entries for the same column value if the associated index entry is greater than 1/2 a block size. If an index entry were to be larger than 1/2 a block size, Oracle creates another Bitmap index “piece” for the same indexed value, with a bitmap column covering a different range of rowids. (Note: additional Bitmap index pieces can be created based on subsequent DML, especially in earlier versions of Oracle. To be discussed at a later point in time).
 
Another thing to note regarding a concatenated Bitmap index is that the potential number of index entries is a product of distinct combinations of data of the indexed columns. For example, if two columns have 100 distinct values each, then as separate Bitmap indexes, they potentially may have as few as 100 index entries each. However, when combined as a concatenated Bitmap index, there may be a minimum of just 100 index entries only if there are just 100 different combinations of data between the columns (ie. there is a 1 to 1 relationship between column values). If there are however say 100 x 100 = 10,000 maximum combinations of data, there will be 10,000 index entries as a minimum in the associated concatenated Bitmap index.
 
A key point though is that if there are many more combinations of data when stored in a concatenated index, the occurrence of each distinct value will be far less within the table, meaning there will be many more “0” (not true) values in the corresponding bitmap column for each index entry and so can be compressed more effectively and likely use substantially less space than a corresponding index entry in a single column Bitmap index.
 
A simple little example to illustrate. To start, I’m going to create a 1M row table that has an ID and a CODE Colum, each with 100 distinct values. 

In this first example, there’s an implicit 1 to 1 relationship between these 2 columns (eg. they always have the same corresponding values) such that there’s also only 100 distinct combinations of ID and CODE. Additionally, the values are distributed evenly throughout the table so the effective clustering of the data in relation to the index is awful.
  

SQL> create table bowie as select mod(rownum,100)+1 id, mod(rownum,100)+1 code,'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_1_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_2_i on bowie(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_3_i on bowie(id,code) pctfree 0;
 
Index created.

  

If we look at the size and characteristics of these indexes we notice a couple of interesting points:  

 
SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE_1_I', 'BOWIE_2_I', 'BOWIE_3_I');

INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE_1_I          200        400
BOWIE_2_I          200        400
BOWIE_3_I          200        400

   
Firstly, even though there are only 100 distinct values for each indexed column or columns, there are actually 400 index entries in these indexes. This means there are on average 4 Bitmap index pieces for each distinct indexed value. Oracle can’t fit the associated index entry for a specific value within 1/2 a block (in this example, the block size is 8k). In fact, it takes approximately two 8K index leaf blocks it fit all the index data for a specific value and it therefore requires 4 Bitmap index pieces per indexed value.
 
If we look at a partial block dump of a leaf block from say the BOWIE_1_I:
 

  
row#0[4089] flag: ------, lock: 0, len=3947
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b5 09 00 60
col 2; len 6; (6):  01 c2 b8 f3 00 3f
col 3; len 3926; (3926):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
... 
   
row#1[142] flag: ------, lock: 0, len=3946
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b8 f3 00 a0
col 2; len 6; (6):  02 03 62 5d 00 1f
col 3; len 3925; (3925):
 01 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7
 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c5 18 61 5d 61 c3 1b 5f 63 5f c1
 ...
 

 

We notice there are actually 2 index entries within the block. Each index entry is for the same indexed value (both col 0 have identical values) but because the associated bitmap column (col 3) is so large (3926 and 3925 byes respectively), we need 4 index entries on average to store the bitmap data for each specific indexed value in order for each piece to not exceed the 1/2 block size limit.
 
Remember, for 100 distinct values in a 1M row table, that’s approximately 10,000 occurrences for each distinct value that need to somehow be mapped within the index. Oracle needs 4 index entries per value to fit the necessary bitmap information with the index such that no single index entry exceeds the 1/2 block size limit.  

The next thing to note is that the size of the concatenated Bitmap index is actually about the same size of each of the individual single column Bitmap index (200 leaf blocks). Therefore, the overall storage required to store the two columns in one Bitmap is only half of that required to store the columns as separate Bitmap indexes.
 
If we look at a partial dump of the concatenated index:

    
row#0[4091] flag: ------, lock: 0, len=3945
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 b5 09 00 60
col 3; len 6; (6):  01 c2 b8 f2 00 57
col 4; len 3921; (3921):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
...
 

 

We notice the overall length of an index entry is practically identical to those of the single column index. The storage required to store the additional indexed column (col 1) within the index entry is just 3 byes in the above example, 2 bytes for the numeric index value and 1 byte for its length. Considering the length of the bitmap column (col 4) is in the order of 3920 bytes for each index entry, an additional 3 bytes here or there is trivial and so doesn’t impact the overall size of the Bitmap index.  

OK, let’s look at a slightly different example. The table is again 1M rows with the overall data being similar. However, I’m making a few subtle differences. Firstly, the data for the ID is actually perfectly clustered and is ordered in exactly the same manner as the index. Additionally, the distribution of data between the columns is such that there are now 100 x 100 = 10,000 distinct combinations of ID and CODE values.
 
  

 
SQL> create table bowie2 (id number, code number, name char(5));
 
Table created.
 
SQL> begin
  2  for i in 1..100 loop
  3    for x in 1..100 loop
  4      for y in 1..100 loop
  5        insert into bowie2 values (x, y, 'BOWIE');
  6      end loop;
  7    end loop;
  8  end loop;
  9  commit;
 10  end;
 11  /
PL/SQL procedure successfully completed.
 

SQL> create bitmap index bowie2_1_i on bowie2(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_2_i on bowie2(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_3_i on bowie2(id,code) pctfree 0;
 
Index created.

        

If we look at the size and characteristics of the these Bitmap indexes:
     

SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE2_1_I', 'BOWIE2_2_I', 'BOWIE2_3_I');
 
INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE2_1_I          25        100
BOWIE2_2_I         200        400
BOWIE2_3_I         417      10000

    
   
We see a number of key differences when compared to the Bitmap indexes in the first example. Firstly, the Bitmap index for the ID column is tiny, just 25 leaf blocks compared to the 200 leaf blocks required previously. Additionally, there are only 100 index entries, rather than the 400 previous index entries. This is due to the fact the data is perfectly clustered within the table and as such, all the “1”s (true) are all grouped together and all the “0”s (false) are likewise grouped together and can be compressed very efficiently. The overall size of the bitmap has now reduced such that it can all fit comfortably within 1/2 a block and so just the 1 index entry is necessary for each indexed value.
 
If we look at a partial block dump of the ID Bitmap index:

  
   

 
row#0[6218] flag: ------, lock: 0, len=1818
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 c6 16 00 d8
col 2; len 6; (6):  02 03 78 18 01 3f
col 3; len 1797; (1797):
 cf f0 ff ff ff ff ff ff ff cc ff ff ff ff ff fc de 10 80 ff ff ff 07 ff 21
 ff ff ff ff ff ff ff ff c8 ff ff de 10 80 ff ff ff ff ff ff ff cd ff ff ff
 ff ff 07 ff de 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 07 f8 21 07 ff de
 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 3f ff fd 19 c0 ff ff ff ff ff ff
 ff cd ff ff ff ff ff 03 ff de 10 fe ff ff ff ff ff ff ff cc ff ff ff ff 1f
...

 
  

We see that the bitmap column (col 3) is now only 1797 bytes and the overall index entry is 1818 bytes, which comfortably fits within 1/2 an 8K block.
  

The Bitmap index for the CODE remains unchanged because its values are still poorly clustered and distributed throughout the entire table.
 
The concatenated Bitmap index is now significantly larger than is was previously (417 leaf blocks, up from 200 leaf blocks), primarily because we now have 10,000 distinct values to deal with rather than just 100. However, as the occurrences of each of these 10,000 distinct values is so much rarer (only 100 occurrences per distinct combination of values), the associated bitmap column is going to be relatively small and well compressed for each Bitmap index entry.
 
If we look at a complete index entry from a partial block dump of the concatenated Bitmap index:

  
  

 
row#0[7710] flag: ------, lock: 0, len=326
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 c6 16 00 d8
col 3; len 6; (6):  02 03 78 18 00 d7
col 4; len 302; (302):
 04 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c4 d8 10 c4 80 11
 c7 d8 10 c3 f8 19 c3 80 11 c6 d8 10 c1 d9 10 c1 80 11 c5 f7 19 c0 d9 10 c0
 80 11 c3 d8 10 c3 80 11 c2 d0 19 c2 80 11 c5 d8 10 c5 80 11 c0 d9 10 c4 f7
 19 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c1 80 11 c4 d8 10
 c7 d8 10 c0 87 09 c3 d8 10 c3 80 11 c6 d8 10 c6 80 11 c1 d9 10 c5 de 08 c5
 80 11 c0 d9 10 c0 80 11 c3 d8 10 c3 80 11 c0 a8 f4 ec bb 01 c3 d8 10 c6 d8
 10 c6 80 11 c1 d9 10 c1 80 11 c5 de e4 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80
 11 c6 d8 10 c7 86 09 c2 d9 10 c2 80 11 c5 d8 10 c0 d9 10 c0 80 11 c4 de 08
 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c5 d8 10 c6 86 09 c1 d9 10 c1 80 11 c4
 d8 10 c4 80 11 c7 d8 10 c0 87 09 c3 d8 10 c6 d8 10 c6 80 11 c1 d9 10 c1 80
 11 c5 de 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80 11 c6 d8 10 c7 86 09 c2 d9 10
 c2 80 11 c5 d8 10 c0 d9 10 c4 f7 19 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c6
 f7 19

 
  

We notice the index entries are relatively small at just 326 bytes even though we have 2 indexed columns (col 0 and col 1). The size of the bitmap column (col 4) is just 302 bytes, only a fraction of that of the single column Bitmap indexes. With so few 1s (true) to contend with, the resultant bitmap is far more efficiently compressed than with the single Bitmap column indexes as it has far more 0s (false) than can be effectively compressed.

A concatenated Bitmap index can potentially use less or more space than corresponding single column indexes, it depends on the number of index entries that are derived and the distribution of the data with the table.

I’ll post Part II in the next few days.

So What Is A Good Cardinality Estimate For A Bitmap Index Column ? (Song 2) April 13, 2010

Posted by Richard Foote in Bitmap Indexes, Non-Unique Indexes, Oracle Indexes, Oracle Myths.
16 comments

As I’ve discussed previously, using a Bitmap index on a unique column makes little sense as the underling index must be larger than a corresponding B-tree index due to the implicit additional overheads associated with Bitmap indexes. As such, Oracle doesn’t permit the use of a Bitmap Index on a declared unique column or to police a unique constraint.  Therefore, some amount of index entry duplication is necessary for a Bitmap index to be considered.

However, an interesting question is how much duplication is actually necessary ? At what point does a Bitmap index have the potential to be equivalent or better than a corresponding B-Tree index ?  The answer will perhaps surprise many, especially those that only consider Bitmap Indexes to be viable for so-called “low cardinality”columns on large tables where there could be many millions of occurrences of each distinct value in a column.
 
If one actually looks at what comprises an index entry in each type of index and understands somewhat how the bitmap column is comprised and effectively compressed, the rough ballpark answer becomes quite easy to determine.
 
Remember, for a non-compressed, non-unique B-Tree index, an index entry comprises:
 
The indexed column or columns (however long the index column values might be)
6 bytes for the rowid (assuming a local index or index on non-partitioned tables)
2 bytes for flag and lock bytes
1 byte for each index column (a minimum of 2 bytes)
 
So that’s 10 bytes plus the size of the actual indexed column for a single column B-Tree index. The key point here however is that there’s an index entry for each and every not null index value.
 
For a Bitmap index, an index entry comprises:
 
The index column(s)
2 x 6 byte rowid
2 bytes for flag and lock bytes
1 byte for each index column (a minimum of 4 bytes)
? bytes for the actual bitmap sequence
 
So the additional overheads comes down to the additional 6 byte rowid and the length of the bitmap column. The key point here though is that there may only need be one index entry (or Bitmap index “piece”) for each distinct indexed value. However, if the number of occurrences of each index value is very low (eg: say single figures), then it’s almost certain only one bitmap index entry (piece) would be necessary for each indexed value.
 
The number of bytes required for the actual bitmap column depends on many factors, including the number of occurrences of each indexed value and the clustering of the data in the table. However again, if the number of occurrences of each index value is very low (eg: say single figures), it means the vast majority of bits are 0 (false) within each bitmap sequence and so can be compressed extremely efficiently. If there are only a handful of 1 (true) bits within a bitmap index entry, the bitmap column is going to be tiny and effectively compressed to only a few bytes.
 
Therefore, the actual additional overheads for a bitmap index with few repeated values is only the 7 byte overhead for the additional rowid and its length and a few bytes for the actual bitmap column. But remember, this single bitmap index entry can cater for all occurrences of the indexed column, whereas the B-Tree index requires an index entry for each and every occurrence of the index column.
 
It doesn’t take much for these additional overheads for each Bitmap index entry to start to cancel out …
 
If we look at an indexed column (say length 4 bytes) that has on average just the one duplicate value:
 
Total for a B-Tree Index would be:
 
4 bytes index column
6 bytes rowid
2 bytes for each index column length byte(remembering the rowid is an index column in a non-unique index)
2 bytes for flag and lock bytes
 
= 14 bytes x 2 (for each index entry as there’s a duplicate value) = 28 bytes in total
 
For a corresponding Bitmap index:
 
4 bytes index column
12 bytes for the two rowids
4 bytes for each index column length byte (remembering the rowids and bitmap sequence are effectively additional indexed columns)
2 bytes for flag and lock bytes
2 bytes is all it takes for the bitmap sequence column (if there’s only 2 actual true bits per index entry)
 
= 24 bytes in total.
 
So we’re already in a position for a Bitmap index to potentially be the smaller and more efficient index type, even when there’s only just one duplicate on average per index column value …
 
If we have another duplicate value (so there are on average 3 occurrences of each index value), then the overheads for such a B-Tree becomes:
 
3 x 14 bytes = 42 bytes
 
but the overheads for the bitmap index only increases by a byte or so for the necessary increase in the bitmap column. So the difference in space between the two index types starts to widen significantly.
 
Obviously, the size of the index column becomes a factor in the potential savings with a Bitmap index as it only has to potentially store the index column once whereas the (non-compressed) B-Tree index needs to store all occurrences of the index value. To illustrate the comparative differences between a B-Tree and a Bitmap Index, I’m going to create various tables with columns that have different levels of cardinality and different clustering attributes for their indexed columns and compare the size differences between B-Tree and Bitmap indexes. The indexed column will simply be a small NUMBER type column to make it just that bit harder for the Bitmap index to be the more efficient.

In the first example, 1/3 of all values are unique while the remaining 2/3 of values have just 2 occurrences of each value. The column is certainly not unique but is arguably “approaching” uniqueness.
Initially, the indexed column is very well clustered within the table (although with a bitmap index, the clustering factor in the index statistics is useless as it simply denotes the number of index entries within the index).
 

SQL> create table bowie as select rownum id, ceil(rownum/1.5) key, 'David Bowie' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> select count(distinct key) from bowie;
 
COUNT(DISTINCTKEY)
------------------
            666667
 
SQL> create bitmap index bowie_bitmap_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BITMAP_I           2129            666667
 
SQL> drop index bowie_bitmap_i;
 
Index dropped.
 
SQL> create index bowie_btree_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BTREE_I            1999              3738

 

Although some may claim such a Bitmap index, one with 666,667 distinct values in a 1 million row table would be thousands of times larger and less efficient than an equivalent B-Tree index, it’s actually quite a close call. The bitmap index is only 124 leaf blocks different or approximately 6.5% larger in size than the B-Tree index.
 
If we create an equivalent table but this time with the clustering of the data all over the place:
 
 


SQL> drop table bowie2;
 
Table dropped.
 
SQL> create table bowie2 as select * from bowie order by mod(id, 100);
 
Table created.
 
SQL> create bitmap index bowie2_bitmap_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BITMAP_I          2246            666667
 
SQL> drop index bowie2_bitmap_i;
 
Index dropped.
 
SQL> create index bowie2_btree_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BTREE_I           1999            999722

 

We notice that the B-Tree index remains the same size but the Bitmap index is now a little larger by 117 additional leaf blocks. So even with an awful clustering, the Bitmap index is only 247 leaf blocks or approximately 12.3% larger than the B-Tree index. So the B-Tree just wins out in this case, the column is still just that bit too unique for the Bitmap index where we have 666,667 distinct values in a 1 million row table.
 
OK, let’s see how the indexes compare when the index column has on average 1 duplicate for each and every indexed value. There are just 2 values for each and every indexed value, 500,000 distinct values in a 1 million row table: 
 

 
SQL> drop table bowie;

Table dropped.

SQL> create table bowie as select rownum id, ceil(rownum/2) key, 'David Bowie' name from dual connect by level <= 1000000;

Table created.

SQL> create bitmap index bowie_bitmap_i on bowie(key) pctfree 0;

Index created.

SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BITMAP_I';

INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BITMAP_I           1628            500000

SQL> drop index bowie_bitmap_i;

Index dropped.

SQL> create index bowie_btree_i on bowie(key) pctfree 0;

Index created.

SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BTREE_I';

INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BTREE_I            1998              3728

 

OK, now the Bitmap index is well ahead. On a well clustered column that has 500,000 distinct values in a 1 million row table, the B-Tree index is now larger by 370 leaf blocks or by 22.7%. What if the data is poorly clustered:


SQL> drop table bowie2;
 
Table dropped.
 
SQL> create table bowie2 as select * from bowie order by mod(id, 100);
 
Table created.
 
SQL> create bitmap index bowie2_bitmap_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BITMAP_I          1806            500000
 
SQL> drop index bowie2_bitmap_i;
 
Index dropped.
 
SQL> create index bowie2_btree_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BTREE_I           1998            999803

 

OK, the Bitmap index is now larger by 178 leaf blocks than it was before, but the equivalent B-Tree index is still larger by 10.6%.
 
Again, this is on a relatively small, extremely poorly clustered indexed column that has 500,000 distinct values in just a 1 million row table. The Bitmap index is smaller and more efficient than the equivalent B-Tree index.
 
If we now use a column that has on average 4 occurrences for each distinct column value (with 250,000 distinct values in a 1 million row table), the differences between a Bitmap and a B-Tree index begin to widen significantly.
 


SQL> drop table bowie;
 
Table dropped.
 
SQL> drop table bowie2;
 
Table dropped.
 
SQL> create table bowie as select rownum id, ceil(rownum/4) key, 'David Bowie' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_bitmap_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BITMAP_I            829            250000
 
SQL> drop index bowie_bitmap_i;
 
Index dropped.
 
SQL> create index bowie_btree_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BTREE_I            1995              3753

 

On a well clustered column with 250,000 distinct values in a 1 million row table, the Bitmap index is less that 1/2 the size of that of an equivalent B-Tree index. If the data were less so clustered:


SQL> create table bowie2 as select * from bowie order by mod(id, 100);
 
Table created.
 
SQL> create bitmap index bowie2_bitmap_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BITMAP_I';
 
INDEX_NAME      LEAF_BLOCKS CLUSTERING_FACTOR
--------------- ----------- -----------------
BOWIE2_BITMAP_I        1145            250000
 
SQL> drop index bowie2_bitmap_i;
 
Index dropped.
 
SQL> create index bowie2_btree_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BTREE_I';
 
INDEX_NAME      LEAF_BLOCKS CLUSTERING_FACTOR
--------------- ----------- -----------------
BOWIE2_BTREE_I         1995            999881

 

Even with a really poor clustered index column, the B-Tree index is still some 74% larger than the Bitmap index with some 250,000 distinct values.
 
Despite many claims to the contrary, including the rewrite of the awful Burleson article  that started this series on Bitmap Indexes, where it still claims that “there are rare cases where a bitmap on a column with 10,000 key values might be appropriate” and “in most cases, there will not be a lot of adjacent index values, so it quite rare to see extensive compression“, in reality it’s not that rare at all for a column with “large” numbers of distinct values to be indexed effectively via a Bitmap Index. But, you actually need to understand Bitmap Indexes to appreciated this fact and have at least some understanding on how Oracle stores and compresses the bitmap column. The Burleson description of Bitmap index compression in the above article is totally wrong and so hence are its overall conclusions. 

Here’s an actual, “real life” example. On a 2.2 million row table with a column on people last names, (name columns are often considered way too distinct for Bitmap indexes), there were approximately 6.5 occurrences of each last name on average over the whole table. The most compact B-Tree index on the Last Name column required 5286 leaf blocks but the equivalent Bitmap index only required 2151 leaf blocks, way less than 1/2 the size, even though the clustering factor was terrible at nearly 2 million.
 
As a rough rule, any column that has an on average of just 1 duplicate per distinct column value (just 2 occurrences per distinct value) is a potential candidate for a bitmap index.
 
Bitmap indexes should only be considered in Data Warehouse, low concurrency DML type environments due to their locking implications and certainly pre 10g, Bitmap indexes had growth issues after significant DML changes. However it’s a complete nonsense to suggest that Bitmap indexes should only be considered with columns with “few” distinct values, else things will run 100s of times slower.
 
500,000 distinct values in a 1 million row table is not really that “few” at all is it …

Unique Bitmap Indexes Part II (You Can’t Do That) March 30, 2010

Posted by Richard Foote in Bitmap Indexes, Oracle Indexes, Unique Indexes.
6 comments

As discussed in the previous post, a Bitmap index on a unique column will be larger than a corresponding Btree index due to the additional overheads associated with each index entry (such as the additional rowid, the additional column length bytes and the bitmap column itself). Oracle therefore attempts to protect you from explicitly creating such a “Unique Bitmap” index. 

For example, you can not specify both UNIQUE and BITMAP when creating an index. To do so would make little sense.
  
A bitmap index must therefore be Non-Unique by definition. Any attempt to explicitly create a Unique Bitmap index will fail.
 
SQL> drop index bowie_bitmap_i;
 
Index dropped.
 
SQL> create unique bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
create unique bitmap index bowie_bitmap_i on bowie(id) pctfree 0
              *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

SQL> create bitmap unique index bowie_bitmap_i on bowie(id) pctfree 0;
create bitmap unique index bowie_bitmap_i on bowie(id) pctfree 0
              *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

The CREATE INDEX syntax only caters for either the BITMAP or the UNIQUE option.
 
Although Oracle permits the use of a Non-Unique index to police either a Primary Key (PK) or Unique Key (UK) constraint, a bitmap index is not permitted to police such constraints. Again, it makes little sense having a bitmap index police such constraints as an equivalent Btree index is going to be more efficient.
 
If an existing bitmap index exists on a column, Oracle can not use it to police the constraint:
 
SQL> create bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> alter table bowie add primary key (id);
alter table bowie add primary key (id)
*
ERROR at line 1:
ORA-01408: such column list already indexed
 

Oracle is attempting to create a Btree index to police the new PK constraint but it can’t create it as an existing bitmap index already exists. Oracle will not create a Btree index if the same column list is already indexed.
 
It makes no difference if we if declare the constraint as deferrable (or invalidate) where a Non-Unique index is required:
 

SQL> alter table bowie add primary key (id) novalidate;
alter table bowie add primary key (id) novalidate
*
ERROR at line 1:
ORA-01408: such column list already indexed
 

SQL> alter table bowie add primary key (id) deferrable;
alter table bowie add primary key (id) deferrable
*
ERROR at line 1:
ORA-01408: such column list already indexed
 

Attempting to create a Bitmap index at the same time as the constraint is equally fruitless:

SQL> alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id));
alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id))
                                                           *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

SQL> alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id) deferrable);
alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id) deferrable)
                                                           *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

So definitely, looking at creating a Bitmap index on a unique column is not a sensible thing to attempt both because the resultant bitmap index would be larger than a corresponding Btree index if permitted and because in many scenarios as discussed, Oracle simply won’t let you do it anyways.

OK, so a unique column is not suitable for a Bitmap index. The question remains at what point does it make sense to create a bitmap index ? The answer is reasonably obvious when one understands the structure of both types of indexes although the answer may surprise some folks. I’ll look at this question next …

Unique Bitmap Indexes Part I (Unnatural Selection) March 24, 2010

Posted by Richard Foote in Bitmap Indexes, Index Internals, Oracle Indexes, Unique Indexes.
17 comments

As I’ve discussed previously, a Bitmap index can be considered over a B-tree index (where concurrent DML is not an issue) even if there are potentially tens of millions of distinct values, in a table that has say hundreds of millions of rows.
 
However, if a column is unique or “approaches” uniqueness, then one has gone too far and the bitmap index is going to be larger and less efficient than an equivalent b-tree index. So you wouldn’t consider a bitmap index on a column with a million distinct values if the table itself only has in the vicinity of a million rows as well.
 
To understand why a column approaching uniqueness shouldn’t be considered as a bitmap index, one only needs to understand the structure and corresponding differences of index entries in both bitmap and b-tree indexes.
 
I’ll begin by creating a simple table and populating it with a million rows.


SQL> create table bowie (id number, name varchar2(20));
 
Table created.
 
SQL> insert into bowie select rownum, 'Ziggy Stardust' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.

   

Note that the ID column is unique. We can therefore create a Unique b-tree index:
 


SQL> create unique index bowie_unique_i on bowie(id) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_UNIQUE_I';

INDEX_NAME          BLEVEL LEAF_BLOCKS DISTINCT_KEYS
--------------- ---------- ----------- -------------
BOWIE_UNIQUE_I           2        1875       1000000

 

Note that the unique index has 1875 leaf blocks. If we dump a leaf block and look at some (say 5) of the index entries:
 

row#0[8025] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 00
col 0; len 2; (2):  c1 02
row#1[8014] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 01
col 0; len 2; (2):  c1 03
row#2[8003] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 02
col 0; len 2; (2):  c1 04
row#3[7992] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 03
col 0; len 2; (2):  c1 05
row#4[7981] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 04

 

We notice the length of these first 5 index entries are all 11 bytes (len=11).
 
An index entry from this Unique index basically consists of the indexed value (col 0) which is 2 bytes in length in the above sample plus the following overhead:
 
2 bytes for flags and locks
6 bytes for the rowid
1 byte for the index column length
 
So there’s a total of 9 bytes of overhead per index entry in this index in addition to the index value itself. Note also there’s an index entry for each and every indexed value. This is always the case for a non-compressed b-tree index.
 
If we now compare this with an equivalent  Non-Unique index on the same column:

 
 
SQL> drop index bowie_unique_i;
 
Index dropped.
 
SQL> create index bowie_nonunique_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_NONUNIQUE_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_NONUNIQUE_I          2        1999       1000000
 

 

We notice the index is now somewhat larger than the equivalent Unique index, with there now being 1999 leaf blocks, an increase of 124 leaf blocks. A block dump of a leaf block reveals the key difference:
 

 
row#0[8024] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
row#1[8012] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 01
row#2[8000] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 02
row#3[7988] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 03
row#4[7976] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 04

 

As I’ve discussed previously, Oracle makes the Non-Unique index effectively unique by adding the rowid as an additional indexed column within the index entry (col 1 is this additional index column comprising the rowid). There are therefore 2 columns in the index entry, not just the one (denoted by col 0 and col 1). This ensures all duplicate indexed values are subsequently sorted in rowid order within the index and can be efficiently accessed as necessary.
 
The consequence of this subtle difference is that an additional byte is now required to store the length of the rowid column and so the total overhead increases from 9 bytes to 10 bytes per index entry. The overall length of an index entry has therefore increased from 11 to 12 byes (len=12) and this results in the overall increase of 124 leaf blocks in the index, required to effectively store these additional 1 million bytes.
 
If we now create the index as an equivalent bitmap index:
 

  
 
SQL> drop index bowie_nonunique_i;
 
Index dropped.
 
SQL> create bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_BITMAP_I             2        3124       1000000

  

We now notice the index has increased substantially from 1999 leaf blocks for the Non-Unique index to 3124 leaf blocks.
 
Again, a dump of an index leaf block highlights the reason for the increase:
 

 
row#0[8015] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  00
row#1[7994] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  01
row#2[7973] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  02
row#3[7952] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  03
row#4[7931] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  04
 

 

The index entry structure is now somewhat different. We now have an index that has not 1 column (as in the Unique index) or 2 columns (as in the Non-unique index) but 4 columns in the index entry. As before, we still have the index column value of 2 bytes but we now have the following overheads per index entry:
 
2 bytes for flags and locking (as before)
1 byte for the indexed column length (as before)
6 bytes for a rowid index column (col 1) stating the start of a range of rowids that are covered by the particular index entry
1 byte for the length of this start rowid index column
6 bytes for a rowid index column (col 2) stating the end of a range of rowids that are covered by the particular index entry
1 byte for the length of this end rowid index column
1 byte for the bitmap bit sequence column (col 3) required for all the bits referencing rows within the above rowid ranges
1 byte for the length of this bitmap column
 
So the total overhead for each of the 5 index entries listed above is now 19 bytes, not 9 or 10 bytes as for the equivalent b-tree indexes. The length of an index entry is therefore 21 bytes in total, not 11 or 12 bytes as for the equivalent b-tree indexes.

A few important points to note.
 
As the columns are effectively unique, the number of index entries are the same for both b-tree and bitmap indexes. A key advantage of a bitmap index over a b-tree index is that for each distinct value, a single index entry is sufficient to cater for a range of rowids, potentially covering the whole table. For example, a specific column value with 100 duplicates may only need the one index entry for the column value within a bitmap index, but would require 100 different index entries within a (non-compressed) b-tree. However, as the column in the above example is unique, there are no duplicate values and so this potential saving is not possible in this bitmap index.
 
Notice also the size of the bitmap string for each index entry is actually tiny, just a single byte, even though there are 1 million rows in the table. It doesn’t even look like it’s using a million bits to store the necessary bitmap string information for each index entry. This is because for each index entry, only one bit is ever set to 1 (“true”), all other occurrences are logically false as only 1 row in the table ever has the specific index value. Remember, the column values are effectively unique.

Therefore, Oracle can use a very narrow range of rowid ranges for each index entry and effectively not bother storing details for the vast majority of the possible rowid ranges within the table as there’s only one bit that’s of interest and it only corresponds to a specific part of the table. Even in cases where there might just be a duplicate here or there, most values would be zeros (false) regardless and can be compressed extremely efficiently (topic for another day).

Although many folks commonly think otherwise (see original Burleson article for a perfect example of the following misperception), if a column which is unique or is approaching uniqueness is indexed via a bitmap index, the overheads associated with the bitmap string in the index entry is usually very minimal as by definition most bit values are logically “false” (0), with only the rare “true” (1) bit value needing to be stored and reference.

The issue is not necessarily with the overheads associated with just the bitmap string per se but also with the other overhead components, namely the additional rowid and column length bytes.
 
In short, the bitmap index can’t be any more efficient that use just 1 byte to store the necessary bitmap string information (plus 1 byte for the bitmap string length), however 19 bytes of overhead is the minimum required, mainly because of the requirement to store 2 rowids instead of 1 rowid and for all the additional index column length bytes. If the bitmap index needs to cater for a wider range of rowids and for more occurrences of 1s (true) values, then the overheads associated with the bitmap sequence can of course be much more considerable than the 1 byte (again, a topic for another day).
 
Therefore, the bitmap index is going to be significantly less efficient if the indexed values are unique or near unique as there’s all this additional overhead per index entry without the subsequent savings by not having to store separate index entries for duplicates column values. There needs to be at least some measure of duplication within a column for a bitmap index to have some chance of being the more efficient when compared to an equivalent b-tree index.
 
However, how many duplicate values within a column are actually necessary for a bitmap index to be considered and be viable alternative ? The answer is far fewer than many may think (again see original Burleson article for a common misunderstanding in this respect), although this question will be addressed in a future post on the subject.

Myth: Bitmap Indexes With High Distinct Columns (Supermassive Black Hole) March 3, 2010

Posted by Richard Foote in Bitmap Indexes, CBO, Clustering Factor, Oracle Indexes, Oracle Myths.
34 comments

As discussed in my previous post, it’s a silly myth to suggest a bitmap index should only be used for so-called “low cardinality” columns else the resultant index would be “huge”. I thought it might be worth a quick refresh to see a simple little demonstration why such claims are a nonsense.  There is in fact no limit to the number of distinct values by which a column could be considered a candidate for a bitmap index.  

I’ve already shown how a bitmap index on a column with 10,000 distinct values could potentially be smaller than an index with just 4 distinct values, even with the same data type and size. The number of distinct values in a column is but one small consideration, the number of rows in the table and the average ratio of repeated values are just as important. The other important consideration that can significant impact the size of a bitmap index is the distribution and clustering of the data within the table as we shall see.    

Using the same demo as the previous post, a reminder of the size of our bitmap index on a column with 10,000 distinct values in a 1 million row table.
 

SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_CODE_BITMAP_I';
 
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ------- -----------
BIG_BOWIE_CODE_BITMAP_I   BITMAP             10000       1          52
 

  

 
Let’s now see if the CBO will actually use this bitmap index on its own: 

  

SQL> select * from big_bowie where code = 42;

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4280385727

--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   100 |  7300 |    22   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_BOWIE               |   100 |  7300 |    22   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                         |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_BOWIE_CODE_BITMAP_I |       |       |            |          |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
 

Absolutely it does. As there are on average just 100 rows per distinct value, that's a small enough selectivity for the CBO to use the bitmap index on its own. Note the query has used just 6 consistent gets to return the 100 rows of data.
 
Let's now drop the bitmap index and see how a regular B-Tree index would compare and perform:
 

  
SQL> drop index big_bowie_code_bitmap_i;
 
Index dropped.
 
SQL> create index big_bowie_code_i on big_bowie(code);
 
Index created.
 
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks, clustering_factor as cf from dba_indexes where index_name = 'BIG_BOWIE_CODE_I';
 
INDEX_NAME           INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS         CF
-------------------- ---------- ------------- ------- ----------- ----------
BIG_BOWIE_CODE_I     NORMAL             10000       2        2090      10895

  

The first thing we notice is that the B-Tree index is significantly larger than the equivalent bitmap index. 1090 leafs blocks whereas the bitmap index was only 56 leaf blocks. So it's not the bitmap index that's so-called "huge" here on a column with 10,000 distinct values but the corresponding B-Tree index. Notice also that the Clustering Factor of the index is quite good at 10,895 in a 1 million row table.   

If we now run the same query as before: 

 
SQL> select * from big_bowie where code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1856845120
 
------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |   100 |  7300 |     5 (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_BOWIE        |   100 |  7300 |     5 (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BIG_BOWIE_CODE_I |   100 |       |     3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
 
 

 

The query also uses the index but as the B-Tree index is somewhat larger, the overall number of consistent reads has increased from 6 up to 8. So not only is the bitmap index substantially smaller despite having 10,000 distinct values, but it's also more efficient to use as a result.A key reason why the bitmap index is so small and compact is because the effective Clustering Factor of the indexed column is excellent. The data was inserted into the table in CODE column order and so all the values with the same CODE value are grouped, ordered and clustered together within the table. Within the bitmap index, this means there are large and few grouping of zeros (0) that can be compressed extremely efficiently. 

For example, for the first CODE value of 1, the bitmap sequence would look something like: 

111111 .... 1110000000000000000000000000000000........000000 

with the 100 values of 1 (true) grouped together followed only by a series of effectively 999,900 zeros (o representing false). The 999,900 zeros can be compressed back almost nothing at all. Note there could actually be somewhat more false (0) values than actual rows in the table but that's a discussion for another day. 

The next value of 2 would have a bitmap sequence something like: 

00000000....0001111111...11111100000000000000000000...0000 

with 100 zeros followed by 100 1s followed by effectively 999,800 zeros, with again the two grouping of zeros compressed down to almost nothing at all. 

And so on ... 
Let's now create a different table that contains the identical data but  this time with the CODE values effectively randomized throughout the table. The Clustering Factor of the CODE column in this case will be terrible: 

 
 
SQL> create table big_bowie_2 as select * from big_bowie order by mod(id,100);
 
Table created.
 
SQL> create bitmap index big_bowie_2_code_bm_i on big_bowie_2(code);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BIG_BOWIE_2', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_2_CODE_BM_I';
 
INDEX_NAME             INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS
---------------------- ---------- ------------- ------- -----------
BIG_BOWIE_2_CODE_BM_I  BITMAP             10000       1         520
 

 

The first thing we notice is that the bitmap index is now substantially larger than it was previously. It's gone from just 52 leaf blocks all the up to 520 blocks, a whole 10 times larger than before. The reason is all to do with the clustering of the data as now values for CODE are littered all over the table and are no longer grouped together. 

The bitmap sequence for the first CODE value of 1 would now look something like: 

00000000000100000000000000000..0000010000000000000000...0000010000000000001000000000...00000100000.... 

with the 1s now littered all over the place. This means it's far less efficient to compress the groups of zeros as there are now substantially more such groupings than before. The index is now 10 times larger as a result. 

If we now run the same query as before: 

  

SQL> select * from big_bowie_2 where code = 42;
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1324437154
 
------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                       |   100 |  7300 |  22     (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_BOWIE_2           |   100 |  7300 |  22     (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                       |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_BOWIE_2_CODE_BM_I |       |       |            |          |
------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("CODE"=42)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        103  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
 

 

 Although the CBO is still using the bitmap index on its own, the performance of the query has deteriorated substantially with the number of consistent gets increasing from 6 all the way up to 103.  

So the bitmap index is now nowhere near as efficient. A bitmap index isn't that great with large numbers of distinct values if the associated clustering is poor and so the B-Tree index is the way to go after all, right ? 

Well just wait a minute. If the clustering is poor for a bitmap index, the clustering will likewise be poor for a corresponding B-Tree index as well.  Most of these consistent reads are due to reading the data out of the table, not from reading the index directly. So the performance of using an equivalent B-Tree index is also likely to be impacted as well. 

Let's compare the difference with a B-Tree index: 

   
SQL> drop index big_bowie_2_code_bm_i;
 
Index dropped.
 
SQL> create index big_bowie_2_code_i on big_bowie_2(code);
 
Index created.
 
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks, clustering_factor as cf from dba_indexes where index_name = 'BIG_BOWIE_2_CODE_I';
 
INDEX_NAME             INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS         CF
---------------------- ---------- ------------- ------- ----------- ----------
BIG_BOWIE_2_CODE_I     NORMAL             10000       2        2090     999922
 
 

 

The first thing to note here is that the B-Tree index is still 2090 leaf blocks in size. Even compared with the now far less efficient Bitmap index, at 520 leaf blocks it's still approximately just 25% the size of the B-Tree index. So the 10,000 distinct value bitmap index, even when it's as inefficient as possible, still can't be described as "huge"  here as it's still only a 1/4 the size of the corresponding B-Tree index. With a Clustering Factor of 999,992 on a 1 million rows table, it doesn't really get much worse than that and yet the Bitmap index on a 10,000 distinct column is still way ahead of the B-Tree index.  

Let's see how the query performs now: 

 
SQL> select * from big_bowie_2 where code = 42;
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 2550134594
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |   100 |  7300 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_BOWIE_2        |   100 |  7300 |   103   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BIG_BOWIE_2_CODE_I |   100 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE"=42)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        105  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As we can see, the performance of the B-Tree index has likewise deteriorated with such a bad Clustering Factor. At 105 consistent gets, it's still worse than the corresponding Bitmap index which only needed 103 consistent gets. 

With a Bitmap index that is as inefficient as it can be, on a column that has 10,000 distinct values in a table of only 1 million rows, the Bitmap index still outperforms the corresponding B-Tree index. 

It's a complete myth and utter nonsense that a bitmap index is only suitable for low cardinality columns and would become "HUGE" and be 100s of times slower if created on so-called high cardinality column 

Let me repeat: There is no limit to the number of distinct values in a column for it to be considered for a Bitmap index.

Myth: Bitmap Indexes With High Distinct Columns (Blow Out) February 18, 2010

Posted by Richard Foote in Bitmap Indexes, Oracle Indexes, Oracle Myths.
81 comments

I just couldn’t resist.  

One of the great myths in Oracle is that bitmap indexes are only suitable and should only be used with columns that have so-called low cardinality (few distinct) values. A classic example of this myth being propagated is in this latest Burleson news item, “Oracle bitmap index maximum distinct values“, from Feb 16 2010 (article itself updated January 8, 2010): 

It says “As the number if distinct values increases, the size of the bitmap increases exponentially, such that an index with 100 values may perform thousands of times faster than a bitmap index on 1,000 distinct column values.” 

It also mentions some so-called rules of thumb whereby: 

“1 – 7 distinct values – Queries against bitmap indexes with a low cardinality are very fast.

8-100 distinct values – As the number if distinct values increases, performance decreases proportionally.

100 – 10,000 distinct values – Over 100 distinct values, the bitmap indexes become huge and SQL performance drops off rapidly.

Over 10,000 distinct values – At this point, performance is ten times slower than an index with only 100 distinct values”

Now this of course is all generalised nonsense. Not only can a column with 10,000+ distinct values be perfect as a bitmap index but it can be considerably smaller than a bitmap index with only a handful of distinct values, even with columns of the same size and data type

A very simple example to demonstrate. First, I’m going to create a table with 1 million rows and have a CODE column that has 10,000 distinct values and a TYPE column with just 4 distinct values:

SQL> create table big_bowie (id number, code number, type number, name varchar2(100));
Table created.
SQL> declare
  2  i number;
  3  begin
  4  i:=0;
  5  for j in 1..10000 loop
  6     for k in 1..100 loop
  7      i:=i+1;
  8      insert into big_bowie values (i, j, mod(k,4)+1, 'The Rise And Fall Of Ziggy Stardust And The Spiders From Mars');
  9     end loop;
 10  end loop;
 11  commit;
 12  end;
 13  /
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BIG_BOWIE', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.

 
OK, I’m just going to create a standard B-Tree index on the TYPE column and see how large it is:

SQL> create index big_bowie_type_i on big_bowie(type) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_TYPE_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_TYPE_I          NORMAL                 4          2        1752

 

OK, so it’s a BLEVEL 2 index with 1752 leaf blocks. Let’s now compare it with an equivalent bitmap index. As the column only has 4 distinct values, it should be perfect as a bitmap index and much smaller than the B-Tree:


SQL> drop index big_bowie_type_i;
Index dropped.
SQL> create bitmap index big_bowie_type_bitmap_i on big_bowie(type) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_TYPE_BITMAP_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_TYPE_BITMAP_I   BITMAP                 4          1          84

 

Indeed it is smaller. It’s now just 84 leaf blocks in size, down from the previous 1752 leaf blocks. The Blevel has even reduced to just 1.

OK, let’s attempt something really silly and outrageous. Let’s create a bitmap index on the CODE column, a column with 10,000 distinct values. I know, I’m crazy to even suggest such a thing as the resultant bitmap will simply be “HUGE” right ?

Let’s see.


SQL> create bitmap index big_bowie_code_bitmap_i on big_bowie(code) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_CODE_BITMAP_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_CODE_BITMAP_I   BITMAP             10000          1          52

 

Well, would you look at that. It’s not “huge” at all, it’s just a tiny 52 leaf blocks !! In fact, it’s actually smaller and just 62% the size of the TYPE bitmap index that only had 4 distinct values.

So a bitmap index with 10,000 distinct values is actually smaller, more compact and more efficient that a bitmap index with just 4 distinct values !!

Why so small ?

Because the size and efficiency of a bitmap index doesn’t just depend on the number of distinct values but a whole range of other factors as well, not least the size and the clustering of the data in the table. Clue: I inserted the data into my BIG_BOWIE table in a very careful manner. However, one does need to actually understand how bitmap indexes work and how they actually store data to appreciate and determine whether a column is suitable for a bitmap index.

In short, there is no limit to the number of distinct values by which a column is suitable for a bitmap index. You could have a column with 10s of millions of distinct values (in say a billion row table) and a bitmap index might be perfectly suitable and significantly smaller than an equivalent B-Tree index. This is because a B-Tree index needs to store each and every occurence of a column value (unless the index is compressed) as well as a corresponding rowid whereas a Bitmap index might only need to store each distinct column value once with just 2 corresponding rowids. The savings in space can be massive, even if there are relatively few repeated occurences of each distinct value on average.

The next time you unfortunately read that bitmap indexes should only be used with very “few” distinct values and would be “huge” otherwise, well hopefully you’ll appreciate that’s simply not correct.

Bitmap Indexes With Many Distinct Column Values (Wots…uh the deal) February 1, 2008

Posted by Richard Foote in Bitmap Indexes, Oracle General, Oracle Indexes, Oracle Myths.
29 comments

In the seemingly never ending list of 8 things one may not have known about indexes, Number 3 stated:

“Bitmap Indexes can be extremely useful and beneficial even if the column contains thousands of distinct values”.

On the surface, this may seem like a somewhat strange thing to suggest to many folk. It seems to be stated in numerous places that Bitmap indexes are only really beneficial with columns that have low numbers of distinct values.  For example, a column called GENDER (I was going to use another word but I have to deal with enough spam messages as it is :) ) has only a few distinct values, so it would be perfect for a Bitmap Index.

Columns that have say 5 or 10 or maybe 20 distinct values should all be OK. But what if a column has 100 distinct values, that might just be pushing it. A column with 1000 distinct values would obviously be totally inappropriate. I would have to be some kind of deranged fool for even contemplating and suggesting a column with 10,000 distinct values, right !!

A Bitmap Index is actually a B-Tree index in it’s basic structure and shape. It has index branch blocks that point to index leaf blocks that have all the necessary index information stored in an ordered and sorted manner. However, in the leaf blocks, a conventional B-Tree index basically stores the indexed column values followed by its corresponding rowid. A bitmap index differs considerably and basically stores in the leaf blocks for each distinct column, the column value followed by a starting and ending rowid that specifies a range of possible rowids within the table followed by a series of bits that denotes for each possible rowid within the range whether the row contains the column value (1) or not (0). If the index entry is larger than roughly half the index block size, another bitmap “piece” is created for the index entry, specifying a different range of rowids with corresponding bitmaps.

The “biggest” component of the index entry is thus this series of bits. But most of the values will be zeros (as a specific row can only have at most the one value of the column) and all these zeros can be compressed quite efficiently by Oracle within the index entry.

So having lots of distinct column values means having lots of index entries with lots of rowid ranges with lots of bitmaps. So a column with anything approaching 10,000 values would be totally inappropriate, right ?

Well take a look at this demo comparing a B-Tree Index vs. a Bitmap Index for a column that has 10,000 distinct values and you might just be surprised.

The table contains 1 Million rows and one of the columns is a NUMBER field that has 10,000 distinct values (hence a Density value of 1%).

The B-Tree Index required 2,090 leaf blocks to store the index and an equality query returning the expected 100 rows requires 19 consistent reads. Not too bad.

However, the equivalent Bitmap Index required just 56 leaf blocks and an equality query returning the same 100 rows does so with just 11 consistent reads.

Ummm, perhaps bitmaps indexes don’t require such low numbers of distinct column values to be useful after all …

A few points to ponder on from this specific example.

The B-Tree index had to store 1,000,000 index values, once for each and every not null row in the parent table. The Bitmap Index only had to store the index values 10,000 times, once for each unique occurrence of the column value (although there may be times when this ratio may be higher)

The B-Tree index had to stored 1,000,000 rowids, once for each and every index row entry. The Bitmap Index only had to store a pair of rowid values for each unique occurrence of the column value (although there may be times when Bitmap Indexes need to store more than one pair of rowids per index value).

If the rows in the table are clustered together based on the index column value, it means the zeros in the bitmap index can be nice and continuous within the bitmap string and so will compress nicely. Therefore, the manner in which the rows are ordered in the table will have a direct impact in how efficient the Bitmap Index can be compressed.

There’s lots and lots of interesting things to discuss about Bitmap Indexes. For now, just note the next time you read Bitmap Indexes should only be used for columns with few distinct values, you may want to get some clarification on what is meant exactly by “few” …

Follow

Get every new post delivered to your Inbox.

Join 1,852 other followers