jump to navigation

Unique Bitmap Indexes Part II (You Can’t Do That) March 30, 2010

Posted by Richard Foote in Bitmap Indexes, Oracle Indexes, Unique Indexes.
6 comments

As discussed in the previous post, a Bitmap index on a unique column will be larger than a corresponding Btree index due to the additional overheads associated with each index entry (such as the additional rowid, the additional column length bytes and the bitmap column itself). Oracle therefore attempts to protect you from explicitly creating such a “Unique Bitmap” index. 

For example, you can not specify both UNIQUE and BITMAP when creating an index. To do so would make little sense.
  
A bitmap index must therefore be Non-Unique by definition. Any attempt to explicitly create a Unique Bitmap index will fail.
 
SQL> drop index bowie_bitmap_i;
 
Index dropped.
 
SQL> create unique bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
create unique bitmap index bowie_bitmap_i on bowie(id) pctfree 0
              *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

SQL> create bitmap unique index bowie_bitmap_i on bowie(id) pctfree 0;
create bitmap unique index bowie_bitmap_i on bowie(id) pctfree 0
              *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

The CREATE INDEX syntax only caters for either the BITMAP or the UNIQUE option.
 
Although Oracle permits the use of a Non-Unique index to police either a Primary Key (PK) or Unique Key (UK) constraint, a bitmap index is not permitted to police such constraints. Again, it makes little sense having a bitmap index police such constraints as an equivalent Btree index is going to be more efficient.
 
If an existing bitmap index exists on a column, Oracle can not use it to police the constraint:
 
SQL> create bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> alter table bowie add primary key (id);
alter table bowie add primary key (id)
*
ERROR at line 1:
ORA-01408: such column list already indexed
 

Oracle is attempting to create a Btree index to police the new PK constraint but it can’t create it as an existing bitmap index already exists. Oracle will not create a Btree index if the same column list is already indexed.
 
It makes no difference if we if declare the constraint as deferrable (or invalidate) where a Non-Unique index is required:
 

SQL> alter table bowie add primary key (id) novalidate;
alter table bowie add primary key (id) novalidate
*
ERROR at line 1:
ORA-01408: such column list already indexed
 

SQL> alter table bowie add primary key (id) deferrable;
alter table bowie add primary key (id) deferrable
*
ERROR at line 1:
ORA-01408: such column list already indexed
 

Attempting to create a Bitmap index at the same time as the constraint is equally fruitless:

SQL> alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id));
alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id))
                                                           *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

SQL> alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id) deferrable);
alter table bowie add primary key (id) using index (create bitmap index bowie_bitmap_i on bowie(id) deferrable)
                                                           *
ERROR at line 1:
ORA-00968: missing INDEX keyword
 

So definitely, looking at creating a Bitmap index on a unique column is not a sensible thing to attempt both because the resultant bitmap index would be larger than a corresponding Btree index if permitted and because in many scenarios as discussed, Oracle simply won’t let you do it anyways.

OK, so a unique column is not suitable for a Bitmap index. The question remains at what point does it make sense to create a bitmap index ? The answer is reasonably obvious when one understands the structure of both types of indexes although the answer may surprise some folks. I’ll look at this question next …

Unique Bitmap Indexes Part I (Unnatural Selection) March 24, 2010

Posted by Richard Foote in Bitmap Indexes, Index Internals, Oracle Indexes, Unique Indexes.
17 comments

As I’ve discussed previously, a Bitmap index can be considered over a B-tree index (where concurrent DML is not an issue) even if there are potentially tens of millions of distinct values, in a table that has say hundreds of millions of rows.
 
However, if a column is unique or “approaches” uniqueness, then one has gone too far and the bitmap index is going to be larger and less efficient than an equivalent b-tree index. So you wouldn’t consider a bitmap index on a column with a million distinct values if the table itself only has in the vicinity of a million rows as well.
 
To understand why a column approaching uniqueness shouldn’t be considered as a bitmap index, one only needs to understand the structure and corresponding differences of index entries in both bitmap and b-tree indexes.
 
I’ll begin by creating a simple table and populating it with a million rows.


SQL> create table bowie (id number, name varchar2(20));
 
Table created.
 
SQL> insert into bowie select rownum, 'Ziggy Stardust' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.

   

Note that the ID column is unique. We can therefore create a Unique b-tree index:
 


SQL> create unique index bowie_unique_i on bowie(id) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_UNIQUE_I';

INDEX_NAME          BLEVEL LEAF_BLOCKS DISTINCT_KEYS
--------------- ---------- ----------- -------------
BOWIE_UNIQUE_I           2        1875       1000000

 

Note that the unique index has 1875 leaf blocks. If we dump a leaf block and look at some (say 5) of the index entries:
 

row#0[8025] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 00
col 0; len 2; (2):  c1 02
row#1[8014] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 01
col 0; len 2; (2):  c1 03
row#2[8003] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 02
col 0; len 2; (2):  c1 04
row#3[7992] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 03
col 0; len 2; (2):  c1 05
row#4[7981] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 04

 

We notice the length of these first 5 index entries are all 11 bytes (len=11).
 
An index entry from this Unique index basically consists of the indexed value (col 0) which is 2 bytes in length in the above sample plus the following overhead:
 
2 bytes for flags and locks
6 bytes for the rowid
1 byte for the index column length
 
So there’s a total of 9 bytes of overhead per index entry in this index in addition to the index value itself. Note also there’s an index entry for each and every indexed value. This is always the case for a non-compressed b-tree index.
 
If we now compare this with an equivalent  Non-Unique index on the same column:

 
 
SQL> drop index bowie_unique_i;
 
Index dropped.
 
SQL> create index bowie_nonunique_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_NONUNIQUE_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_NONUNIQUE_I          2        1999       1000000
 

 

We notice the index is now somewhat larger than the equivalent Unique index, with there now being 1999 leaf blocks, an increase of 124 leaf blocks. A block dump of a leaf block reveals the key difference:
 

 
row#0[8024] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
row#1[8012] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 01
row#2[8000] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 02
row#3[7988] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 03
row#4[7976] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 04

 

As I’ve discussed previously, Oracle makes the Non-Unique index effectively unique by adding the rowid as an additional indexed column within the index entry (col 1 is this additional index column comprising the rowid). There are therefore 2 columns in the index entry, not just the one (denoted by col 0 and col 1). This ensures all duplicate indexed values are subsequently sorted in rowid order within the index and can be efficiently accessed as necessary.
 
The consequence of this subtle difference is that an additional byte is now required to store the length of the rowid column and so the total overhead increases from 9 bytes to 10 bytes per index entry. The overall length of an index entry has therefore increased from 11 to 12 byes (len=12) and this results in the overall increase of 124 leaf blocks in the index, required to effectively store these additional 1 million bytes.
 
If we now create the index as an equivalent bitmap index:
 

  
 
SQL> drop index bowie_nonunique_i;
 
Index dropped.
 
SQL> create bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_BITMAP_I             2        3124       1000000

  

We now notice the index has increased substantially from 1999 leaf blocks for the Non-Unique index to 3124 leaf blocks.
 
Again, a dump of an index leaf block highlights the reason for the increase:
 

 
row#0[8015] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  00
row#1[7994] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  01
row#2[7973] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  02
row#3[7952] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  03
row#4[7931] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  04
 

 

The index entry structure is now somewhat different. We now have an index that has not 1 column (as in the Unique index) or 2 columns (as in the Non-unique index) but 4 columns in the index entry. As before, we still have the index column value of 2 bytes but we now have the following overheads per index entry:
 
2 bytes for flags and locking (as before)
1 byte for the indexed column length (as before)
6 bytes for a rowid index column (col 1) stating the start of a range of rowids that are covered by the particular index entry
1 byte for the length of this start rowid index column
6 bytes for a rowid index column (col 2) stating the end of a range of rowids that are covered by the particular index entry
1 byte for the length of this end rowid index column
1 byte for the bitmap bit sequence column (col 3) required for all the bits referencing rows within the above rowid ranges
1 byte for the length of this bitmap column
 
So the total overhead for each of the 5 index entries listed above is now 19 bytes, not 9 or 10 bytes as for the equivalent b-tree indexes. The length of an index entry is therefore 21 bytes in total, not 11 or 12 bytes as for the equivalent b-tree indexes.

A few important points to note.
 
As the columns are effectively unique, the number of index entries are the same for both b-tree and bitmap indexes. A key advantage of a bitmap index over a b-tree index is that for each distinct value, a single index entry is sufficient to cater for a range of rowids, potentially covering the whole table. For example, a specific column value with 100 duplicates may only need the one index entry for the column value within a bitmap index, but would require 100 different index entries within a (non-compressed) b-tree. However, as the column in the above example is unique, there are no duplicate values and so this potential saving is not possible in this bitmap index.
 
Notice also the size of the bitmap string for each index entry is actually tiny, just a single byte, even though there are 1 million rows in the table. It doesn’t even look like it’s using a million bits to store the necessary bitmap string information for each index entry. This is because for each index entry, only one bit is ever set to 1 (“true”), all other occurrences are logically false as only 1 row in the table ever has the specific index value. Remember, the column values are effectively unique.

Therefore, Oracle can use a very narrow range of rowid ranges for each index entry and effectively not bother storing details for the vast majority of the possible rowid ranges within the table as there’s only one bit that’s of interest and it only corresponds to a specific part of the table. Even in cases where there might just be a duplicate here or there, most values would be zeros (false) regardless and can be compressed extremely efficiently (topic for another day).

Although many folks commonly think otherwise (see original Burleson article for a perfect example of the following misperception), if a column which is unique or is approaching uniqueness is indexed via a bitmap index, the overheads associated with the bitmap string in the index entry is usually very minimal as by definition most bit values are logically “false” (0), with only the rare “true” (1) bit value needing to be stored and reference.

The issue is not necessarily with the overheads associated with just the bitmap string per se but also with the other overhead components, namely the additional rowid and column length bytes.
 
In short, the bitmap index can’t be any more efficient that use just 1 byte to store the necessary bitmap string information (plus 1 byte for the bitmap string length), however 19 bytes of overhead is the minimum required, mainly because of the requirement to store 2 rowids instead of 1 rowid and for all the additional index column length bytes. If the bitmap index needs to cater for a wider range of rowids and for more occurrences of 1s (true) values, then the overheads associated with the bitmap sequence can of course be much more considerable than the 1 byte (again, a topic for another day).
 
Therefore, the bitmap index is going to be significantly less efficient if the indexed values are unique or near unique as there’s all this additional overhead per index entry without the subsequent savings by not having to store separate index entries for duplicates column values. There needs to be at least some measure of duplication within a column for a bitmap index to have some chance of being the more efficient when compared to an equivalent b-tree index.
 
However, how many duplicate values within a column are actually necessary for a bitmap index to be considered and be viable alternative ? The answer is far fewer than many may think (again see original Burleson article for a common misunderstanding in this respect), although this question will be addressed in a future post on the subject.

Myth: Bitmap Indexes With High Distinct Columns (Supermassive Black Hole) March 3, 2010

Posted by Richard Foote in Bitmap Indexes, CBO, Clustering Factor, Oracle Indexes, Oracle Myths.
34 comments

As discussed in my previous post, it’s a silly myth to suggest a bitmap index should only be used for so-called “low cardinality” columns else the resultant index would be “huge”. I thought it might be worth a quick refresh to see a simple little demonstration why such claims are a nonsense.  There is in fact no limit to the number of distinct values by which a column could be considered a candidate for a bitmap index.  

I’ve already shown how a bitmap index on a column with 10,000 distinct values could potentially be smaller than an index with just 4 distinct values, even with the same data type and size. The number of distinct values in a column is but one small consideration, the number of rows in the table and the average ratio of repeated values are just as important. The other important consideration that can significant impact the size of a bitmap index is the distribution and clustering of the data within the table as we shall see.    

Using the same demo as the previous post, a reminder of the size of our bitmap index on a column with 10,000 distinct values in a 1 million row table.
 

SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_CODE_BITMAP_I';
 
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ------- -----------
BIG_BOWIE_CODE_BITMAP_I   BITMAP             10000       1          52
 

  

 
Let’s now see if the CBO will actually use this bitmap index on its own: 

  

SQL> select * from big_bowie where code = 42;

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4280385727

--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   100 |  7300 |    22   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_BOWIE               |   100 |  7300 |    22   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                         |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_BOWIE_CODE_BITMAP_I |       |       |            |          |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
 

Absolutely it does. As there are on average just 100 rows per distinct value, that's a small enough selectivity for the CBO to use the bitmap index on its own. Note the query has used just 6 consistent gets to return the 100 rows of data.
 
Let's now drop the bitmap index and see how a regular B-Tree index would compare and perform:
 

  
SQL> drop index big_bowie_code_bitmap_i;
 
Index dropped.
 
SQL> create index big_bowie_code_i on big_bowie(code);
 
Index created.
 
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks, clustering_factor as cf from dba_indexes where index_name = 'BIG_BOWIE_CODE_I';
 
INDEX_NAME           INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS         CF
-------------------- ---------- ------------- ------- ----------- ----------
BIG_BOWIE_CODE_I     NORMAL             10000       2        2090      10895

  

The first thing we notice is that the B-Tree index is significantly larger than the equivalent bitmap index. 1090 leafs blocks whereas the bitmap index was only 56 leaf blocks. So it's not the bitmap index that's so-called "huge" here on a column with 10,000 distinct values but the corresponding B-Tree index. Notice also that the Clustering Factor of the index is quite good at 10,895 in a 1 million row table.   

If we now run the same query as before: 

 
SQL> select * from big_bowie where code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1856845120
 
------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |   100 |  7300 |     5 (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_BOWIE        |   100 |  7300 |     5 (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BIG_BOWIE_CODE_I |   100 |       |     3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
 
 

 

The query also uses the index but as the B-Tree index is somewhat larger, the overall number of consistent reads has increased from 6 up to 8. So not only is the bitmap index substantially smaller despite having 10,000 distinct values, but it's also more efficient to use as a result.A key reason why the bitmap index is so small and compact is because the effective Clustering Factor of the indexed column is excellent. The data was inserted into the table in CODE column order and so all the values with the same CODE value are grouped, ordered and clustered together within the table. Within the bitmap index, this means there are large and few grouping of zeros (0) that can be compressed extremely efficiently. 

For example, for the first CODE value of 1, the bitmap sequence would look something like: 

111111 .... 1110000000000000000000000000000000........000000 

with the 100 values of 1 (true) grouped together followed only by a series of effectively 999,900 zeros (o representing false). The 999,900 zeros can be compressed back almost nothing at all. Note there could actually be somewhat more false (0) values than actual rows in the table but that's a discussion for another day. 

The next value of 2 would have a bitmap sequence something like: 

00000000....0001111111...11111100000000000000000000...0000 

with 100 zeros followed by 100 1s followed by effectively 999,800 zeros, with again the two grouping of zeros compressed down to almost nothing at all. 

And so on ... 
Let's now create a different table that contains the identical data but  this time with the CODE values effectively randomized throughout the table. The Clustering Factor of the CODE column in this case will be terrible: 

 
 
SQL> create table big_bowie_2 as select * from big_bowie order by mod(id,100);
 
Table created.
 
SQL> create bitmap index big_bowie_2_code_bm_i on big_bowie_2(code);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BIG_BOWIE_2', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_2_CODE_BM_I';
 
INDEX_NAME             INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS
---------------------- ---------- ------------- ------- -----------
BIG_BOWIE_2_CODE_BM_I  BITMAP             10000       1         520
 

 

The first thing we notice is that the bitmap index is now substantially larger than it was previously. It's gone from just 52 leaf blocks all the up to 520 blocks, a whole 10 times larger than before. The reason is all to do with the clustering of the data as now values for CODE are littered all over the table and are no longer grouped together. 

The bitmap sequence for the first CODE value of 1 would now look something like: 

00000000000100000000000000000..0000010000000000000000...0000010000000000001000000000...00000100000.... 

with the 1s now littered all over the place. This means it's far less efficient to compress the groups of zeros as there are now substantially more such groupings than before. The index is now 10 times larger as a result. 

If we now run the same query as before: 

  

SQL> select * from big_bowie_2 where code = 42;
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1324437154
 
------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                       |   100 |  7300 |  22     (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_BOWIE_2           |   100 |  7300 |  22     (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                       |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_BOWIE_2_CODE_BM_I |       |       |            |          |
------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("CODE"=42)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        103  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
 

 

 Although the CBO is still using the bitmap index on its own, the performance of the query has deteriorated substantially with the number of consistent gets increasing from 6 all the way up to 103.  

So the bitmap index is now nowhere near as efficient. A bitmap index isn't that great with large numbers of distinct values if the associated clustering is poor and so the B-Tree index is the way to go after all, right ? 

Well just wait a minute. If the clustering is poor for a bitmap index, the clustering will likewise be poor for a corresponding B-Tree index as well.  Most of these consistent reads are due to reading the data out of the table, not from reading the index directly. So the performance of using an equivalent B-Tree index is also likely to be impacted as well. 

Let's compare the difference with a B-Tree index: 

   
SQL> drop index big_bowie_2_code_bm_i;
 
Index dropped.
 
SQL> create index big_bowie_2_code_i on big_bowie_2(code);
 
Index created.
 
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks, clustering_factor as cf from dba_indexes where index_name = 'BIG_BOWIE_2_CODE_I';
 
INDEX_NAME             INDEX_TYPE DISTINCT_KEYS  BLEVEL LEAF_BLOCKS         CF
---------------------- ---------- ------------- ------- ----------- ----------
BIG_BOWIE_2_CODE_I     NORMAL             10000       2        2090     999922
 
 

 

The first thing to note here is that the B-Tree index is still 2090 leaf blocks in size. Even compared with the now far less efficient Bitmap index, at 520 leaf blocks it's still approximately just 25% the size of the B-Tree index. So the 10,000 distinct value bitmap index, even when it's as inefficient as possible, still can't be described as "huge"  here as it's still only a 1/4 the size of the corresponding B-Tree index. With a Clustering Factor of 999,992 on a 1 million rows table, it doesn't really get much worse than that and yet the Bitmap index on a 10,000 distinct column is still way ahead of the B-Tree index.  

Let's see how the query performs now: 

 
SQL> select * from big_bowie_2 where code = 42;
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 2550134594
 
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |   100 |  7300 |   103   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_BOWIE_2        |   100 |  7300 |   103   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BIG_BOWIE_2_CODE_I |   100 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE"=42)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        105  consistent gets
          0  physical reads
          0  redo size
       1854  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As we can see, the performance of the B-Tree index has likewise deteriorated with such a bad Clustering Factor. At 105 consistent gets, it's still worse than the corresponding Bitmap index which only needed 103 consistent gets. 

With a Bitmap index that is as inefficient as it can be, on a column that has 10,000 distinct values in a table of only 1 million rows, the Bitmap index still outperforms the corresponding B-Tree index. 

It's a complete myth and utter nonsense that a bitmap index is only suitable for low cardinality columns and would become "HUGE" and be 100s of times slower if created on so-called high cardinality column 

Let me repeat: There is no limit to the number of distinct values in a column for it to be considered for a Bitmap index.

Follow

Get every new post delivered to your Inbox.

Join 1,821 other followers