jump to navigation

Myth: Bitmap Indexes With High Distinct Columns (Blow Out) February 18, 2010

Posted by Richard Foote in Bitmap Indexes, Oracle Indexes, Oracle Myths.
81 comments

I just couldn’t resist.  

One of the great myths in Oracle is that bitmap indexes are only suitable and should only be used with columns that have so-called low cardinality (few distinct) values. A classic example of this myth being propagated is in this latest Burleson news item, “Oracle bitmap index maximum distinct values“, from Feb 16 2010 (article itself updated January 8, 2010): 

It says “As the number if distinct values increases, the size of the bitmap increases exponentially, such that an index with 100 values may perform thousands of times faster than a bitmap index on 1,000 distinct column values.” 

It also mentions some so-called rules of thumb whereby: 

“1 – 7 distinct values – Queries against bitmap indexes with a low cardinality are very fast.

8-100 distinct values – As the number if distinct values increases, performance decreases proportionally.

100 – 10,000 distinct values – Over 100 distinct values, the bitmap indexes become huge and SQL performance drops off rapidly.

Over 10,000 distinct values – At this point, performance is ten times slower than an index with only 100 distinct values”

Now this of course is all generalised nonsense. Not only can a column with 10,000+ distinct values be perfect as a bitmap index but it can be considerably smaller than a bitmap index with only a handful of distinct values, even with columns of the same size and data type

A very simple example to demonstrate. First, I’m going to create a table with 1 million rows and have a CODE column that has 10,000 distinct values and a TYPE column with just 4 distinct values:

SQL> create table big_bowie (id number, code number, type number, name varchar2(100));
Table created.
SQL> declare
  2  i number;
  3  begin
  4  i:=0;
  5  for j in 1..10000 loop
  6     for k in 1..100 loop
  7      i:=i+1;
  8      insert into big_bowie values (i, j, mod(k,4)+1, 'The Rise And Fall Of Ziggy Stardust And The Spiders From Mars');
  9     end loop;
 10  end loop;
 11  commit;
 12  end;
 13  /
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BIG_BOWIE', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.

 
OK, I’m just going to create a standard B-Tree index on the TYPE column and see how large it is:

SQL> create index big_bowie_type_i on big_bowie(type) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_TYPE_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_TYPE_I          NORMAL                 4          2        1752

 

OK, so it’s a BLEVEL 2 index with 1752 leaf blocks. Let’s now compare it with an equivalent bitmap index. As the column only has 4 distinct values, it should be perfect as a bitmap index and much smaller than the B-Tree:


SQL> drop index big_bowie_type_i;
Index dropped.
SQL> create bitmap index big_bowie_type_bitmap_i on big_bowie(type) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_TYPE_BITMAP_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_TYPE_BITMAP_I   BITMAP                 4          1          84

 

Indeed it is smaller. It’s now just 84 leaf blocks in size, down from the previous 1752 leaf blocks. The Blevel has even reduced to just 1.

OK, let’s attempt something really silly and outrageous. Let’s create a bitmap index on the CODE column, a column with 10,000 distinct values. I know, I’m crazy to even suggest such a thing as the resultant bitmap will simply be “HUGE” right ?

Let’s see.


SQL> create bitmap index big_bowie_code_bitmap_i on big_bowie(code) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_CODE_BITMAP_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_CODE_BITMAP_I   BITMAP             10000          1          52

 

Well, would you look at that. It’s not “huge” at all, it’s just a tiny 52 leaf blocks !! In fact, it’s actually smaller and just 62% the size of the TYPE bitmap index that only had 4 distinct values.

So a bitmap index with 10,000 distinct values is actually smaller, more compact and more efficient that a bitmap index with just 4 distinct values !!

Why so small ?

Because the size and efficiency of a bitmap index doesn’t just depend on the number of distinct values but a whole range of other factors as well, not least the size and the clustering of the data in the table. Clue: I inserted the data into my BIG_BOWIE table in a very careful manner. However, one does need to actually understand how bitmap indexes work and how they actually store data to appreciate and determine whether a column is suitable for a bitmap index.

In short, there is no limit to the number of distinct values by which a column is suitable for a bitmap index. You could have a column with 10s of millions of distinct values (in say a billion row table) and a bitmap index might be perfectly suitable and significantly smaller than an equivalent B-Tree index. This is because a B-Tree index needs to store each and every occurence of a column value (unless the index is compressed) as well as a corresponding rowid whereas a Bitmap index might only need to store each distinct column value once with just 2 corresponding rowids. The savings in space can be massive, even if there are relatively few repeated occurences of each distinct value on average.

The next time you unfortunately read that bitmap indexes should only be used with very “few” distinct values and would be “huge” otherwise, well hopefully you’ll appreciate that’s simply not correct.

Follow

Get every new post delivered to your Inbox.

Join 1,861 other followers