jump to navigation

Bitmap Indexes With Many Distinct Column Values (Wots…uh the deal) February 1, 2008

Posted by Richard Foote in Bitmap Indexes, Oracle General, Oracle Indexes, Oracle Myths.

In the seemingly never ending list of 8 things one may not have known about indexes, Number 3 stated:

“Bitmap Indexes can be extremely useful and beneficial even if the column contains thousands of distinct values”.

On the surface, this may seem like a somewhat strange thing to suggest to many folk. It seems to be stated in numerous places that Bitmap indexes are only really beneficial with columns that have low numbers of distinct values.  For example, a column called GENDER (I was going to use another word but I have to deal with enough spam messages as it is 🙂 ) has only a few distinct values, so it would be perfect for a Bitmap Index.

Columns that have say 5 or 10 or maybe 20 distinct values should all be OK. But what if a column has 100 distinct values, that might just be pushing it. A column with 1000 distinct values would obviously be totally inappropriate. I would have to be some kind of deranged fool for even contemplating and suggesting a column with 10,000 distinct values, right !!

A Bitmap Index is actually a B-Tree index in it’s basic structure and shape. It has index branch blocks that point to index leaf blocks that have all the necessary index information stored in an ordered and sorted manner. However, in the leaf blocks, a conventional B-Tree index basically stores the indexed column values followed by its corresponding rowid. A bitmap index differs considerably and basically stores in the leaf blocks for each distinct column, the column value followed by a starting and ending rowid that specifies a range of possible rowids within the table followed by a series of bits that denotes for each possible rowid within the range whether the row contains the column value (1) or not (0). If the index entry is larger than roughly half the index block size, another bitmap “piece” is created for the index entry, specifying a different range of rowids with corresponding bitmaps.

The “biggest” component of the index entry is thus this series of bits. But most of the values will be zeros (as a specific row can only have at most the one value of the column) and all these zeros can be compressed quite efficiently by Oracle within the index entry.

So having lots of distinct column values means having lots of index entries with lots of rowid ranges with lots of bitmaps. So a column with anything approaching 10,000 values would be totally inappropriate, right ?

Well take a look at this demo comparing a B-Tree Index vs. a Bitmap Index for a column that has 10,000 distinct values and you might just be surprised.

The table contains 1 Million rows and one of the columns is a NUMBER field that has 10,000 distinct values (hence a Density value of 1%).

The B-Tree Index required 2,090 leaf blocks to store the index and an equality query returning the expected 100 rows requires 19 consistent reads. Not too bad.

However, the equivalent Bitmap Index required just 56 leaf blocks and an equality query returning the same 100 rows does so with just 11 consistent reads.

Ummm, perhaps bitmaps indexes don’t require such low numbers of distinct column values to be useful after all …

A few points to ponder on from this specific example.

The B-Tree index had to store 1,000,000 index values, once for each and every not null row in the parent table. The Bitmap Index only had to store the index values 10,000 times, once for each unique occurrence of the column value (although there may be times when this ratio may be higher)

The B-Tree index had to stored 1,000,000 rowids, once for each and every index row entry. The Bitmap Index only had to store a pair of rowid values for each unique occurrence of the column value (although there may be times when Bitmap Indexes need to store more than one pair of rowids per index value).

If the rows in the table are clustered together based on the index column value, it means the zeros in the bitmap index can be nice and continuous within the bitmap string and so will compress nicely. Therefore, the manner in which the rows are ordered in the table will have a direct impact in how efficient the Bitmap Index can be compressed.

There’s lots and lots of interesting things to discuss about Bitmap Indexes. For now, just note the next time you read Bitmap Indexes should only be used for columns with few distinct values, you may want to get some clarification on what is meant exactly by “few” …