jump to navigation

Myth: Bitmap Indexes With High Distinct Columns (Blow Out) February 18, 2010

Posted by Richard Foote in Bitmap Indexes, Oracle Indexes, Oracle Myths.
trackback

I just couldn’t resist.  

One of the great myths in Oracle is that bitmap indexes are only suitable and should only be used with columns that have so-called low cardinality (few distinct) values. A classic example of this myth being propagated is in this latest Burleson news item, “Oracle bitmap index maximum distinct values“, from Feb 16 2010 (article itself updated January 8, 2010): 

It says “As the number if distinct values increases, the size of the bitmap increases exponentially, such that an index with 100 values may perform thousands of times faster than a bitmap index on 1,000 distinct column values.” 

It also mentions some so-called rules of thumb whereby: 

“1 – 7 distinct values – Queries against bitmap indexes with a low cardinality are very fast.

8-100 distinct values – As the number if distinct values increases, performance decreases proportionally.

100 – 10,000 distinct values – Over 100 distinct values, the bitmap indexes become huge and SQL performance drops off rapidly.

Over 10,000 distinct values – At this point, performance is ten times slower than an index with only 100 distinct values”

Now this of course is all generalised nonsense. Not only can a column with 10,000+ distinct values be perfect as a bitmap index but it can be considerably smaller than a bitmap index with only a handful of distinct values, even with columns of the same size and data type

A very simple example to demonstrate. First, I’m going to create a table with 1 million rows and have a CODE column that has 10,000 distinct values and a TYPE column with just 4 distinct values:

SQL> create table big_bowie (id number, code number, type number, name varchar2(100));
Table created.
SQL> declare
  2  i number;
  3  begin
  4  i:=0;
  5  for j in 1..10000 loop
  6     for k in 1..100 loop
  7      i:=i+1;
  8      insert into big_bowie values (i, j, mod(k,4)+1, 'The Rise And Fall Of Ziggy Stardust And The Spiders From Mars');
  9     end loop;
 10  end loop;
 11  commit;
 12  end;
 13  /
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BIG_BOWIE', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.

 
OK, I’m just going to create a standard B-Tree index on the TYPE column and see how large it is:

SQL> create index big_bowie_type_i on big_bowie(type) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_TYPE_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_TYPE_I          NORMAL                 4          2        1752

 

OK, so it’s a BLEVEL 2 index with 1752 leaf blocks. Let’s now compare it with an equivalent bitmap index. As the column only has 4 distinct values, it should be perfect as a bitmap index and much smaller than the B-Tree:


SQL> drop index big_bowie_type_i;
Index dropped.
SQL> create bitmap index big_bowie_type_bitmap_i on big_bowie(type) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_TYPE_BITMAP_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_TYPE_BITMAP_I   BITMAP                 4          1          84

 

Indeed it is smaller. It’s now just 84 leaf blocks in size, down from the previous 1752 leaf blocks. The Blevel has even reduced to just 1.

OK, let’s attempt something really silly and outrageous. Let’s create a bitmap index on the CODE column, a column with 10,000 distinct values. I know, I’m crazy to even suggest such a thing as the resultant bitmap will simply be “HUGE” right ?

Let’s see.


SQL> create bitmap index big_bowie_code_bitmap_i on big_bowie(code) pctfree 0;
Index created.
SQL> select index_name, index_type, distinct_keys, blevel, leaf_blocks from dba_indexes where index_name = 'BIG_BOWIE_CODE_BITMAP_I';
  
INDEX_NAME                INDEX_TYPE DISTINCT_KEYS     BLEVEL LEAF_BLOCKS
------------------------- ---------- ------------- ---------- -----------
BIG_BOWIE_CODE_BITMAP_I   BITMAP             10000          1          52

 

Well, would you look at that. It’s not “huge” at all, it’s just a tiny 52 leaf blocks !! In fact, it’s actually smaller and just 62% the size of the TYPE bitmap index that only had 4 distinct values.

So a bitmap index with 10,000 distinct values is actually smaller, more compact and more efficient that a bitmap index with just 4 distinct values !!

Why so small ?

Because the size and efficiency of a bitmap index doesn’t just depend on the number of distinct values but a whole range of other factors as well, not least the size and the clustering of the data in the table. Clue: I inserted the data into my BIG_BOWIE table in a very careful manner. However, one does need to actually understand how bitmap indexes work and how they actually store data to appreciate and determine whether a column is suitable for a bitmap index.

In short, there is no limit to the number of distinct values by which a column is suitable for a bitmap index. You could have a column with 10s of millions of distinct values (in say a billion row table) and a bitmap index might be perfectly suitable and significantly smaller than an equivalent B-Tree index. This is because a B-Tree index needs to store each and every occurence of a column value (unless the index is compressed) as well as a corresponding rowid whereas a Bitmap index might only need to store each distinct column value once with just 2 corresponding rowids. The savings in space can be massive, even if there are relatively few repeated occurences of each distinct value on average.

The next time you unfortunately read that bitmap indexes should only be used with very “few” distinct values and would be “huge” otherwise, well hopefully you’ll appreciate that’s simply not correct.

About these ads

Comments»

1. mwidlake - February 18, 2010

Would a more useful set of “rules of thumb” for a bitmap index being more beneficial than a b-tree index not be based more on:
- The ratio of distinct values indexed to number of rows, not just the base cardinality (number of) distinct values. so 100 values across 1 million rows is more likely to be better supportd by a b-tree index than 100 values over 1000 records.
- How often new distinct values are introduced, as the impact of updating the bitmap can be considerable. You don’t want an OLTP-type insert to be slowed right down.
- how often the indexed value is likely to be used in conjuction with other values in sql filter clauses.
- The skew of the indexed values. (if you always want the handful of “in progress” out of the millions of “completed” orders, a bitmap would probably be beneficial. If it is the other way around, it would be a waste or time, as would a b-tree index.

I think I have missed a couple of considerations, but a rule of “10 values is good, 1000 values is bad” is no better than saying “10 miles to the gallon is bad, 100 is good”. Depends on if I am driving a motorbike or a 40 ton truck.

Jeff Moss - February 19, 2010

I agree with the principle that we should look at the ratio of NDV to Cardinality to indicate whether we use a Bitmap Index, or not.

Now, obviously, the question that comes next is “Well, what ratio signifies that you should use a Bitmap or a B Tree?” The answer is, of course, it depends!

If we assume we need an index at all and that we have to choose between a Bitmap or a B Tree, then, supposing we had one million rows in our table and one million different values, i.e. a ratio of exactly one, then you’re going to choose the B Tree index (in this instance it’s a UNIQUE one).

If, on the other hand, we have ten distinct values for our one million rows in the table, then you’re going to choose the Bitmap.

There are a host of ratio values where we need to make a decision – and there are no hard and fast rules…the best thing to do, is, if it’s not obvious like the extreme examples above, to test it in your environment.

A further aspect to think about, is that Bitmap indexes are seldom as effective when used in isolation, as they are when used with other bitmap indexes on the same table – it’s in combining them, that bitmap indexes become so powerful. You need to bear that in mind when doing your testing in your environment, since in isolation, a given candidate might just miss out on being a bitmap index, however, when taken in the context of your environment, with the queries that your users issue, it may, due to combinations with other bitmaps, become an acceptable candidate.

Richard Foote - February 19, 2010

Hi Martin

I agree with all your comments. Yes, it’s more a simple ratio of approx. number of repeated occurrences rather than simply distinct values combined with how well the bitmaps can be compressed. Yes, bitmap indexes and multi-user OLTP don’t mixed, yes bitmaps can be very efficiently combined although a bitmap index with high numbers of distinct values can on its own can potentially outperform an equivalent b-tree and yes data skewness can be a factor as well.

Nice analogy too ;)

2. Tony Sleight - February 18, 2010

Well done, a great example of myth busting. I have always resisted the investigation of bit map indexes just for these reasons. I think I shall have to re-open the case book for bit map indexes.

One question I have is would this example hold if a lot of updates and/or deletes were performed on rows in the table. Is this a case where bit map indexes would be in-appropriate, independent of the number of distinct vales?

Niall Litchfield - February 19, 2010

Hi Tony,

I suspect in the case that you probably have in mind (OLTP with a lot of activity “evenly spread” across database rows) then you are boradly correct. If however all of your update/delete activity on the tables that likely benefit from bitmap indexes comes in well timed batches then borrowing the classic DWH trick of dropping and recreating indexes at the beginning and end of the batch may well be appropriate for you. oops, got a potential use case for regularly rebuilding indexes in there.

I suspect the rule of thumb still holds but over the years Oracle has made bitmap indexes more friendly to DML activity so what would be unacceptable in 9.0.1 might be acceptable in 11.2.

Niall

Richard Foote - February 19, 2010

Hi Tony

If you have concurrent DML, then avoid due to likely locking issues. If it’s a single user (or DML process) system, then since 10g, bitmap index tend to keep their “shape” much better than with previous versions with DML. Then it’s a case of test and see.

3. Niall Litchfield - February 19, 2010

ah and I’ve just read the article. Even the snide remark about hippie states at the end is just wrong in as much as it suggests that gender is a volatile column, mutable doesn’t mean volatile.

Noons - February 19, 2010

you forgot the vapid blondes, no?

Richard Foote - February 19, 2010

Hi Niall

True, so perhaps it won’t cost those hippie states billions after all ;)

ankur jain - April 5, 2010

richard sir can u please let me know in short simple language dat bitmap n btree index selection based on distinct values or cardinality or bitmap combined with other bitmap index on same table. bit confusing coz there are somany saying about btree n bitmap all saying all for both type of index . isn’t there any standard talk about bitmap or btree selection or all depeends upon what??

mwidlake - February 20, 2010

Oh good grief! I did not see that statement at the bottom first time around, I stopped when the advert for his book popped up. Irrespective of his stance on gender, or hippies (I’ve never had a hippie be unpleasant to me) it indicates he has never worked on a clinical system. Last I knew the UK NHS had 9 standard gender codes and that was in 1995. Tsch.

4. Karsten Schmidt - February 19, 2010

I can only confirm what you say here.
In the past I built an bitmap index on a date column for 500M row table with only a few hundred duplicates per second. (hence millions of distinct values)

However, data arrived in strict order, and the table was partitioned, hence the resulting bitmap indexes were fairly small.

and yes query performance improved a lot with the bitmap on the almost-unique column, since the b*tree index that used to be on that column was converted to bitmap on the fly for most queries anyway. (rowid conversion to bitmap in the execution plan)

Richard Foote - February 19, 2010

Hi Karsten

Exactly !!

Data order is an important consideration as it can make a big difference to the size and efficiency of the bitmap index.

But having a bitmap index with millions of distinct values is not only possible but highly desirable in scenarios such as yours.

5. Tom - February 19, 2010

Could the reason this bitmap is small on the column with a high DV, because of how you dispersed the data?

You would have something like.

Value1 111111111111111000000000000000000000000…
Value2 000000000000000111111111111110000000000…
Value3 000000000000000000000000000001111111111…
etc…

And so Oracle is able to efficiently “compress” by not storing all those repeating values, but something more efficient that expresses it?

Where as, if you had something like this.

Value1 1100010010010001000100010001000100010001…
Value2 0010101001001000110010101010110010100110…
Value3 0001000100100110001001000100001001001000…

Then Oracle isn’t able to store it as “compressed” because the number of repeating bits in each value is smaller and then as the number of DV increases, so does the bitmap thereby making it somewhat inefficient?

Peter Scott - February 19, 2010

Its not really compression as such. The bitmap index is roughly speaking two rowids (for the beginning and end of the range of rows covered by the bitmap, the key value and the bitmap itself. There is no need for the bitmaps to cover exactly the same set of rows – so if all of the rows for the key are clustered together then the bitmap can be short as there is no point storing a load of leading or trailing zeros.

Tom - February 19, 2010

So it is essentially saying

Value1 ROWID of first matching row…through…ROWID of last matching row?

Does it store rowid for both matching and nonmatching sequences of rows that match or do not match that value?

Something like.

ROWIDmatches 11111111111111 ROWIDmatches – ROWID not matches 000000000000 ROWID not matches . ROWID matches 11111111111111 ROWIDmatches etc…

Tom - February 19, 2010

I wasn’t finished.

So instead of storing all those one’s and zero’s it stores some sort of ROWID offset to tell it how many trailing 1′s and 0′s there really are?

Richard Foote - February 19, 2010

Hi Tom

Exactly !!

Which means to “compress” the 10,000 distinct column index, it’s very easy and efficient with so many running 0s (false) but the 4 distinct index with the 1s (trues) scattered everywhere, it’s extremely inefficient to compress.

Hence why the clustering of data in the table can make such a huge difference.

6. Tom - February 19, 2010

oops meant bitmap index :)

Richard Foote - February 19, 2010

Hi Tom

No, it’s stores logical 1s and 0s but a series of consecutive 0000000s followed by a 1 can be represented by a single (smaller) value and effectively compressed.

I highly recommend Julian Dykes presentation on bitmap indexes where he explains all this so well:

http://www.juliandyke.com/Presentations/BitmapIndexInternals.ppt

7. Peter Scott - February 19, 2010

I recall that excellent series on bitmap indexes written by Jonathan Lewis and published on DBAZINE.COM… must be at least 4 years old now. Very clear definition of what is being indexed and how – and a clear example of sizing.

It’s sad that the myths that came out back in 9i days are still going strong.

Niall – I think that 11g is much better about locking a big chunk of table when updating bitmap indexes than 9 ever was but it is still not perfect – and the cost in time of setting an index as unusable, do the dml and rebuild it can still be less than doing the update with the index in place… as always benchmark it and see – and of course with bitmap indexes on partitioned tables it is quite possible to restrict rebuilding to a single index partition.

Niall Litchfield - February 19, 2010

oh absolutely test, and especially test over a decent, simulated, period of time to see what will happen. The word *might* was definitely not there by accident!

Jonathan Lewis - February 19, 2010

Peter,

This isn’t “myths appearing in 9i”, this is “myths that I (thought I’d) busted when I published Practical Oracle 8i at the end of the last century” – and that included a demonstration similar to the one Richard has published here.

Regards
Jonathan Lewis

Richard Foote - February 19, 2010

Hi Jonathan

Unfortunately, all these many years later, some people still don’t get it and I strongly suspect never will :(

One reason why your first book is still recommended reading IMHO.

Richard Foote - February 19, 2010

Hi Peter

It’s extremely sad and somewhat incredible that such nonsense is still being written in 2010 AND passed off as if it’s some kind of current news item :(

Locking is still a big issue with 11g, in some cases even worse so with the manner that the rowid ranges can be changed within an index entry. Still not suitable for concurrent OLTP.

Peter Scott - February 20, 2010

Glad I don’t do OLTP then :-)

8. Tom - February 19, 2010
9. My Tuning Mantras « Confessions of an Oracle nOOb - February 19, 2010

[...] and why such rule of thumbs are nonsense, please see the update by Richard Foote at his blog today. http://richardfoote.wordpress.com/2010/02/18/myth-bitmap-indexes-with-high-distinct-columns-blow-ou… Possibly related posts: (automatically generated)Relevance of Time on Task to SPEMicro calendar [...]

10. Gary - February 19, 2010

Adding to the complexity, if your table has columns “colour” and “size”, you have the choice between two single column indexes (btree or bitmap) and/or an index on both “colour” and “size” columns.

If you query on a particular combination of those columns, a multi-column index may perform better than juggling two bitmap indexes. If you might be interested in three sizes in two colours, then the bitmap indexes might be better.

Richard Foote - February 19, 2010

Hi Gary

It’s a balance of flexibilty vs. absolute efficiency and getting the balance right depends on so many factors that testing and seeing for oneself in the specific application is usually the best strategy.

11. baskar.loganathan - February 19, 2010

Very good demonstration sir

12. Richard Foote - February 19, 2010

Hi Baskar

Thank you :)

13. Houri Mohamed - February 20, 2010

Dear Richard

Thanks very much for these precious information about bitmap indexes. I am also still waiting for another one
that you could title it as “The danger of bitmap indexes”

In my actual project which is a multi-user concurent OLTP application(10.2 release) subject to several Insert/delete/update, there is a standard that when the number of distinct value of a index candidate column is small then use a bitmap indexe.

Of course this standard has been followed by developers and it produced several deadlocks in Production. Those deadlocks
disapeared immediately when bitmap indexes have been converted to b-tree indexes

Another thing I want to emphasize is that when you “b-tree” index a foreign key you will avoid a lock of the child table when you delete/update/merge the parent table. However, this is not the case when you “bitmap” index a foreign key.

Please correct me if I am wrong but I believe that we should not discuss about the benefit of bitmap indexes versus
b-tree ones in an OTLP mutli-user concurent application at all?

Or am I wrong?

Regards

Mohamed Houri

Richard Foote - February 25, 2010

Hi Houri

You’re right, I totally agree with you.

The discussion of bitmap vs b-tree is really only a non-issue on concurrent update applications/tables, which rules bitmaps out for most OTLP type applications.

14. Andy - February 23, 2010

Richard,

Have you sent an invoice to Mr. B? ;)

Richard Foote - February 25, 2010

Hi Andy

LOL !!

Unfortunately, with this classic piece of writing, it’s so totally and completely wrong that no amount of editing can save it.

Two choices really, delete the awful thing or just ignore everything and just pretend it really is correct …

15. Log Buffer #180: a Carnival of the Vanities for DBAs | The Pythian Blog - February 27, 2010

[...] Richard Foote knocks holes in another myth: “One of the great myths in Oracle is that bitmap indexes are only suitable and should only be used with columns …” [...]

16. Norman - February 28, 2010

@Martin

“Last I knew the UK NHS had 9 standard gender codes and that was in 1995.”

In my country we have only two genders, female and male.

What are the other 7 genders called?

Martin Widlake - February 28, 2010

Hi Norman,

Male.
Female.
Unknown (not checked, which happens when for example someone comes to A&E and will not confirm genda)
Indeterminate (checked but not sure. This has almost dissapeared due to genetic testing, but a suprising number of babies could not be accurately sexed by external genitalia as many male new-borns still have the genila tucked up into the body cavity).
Male once Female (re-assignment)
female once male (re-assignment)
male been both (re-assignment twice)
female been both (re-assignment twice)
hermaphrodite. (yes, have some of both male and female physical genital attributes, again more common than maybe you realise, probably due to phobic responsed from “the general public”).

This list is from the early 90′s and before simple chromosmal tests were routinely carried out, which did NOT simplify matter as the true gentic basis of sex is more complex and not always in direct relation to physical sexual genitalia. You could be married to a lady who is physical and (as far as one can determine such things) empathetically female and she could have the genetics of XY (genetically male). Similarly, you could have a penis and testicals and find you are genetically XX, female. Now we can expand to people with multiple chromosomal duplicity who are XXY, XXX, XYY, YY {hmm, not sure about YY, as Y chromosome, the “Male” chromosome, is effectively a truncated X chromosome and so lacking any Y genes is a bit of a problem, ie we might all have to be 50% female to survive, but I kind of remember YY can just about work biologically} etc, etc, etc.

Before you question the above Norman, I have a UK degree in Genetics and many, many years in healthcare IT. Looking at your other comments, I think the problem is your attitude, not the rest of the world. The world is a very complex and interesting place. I doubt very much that your “country” has, from a purely biogenetic stance, just the two genders, as they would gave to be pre-mendelian or just terribly ignorant to hold that view.

Do you want me to discuss the impact of sex hormone balance within the gestation period of fetuses and external genetalia and psychological outlook or would you prefer to go and check some scientific journals first? I await your respons with…interest.

Richard Foote - February 28, 2010

Hi Martin

I think you might just have confused poor Norman ;)

Martin Widlake - February 28, 2010

Oh, and “Norman”, who the hell are you? DKB by any chance?

There is something you need to grasp here. Anything you read in the net is just hearsay. Unless it is backed by some test scripts that you or I can run and check the results. Now, DKB is pretty bloody poor at putting up such test scripts, Richard is pretty bloody good at putting them up so YOU can download them, take them apart, try them. If you have commented without doing some tests that you can offer up, shame on you.

I for one am going to believe someone who shows their tests and worked examples. DKB? No tests, no scripts, no kudos.

Richard Foote - February 28, 2010

Hi Martin

Well said :)

17. Norman - February 28, 2010

@Niall,

“mutable doesn’t mean volatile”

Yes it does.

According to my dictionary, “mutable” and “volatile” are close synonyms.

Niall Litchfield - March 1, 2010

Hi Norman,

I was using those words in the sense below (both american heritage dictionary)
Volatile = “Tending to vary often or widely, as in price”. The example given being of stock prices.

Mutable = “Capable of or subject to change or alteration.”

In other words a volatile column is one that changes often – think of the status flag for an order, this goes through a predicatable lifecycle and nearly all rows will get updated more than once. A mutable column is merely one that is not unchanging. In the case of gender, at least for Health care applications, gender is not unchanging, but it is rarely changing. The occasional update to a very few rows would be unlikely to be a major concern for bitmap indexes.

Incidentally I see in your other comments that you are asking for a repeatable test case. There is one in the article you know, have you run it?

18. Norman - February 28, 2010

@Richard,

“great myths in Oracle is that bitmap indexes are only suitable and should only be used with columns that have so-called low cardinality (few distinct) values”

As an Oracle Ace Director, you should not say such things that are dangerous myths themselves.

You represent Oracle Corporation, and the Oracle documentation clearly says that this is not a myth.

Are you saying that the Oracle documentation and MOS notes are incorrect?

“I inserted the data into my BIG_BOWIE table in a very careful manner.”

Can you please explain exactly how you have rigged this table to achieve this result?

Since you are using this as evidence to bash DKB’s reputation, you should at least make a reproducible test case.

You risk being guilty of doing the same thing that you accuse DKB of. You take a single test case, carefully made to show your point and then falsely over generalize from it.

“The next time you unfortunately read that bitmap indexes should only be used with very “few” distinct values and would be “huge” otherwise, well hopefully you’ll appreciate that’s simply not correct.”

No, they are correct.

These bitmap guidelines are exactly those taught my Oracle University andf in the Oracle documentation.

If you think this is true, why are you attacking DKB and not the Oracle guidelines on bitmap indexing?

Everyone knows that there are obscure exceptions to any rule, and you yourself demonstrate by the craftsmanship required to make a counter-case.

Don’t let your jealousy of DKB is make you look pathetic and
Crafting an obscure test case to discredit someone that you are jealous of is not becoming of you.

“Please correct me if I am wrong but I believe that we should not discuss about the benefit of bitmap indexes versus b-tree ones in an OTLP mutli-user concurent application at all?”

See there, we see evidence that you are deceiving or at least misleading people, making myths not finding them.

Richard Foote - February 28, 2010

Hi Norman

I believe I was nominated and accepted as an Oracle ACE Director precisely because of what I say and write …

From the doco if you’re interested (Database Data Warehousing Guide, both 10g and 11g):

“For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A bitmap index on this column can outperform a B-tree index, particularly when this column is often queried in conjunction with other indexed columns. In fact, in a typical data warehouse environments, a bitmap index can be considered for any non-unique column.”.

This recommendation by Oracle therefore suggests my simple little demo is not quite such an obscure exception after all …

Tell you what Norman, how about you copy my test case and see the results for yourself. Hey, perhaps you might even learn why the 10,000 distinct column is so efficient. Clue: The answer is explained right here.

Tell you what. How about you take any table with a column with 10,000 distinct values in a 1million+ row table, create a bitmap index and compare it with an equivalent b-tree index.

Ummm, perhaps those 10,000 distinct columns are not so bad after all when indexed via a bitmap index …

Perhaps you might want to read the comment 4 by Karsten above where they use bitmap indexes on columns with millions of distinct columns.

If you want to believe the nonsense in Burleson’s article, go knock yourself out. But let me repeat: There is no limit to the number of distinct values by which a column is suitable for a bitmap index. One can only lead a horse to water …

19. Andy - February 28, 2010

This is easy to resolve.

1) DKB can provide test cases to prove that:

“1 – 7 distinct values – Queries against bitmap indexes with a low cardinality are very fast.

8-100 distinct values – As the number if distinct values increases, performance decreases proportionally.

100 – 10,000 distinct values – Over 100 distinct values, the bitmap indexes become huge and SQL performance drops off rapidly.

Over 10,000 distinct values – At this point, performance is ten times slower than an index with only 100 distinct values”

When this is done, we all have Richard’s test cases and DKB test cases to dive into , and decide for ourselves. Fair enough? And please, test cases do not include empty claims such as ‘My 80 years of experience says so”. DKB can rig the test cases all he wants, as long as the test cases results are reproducible.

Richard Foote - February 28, 2010

Hi Andy

Wouldn’t something that proves or highlights these claims be a pleasant surprise.

Then we can perhaps say “umm yes, the reason why the 10,000 distinct column example was so bad was because the table itself only had 10,000 rows” ;)

Like I said to Norman, create a column with 10,000 distinct values in a table with a few millions rows, that’s any table, and see the results for yourself …

Andy - February 28, 2010

Richard,

According to Norman, you are jealous of DKB, and your test cases are obscure …

Obscure … hmm … what is more obscure than a standard reply to all technical question/challenge with
1) “It is not allowed by non-disclosure agreement to demonstrate”
2) “I don’t have to demo. I have 130 years of experience”
3) “My customers are from Fortune 5000000″

I wonder if I can use #3 to retort all technical questions thrown at me, since my employer is at the tip of the Fortune 5000,000,000 list…. can I ? Can I? Can I?

20. Richard Foote - February 28, 2010

Hi Andy

Perhaps “Norman” needs to become more familiar with Burleson’s work ;)

Maybe if he read a few more Burleson classics such as this one on “Tuning SQL to invoke nested loops joins”, a news item from Jan 5 2010 (article dated Sept 4 2009):

http://www.dba-oracle.com/t_tuning_sql_nested_loops_joins.htm

which includes this example of a nested loop join:

select /*+ use_nl_with_index */
sales_stuff
from
sales
where
state = ‘idaho’;

Now that’s brilliant, a nested loop join involving the one table :)

I’m sure “Norman” would be impressed …

21. Andy - February 28, 2010

Wow… I’m not sure about Norman, but I am definitely impressed!!!!!!!

I’m going to put that code into my terrabytes datawarehouse tomorrow …

22. Norman - February 28, 2010

@Richard

“is no limit to the number of distinct values by which a column is suitable for a bitmap index.”

Not according to the very same Oracle documentation you cited:

“In a data warehouse, B-tree indexes should only be used for unique columns or other columns with very high cardinalities (that is, columns that are almost unique).”

It also appears that your quote is from an obsolete copy of the Oracle documentation, and this was removed in the 11g documentation.

The 11g documentation agrees exactly with DKB’s web page:

“You typically want to use bitmap indexes on low degree of cardinality columns and B-tree indexes on high degree of cardinality columns.

As a general rule, a cardinality of under 1% makes a good candidate for a bitmap index.”

All I asked is why you attack DKB when the Oracle documentation says the same thing, and you confirm my suspicions by hurling a barrage of unprofessional insults.

I thought that Oracle Ace Directors were official representatives of Oracle Corporation.

I was hoping to come here, ask questions and learn, and I am not accustomed to be treated in such an unprofessional and crude manner, especially from somebody who represents Oracle.

The only thing I have learned is that Oracle Ace’s are rude, unprofessional and abuse their credentials to bash their enemies.

If you were truly motivated by a quest for the truth, you would be citing the “mistakes” in the Oracle docs, not DKB’s web pages.

@Martin

“Before you question the above Norman, I have a UK degree in Genetics and many, many years in healthcare IT.”

Why would I question you? I asked you a polite question, and I was rude to you in no way.

In that same document Richard cites, the Oracle documentation clearly says that there are only two genders:

“A gender column, which only has two distinct values (male and female), is ideal for a bitmap index.”

Why do you insult me?

Andy - February 28, 2010

Norman,

Personally, I think Martin’s response to you was a bit overboard. But that’s probably because he thinks you are DKB ;) But if you are not DKB, then there is nothing to worry about.

Are you?

Richard Foote - March 1, 2010

Now now Norman, we both know my documentation quote is not obsolete and is still in the 11g Rel 2 doco as it directly precedes your little quotation. Link FYI:

http://download.oracle.com/docs/cd/E11882_01/server.112/e10810/indexes.htm#i1004902

with the full quote being:

“For example, on a table with one million rows, a column with 10,000 distinct values is a candidate for a bitmap index. A bitmap index on this column can outperform a B-tree index, particularly when this column is often queried in conjunction with other indexed columns. In fact, in a typical data warehouse environments, a bitmap index can be considered for any non-unique column.

B-tree indexes are most effective for high-cardinality data: that is, for data with many possible values, such as customer_name or phone_number. In a data warehouse, B-tree indexes should be used only for unique columns or other columns with very high cardinalities (that is, columns that are almost unique). The majority of indexes in a data warehouse should be bitmap indexes.”

Oh dear, the full and complete quote rather conflicts with your point of view and that of the Burleson article doesn’t it ?

“You typically want to use bitmap indexes on low degree of cardinality columns and B-tree indexes on high degree of cardinality columns. As a general rule, a cardinality of under 1% makes a good candidate for a bitmap index.”

Thank-you for finding this quote in the doco. Although it’s poorly written, it does however backup everything I’ve been saying because if you were to actually look at my demo, the 10,000 distinct values column does indeed represent a cardinality (or selectivity) of approximately 1%. If you have a large 100 million row data warehouse table, you can have an approx. 1 million distinct value column and still have a cardinality returned of 1% of data.

That’s the whole point, a column with millions of distinct values is potentially a perfect candidate for a bitmap index and is why the Burleson article is such complete nonsense. As Jonathan noted, this myth was dispelled 10+ years ago …

BTW, I hope you don’t depend on the Oracle doco too much. You do of course appreciate that like the software itself, it’s not perfect and can have errors and inaccuracies. A good DBA uses documentation as a resource and as a guide but should never blindly depend on it. That’s what experience and a proper and practical understanding of the necessary concepts is all about.

I can show you the difference between fact and fiction. Whether you actually learn however is beyond my control and is entirely up to you I’m afraid.

One final point. Your constant snide references to the Oracle ACE Director program, your desperation to find quotes in the doco to support your claims and your subsequent inability to do so by only referencing quotes that counter your argument, your inability to see and appreciate a reproducible test that’s sitting right in front of you, your inability to produce any test cases of your own to backup your claims, your phrases such as “hurling a barrage of unprofessional insults”, “Oracle Ace’s are rude, unprofessional and abuse their credentials”, “Everyone knows that there are obscure exceptions to any rule” , “Don’t let your jealousy of DKB is make you look pathetic”, etc. etc. all remind me of somebody but I just can’t put my finger on who it might be ???

Norman, if you look like a duck, if you quack like a duck and if you waddle like a duck, well you know what they say … ;)

Andy - March 1, 2010

While the topic is still sizzling hot ;) …. there is one section of the Oracle Datawarehouse ( forgot the part number though) that is full of error. The same errors can be seen in 10gR2 and 11gR1docs. The SQL statements don’t work ( with syntax error on the date format).

Who says official doc is guaranteed to be 100% correct? ;)

Niall Litchfield - March 1, 2010

Hi Andy,

The docs definitely can have bugs filed against them, and I have done exactly that in the past (the entire database is limited to 121 extents :( ) . Unfortunately the docs tend not to get fixed until the next version. Since the release of 11 you can comment on the docs online, I’m not sure if the comments get incorporated into the online docs, though.

The documentation process appears to work as follows

1. copy the previous versions docs
2. update the new features docs with a development list of NF
3. update the obvious sections that relate to NF
4. remove any old features
5. optionally remove documentation that revealed internal operations (eg compare the streams documentation through the releases)

Andy - March 1, 2010

Hi Niall,

I cited that ‘bad doc’ example for Norman actually, since he was so hung up on ‘The doc says so , but you say something else’ in response to Richard.

Thanks for pointing out the doc process. I did make an online comment after reading the entire chapter. A few months later, I received an email from the doc owner. I could only quote a few examples from the doc to him.

Richard Foote - March 2, 2010

Hi Niall

I think the key point here is that they clearly don’t rewrite the doco from scratch, that would just be a waste of time, they indeed copy the previous version and modify as they see fit.

Which of course lends itself to having various errors and mistakes that remain undetected for quite sometime as they pass from one doco release to another or having things that have indeed changed not being reflected in the new doco as it’s simply been missed or forgotten in the update process.

The doco is great IMHO, but certainly not perfect and just becuase it says it’s so, doesn’t necessarily mean that it’s always so.

Andy - March 15, 2010

Hmm…. that very ‘technically accurate and entertaining’ article by Mr B has been … updated, as predicted ;)

Thank god I have a copy of the ‘original acoustic version’ :p One fine day I will post all of them in a blog ;)

Richard Foote - March 15, 2010

Hi Andy

I have a wonderfully funny collection of such original Burleson articles ;)

mwidlake - March 1, 2010

Actually Norman, you are right, I was a little high-handed and I apologise for being so.
I think I reacted so because of the way you came into the discussion thread – challenged three people in quick succession, all mildy confrontational, except the one aimed at Richard, which read to me as being very confrontational towards the end and patronising.
However, I need to remind myself that communication by the written medium is not perfect and people can come across not as they intended. And even if it does turn out that the person is in fact baiting people, there is no real excuse for responding in a harsh manner, so I apologise for my tone.
However, your later posts give me the strong impression that you only joined in the thread to support DKB on a personal level and to try and score points. You constantly refuse to accept that the original point was well made and ,demonstrated.

To keep everyone happy, may I suggest you keep your web browsing for Oracle Knowledge to DKBs output in future? – it is considerable in volume and you seem very happy with it, so I’d stick with what you know and like.

23. Norman - February 28, 2010

@Andy,

“I wonder if I can use #3 to retort all technical questions thrown at me”

Before today I was no DKB fan, but after this I see I’ve been listening to the wrong people.

I now suspect that this whole thread was nothing more than a personal attack.

All I did was ask for a reproducable test case, I’m sorry I asked.

It appears that I am underqualified to ask questions of a great Oracle Ace Director.

24. Andy - February 28, 2010

Norman,

Nobody says you are under-qualified. I’m not a member of Richard’s fan club either. If he is wrong, I would not hesitate to blast him.

Richard has clearly done his work by giving his test cases. Why don’t you ask DKB do to the same to support his claim? Then we all have a balanced platform for discussion.

If I can see solid evidence from DKB’s test cases, I would then say”Richard, you should apologize to DKB!”

Fair enough?

Richard Foote - March 2, 2010

Hi Andy

Damn, I thought you were the other member of the Richard Foote fan club ;)

Andy - March 2, 2010

shhh ………. I can’t publicly announce my membership …. :p

25. Andy - March 2, 2010

The thing about documentation is, it is usually *not* written by the developers. That puts a lot of pressure on the technical writers, as they must *know* everything without actually *knowing* everything … :)

By the way, is ‘doco’ a common short form? I’ve never seen it before ….

26. Uwe Hesse - March 3, 2010

I would like to express my opinion that the Oracle Database Online Documentation is usually very well written and outperforms the very most of other available information – especially the majority of books purchasable. Nobody is perfect, of course, so mistakes in the documentation are possible. But that is a pretty rare exception – it can generally be trusted.
A sensible person will of course always try to develop test cases and will not trust anybody blindley before taking anything critical into production. Personally, I am usually satisfied if someone cites the Online Documentation to backup a claim – but I would check it myself if it contraticts my claims or expectations :-)

Richard Foote - March 3, 2010

Hi Uwe

I would totally agree with your comments. On the whole, the doco is excellent and well written. In fact, looking at the doco quotes associated with this discussion, they validate everthing I’ve been saying and clearly state that a bitmap index can be used potentially for any column that doesn’t approach too close to being unique.

That said, the point I make is that it’s not 100% perfect and noone can really expect it to be. If something doesn’t sound right based on your experiences, there’s always the possibilty it might be wrong or that it doesn’t consider a specific case.

If someone shows me a repeatable test case, I would believe it over a quote from a manual.

mwidlake - March 3, 2010

The manuals are usually my first port of call when I want to check something. They may be dry and there is the very occasional error, but I find what they say to be very reliable. However, there is often a lot they do not say, which would be my complaint about the manuals (ie a lack of good examples or mentioning drawbacks/issues).
I go to the web to get the examples and potential issues.

Knowing what the manuals say helps me spot when something on the web is probably wrong though. If some of what the pages says goes against the manuals, it is probably old or myth. It’s like the manulas give me a reliable core of information and the web fleshes it out.

27. John Wu - March 10, 2010

This research article

http://doi.acm.org/10.1145/1132863.1132864

has an in-depth analysis on the compressed size of bitmap indexes. A conclusion is that the compressed index size does increase with the number of distinct value, however the maximum size is proportional to the number of rows indexed (unrelated to the number of distinct values).

Richard Foote - March 15, 2010

Hi John

Then that would be an incomplete conclusion with regard to Oracle bitmap indexes because as I’ve demostrated, depending on the efficiency of the compression, which in turn depends on things such as the clustering factor and the possible number of rows per block, an index with far more distinct values can potentially be significantly smaller than an index with far fewer distinct values.

28. Ted - March 12, 2010

Just a word to note that Mr Burleson updated his article on March 8 ;)

Seems like he’s learning (only?) by writing incorrect things on his site and waiting for others to correct him…

Richard Foote - March 15, 2010

Hi Ted

Yes I’ve noticed.

Unfortunately, in its attempt to cling to some credability after the first dismal attempt, it continues to make basic errors with regard to saying things are “rare” when they’re not and by getting totally wrong how bitmap compression is implemented.

It doesn’t now include the silly and completely wrong “rules of thumb”, but replaces them with a series of other errors.

As there appears to be some interest in how bitmap indexes actually work and can be implemented, I might write a few more posts that might help the spoonfeeding process ;)

29. Andy - March 15, 2010

To Mr B ( You know who you are, and we all know you are reading this blog religiously),

Since you have been leeching from Richard’s technical writeup, you should probably acknowledging his contribution by giving him some credits in your blog?

Richard Foote - March 15, 2010

Hi Andy

Interestingly, while I was in the US, I received an email from Burleson Consulting (but not from Don himself) thanking me for my corrections on an article and after being checked by some “Oracle ACEs” have made changes. I was also given a $100 amazon voucher for my trouble !!

Unfortunately, the bitmap article (the actual article wasn’t mentioned in the email) is still full of errors, assuming it’s the one they’re referring to.

However, why someone would write an article on a subject matter they clearly know little about and requires some Oracle ACEs to confirm all the errors, while labeling it “latest Oracle news”, is totally and completely beyond me ??

To be honest, I really don’t want to be connected to the article in any way, in case people mistakenly think I had something to do with writing the thing :)

Andy - March 15, 2010

Ah…. you are right! Nothing is worse than a bad association ;)

But I would still think you should start billing him big time! $100 of voucher? Geez… I hope you are not exchanging that for his ‘Performance Tuning Guide’ …. :p

30. Blogroll Report 12/02/2009 – 19/02/2010 « Coskan’s Approach to Oracle - March 19, 2010

[...] 22-What is/isn’t the main rule to build bitmap index? Riachar Foote-Myth: Bitmap Indexes With High Distinct Columns (Blow Out) [...]

31. ankur jain - April 5, 2010

pls let me know about bitmap n btree selection all points need to take care . which is better in which condition i need to keep in mind na.

32. Richard Foote - April 13, 2010

@Ankur

I think my next blog entry which I hope to write tonight might answer your question (if I’ve understood it correctly).

33. Richard Foote - May 6, 2010

Oh good grief !!

Someone highlighted another Burleson shocker with regards to Bitmap indexes:

http://www.dba-oracle.com/oracle_tips_bitmapped_indexes.htm

That makes all the same old mistakes and promotes all the same old old old myths regarding Bitmap indexes as in the article I referenced in this blog piece. Such as:

“100 – 10,000 distinct key values – Over 100 distinct values, the bitmap indexes become huge and SQL performance drops off rapidly. Over 10,000 distinct key values – At this point, performance is ten times slower than an index with only 100 distinct values.”

“You will want a bitmap index when:

1 – Table column is low cardinality – As a ROUGH guide, consider a bitmap for any index with less than 100 distinct values”.

“As the number if distinct values increases, the size of the bitmap increases exponentially, such that an index with 100 values may perform thousands of times faster than a bitmap index on 1,000 distinct column values.”

Hopefully, my discussions on Bitmap indexes have helped to explain why this is all such total nonsense …

34. Btree / Bitmap « Oracle Scratchpad - June 29, 2010

[...] Feb 2010: Richard Foote has recently published a lengthy blog item about bitmap indexes that contains some useful information and [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,901 other followers

%d bloggers like this: