jump to navigation

It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ? February 13, 2008

Posted by Richard Foote in Concatenated Indexes, Index Internals, Oracle General, Oracle Indexes, Oracle Myths, Performance Tuning.
trackback

A common myth or mis-perception is that when deciding how to order the columns in a concatenated, multi-column index, one should avoid placing low cardinality columns in front.

For example, if you want to create an index on two columns, column ID which has many many distinct values and column CODE which has very few distinct values, create the index as (ID, CODE) as it’ll be far more efficient than a corresponding index on (CODE, ID).

The reasoning goes that by creating the (CODE, ID) index, one decreases the performance and efficiency of using the index as Oracle will have to scan through multiple index leaf blocks containing the low cardinality column, until it eventually finds the specific index entries of interest.

Or so the theory goes …

In actual fact, there’s no real difference in navigating to the specific leaf block of interest for an index on (ID, CODE) compared to an index based on (CODE, ID), providing both indexed columns are known.

The important fact that’s missed is that the branch index entries contain column entries based on all indexed columns, or at least on as much as is necessary to uniquely identify the required navigational path. Therefore, Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know.

The only slight overhead that an index based on (CODE,ID) will have is that these branch index entries are going to be somewhat larger as it will likely require both columns for the branch index entries but likely only the one column the other way around. However, branch blocks usually take up a small percentage of the overall index structure and this (usually) minor overhead is very unlikely to make a difference to the index height.

This demo on Index Column Cardinality Order shows how Oracle navigates to a specific leaf block of interest in the same manner and with the same costs, regardless of the ordering of low and high cardinality columns in the index. It also shows and describes a couple of index branch block dumps to highlight how Oracle uses the column values to define the necessary navigational path.

So the high cardinality column shouldn’t necessarily be the column given leading column status.

In actual fact there are a number of good reasons why the low cardinality column could be considered as the better option as the leading column. For a start, the index can be compressed much more efficiently if the leading column has lower cardinality. Also, an Index Skip Scan can at least be considered if the leading column has lower cardinality.

Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration, as is the clustering factor of the leading column.

All good discussions for another day 🙂

Comments»

1. Vyacheslav Rasskazov - February 14, 2008

But picture changes if you want find by ID column only (SELECT * FROM ziggy_stuff WHERE id = 4242). Index by (ID, CODE) produced the same 6 CRs, but 22 CRs when index build by (CODE, ID), due Index Skip Scan. Therefore, in many cases preferable to place high cardinality columns in front.

Like

2. Richard Foote - February 14, 2008

Hi Vyacheslav

Be very careful that you don’t confuse one issue with another.

As I clearly stated, “Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know“.

I also clearly stated “Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration

The issue you’ve raised has nothing really to do with the cardinality of the column and everything to do with the fact in the “bad” performing example, the leading column is unknown and Oracle is forced into the relatively inefficient Index Skip Scan.

If the leading column is unknown, the index will be either inefficient at best or totally useless at worst, regardless of the cardinality of the columns.

In many (if not all cases) it’s preferrable to place “columns with a known value” in front …

Like

3. Vyacheslav Rasskazov - February 14, 2008

Hi Richard
Thank you for clear explanation. I really confused two different things.

Like

4. Richard Foote - February 14, 2008

Hi Vyacheslav

No worries, I confuse things all the time as well 😉

Glad it made sense.

Like

5. praveen - February 28, 2008

I will make a note of it and will see the performance difference when dealing with concatenated indexes.

Thank you

Like

6. Ketan Bakrania - March 4, 2008

Richard, why would you index a column with a low cardinality? Will it give any performance improvement. Surely just better to FTS this bit.

Like

7. Richard Foote - March 5, 2008

Hi Ketan

Not if by indexing the low cardinality column, it avoids the I/Os associated with visiting the table.

Not if the low cardinality column has some, rare, values that we may wish to also reference.

Not if the low cardinality column is required for unqiueness.

Etc.

Also, how does one just FTS “a bit” ?

There are various reasons why one might want to index a low cardinality column or include a low cardinality column in the index.

Like

8. Charles Herranz - March 21, 2008

Hello Richard

I carried some tests after seeing your blog and after suffering some composite indexes problems.

I changed your test a little bit, instead of 2 column composute I have made it 3 where the first two columns are not selective.

create table t1
tablespace noassm
as
select ‘TTT’ qname, mod(rownum,2) code, mod(rownum,500000) id, ‘ORA’ DESCRIPTION FROM dual CONNECT BY LEVEL <= 10000;

create index t1_i on t1(qname, code, id)
tablespace noassm;

update t1 set code = 3 where rownum null, tabname=>’T1′, cascade=> true, estimate_percent=> null, method_opt=> ‘FOR ALL COLUMNS SIZE 1’);

After this dump the branck block and tree dump and noticed this:

The block dump shows 31 branch rows
Tree dump shows 31 leaves

Strange isnt it? Because from your test where it says:

row#0[8047] dba: 75532512=0x48088e0 ==> pointer to the second intermediate branch block

in my test it does not point to any branch block, instead it points to a leaf. So it seems with my data there is only a branch in the index…?

Like

9. Richard Foote - March 21, 2008

Hi Charles

Your table only has 10000 rows where in my example I had 1million.

Therefore, you only have 10000 index entries which only requires 31 leaf blocks which can all be referenced by just the one branch block (which therefore happens to be the root block).

If you look in dba_indexes, you’ll find the index will only have a BLEVEL = 1, meaning the index only consists of the root block and the 31 leaf blocks. There are no intermediate branch blocks in your index.

Like

10. Tom - October 30, 2009

Hello Richard,

What about updates? for example I have a composite index with 3 columns in a 50M table. ucol1+ccol2+scol3
ucol1=unchanged column, primary key
ccol2= changed column that changes by batch 300k to 500k rows
scol3= sometimes changes, 1 row at a time by OLTP

most of the queries are using col1+col2+col3, Iam trying to tune the batch update and am thinking that we should move col2 as the last column in the index. ccol2 is a status col so goes from created to executed to completed, 90% of the rows will be completed.

Do you think moving col2 as last in the order will help the batch updates? not 100% sure as i think it will always delete/insert index block rather than update…what do you think, will it help batch updates? if so why? if not why? Thank you

Like

11. Richard Foote - October 30, 2009

Hi Tom

I’m just about to pop on a plane so sorry for the short response but I would suggest it wouldn’t make much different either way as in both scenarios, the entire index entry is having to be deleted and re-inserted elsewhere with the index structure (even if it’s logically within the same leaf block).

It’s one of those things you could always benchmark although the option of dropping / marking unusable the index and then rebuilding after the batch process, may improve the batch process by enough to justify the additional overheads of such a move.

Like

12. Richard Foote - February 23, 2011

Just when I thought these myths are slowly disappearing, Don Burleson said recently (21 Feb 2011) on his forum:

http://dbaforums.org/oracle/index.php?showtopic=20576&pid=61513&mode=threaded&start=#entry61513:

“When using a multi-column index, you want to put the most restrictive column value first (the column with the highest unique values) because this will trim-down the result set”.

Oh dear 😦

Like

Charles Hooper - February 24, 2011

Richard,

I think that you will find several index myths repeated in Mr. Burleson’s latest Definite Reference book. Could you devote a couple of articles to address those myths – which would be helpful information in the event that someone searches for supporting evidence for what is printed as fact in the book? It appears that much of the coverage of indexes in the book starts at page 906:
http://books.google.com/books?id=hiOhVSO-EFcC&pg=PA906

There is also a bit more starting at page 521.

For example:
On page 982, we find that a certain way for a DBA to get fired is to waste computing resources due to keeping an excessive number of indexes.
On page 985, we learn that a composite index cannot be used to police a primary key constraint.

Like

Richard Foote - February 24, 2011

Hi Charles

Blimey, that has got to be the funniest read I’ve had in a long time !! The number of errors and inconsistencies are just staggering, it reads like some kinda tragic comedy.

I think my favourite bit is where he recommends dropping a so-called duplicate index which could very well result in the crippling of the associated database. I love to know what a “unique foreign key” is 😉 I also love how he recommends on one hand caching all your indexes in a 32K cache and later says databases really should be as much as 60% indexes, meaning you should of course simply just cache 60% of your entire database but then says random I/Os waste memory and so store such tables in a 2K block whereas it’s perfectly OK to waste memory for the random I/Os for all your indexes !!

The bit where he has a pretty diagram listing all the factors that influences Oracle’s choice of index (such as raid level and striping) but forgets to mention index stats is a classic.

I actually dispel many of the myths listed in the book already in previous posts (large block size myth, bitmap low cardinality myth, etc. etc.) although perhaps it might be worth going back to basics with some of the elementary errors (such as how pctfree and freelists really work for indexes, PK and indexes, etc. etc.) for newbies who might read the book and know no better.

Based on what I’ve read, I give it a 1/2 star out of five, the 1/2 star only because of its comic value 😉

Like

Charles Hooper - February 25, 2011

Hi Richard,

Thank you for taking some time to examine the technical accuracy of the index related content in that book.

I agree that you have already addressed many of the false statements in previous blog articles and in Usenet posts dating at least as far back as 2003 (yes, some of those mistakes pointed out in Usenet posts from 2003/2004 were copied into the book). I assume that suggests that at least one person working with Oracle Database still believes those points to be true, and hopes to convince others of the same by printing the information in book form.

Your posts have a very fluid, logical, consistent, and easy to follow writing style. If you have the time, I think that it would be a beneficial to the Oracle community if you were to write a back to the basics series of blog articles.

Like

13. Richard Foote - February 25, 2011

Hi Charles

I can’t believe someone who had the correct explanation detailed to them years ago (such as how Oracle leaf blocks actually split and that no, it’s not possible for an index to have a greater depth in one part of the index from another part of the index) would still repeat this type of nonsense 😦

Page 727: “Hence an Oracle index may have four levels, but only in those areas of the index where the massive inserts have occured“.

And:

Note Oracle indexes will spawn to a fourth level only in areas of the index where a massive insert has occured, such that 99% of the index has three levels but the index is reported as having four levels“.

What rubbish and I explained why to him way way back in Feb 2003 when he made the same mistake:

http://www.orafaq.com/usenet/comp.databases.oracle.server/2003/02/01/0020.htm

And he still gets it wrong !!!

OK, you’ve convinced me a few basic “newbie” oriented posts might be worthwhile.

Like

14. BLKS_GETS_PER_ACCESS Index Rebuild Criteria ? (Twisted Logic) « Richard Foote’s Oracle Blog - April 20, 2011

[…] recent question on the database OTN forum and a previous request by Charles Hooper that I cover some basic indexing concepts for newbie’s who might be […]

Like

15. DV - May 11, 2012

Hi,

But according to Oracle itself, it’s better to put the column with the highest cardinality first:

http://docs.oracle.com/cd/B10500_01/server.920/a96533/data_acc.htm#2174

Ordering Keys for Composite Indexes

If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance.

Like

Richard Foote - May 11, 2012

Hi DV

And of course when this was written back in 9i days, it was still wrong then 🙂

If you look at the more modern version of the Oracle manuals, this incorrect comment has now been removed:

http://docs.oracle.com/cd/E11882_01/server.112/e16638/data_acc.htm#i2773

Like

DV - May 12, 2012

You’re right, thanks!

Like

16. 组合索引中多个字段的顺序 | 数据库(database.riaos.com) - December 10, 2012

[…] 或者It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?  (轻功需好) […]

Like

17. Faktenbasierte Indexierung | - April 27, 2015

[…] Ein gängiger Mythos der Indexierung ist, dass man selektive Spalten an den Anfang des Index stellen sollte. Dies würde es erlauben, im Indexbaum schneller die relevanten Blöcke ausfindig zu machen. Meistens sind die Verbesserungen marginal. (Für eine ausführliche Diskussion durch den Indexspezialisten Richard Foote klicken Sie hier). […]

Like

18. Faktenbasierte Datenbank-Indexierung | Diso AG - July 17, 2016

[…] Ein gängiger Mythos der Indexierung ist, dass man selektive Spalten an den Anfang des Index stellen sollte. Dies würde es erlauben, im Indexbaum schneller die relevanten Blöcke ausfindig zu machen. Meistens sind die Verbesserungen marginal. (Für eine ausführliche Diskussion durch den Indexspezialisten Richard Foote klicken Sie hier). […]

Like

19. Faktenbasierte Indexierung | Diso AG - October 6, 2016

[…] Ein gängiger Mythos der Indexierung ist, dass man selektive Spalten an den Anfang des Index stellen sollte. Dies würde es erlauben, im Indexbaum schneller die relevanten Blöcke ausfindig zu machen. Meistens sind die Verbesserungen marginal. (Für eine ausführliche Diskussion durch den Indexspezialisten Richard Foote klicken Sie hier). […]

Like

20. Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out) | Richard Foote's Oracle Blog - June 4, 2018

[…] about index column order, including this post that’s now some 10 years old – “It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?“. The summary being that it generally makes no appreciable difference to the performance of […]

Like

21. Oracle Database 19c Automatic Indexing: Default Index Column Order Part II (Future Legend) | Richard Foote's Oracle Blog - September 11, 2019

[…] then the ordering of columns within an index is of little consequence. I’ve discussed this point a number of times […]

Like


Leave a reply to Richard Foote Cancel reply