jump to navigation

It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ? February 13, 2008

Posted by Richard Foote in Concatenated Indexes, Index Internals, Oracle General, Oracle Indexes, Oracle Myths, Performance Tuning.
trackback

A common myth or mis-perception is that when deciding how to order the columns in a concatenated, multi-column index, one should avoid placing low cardinality columns in front.

For example, if you want to create an index on two columns, column ID which has many many distinct values and column CODE which has very few distinct values, create the index as (ID, CODE) as it’ll be far more efficient than a corresponding index on (CODE, ID).

The reasoning goes that by creating the (CODE, ID) index, one decreases the performance and efficiency of using the index as Oracle will have to scan through multiple index leaf blocks containing the low cardinality column, until it eventually finds the specific index entries of interest.

Or so the theory goes …

In actual fact, there’s no real difference in navigating to the specific leaf block of interest for an index on (ID, CODE) compared to an index based on (CODE, ID), providing both indexed columns are known.

The important fact that’s missed is that the branch index entries contain column entries based on all indexed columns, or at least on as much as is necessary to uniquely identify the required navigational path. Therefore, Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know.

The only slight overhead that an index based on (CODE,ID) will have is that these branch index entries are going to be somewhat larger as it will likely require both columns for the branch index entries but likely only the one column the other way around. However, branch blocks usually take up a small percentage of the overall index structure and this (usually) minor overhead is very unlikely to make a difference to the index height.

This demo on Index Column Cardinality Order shows how Oracle navigates to a specific leaf block of interest in the same manner and with the same costs, regardless of the ordering of low and high cardinality columns in the index. It also shows and describes a couple of index branch block dumps to highlight how Oracle uses the column values to define the necessary navigational path.

So the high cardinality column shouldn’t necessarily be the column given leading column status.

In actual fact there are a number of good reasons why the low cardinality column could be considered as the better option as the leading column. For a start, the index can be compressed much more efficiently if the leading column has lower cardinality. Also, an Index Skip Scan can at least be considered if the leading column has lower cardinality.

Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration, as is the clustering factor of the leading column.

All good discussions for another day :)

Comments»

1. Vyacheslav Rasskazov - February 14, 2008

But picture changes if you want find by ID column only (SELECT * FROM ziggy_stuff WHERE id = 4242). Index by (ID, CODE) produced the same 6 CRs, but 22 CRs when index build by (CODE, ID), due Index Skip Scan. Therefore, in many cases preferable to place high cardinality columns in front.

2. Richard Foote - February 14, 2008

Hi Vyacheslav

Be very careful that you don’t confuse one issue with another.

As I clearly stated, “Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know“.

I also clearly stated “Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration

The issue you’ve raised has nothing really to do with the cardinality of the column and everything to do with the fact in the “bad” performing example, the leading column is unknown and Oracle is forced into the relatively inefficient Index Skip Scan.

If the leading column is unknown, the index will be either inefficient at best or totally useless at worst, regardless of the cardinality of the columns.

In many (if not all cases) it’s preferrable to place “columns with a known value” in front …

3. Vyacheslav Rasskazov - February 14, 2008

Hi Richard
Thank you for clear explanation. I really confused two different things.

4. Richard Foote - February 14, 2008

Hi Vyacheslav

No worries, I confuse things all the time as well ;)

Glad it made sense.

5. praveen - February 28, 2008

I will make a note of it and will see the performance difference when dealing with concatenated indexes.

Thank you

6. Ketan Bakrania - March 4, 2008

Richard, why would you index a column with a low cardinality? Will it give any performance improvement. Surely just better to FTS this bit.

7. Richard Foote - March 5, 2008

Hi Ketan

Not if by indexing the low cardinality column, it avoids the I/Os associated with visiting the table.

Not if the low cardinality column has some, rare, values that we may wish to also reference.

Not if the low cardinality column is required for unqiueness.

Etc.

Also, how does one just FTS “a bit” ?

There are various reasons why one might want to index a low cardinality column or include a low cardinality column in the index.

8. Charles Herranz - March 21, 2008

Hello Richard

I carried some tests after seeing your blog and after suffering some composite indexes problems.

I changed your test a little bit, instead of 2 column composute I have made it 3 where the first two columns are not selective.

create table t1
tablespace noassm
as
select ‘TTT’ qname, mod(rownum,2) code, mod(rownum,500000) id, ‘ORA’ DESCRIPTION FROM dual CONNECT BY LEVEL <= 10000;

create index t1_i on t1(qname, code, id)
tablespace noassm;

update t1 set code = 3 where rownum null, tabname=>’T1′, cascade=> true, estimate_percent=> null, method_opt=> ‘FOR ALL COLUMNS SIZE 1′);

After this dump the branck block and tree dump and noticed this:

The block dump shows 31 branch rows
Tree dump shows 31 leaves

Strange isnt it? Because from your test where it says:

row#0[8047] dba: 75532512=0×48088e0 ==> pointer to the second intermediate branch block

in my test it does not point to any branch block, instead it points to a leaf. So it seems with my data there is only a branch in the index…?

9. Richard Foote - March 21, 2008

Hi Charles

Your table only has 10000 rows where in my example I had 1million.

Therefore, you only have 10000 index entries which only requires 31 leaf blocks which can all be referenced by just the one branch block (which therefore happens to be the root block).

If you look in dba_indexes, you’ll find the index will only have a BLEVEL = 1, meaning the index only consists of the root block and the 31 leaf blocks. There are no intermediate branch blocks in your index.

10. Tom - October 30, 2009

Hello Richard,

What about updates? for example I have a composite index with 3 columns in a 50M table. ucol1+ccol2+scol3
ucol1=unchanged column, primary key
ccol2= changed column that changes by batch 300k to 500k rows
scol3= sometimes changes, 1 row at a time by OLTP

most of the queries are using col1+col2+col3, Iam trying to tune the batch update and am thinking that we should move col2 as the last column in the index. ccol2 is a status col so goes from created to executed to completed, 90% of the rows will be completed.

Do you think moving col2 as last in the order will help the batch updates? not 100% sure as i think it will always delete/insert index block rather than update…what do you think, will it help batch updates? if so why? if not why? Thank you

11. Richard Foote - October 30, 2009

Hi Tom

I’m just about to pop on a plane so sorry for the short response but I would suggest it wouldn’t make much different either way as in both scenarios, the entire index entry is having to be deleted and re-inserted elsewhere with the index structure (even if it’s logically within the same leaf block).

It’s one of those things you could always benchmark although the option of dropping / marking unusable the index and then rebuilding after the batch process, may improve the batch process by enough to justify the additional overheads of such a move.