jump to navigation

Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out) June 4, 2018

Posted by Richard Foote in Block Dumps, Branch Blocks, Index Branches, Index Column Order, Index Compression, Index Internals, Oracle Indexes.
trackback

day in day out bowie

I recently replied on Twitter to some comments regarding an excellent blog post by Franck Pachot – Covering indexes in Oracle, and branch size, where I disagreed somewhat with one of the conclusions stated in the post:

ensure that selective columns appear as early as possible (without compromising the index access efficiency of course) in order to lower the bytes required to address branches and leaves“.

Based on the Twitter discussion, the post was updated on 14 April 2018 with an additional clarification that putting the most selective indexed column first is a “common misconception“.

I’ve written a number of times about index column order, including this post that’s now some 10 years old – “It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?“. The summary being that it generally makes no appreciable difference to the performance of an index in which order you position the columns in an index, if all index columns are referenced equality type SQL predicates. I thought it might be worth revisiting this topic, with a new example that discusses why I specifically disagree with the notion of putting the most selective columns first, despite the possible impact on Index Branches.

I’ll begin with a simple table that has 2 columns of interest, the ID which is effectively unique and the CODE column which is “relatively” large in size but only has 5 distinct values:

SQL> CREATE TABLE ziggy AS
SELECT rownum id, 'SOME LARGE OFTEN REPEATED VALUE ' || mod(rownum,5) code, 'ZIGGY' name
FROM dual CONNECT BY LEVEL <= 2000000;

Table created.

I'll next create a concatenated index based on both the ID and CODE columns, with the highly selective ID column leading:

SQL> create index ziggy_id_code_i ON ziggy(id, code);

Index created.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14135         23      176612   113264736  101146313

So we notice the index has a Height of 3, with a total of 23 Index Branch blocks. There are a total of 14,135 leaf blocks.

If we look at a partial block dump of a Branch block:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 21
kdxcofbo 70=0x46
kdxcofeo 7840=0x1ea0
kdxcoavs 7770
kdxbrlmc 29440826=0x1c13b3a
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8050] dba: 29441507=0x1c13de3
col 0; len 4; (4): c3 0a 45 4e
col 1; TERM
row#1[8040] dba: 29442190=0x1c1408e
col 0; len 4; (4): c3 14 1b 58
col 1; TERM
row#2[8030] dba: 29442871=0x1c14337
col 0; len 4; (4): c3 1d 55 62
col 1; TERM

We can see that each entry in the Index Branch only contains the leading ID column. That’s because the column is so selective that it provides all the necessary data to determine the exact Leaf Block location of any given indexed value. The following columns (CODE and ROWID) do not provide any additional useful information and would be redundant if stored. Therefore each Index Branch entry is shown with a TERM value, meaning that subsequent indexed values are not stored within the Index Branch.

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=4242 AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(("ID"=4 OR "ID"=42 OR "ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242)
AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We note for now the number of consistent gets (5 and 19) for each of these queries.

If we now create another index, but this time with the columns the other way around and so with the very unselective CODE column leading:

SQL> create index ziggy_code_id_i on ziggy(code,id);

Index created.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14125         83      656341   113666656  101626042

So the number of Index Branch blocks has increased from 23 to 83 compared to the other index (although the number of Leaf Blocks are almost the same). Note that at 83, the percentage of branch blocks to leaf blocks is still tiny, just 0.06%.

The reason for the greater number of Index Branches can be seen with a partial index block dump of an Index Branch:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 81
kdxcofbo 190=0xbe
kdxcofeo 4458=0x116a
kdxcoavs 4268
kdxbrlmc 29440318=0x1c1393e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8016] dba: 29440496=0x1c139f0
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 0d 3d 38
col 2; TERM
row#1[7972] dba: 29440676=0x1c13aa4
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 1a 0c 51
col 2; TERM
row#2[7928] dba: 29440854=0x1c13b56
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 26 40 06
col 2; TERM

With the larger CODE column now leading, the column must therefore be stored within the Branch Block. However, as this column is so unselective with just 5 distinct values (notice how the same col 0 CODE value is repeated for each of the displayed branch entries), it’s not sufficient on its own to ensure the navigation down to the first leaf block containing the required index entry. Therefore, the next column (the highly selective col 1 ID column) is also necessary as part of each branch entry.

The branch entry with both the CODE and ID columns has ranges sufficiently selective enough to ensure any indexed value can be found within leaf blocks. Therefore the third column (the Rowid) is not required and is marked with the TERM value in the block dump.

So on the surface, it looks as if this index is not as efficient as there are indeed more Index Branches within the index. However, during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case), having more branches as in this example makes no appreciable difference to the efficiency of the index.

If we run the same queries as we did when using the previous index:

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND "ID"=4242)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND ("ID"=4 OR "ID"=42 OR
"ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242))

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We notice the number of consistent gets remains exactly the same, with the additional branch blocks making no appreciable difference to the performance of the index.

So the column order, providing all index columns are referenced with equality type SQL predicates, makes no real difference to the performance of the index. In both cases, there are enough columns referenced in the branch blocks to always point down to the first index leaf block that contains the first index entry of interest.

In Part II, we’ll see how having the unselective column as the leading column of the index can actually make an appreciable positive difference to the index.

Advertisements

Comments»

1. Gabriel - June 6, 2018

One thing though : having the low cardinality columns in front means we can compress better and the index will became smaller . Doesn’t that make him more attractive/optimum ?

Like

Richard Foote - June 7, 2018

HI Gabriel

That’s correct and the next topic I’ll cover.

Like

2. Sekar - June 7, 2018

Hi Richard,
Can you explain the following in detail
“during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case)”
Do you mean that each access to a leaf block will need to access only one branch block regardless of the number of branch blocks present?
We can only see improvement if we reduce the blevel to 2 i.e there is no branch block ?
Thanks

Like

Richard Foote - June 7, 2018

Hi Sekar

You only need to access a branch block at each level they exist. So for an index with a HEIGHT of 3, that’s just the root branch block and an intermediate branch block. The number of branch blocks at each level is therefore of trivial consequence.

Being able to reduce the number of leaf blocks is where it can possibly matter (which of course in turn results in few necessary branch blocks).

Like

3. Index Column Order – Impact On Index Branch Blocks Part II (The Weeping Song) | Richard Foote's Oracle Blog - July 5, 2018

[…] Part I, I discussed how the order of columns in an index makes no real difference to the effectiveness of […]

Like

4. Shri - July 5, 2018

Hi Rich,
I am not getting how it will be better to do index skip scan where leading index column has less distinct value.
Plz explain it.
Plz correct if I am wrong
When there is composite index it will scan the leading column and for every value of leading column it will scan the second column like for leading column for value 1 it will search the value of second column, similarly it will do for all values untill and unless it doesn’t get the value

Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: