Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out) June 4, 2018
Posted by Richard Foote in Block Dumps, Branch Blocks, Index Branches, Index Column Order, Index Compression, Index Internals, Oracle Indexes.15 comments
I recently replied on Twitter to some comments regarding an excellent blog post by Franck Pachot – Covering indexes in Oracle, and branch size, where I disagreed somewhat with one of the conclusions stated in the post:
“ensure that selective columns appear as early as possible (without compromising the index access efficiency of course) in order to lower the bytes required to address branches and leaves“.
Based on the Twitter discussion, the post was updated on 14 April 2018 with an additional clarification that putting the most selective indexed column first is a “common misconception“.
I’ve written a number of times about index column order, including this post that’s now some 10 years old – “It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?“. The summary being that it generally makes no appreciable difference to the performance of an index in which order you position the columns in an index, if all index columns are referenced equality type SQL predicates. I thought it might be worth revisiting this topic, with a new example that discusses why I specifically disagree with the notion of putting the most selective columns first, despite the possible impact on Index Branches.
I’ll begin with a simple table that has 2 columns of interest, the ID which is effectively unique and the CODE column which is “relatively” large in size but only has 5 distinct values:
SQL> CREATE TABLE ziggy AS SELECT rownum id, 'SOME LARGE OFTEN REPEATED VALUE ' || mod(rownum,5) code, 'ZIGGY' name FROM dual CONNECT BY LEVEL <= 2000000; Table created.
I'll next create a concatenated index based on both the ID and CODE columns, with the highly selective ID column leading:
SQL> create index ziggy_id_code_i ON ziggy(id, code); Index created. SQL> analyze index ziggy_id_code_i validate structure; Index analyzed. SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats; HEIGHT LF_BLKS BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE ---------- ---------- ---------- ----------- ----------- ---------- 3 14135 23 176612 113264736 101146313
So we notice the index has a Height of 3, with a total of 23 Index Branch blocks. There are a total of 14,135 leaf blocks.
If we look at a partial block dump of a Branch block:
Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 21
kdxcofbo 70=0x46
kdxcofeo 7840=0x1ea0
kdxcoavs 7770
kdxbrlmc 29440826=0x1c13b3a
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8050] dba: 29441507=0x1c13de3
col 0; len 4; (4): c3 0a 45 4e
col 1; TERM
row#1[8040] dba: 29442190=0x1c1408e
col 0; len 4; (4): c3 14 1b 58
col 1; TERM
row#2[8030] dba: 29442871=0x1c14337
col 0; len 4; (4): c3 1d 55 62
col 1; TERM
…
We can see that each entry in the Index Branch only contains the leading ID column. That’s because the column is so selective that it provides all the necessary data to determine the exact Leaf Block location of any given indexed value. The following columns (CODE and ROWID) do not provide any additional useful information and would be redundant if stored. Therefore each Index Branch entry is shown with a TERM value, meaning that subsequent indexed values are not stored within the Index Branch.
SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2'; Execution Plan ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 45 | 4(0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | ZIGGY | 1 | 45 | 4(0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | ZIGGY_ID_CODE_I | 1 | | 3(0) | 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=4242 AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 713 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2'; Execution Plan ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 45 | 9 (0) | 00:00:01 | | 1 | INLIST ITERATOR | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID | ZIGGY | 1 | 45 | 9 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | ZIGGY_ID_CODE_I | 1 | | 8 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(("ID"=4 OR "ID"=42 OR "ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242) AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 19 consistent gets 0 physical reads 0 redo size 861 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed
We note for now the number of consistent gets (5 and 19) for each of these queries.
If we now create another index, but this time with the columns the other way around and so with the very unselective CODE column leading:
SQL> create index ziggy_code_id_i on ziggy(code,id); Index created. SQL> analyze index ziggy_code_id_i validate structure; Index analyzed. SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats; HEIGHT LF_BLKS BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE ---------- ---------- ---------- ----------- ----------- ---------- 3 14125 83 656341 113666656 101626042
So the number of Index Branch blocks has increased from 23 to 83 compared to the other index (although the number of Leaf Blocks are almost the same). Note that at 83, the percentage of branch blocks to leaf blocks is still tiny, just 0.06%.
The reason for the greater number of Index Branches can be seen with a partial index block dump of an Index Branch:
Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 81
kdxcofbo 190=0xbe
kdxcofeo 4458=0x116a
kdxcoavs 4268
kdxbrlmc 29440318=0x1c1393e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8016] dba: 29440496=0x1c139f0
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 0d 3d 38
col 2; TERM
row#1[7972] dba: 29440676=0x1c13aa4
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 1a 0c 51
col 2; TERM
row#2[7928] dba: 29440854=0x1c13b56
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 26 40 06
col 2; TERM
…
With the larger CODE column now leading, the column must therefore be stored within the Branch Block. However, as this column is so unselective with just 5 distinct values (notice how the same col 0 CODE value is repeated for each of the displayed branch entries), it’s not sufficient on its own to ensure the navigation down to the first leaf block containing the required index entry. Therefore, the next column (the highly selective col 1 ID column) is also necessary as part of each branch entry.
The branch entry with both the CODE and ID columns has ranges sufficiently selective enough to ensure any indexed value can be found within leaf blocks. Therefore the third column (the Rowid) is not required and is marked with the TERM value in the block dump.
So on the surface, it looks as if this index is not as efficient as there are indeed more Index Branches within the index. However, during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case), having more branches as in this example makes no appreciable difference to the efficiency of the index.
If we run the same queries as we did when using the previous index:
SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2'; Execution Plan ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 45 | 4(0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | ZIGGY | 1 | 45 | 4(0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | ZIGGY_CODE_ID_I | 1 | | 3(0) | 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND "ID"=4242) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 713 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2'; Execution Plan ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 45 | 9 (0) | 00:00:01 | | 1 | INLIST ITERATOR | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID | ZIGGY | 1 | 45 | 9 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN | ZIGGY_CODE_ID_I | 1 | | 8 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND ("ID"=4 OR "ID"=42 OR "ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242)) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 19 consistent gets 0 physical reads 0 redo size 861 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed
We notice the number of consistent gets remains exactly the same, with the additional branch blocks making no appreciable difference to the performance of the index.
So the column order, providing all index columns are referenced with equality type SQL predicates, makes no real difference to the performance of the index. In both cases, there are enough columns referenced in the branch blocks to always point down to the first index leaf block that contains the first index entry of interest.
In Part II, we’ll see how having the unselective column as the leading column of the index can actually make an appreciable positive difference to the index.