jump to navigation

Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out) June 4, 2018

Posted by Richard Foote in Block Dumps, Branch Blocks, Index Branches, Index Column Order, Index Compression, Index Internals, Oracle Indexes.
trackback

day in day out bowie

I recently replied on Twitter to some comments regarding an excellent blog post by Franck Pachot – Covering indexes in Oracle, and branch size, where I disagreed somewhat with one of the conclusions stated in the post:

ensure that selective columns appear as early as possible (without compromising the index access efficiency of course) in order to lower the bytes required to address branches and leaves“.

Based on the Twitter discussion, the post was updated on 14 April 2018 with an additional clarification that putting the most selective indexed column first is a “common misconception“.

I’ve written a number of times about index column order, including this post that’s now some 10 years old – “It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?“. The summary being that it generally makes no appreciable difference to the performance of an index in which order you position the columns in an index, if all index columns are referenced equality type SQL predicates. I thought it might be worth revisiting this topic, with a new example that discusses why I specifically disagree with the notion of putting the most selective columns first, despite the possible impact on Index Branches.

I’ll begin with a simple table that has 2 columns of interest, the ID which is effectively unique and the CODE column which is “relatively” large in size but only has 5 distinct values:

SQL> CREATE TABLE ziggy AS
SELECT rownum id, 'SOME LARGE OFTEN REPEATED VALUE ' || mod(rownum,5) code, 'ZIGGY' name
FROM dual CONNECT BY LEVEL <= 2000000;

Table created.

I'll next create a concatenated index based on both the ID and CODE columns, with the highly selective ID column leading:

SQL> create index ziggy_id_code_i ON ziggy(id, code);

Index created.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14135         23      176612   113264736  101146313

So we notice the index has a Height of 3, with a total of 23 Index Branch blocks. There are a total of 14,135 leaf blocks.

If we look at a partial block dump of a Branch block:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 21
kdxcofbo 70=0x46
kdxcofeo 7840=0x1ea0
kdxcoavs 7770
kdxbrlmc 29440826=0x1c13b3a
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8050] dba: 29441507=0x1c13de3
col 0; len 4; (4): c3 0a 45 4e
col 1; TERM
row#1[8040] dba: 29442190=0x1c1408e
col 0; len 4; (4): c3 14 1b 58
col 1; TERM
row#2[8030] dba: 29442871=0x1c14337
col 0; len 4; (4): c3 1d 55 62
col 1; TERM

We can see that each entry in the Index Branch only contains the leading ID column. That’s because the column is so selective that it provides all the necessary data to determine the exact Leaf Block location of any given indexed value. The following columns (CODE and ROWID) do not provide any additional useful information and would be redundant if stored. Therefore each Index Branch entry is shown with a TERM value, meaning that subsequent indexed values are not stored within the Index Branch.

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=4242 AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(("ID"=4 OR "ID"=42 OR "ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242)
AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We note for now the number of consistent gets (5 and 19) for each of these queries.

If we now create another index, but this time with the columns the other way around and so with the very unselective CODE column leading:

SQL> create index ziggy_code_id_i on ziggy(code,id);

Index created.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14125         83      656341   113666656  101626042

So the number of Index Branch blocks has increased from 23 to 83 compared to the other index (although the number of Leaf Blocks are almost the same). Note that at 83, the percentage of branch blocks to leaf blocks is still tiny, just 0.06%.

The reason for the greater number of Index Branches can be seen with a partial index block dump of an Index Branch:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 81
kdxcofbo 190=0xbe
kdxcofeo 4458=0x116a
kdxcoavs 4268
kdxbrlmc 29440318=0x1c1393e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8016] dba: 29440496=0x1c139f0
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 0d 3d 38
col 2; TERM
row#1[7972] dba: 29440676=0x1c13aa4
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 1a 0c 51
col 2; TERM
row#2[7928] dba: 29440854=0x1c13b56
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 26 40 06
col 2; TERM

With the larger CODE column now leading, the column must therefore be stored within the Branch Block. However, as this column is so unselective with just 5 distinct values (notice how the same col 0 CODE value is repeated for each of the displayed branch entries), it’s not sufficient on its own to ensure the navigation down to the first leaf block containing the required index entry. Therefore, the next column (the highly selective col 1 ID column) is also necessary as part of each branch entry.

The branch entry with both the CODE and ID columns has ranges sufficiently selective enough to ensure any indexed value can be found within leaf blocks. Therefore the third column (the Rowid) is not required and is marked with the TERM value in the block dump.

So on the surface, it looks as if this index is not as efficient as there are indeed more Index Branches within the index. However, during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case), having more branches as in this example makes no appreciable difference to the efficiency of the index.

If we run the same queries as we did when using the previous index:

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND "ID"=4242)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND ("ID"=4 OR "ID"=42 OR
"ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242))

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We notice the number of consistent gets remains exactly the same, with the additional branch blocks making no appreciable difference to the performance of the index.

So the column order, providing all index columns are referenced with equality type SQL predicates, makes no real difference to the performance of the index. In both cases, there are enough columns referenced in the branch blocks to always point down to the first index leaf block that contains the first index entry of interest.

In Part II, we’ll see how having the unselective column as the leading column of the index can actually make an appreciable positive difference to the index.

Comments»

1. Gabriel - June 6, 2018

One thing though : having the low cardinality columns in front means we can compress better and the index will became smaller . Doesn’t that make him more attractive/optimum ?

Like

Richard Foote - June 7, 2018

HI Gabriel

That’s correct and the next topic I’ll cover.

Like

2. Sekar - June 7, 2018

Hi Richard,
Can you explain the following in detail
“during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case)”
Do you mean that each access to a leaf block will need to access only one branch block regardless of the number of branch blocks present?
We can only see improvement if we reduce the blevel to 2 i.e there is no branch block ?
Thanks

Like

Richard Foote - June 7, 2018

Hi Sekar

You only need to access a branch block at each level they exist. So for an index with a HEIGHT of 3, that’s just the root branch block and an intermediate branch block. The number of branch blocks at each level is therefore of trivial consequence.

Being able to reduce the number of leaf blocks is where it can possibly matter (which of course in turn results in few necessary branch blocks).

Like

3. Index Column Order – Impact On Index Branch Blocks Part II (The Weeping Song) | Richard Foote's Oracle Blog - July 5, 2018

[…] Part I, I discussed how the order of columns in an index makes no real difference to the effectiveness of […]

Like

4. Shri - July 5, 2018

Hi Rich,
I am not getting how it will be better to do index skip scan where leading index column has less distinct value.
Plz explain it.
Plz correct if I am wrong
When there is composite index it will scan the leading column and for every value of leading column it will scan the second column like for leading column for value 1 it will search the value of second column, similarly it will do for all values untill and unless it doesn’t get the value

Like

Richard Foote - July 18, 2018
5. Vishnu Potukanuma - January 18, 2020

Hi Richard,

the actual use of TERM appears to be different and is actually used specifically when the block split results in subsequent entries of a particular key is split across different leaf blocks due to 50-50 split. consider the following trace of Branch block:

row#24[4120] dba: 29373640=0x1c034c8
col 0; len 2; (2): c1 0d
col 1; len 3; (3): 01 c0 1c
row#25[4816] dba: 29361275=0x1c0047b
col 0; len 2; (2): c1 0e
col 1; TERM ==> Term USED
row#26[4824] dba: 29371785=0x1c02d89 –> leaf block dump trace shown next
col 0; len 2; (2): c1 0e
col 1; len 3; (3): 01 c0 19
row#27[4835] dba: 29365479=0x1c014e7
col 0; len 2; (2): c1 0f
col 1; len 4; (4): 01 c0 05 4b

for any entry such as
row#24[4120] dba: 29373640=0x1c034c8
col 0; len 2; (2): c1 0d
col 1; len 3; (3): 01 c0 1c
we know that 0x1c034c8 leaf block contains values incremented from c1 0d…. and c1 0d is the starting point…. of entries in the leaf blocks… I mean c1 0d is the least value in that leaf block 0x1c034c8 and values increase from there…

It so happens that due to a 50-50 block split that entries of a particular value in our case c1 0f split across two different leaf block… TERM actually signals that the values start from the previous leaf block….

leaf block dump of 0x1c02d89
kdxlenxt 29365479=0x1c014e7
kdxleprv 29361275=0x1c0047b

kdxlenxt 29372021=0x1c02e75
kdxleprv 29371785=0x1c02d89 –> Term USED….
kdxledsz 0
kdxlebksz 7816
row#0[4420] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 0f
col 1; len 6; (6): 01 c0 05 4b 00 13
row#1[4432] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 0f
col 1; len 6; (6): 01 c0 05 57 00 4b
row#2[4444] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 0f
col 1; len 6; (6): 01 c0 05 5f 00 38
row#3[4456] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 0f
col 1; len 6; (6): 01 c0 05 73 00 45

Use of TERM indicates that a actual value of interest starts from a middle of the previous leaf block…

it is the same case with multi-column indexes as well:

row#219[4913] dba: 29380028=0x1c04dbc
col 0; len 2; (2): c1 18
col 1; len 2; (2): c1 1c
col 2; TERM
row#220[4898] dba: 29380029=0x1c04dbd
col 0; len 2; (2): c1 18
col 1; len 3; (3): c2 02 4e
col 2; len 3; (3): 01 c0 1c
row#221[4883] dba: 29380030=0x1c04dbe
col 0; len 2; (2): c1 18
col 1; len 3; (3): c2 04 15
col 2; len 3; (3): 01 c0 19

leaf block dump –> 0x1c04dbd
kdxlenxt 29380030=0x1c04dbe
kdxleprv 29380028=0x1c04dbc

leaf block dump of 0x1c04dbc….
row#407[1707] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 18
col 1; len 3; (3): c2 02 4d
col 2; len 6; (6): 01 c0 3b 65 00 35
row#408[1691] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 18
col 1; len 3; (3): c2 02 4e
col 2; len 6; (6): 01 c0 12 73 00 01
row#409[1675] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 18
col 1; len 3; (3): c2 02 4e
col 2; len 6; (6): 01 c0 16 3a 00 35
—– end of leaf block Logical dump —–
—– end of leaf block dump —–
End dump data blocks tsn: 4 file#: 7 minblk 19900 maxblk 19900

Thanks,
Vishnu

Like

6. Vishnu Potukanuma - January 18, 2020

Regarding the use of additional cols in the branch block… it appears that Oracle can include either column value or portion of the ROWID of the depending on whether if it can uniquely traverse the B-Tree structure given the cardinality or NDV of the column value…

Considering a multi-column index (A, B, C):
A has low NDV, B has medium NDV and C has High NDV and the values are populated randomly
If there are only 10 rows in the table for column A say 1, for that portion it add only A column in the branch block… as rows are inserted if it detects that given the predicates A = ? and B=? if it cannot reach the index leaf block just with A it adds B to the Branch blocks… and this process repeats…

So finally if considering the case A = ? and B = ? and C=? if suppose the values are less unique i mean a = ? and b=? and c=? results in multiple values… and can span across multiple leaf blocks, in this case it also adds rowid or even portion of the ROWID if it determines that it is enough…

row#407[951] dba: 29369816=0x1c025d8
col 0; len 2; (2): c1 05
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 0c 86 00 55
row#408[934] dba: 29369817=0x1c025d9
col 0; len 2; (2): c1 05
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 4; (4): 01 c0 19 62

it looks like even though rowid addition is redundant and appears silly even when all the columns are have same value..

the addition of ROWID appears to be to far more complex than actually anticipated.. it looks like it is to support partition execution and lower fetch sizes to reduce the number of gets required…
I have to perform additional tests to see if this is actual the case …

Like

Richard Foote - January 18, 2020

Hi Vishnu

Thank you for your comments and valuable demos, it’s always nice to see people interested in how Oracle works and using block dumps to help illustrate internal behavior.

Now I suspect you’re coming to these conclusions because of one piece of missing understanding in how Oracle uses indexes. Your comment “it looks like even though rowid addition is redundant and appears silly even when all the columns are have same value..” suggests this.

So let me quickly explain why this isn’t silly but critical and why this also explains the behavior you see in your first example.

The humble Delete statement.

Oracle does all you describe above, because it’s critical to not only find the first leaf block that contains an indexed VALUE of interest, but also because it needs to find the leaf block that contains a specific index ENTRY of interest, classically when performing a Delete operation.

If you delete a row from a table that is indexed on a non-unique value (say a column contains millions of rows with the same value), then it’s not sufficient to only determine the leaf block in which these values begin, because then Oracle would have to perhaps trawl through millions of index entries and maybe millions of index leaf blocks searching for the specific index entry that contains the Rowid of the row being deleted.

For this reason, Oracle stores the Rowid as part of the index entry and ensures there’s enough information in the index branches to navigate to the leaf block that contains both the index value AND the associated Rowid, so when performing a delete, you are guaranteed to navigate to the SPECIFIC leaf block that contains the index entry Oracle wants to delete.

This explains the behavior you see in both your examples.

Hopes this makes sense.

Richard

Like

7. Vishnu Potukanuma - January 18, 2020

as my previous comment can create confusion and why i was incorrect in assuming that the TERM is the thing that triggers the subsequent leaf block read…

I will elaborate more on my entire investigation and why I came to the conclusion that split is the reason… (VERY BAD ON MY PART DIDN’T CONSIDER ALL THE VARIABLES and investigated only the indexes where values in the leaf blocks were not contained….)

use of TERM in the branch block entries is valid only when entries in the leaf block are contained and not split across multiple leaf blocks and happens only when
1. Column values are unique and we create a non-unique index.
2. unique indexes does not use TERM (single column).
3. Unique indexes multiple column when leading column is almost unique and entries in the leaf block are 100% contained based on the leading or leading set of columns
4. When the values in the leaf blocks are contained….

The following explains more precisely…

create table students(roll number);
insert into students select rownum from dual connect by level < 1000;
insert into students select rownum from dual connect by level also explains why ROW#0 entry in the branch block contains TERM:

1st leaf block
0x1c00164
row#490[1851] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 03 2f
col 1; len 6; (6): 01 c0 01 5e 00 f5
row#491[1838] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 03 2f
col 1; len 6; (6): 01 c0 01 5f 02 48
—– end of leaf block Logical dump —–
—– end of leaf block dump —–
End dump data blocks tsn: 4 file#: 7 minblk 356 maxblk 356

As column values are sorted in the index leaf blocks, and the tailing section of the 1st leaf block do indicate 2 values meaning that a column value is not split and all the entries in the leaf block are contained.. use of TERM is justified…. same is the case with the 2nd leaf block

2nd leaf block:
0x1c00165
row#476[1835] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 05 56
col 1; len 6; (6): 01 c0 01 5b 00 a3
row#477[1822] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 05 56
col 1; len 6; (6): 01 c0 01 5e 01 e4
—– end of leaf block Logical dump —–
—– end of leaf block dump —–
End dump data blocks tsn: 4 file#: 7 minblk 357 maxblk 357

Things go south from 3rd block and the last use of TERM:
3rd leaf block:
0x1c00166 . precisely when the TERM stopped:
row#476[1837] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 08 19
col 1; len 6; (6): 01 c0 01 5b 01 92
row#477[1824] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 08 19
col 1; len 6; (6): 01 c0 01 5f 00 3f
row#478[1811] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 08 1a
col 1; len 6; (6): 01 c0 01 5b 01 93
—– end of leaf block Logical dump —–
—– end of leaf block dump —–
End dump data blocks tsn: 4 file#: 7 minblk 358 maxblk 358

Third leaf block if we observe entry c2 08 1a has only one value, and the value c2 08 1a is the starting point of the 4th leaf block…. as shown in the Branch block dump:

row#2[8025] dba: 29360487=0x1c00167
col 0; len 3; (3): c2 08 1a
col 1; len 4; (4): 01 c0 01 5f

and we can confirm this with 4th leaf block dump as well…..

header address 2098339940=0x7d122064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 478
kdxcofbo 992=0x3e0
kdxcofeo 1822=0x71e
kdxcoavs 830
kdxlespl 0
kdxlende 0
kdxlenxt 29360488=0x1c00168
kdxleprv 29360486=0x1c00166
kdxledsz 0
kdxlebksz 8032
row#0[8019] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 08 1a
col 1; len 6; (6): 01 c0 01 5f 00 40



row#1[8006] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 08 1b
col 1; len 6; (6): 01 c0 01 5b 01 94
row#2[7993] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 08 1b
col 1; len 6; (6): 01 c0 01 5f 00 41
row#3[7980] flag: ——-, lock: 0, len=13

we can see the ROW#0 is the starting point…. starting here oracle simply used ROWID or added ROWID column….

So does it keep on Adding ROWIDs all the time?
No.. it wont…. only if the entries are split across multiple leaf blocks…. that too added for the subsequent entry in the branch blocks as the previous index branch block entry – or index leaf entries are self contained when it comes to this block alone….

Now we insert data as follows:

SQL> insert into students select * from (select rownum “ROLL” from dual connect by level 999;

2000 rows created.

SQL> commit;

dump the branch block again…
Here we are basically inserting unique data using single session resulting in 90-10 splits.. which is good…

The tree dump and branch block dump is as follows:
branch: 0x1c00163 29360483 (0: nrow: 8, level: 1)
leaf: 0x1c00164 29360484 (-1: row:492.492 avs:818)
leaf: 0x1c00165 29360485 (0: row:478.478 avs:830)
leaf: 0x1c00166 29360486 (1: row:479.479 avs:817)
leaf: 0x1c00167 29360487 (2: row:478.478 avs:830)
leaf: 0x1c00168 29360488 (3: row:533.533 avs:6)
leaf: 0x1c0016d 29360493 (4: row:533.533 avs:6)
leaf: 0x1c0016e 29360494 (5: row:533.533 avs:7)
leaf: 0x1c0016f 29360495 (6: row:472.472 avs:920)

row#0[8047] dba: 29360485=0x1c00165
col 0; len 3; (3): c2 03 30
col 1; TERM
row#1[8038] dba: 29360486=0x1c00166
col 0; len 3; (3): c2 05 57
col 1; TERM
row#2[8025] dba: 29360487=0x1c00167
col 0; len 3; (3): c2 08 1a
col 1; len 4; (4): 01 c0 01 5f
row#3[8012] dba: 29360488=0x1c00168
col 0; len 3; (3): c2 0a 41
col 1; len 4; (4): 01 c0 01 5f
row#4[8003] dba: 29360493=0x1c0016d
col 0; len 3; (3): c2 0f 3f
col 1; TERM
row#5[7994] dba: 29360494=0x1c0016e
col 0; len 3; (3): c2 14 60
col 1; TERM
row#6[7985] dba: 29360495=0x1c0016f
col 0; len 3; (3): c2 1a 1d
col 1; TERM
—– end of branch block dump —–

and the use of TERM is also justified as the subsequent entries are all unique and ROWID is not required…

all i did now is reinsert the data back again:
insert into students select * from (select rownum “ROLL” from dual connect by level 999;
commit;

and the dump is as follows:

branch: 0x1c0016b 29360491 (0: nrow: 13, level: 1)
leaf: 0x1c0016c 29360492 (-1: row:492.492 avs:818)
leaf: 0x1c0016d 29360493 (0: row:478.478 avs:830)
leaf: 0x1c0016e 29360494 (1: row:479.479 avs:817)
leaf: 0x1c0016f 29360495 (2: row:478.478 avs:830)
leaf: 0x1c00188 29360520 (3: row:479.479 avs:817)
leaf: 0x1c00189 29360521 (4: row:478.478 avs:830)
leaf: 0x1c0018a 29360522 (5: row:478.478 avs:830)
leaf: 0x1c0018b 29360523 (6: row:479.479 avs:817)
leaf: 0x1c0018c 29360524 (7: row:478.478 avs:830)
leaf: 0x1c0018d 29360525 (8: row:478.478 avs:830)
leaf: 0x1c0018e 29360526 (9: row:479.479 avs:817)
leaf: 0x1c0018f 29360527 (10: row:478.478 avs:830)
leaf: 0x1c00191 29360529 (11: row:244.244 avs:4338)

and the branch block dump now is as follows:
kdxbrlmc 29360492=0x1c0016c
kdxbrsno 0
kdxbrbksz 8056
kdxbr2urrc 0
row#0[8047] dba: 29360493=0x1c0016d
col 0; len 3; (3): c2 03 30
col 1; TERM
row#1[8038] dba: 29360494=0x1c0016e
col 0; len 3; (3): c2 05 57
col 1; TERM
row#2[8025] dba: 29360495=0x1c0016f
col 0; len 3; (3): c2 08 1a
col 1; len 4; (4): 01 c0 01 5f
row#3[8012] dba: 29360520=0x1c00188
col 0; len 3; (3): c2 0a 41
col 1; len 4; (4): 01 c0 01 5f
row#4[8003] dba: 29360521=0x1c00189
col 0; len 3; (3): c2 0d 05
col 1; TERM
row#5[7994] dba: 29360522=0x1c0018a
col 0; len 3; (3): c2 0f 2c
col 1; TERM
row#6[7985] dba: 29360523=0x1c0018b
col 0; len 3; (3): c2 11 53
col 1; TERM
row#7[7972] dba: 29360524=0x1c0018c
col 0; len 3; (3): c2 14 16
col 1; len 4; (4): 01 c0 01 70
row#8[7959] dba: 29360525=0x1c0018d
col 0; len 3; (3): c2 16 3d
col 1; len 4; (4): 01 c0 01 70
row#9[7946] dba: 29360526=0x1c0018e
col 0; len 3; (3): c2 18 64
col 1; len 4; (4): 01 c0 01 76
row#10[7937] dba: 29360527=0x1c0018f
col 0; len 3; (3): c2 1b 28
col 1; TERM
row#11[7928] dba: 29360529=0x1c00191
col 0; len 3; (3): c2 1d 4f
col 1; TERM

we see more blocks with TERM, even though 2 rows only for each value…. So basically by now we know that the size of the index column value and the NDV plays a vital role… and even 2 entries in the index leaf block can signal a read of two index leaf blocks…

If Oracle considered Only TERM as to save entries in branches… and has no other meaning or not used in the processing…. we would get wrong results or as the SQL statement results in only one row as indicated in the entries in the branch blocks…. if it went strictly with the branch block entries… .which could have been a serious flaw in the implementation itself… but the situation is not the case….

So is it precisely ROWID in the subsequent branch block entry immediately following the TERM that triggers reading the previous leaf block… we are not 100% sure unless we read the code….but we can find out that it is not the TERM that triggers the mechanism by performing a trace or looking at the buffer cache immediately following the SQL statement execution… and it confirms that the other leaf blocks are not read or accessed…
Performed tests using:
1. Checking the latch gets on the blocks…
2. flushing buffer cache / restarting database…

I came to this conclusion by investigating only the indexes when the values were not contained.. which I should have looked at previously or even considered the case….

So what other factors can trigger ROWID to be inserted…
– 50-50 split
– 90-10 split
– increased column lengths…
– way the data is populated…

Now that TERM part is done… next the ROWID part or only the leading bytes of the ROWID in the branch blocks….
1. ROWID in the branch block can trigger an additional or previous leaf block entry read.
2. indicate start and stop.
3. still investigating the extent

Thanks,
Vishnu

Like

8. Vishnu Potukanuma - January 18, 2020

looks like i didn’t paste properly… the scenario:

The following explains more precisely…

create table students (roll number);
insert into students select rownum from dual connect by level < 1000;
insert into students select rownum from dual connect by level < 1000;
commit;
create index idx on students(roll);

we have precisely only 2 row for each value…

now that we use the treedump…

—– begin tree dump
branch: 0x1c00163 29360483 (0: nrow: 5, level: 1)
leaf: 0x1c00164 29360484 (-1: row:492.492 avs:818)
leaf: 0x1c00165 29360485 (0: row:478.478 avs:830)
leaf: 0x1c00166 29360486 (1: row:479.479 avs:817)
leaf: 0x1c00167 29360487 (2: row:478.478 avs:830)
leaf: 0x1c00168 29360488 (3: row:71.71 avs:6931)

and the subsequent branch block dump..
kdxbrlmc 29360484=0x1c00164
kdxbrsno 0
kdxbrbksz 8056
kdxbr2urrc 0
row#0[8047] dba: 29360485=0x1c00165
col 0; len 3; (3): c2 03 30
col 1; TERM
row#1[8038] dba: 29360486=0x1c00166
col 0; len 3; (3): c2 05 57
col 1; TERM
row#2[8025] dba: 29360487=0x1c00167
col 0; len 3; (3): c2 08 1a
col 1; len 4; (4): 01 c0 01 5f
row#3[8012] dba: 29360488=0x1c00168
col 0; len 3; (3): c2 0a 41
col 1; len 4; (4): 01 c0 01 5f
—– end of branch block dump —–

Yes Richard… your explanation is perfect…..

If you delete a row from a table that is indexed on a non-unique value (say a column contains millions of rows with the same value), then it’s not sufficient to only determine the leaf block in which these values begin, because then Oracle would have to perhaps trawl through millions of index entries and maybe millions of index leaf blocks searching for the specific index entry that contains the Rowid of the row being deleted.

I was looking at the partial SQL execution and low fetch sizes especially when executing statements that involve simple index scans and nested loops….

Thanks,
Vishnu

Like

Richard Foote - January 19, 2020

Hi Vishnu

The use of TERM in a leaf block is simply to denote that there are no more column(s) that need be stored within the branch entry in order for Oracle to guarantee the starting location of any required index entry.

If a single indexed column value spreads across 2 leaf blocks, even in the scenario such as yours where there are only 2 occurrences, then Oracle MUST store the rowid within the second branch entry to let Oracle know that you must visit the preceding leaf block (i.e. any values of interest less than the columns denoted in the branch) if you want the first column value occurrence as in a range scan or you want the column value with a rowid less than that denoted in the branch as in the case of a delete operation.

So it’s not the TERM that’s critical as such, but the listed column values themselves within the branch to let Oracle know the required path down the index structure one must follow in order to find the index entry of interest. The same pattern occurs with TERM for multi-column indexes where specific leading column(s) value occurs in multiple leaf blocks, an additional column at least must also be included in the second branch entry.

So yes, I agree with everything you’ve stated, this is entirely expected and necessary Oracle indexing behavior.

Regards

Richard

Like

9. Vishnu Potukanuma - January 20, 2020

Things got even more interesting when I looked even further into this TERM… It appears that this optimization does meddle with the index leaf block splits especially 50-50, a bit… and a 50-50 block split is not always 50-50 precisely..

but i could see only upto 1-2% variation in the size of the leaf block entries or upto 11 entries…. but there are too many variables…. further this point i found its not worth investigating…

drop table students;
create table students (roll number);
create index idx on students(roll);
insert into students select 1 from dual connect by level < 292;
insert into students select 2 from dual connect by level < 281;
commit;
alter system flush buffer_cache;
insert into students select 1 from dual;
commit;
Here the leaf block splits….

If we observe value 1 is span across multiple leaf blocks, and for the following TERM is used…

drop table students;
create table students (roll number);
create index idx on students(roll);
insert into students select 1 from dual connect by level < 291;
insert into students select 2 from dual connect by level < 282;
commit;
alter system flush buffer_cache;
insert into students select 1 from dual;
commit;
Here the leaf block splits….

The same applies to the branch blocks as well.. even when there are 2 rows (depends on the size of the indexed column and if we add any junk as the tailing column), following an index creation or rebuild.. the same value can span multiple branch blocks… but doesn't incur any additional reads… as the reads go through the leaf block chain..

Looks like Oracle strictly follows PCTFREE 10….

Like

10. mgogala - August 19, 2021

Well, what happens if there is a need for a range scan instead of equality predicate? In my experience, multi-column indexes are usually an attempt to solve several conditions at once:
1) Leading colum(s) range scan (between, =)
2) All columns equality

I am not quite sure what do you mean by “index is not efficient” but the application response is usually good.
Regards

Like


Leave a comment