but i could see only upto 1-2% variation in the size of the leaf block entries or upto 11 entries…. but there are too many variables…. further this point i found its not worth investigating…

drop table students;

create table students (roll number);

create index idx on students(roll);

insert into students select 1 from dual connect by level < 292;

insert into students select 2 from dual connect by level < 281;

commit;

alter system flush buffer_cache;

insert into students select 1 from dual;

commit;

Here the leaf block splits….

If we observe value 1 is span across multiple leaf blocks, and for the following TERM is used…

drop table students;

create table students (roll number);

create index idx on students(roll);

insert into students select 1 from dual connect by level < 291;

insert into students select 2 from dual connect by level < 282;

commit;

alter system flush buffer_cache;

insert into students select 1 from dual;

commit;

Here the leaf block splits….

The same applies to the branch blocks as well.. even when there are 2 rows (depends on the size of the indexed column and if we add any junk as the tailing column), following an index creation or rebuild.. the same value can span multiple branch blocks… but doesn't incur any additional reads… as the reads go through the leaf block chain..

Looks like Oracle strictly follows PCTFREE 10….

LikeLike

]]>The use of TERM in a leaf block is simply to denote that there are no more column(s) that need be stored within the branch entry in order for Oracle to guarantee the starting location of any required index entry.

If a single indexed column value spreads across 2 leaf blocks, even in the scenario such as yours where there are only 2 occurrences, then Oracle MUST store the rowid within the second branch entry to let Oracle know that you must visit the preceding leaf block (i.e. any values of interest less than the columns denoted in the branch) if you want the first column value occurrence as in a range scan or you want the column value with a rowid less than that denoted in the branch as in the case of a delete operation.

So it’s not the TERM that’s critical as such, but the listed column values themselves within the branch to let Oracle know the required path down the index structure one must follow in order to find the index entry of interest. The same pattern occurs with TERM for multi-column indexes where specific leading column(s) value occurs in multiple leaf blocks, an additional column at least must also be included in the second branch entry.

So yes, I agree with everything you’ve stated, this is entirely expected and necessary Oracle indexing behavior.

Regards

Richard

LikeLike

]]>The following explains more precisely…

create table students (roll number);

insert into students select rownum from dual connect by level < 1000;

insert into students select rownum from dual connect by level < 1000;

commit;

create index idx on students(roll);

we have precisely only 2 row for each value…

now that we use the treedump…

—– begin tree dump

branch: 0x1c00163 29360483 (0: nrow: 5, level: 1)

leaf: 0x1c00164 29360484 (-1: row:492.492 avs:818)

leaf: 0x1c00165 29360485 (0: row:478.478 avs:830)

leaf: 0x1c00166 29360486 (1: row:479.479 avs:817)

leaf: 0x1c00167 29360487 (2: row:478.478 avs:830)

leaf: 0x1c00168 29360488 (3: row:71.71 avs:6931)

and the subsequent branch block dump..

kdxbrlmc 29360484=0x1c00164

kdxbrsno 0

kdxbrbksz 8056

kdxbr2urrc 0

row#0[8047] dba: 29360485=0x1c00165

col 0; len 3; (3): c2 03 30

col 1; TERM

row#1[8038] dba: 29360486=0x1c00166

col 0; len 3; (3): c2 05 57

col 1; TERM

row#2[8025] dba: 29360487=0x1c00167

col 0; len 3; (3): c2 08 1a

col 1; len 4; (4): 01 c0 01 5f

row#3[8012] dba: 29360488=0x1c00168

col 0; len 3; (3): c2 0a 41

col 1; len 4; (4): 01 c0 01 5f

—– end of branch block dump —–

Yes Richard… your explanation is perfect…..

If you delete a row from a table that is indexed on a non-unique value (say a column contains millions of rows with the same value), then it’s not sufficient to only determine the leaf block in which these values begin, because then Oracle would have to perhaps trawl through millions of index entries and maybe millions of index leaf blocks searching for the specific index entry that contains the Rowid of the row being deleted.

I was looking at the partial SQL execution and low fetch sizes especially when executing statements that involve simple index scans and nested loops….

Thanks,

Vishnu

LikeLike

]]>I will elaborate more on my entire investigation and why I came to the conclusion that split is the reason… (VERY BAD ON MY PART DIDN’T CONSIDER ALL THE VARIABLES and investigated only the indexes where values in the leaf blocks were not contained….)

use of TERM in the branch block entries is valid only when entries in the leaf block are contained and not split across multiple leaf blocks and happens only when

1. Column values are unique and we create a non-unique index.

2. unique indexes does not use TERM (single column).

3. Unique indexes multiple column when leading column is almost unique and entries in the leaf block are 100% contained based on the leading or leading set of columns

4. When the values in the leaf blocks are contained….

The following explains more precisely…

create table students(roll number);

insert into students select rownum from dual connect by level < 1000;

insert into students select rownum from dual connect by level also explains why ROW#0 entry in the branch block contains TERM:

1st leaf block

0x1c00164

row#490[1851] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 03 2f

col 1; len 6; (6): 01 c0 01 5e 00 f5

row#491[1838] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 03 2f

col 1; len 6; (6): 01 c0 01 5f 02 48

—– end of leaf block Logical dump —–

—– end of leaf block dump —–

End dump data blocks tsn: 4 file#: 7 minblk 356 maxblk 356

As column values are sorted in the index leaf blocks, and the tailing section of the 1st leaf block do indicate 2 values meaning that a column value is not split and all the entries in the leaf block are contained.. use of TERM is justified…. same is the case with the 2nd leaf block

2nd leaf block:

0x1c00165

row#476[1835] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 05 56

col 1; len 6; (6): 01 c0 01 5b 00 a3

row#477[1822] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 05 56

col 1; len 6; (6): 01 c0 01 5e 01 e4

—– end of leaf block Logical dump —–

—– end of leaf block dump —–

End dump data blocks tsn: 4 file#: 7 minblk 357 maxblk 357

Things go south from 3rd block and the last use of TERM:

3rd leaf block:

0x1c00166 . precisely when the TERM stopped:

row#476[1837] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 08 19

col 1; len 6; (6): 01 c0 01 5b 01 92

row#477[1824] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 08 19

col 1; len 6; (6): 01 c0 01 5f 00 3f

row#478[1811] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 08 1a

col 1; len 6; (6): 01 c0 01 5b 01 93

—– end of leaf block Logical dump —–

—– end of leaf block dump —–

End dump data blocks tsn: 4 file#: 7 minblk 358 maxblk 358

Third leaf block if we observe entry c2 08 1a has only one value, and the value c2 08 1a is the starting point of the 4th leaf block…. as shown in the Branch block dump:

row#2[8025] dba: 29360487=0x1c00167

col 0; len 3; (3): c2 08 1a

col 1; len 4; (4): 01 c0 01 5f

and we can confirm this with 4th leaf block dump as well…..

header address 2098339940=0x7d122064

kdxcolev 0

KDXCOLEV Flags = – – –

kdxcolok 0

kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y

kdxconco 2

kdxcosdc 0

kdxconro 478

kdxcofbo 992=0x3e0

kdxcofeo 1822=0x71e

kdxcoavs 830

kdxlespl 0

kdxlende 0

kdxlenxt 29360488=0x1c00168

kdxleprv 29360486=0x1c00166

kdxledsz 0

kdxlebksz 8032

row#0[8019] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 08 1a

col 1; len 6; (6): 01 c0 01 5f 00 40

…

…

…

row#1[8006] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 08 1b

col 1; len 6; (6): 01 c0 01 5b 01 94

row#2[7993] flag: ——-, lock: 0, len=13

col 0; len 3; (3): c2 08 1b

col 1; len 6; (6): 01 c0 01 5f 00 41

row#3[7980] flag: ——-, lock: 0, len=13

we can see the ROW#0 is the starting point…. starting here oracle simply used ROWID or added ROWID column….

So does it keep on Adding ROWIDs all the time?

No.. it wont…. only if the entries are split across multiple leaf blocks…. that too added for the subsequent entry in the branch blocks as the previous index branch block entry – or index leaf entries are self contained when it comes to this block alone….

Now we insert data as follows:

SQL> insert into students select * from (select rownum “ROLL” from dual connect by level 999;

2000 rows created.

SQL> commit;

dump the branch block again…

Here we are basically inserting unique data using single session resulting in 90-10 splits.. which is good…

The tree dump and branch block dump is as follows:

branch: 0x1c00163 29360483 (0: nrow: 8, level: 1)

leaf: 0x1c00164 29360484 (-1: row:492.492 avs:818)

leaf: 0x1c00165 29360485 (0: row:478.478 avs:830)

leaf: 0x1c00166 29360486 (1: row:479.479 avs:817)

leaf: 0x1c00167 29360487 (2: row:478.478 avs:830)

leaf: 0x1c00168 29360488 (3: row:533.533 avs:6)

leaf: 0x1c0016d 29360493 (4: row:533.533 avs:6)

leaf: 0x1c0016e 29360494 (5: row:533.533 avs:7)

leaf: 0x1c0016f 29360495 (6: row:472.472 avs:920)

row#0[8047] dba: 29360485=0x1c00165

col 0; len 3; (3): c2 03 30

col 1; TERM

row#1[8038] dba: 29360486=0x1c00166

col 0; len 3; (3): c2 05 57

col 1; TERM

row#2[8025] dba: 29360487=0x1c00167

col 0; len 3; (3): c2 08 1a

col 1; len 4; (4): 01 c0 01 5f

row#3[8012] dba: 29360488=0x1c00168

col 0; len 3; (3): c2 0a 41

col 1; len 4; (4): 01 c0 01 5f

row#4[8003] dba: 29360493=0x1c0016d

col 0; len 3; (3): c2 0f 3f

col 1; TERM

row#5[7994] dba: 29360494=0x1c0016e

col 0; len 3; (3): c2 14 60

col 1; TERM

row#6[7985] dba: 29360495=0x1c0016f

col 0; len 3; (3): c2 1a 1d

col 1; TERM

—– end of branch block dump —–

and the use of TERM is also justified as the subsequent entries are all unique and ROWID is not required…

all i did now is reinsert the data back again:

insert into students select * from (select rownum “ROLL” from dual connect by level 999;

commit;

and the dump is as follows:

branch: 0x1c0016b 29360491 (0: nrow: 13, level: 1)

leaf: 0x1c0016c 29360492 (-1: row:492.492 avs:818)

leaf: 0x1c0016d 29360493 (0: row:478.478 avs:830)

leaf: 0x1c0016e 29360494 (1: row:479.479 avs:817)

leaf: 0x1c0016f 29360495 (2: row:478.478 avs:830)

leaf: 0x1c00188 29360520 (3: row:479.479 avs:817)

leaf: 0x1c00189 29360521 (4: row:478.478 avs:830)

leaf: 0x1c0018a 29360522 (5: row:478.478 avs:830)

leaf: 0x1c0018b 29360523 (6: row:479.479 avs:817)

leaf: 0x1c0018c 29360524 (7: row:478.478 avs:830)

leaf: 0x1c0018d 29360525 (8: row:478.478 avs:830)

leaf: 0x1c0018e 29360526 (9: row:479.479 avs:817)

leaf: 0x1c0018f 29360527 (10: row:478.478 avs:830)

leaf: 0x1c00191 29360529 (11: row:244.244 avs:4338)

and the branch block dump now is as follows:

kdxbrlmc 29360492=0x1c0016c

kdxbrsno 0

kdxbrbksz 8056

kdxbr2urrc 0

row#0[8047] dba: 29360493=0x1c0016d

col 0; len 3; (3): c2 03 30

col 1; TERM

row#1[8038] dba: 29360494=0x1c0016e

col 0; len 3; (3): c2 05 57

col 1; TERM

row#2[8025] dba: 29360495=0x1c0016f

col 0; len 3; (3): c2 08 1a

col 1; len 4; (4): 01 c0 01 5f

row#3[8012] dba: 29360520=0x1c00188

col 0; len 3; (3): c2 0a 41

col 1; len 4; (4): 01 c0 01 5f

row#4[8003] dba: 29360521=0x1c00189

col 0; len 3; (3): c2 0d 05

col 1; TERM

row#5[7994] dba: 29360522=0x1c0018a

col 0; len 3; (3): c2 0f 2c

col 1; TERM

row#6[7985] dba: 29360523=0x1c0018b

col 0; len 3; (3): c2 11 53

col 1; TERM

row#7[7972] dba: 29360524=0x1c0018c

col 0; len 3; (3): c2 14 16

col 1; len 4; (4): 01 c0 01 70

row#8[7959] dba: 29360525=0x1c0018d

col 0; len 3; (3): c2 16 3d

col 1; len 4; (4): 01 c0 01 70

row#9[7946] dba: 29360526=0x1c0018e

col 0; len 3; (3): c2 18 64

col 1; len 4; (4): 01 c0 01 76

row#10[7937] dba: 29360527=0x1c0018f

col 0; len 3; (3): c2 1b 28

col 1; TERM

row#11[7928] dba: 29360529=0x1c00191

col 0; len 3; (3): c2 1d 4f

col 1; TERM

we see more blocks with TERM, even though 2 rows only for each value…. So basically by now we know that the size of the index column value and the NDV plays a vital role… and even 2 entries in the index leaf block can signal a read of two index leaf blocks…

If Oracle considered Only TERM as to save entries in branches… and has no other meaning or not used in the processing…. we would get wrong results or as the SQL statement results in only one row as indicated in the entries in the branch blocks…. if it went strictly with the branch block entries… .which could have been a serious flaw in the implementation itself… but the situation is not the case….

So is it precisely ROWID in the subsequent branch block entry immediately following the TERM that triggers reading the previous leaf block… we are not 100% sure unless we read the code….but we can find out that it is not the TERM that triggers the mechanism by performing a trace or looking at the buffer cache immediately following the SQL statement execution… and it confirms that the other leaf blocks are not read or accessed…

Performed tests using:

1. Checking the latch gets on the blocks…

2. flushing buffer cache / restarting database…

I came to this conclusion by investigating only the indexes when the values were not contained.. which I should have looked at previously or even considered the case….

So what other factors can trigger ROWID to be inserted…

– 50-50 split

– 90-10 split

– increased column lengths…

– way the data is populated…

Now that TERM part is done… next the ROWID part or only the leading bytes of the ROWID in the branch blocks….

1. ROWID in the branch block can trigger an additional or previous leaf block entry read.

2. indicate start and stop.

3. still investigating the extent

Thanks,

Vishnu

LikeLike

]]>Thank you for your comments and valuable demos, it’s always nice to see people interested in how Oracle works and using block dumps to help illustrate internal behavior.

Now I suspect you’re coming to these conclusions because of one piece of missing understanding in how Oracle uses indexes. Your comment “it looks like even though rowid addition is redundant and appears silly even when all the columns are have same value..” suggests this.

So let me quickly explain why this isn’t silly but critical and why this also explains the behavior you see in your first example.

The humble Delete statement.

Oracle does all you describe above, because it’s critical to not only find the first leaf block that contains an indexed VALUE of interest, but also because it needs to find the leaf block that contains a specific index ENTRY of interest, classically when performing a Delete operation.

If you delete a row from a table that is indexed on a non-unique value (say a column contains millions of rows with the same value), then it’s not sufficient to only determine the leaf block in which these values begin, because then Oracle would have to perhaps trawl through millions of index entries and maybe millions of index leaf blocks searching for the specific index entry that contains the Rowid of the row being deleted.

For this reason, Oracle stores the Rowid as part of the index entry and ensures there’s enough information in the index branches to navigate to the leaf block that contains both the index value AND the associated Rowid, so when performing a delete, you are guaranteed to navigate to the SPECIFIC leaf block that contains the index entry Oracle wants to delete.

This explains the behavior you see in both your examples.

Hopes this makes sense.

Richard

LikeLike

]]>Considering a multi-column index (A, B, C):

A has low NDV, B has medium NDV and C has High NDV and the values are populated randomly

If there are only 10 rows in the table for column A say 1, for that portion it add only A column in the branch block… as rows are inserted if it detects that given the predicates A = ? and B=? if it cannot reach the index leaf block just with A it adds B to the Branch blocks… and this process repeats…

So finally if considering the case A = ? and B = ? and C=? if suppose the values are less unique i mean a = ? and b=? and c=? results in multiple values… and can span across multiple leaf blocks, in this case it also adds rowid or even portion of the ROWID if it determines that it is enough…

row#407[951] dba: 29369816=0x1c025d8

col 0; len 2; (2): c1 05

col 1; len 1; (1): 80

col 2; len 2; (2): c1 02

col 3; len 6; (6): 01 c0 0c 86 00 55

row#408[934] dba: 29369817=0x1c025d9

col 0; len 2; (2): c1 05

col 1; len 1; (1): 80

col 2; len 2; (2): c1 02

col 3; len 4; (4): 01 c0 19 62

it looks like even though rowid addition is redundant and appears silly even when all the columns are have same value..

the addition of ROWID appears to be to far more complex than actually anticipated.. it looks like it is to support partition execution and lower fetch sizes to reduce the number of gets required…

I have to perform additional tests to see if this is actual the case …

LikeLike

]]>the actual use of TERM appears to be different and is actually used specifically when the block split results in subsequent entries of a particular key is split across different leaf blocks due to 50-50 split. consider the following trace of Branch block:

row#24[4120] dba: 29373640=0x1c034c8

col 0; len 2; (2): c1 0d

col 1; len 3; (3): 01 c0 1c

row#25[4816] dba: 29361275=0x1c0047b

col 0; len 2; (2): c1 0e

col 1; TERM ==> Term USED

row#26[4824] dba: 29371785=0x1c02d89 –> leaf block dump trace shown next

col 0; len 2; (2): c1 0e

col 1; len 3; (3): 01 c0 19

row#27[4835] dba: 29365479=0x1c014e7

col 0; len 2; (2): c1 0f

col 1; len 4; (4): 01 c0 05 4b

for any entry such as

row#24[4120] dba: 29373640=0x1c034c8

col 0; len 2; (2): c1 0d

col 1; len 3; (3): 01 c0 1c

we know that 0x1c034c8 leaf block contains values incremented from c1 0d…. and c1 0d is the starting point…. of entries in the leaf blocks… I mean c1 0d is the least value in that leaf block 0x1c034c8 and values increase from there…

It so happens that due to a 50-50 block split that entries of a particular value in our case c1 0f split across two different leaf block… TERM actually signals that the values start from the previous leaf block….

leaf block dump of 0x1c02d89

kdxlenxt 29365479=0x1c014e7

kdxleprv 29361275=0x1c0047b

kdxlenxt 29372021=0x1c02e75

kdxleprv 29371785=0x1c02d89 –> Term USED….

kdxledsz 0

kdxlebksz 7816

row#0[4420] flag: ——-, lock: 0, len=12

col 0; len 2; (2): c1 0f

col 1; len 6; (6): 01 c0 05 4b 00 13

row#1[4432] flag: ——-, lock: 0, len=12

col 0; len 2; (2): c1 0f

col 1; len 6; (6): 01 c0 05 57 00 4b

row#2[4444] flag: ——-, lock: 0, len=12

col 0; len 2; (2): c1 0f

col 1; len 6; (6): 01 c0 05 5f 00 38

row#3[4456] flag: ——-, lock: 0, len=12

col 0; len 2; (2): c1 0f

col 1; len 6; (6): 01 c0 05 73 00 45

Use of TERM indicates that a actual value of interest starts from a middle of the previous leaf block…

it is the same case with multi-column indexes as well:

row#219[4913] dba: 29380028=0x1c04dbc

col 0; len 2; (2): c1 18

col 1; len 2; (2): c1 1c

col 2; TERM

row#220[4898] dba: 29380029=0x1c04dbd

col 0; len 2; (2): c1 18

col 1; len 3; (3): c2 02 4e

col 2; len 3; (3): 01 c0 1c

row#221[4883] dba: 29380030=0x1c04dbe

col 0; len 2; (2): c1 18

col 1; len 3; (3): c2 04 15

col 2; len 3; (3): 01 c0 19

leaf block dump –> 0x1c04dbd

kdxlenxt 29380030=0x1c04dbe

kdxleprv 29380028=0x1c04dbc

leaf block dump of 0x1c04dbc….

row#407[1707] flag: ——-, lock: 0, len=16

col 0; len 2; (2): c1 18

col 1; len 3; (3): c2 02 4d

col 2; len 6; (6): 01 c0 3b 65 00 35

row#408[1691] flag: ——-, lock: 0, len=16

col 0; len 2; (2): c1 18

col 1; len 3; (3): c2 02 4e

col 2; len 6; (6): 01 c0 12 73 00 01

row#409[1675] flag: ——-, lock: 0, len=16

col 0; len 2; (2): c1 18

col 1; len 3; (3): c2 02 4e

col 2; len 6; (6): 01 c0 16 3a 00 35

—– end of leaf block Logical dump —–

—– end of leaf block dump —–

End dump data blocks tsn: 4 file#: 7 minblk 19900 maxblk 19900

Thanks,

Vishnu

LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>Cheers,

Norm.