jump to navigation

Indexing Foreign Key Constraints With Invisible Indexes (Invisible People) April 22, 2014

Posted by Richard Foote in 12c, Block Dumps, Foreign Keys, Invisible Indexes, Oracle Indexes.
1 comment so far

In my previous post I discussed when deleting rows from parent tables, how Bitmap Indexes based on the FK constraint can prevent the expensive Full Tables Scans (FTS) on the child tables but not the associated exclusive table locks.

Last year, I discussed how it was possible in Oracle Database 12c to have multiple indexes on the same column list.

Quite some time ago, I discussed how so-called Invisible Indexes can indeed still be visible in various scenarios, including when policing FK constraints.

Well, lets put all these three topics together :)

First, let use the same basic setup as the last post:

SQL> create table bowie_dad (id number, dad_name varchar2(30)); 

Table created.

SQL> insert into bowie_dad values (1, 'DAVID BOWIE'); 

1 row created.

SQL> insert into bowie_dad values (2, 'ZIGGY STARDUST'); 

1 row created.

SQL> insert into bowie_dad values (3, 'MAJOR TOM');

1 row created.

SQL> insert into bowie_dad values (4, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

SQL> create table bowie_kid (id number, kid_name varchar2(30), dad_id number);

Table created.

SQL> insert into bowie_kid select rownum, 'ALADDIN SANE', mod(rownum,3)+2 from dual connect by level <=1000000; 

1000000 rows created.

SQL> commit;

Commit complete.

SQL> alter table bowie_dad add primary key(id);

Table altered.

SQL> alter table bowie_kid add constraint bowie_kid_fk foreign key(dad_id) references bowie_dad(id);

Table altered.

 

We’re now going to create two indexes concurrently on the FK constraint on the DAD_ID column, a Bitmap Index and an invisible B-Tree Index as is now possible since Oracle Database 12c:

SQL> create bitmap index bowie_kid_fk_i on bowie_kid(dad_id); 

Index created.

SQL> create index bowie_kid_fk2_i on bowie_kid(dad_id) invisible;

Index created.

Oracle Database 12c allows us to now create multiple indexes on the same column list, providing only one index is visible at a time.

Let’s look at a partial block dump of the first leaf block of each index. First the Bitmap Index:

Block header dump:  0x0180805c
 Object id on Block? Y
 seg/obj: 0x16f45  csc: 0x00.36bc54  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1808058 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0036bc54
Leaf block dump
===============
header address 32801380=0x1f48264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 959=0x3bf
kdxcoavs 919
kdxlespl 0
kdxlende 0
kdxlenxt 25198685=0x180805d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——-, lock: 0, len=3537
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 00
col 2; len 6; (6):  01 80 80 2c 00 3f
col 3; len 3516; (3516):
 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cb 92 24 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24
 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49
 92 24 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49

 

Note the indexed value is c1 03, denoting the lowest DAD_ID=2 currently in the table.

Now the  partial block dump of the invisible B-Tree Index:

 

Block header dump:  0x0181b724
 Object id on Block? Y
 seg/obj: 0x16f46  csc: 0x00.36bc78  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x181b720 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0036bc78
Leaf block dump
===============
header address 32801380=0x1f48264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25278245=0x181b725
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 01
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 04
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 07

Again as expected the first index entry is C1 03.

With only a visible Bitmap Index in place, does that mean we’ll have table locking issues if we delete a parent row with current transactions in place ? Let’s check it out.

In one session, we have a current transaction on the child table:

SQL> insert into bowie_kid values (1000001, 'LOW', 4); 

1 row created.

 

In another session, we attempt to delete a parent row (with an ID = 1 which doesn’t currently exist with the child table):

SQL> delete bowie_dad where id = 1; 

1 row deleted.

We note the DML was successful and didn’t hang. This means the B-Tree index is clearly being used to police this constraint, even though it’s currently invisible.

In a third session, we now attempt to insert a child row using a FK value that’s in the process of being deleted:

SQL> insert into bowie_kid values (1000003, 'HEROES', 1);

 

As expected, it hangs as it’s currently effectively waiting on the row level lock made possible by the index entry in the B-Tree index as invisible indexes are still maintained behind the scenes. If we look at a fresh block dump of both indexes, beginning with the Bitmap Index:

Block header dump:  0x0180805c
 Object id on Block? Y
 seg/obj: 0x16f45  csc: 0x00.36bc54  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1808058 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0036bc54
Leaf block dump
===============
header address 402948708=0x18048264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 959=0x3bf
kdxcoavs 919
kdxlespl 0
kdxlende 0
kdxlenxt 25198685=0x180805d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——-, lock: 0, len=3537
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 00
col 2; len 6; (6):  01 80 80 2c 00 3f
col 3; len 3516; (3516):
 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cb 92 24 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24
 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49
 92 24 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49

 

We note the Bitmap Index has not been updated. It still lists the C1 03 value as the minimum indexed value.

However, if we look at the invisible B-Tree index:

Block header dump:  0x0181b724
 Object id on Block? Y
 seg/obj: 0x16f46  csc: 0x00.36bc78  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x181b720 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0x0008.015.00000b86  0x014316ab.01c5.42  —-    1  fsc 0x0000.00000000
Leaf block dump
===============
header address 402948708=0x18048264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 514
kdxcofbo 1064=0x428
kdxcofeo 1868=0x74c
kdxcoavs 804
kdxlespl 0
kdxlende 0
kdxlenxt 25278245=0x181b725
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1868] flag: ——-, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 81 b6 f3 00 00
row#1[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 01
row#2[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 04
row#3[8000] flag: ——-, lock: 0, len=12

It has been updated and lists a new index entry C1 02 as the minimum value now in the index.

So the B-Tree index can be used to successfully police the FK index and prevent the possible table level locking issues associated with deleting parent rows, even though it’s invisible and there is an equivalent visible Bitmap index in place. Invisible indexes are simply not considered as viable execution paths by the Cost Based Optimizer, but may still be “visible” in a variety of scenarios such as quietly policing constraints behind the scenes.

Do I recommend creating two such indexes in Oracle Database 12c. Well, no as the costs of maintaining both indexes need to be considered. But I certainly do caution simply making indexes invisible and expecting the database to behave in exactly the same manner if the index were to be subsequently dropped.

Because rolling back all the above and then dropping the invisible index:

SQL> drop index bowie_kid_fk2_i;

Index dropped.

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

 

Means in another session the parent delete operation will now hang without the B-Tree index being in place:

SQL> delete bowie_dad where id = 1;

 

 

 

 

 

 

Indexing Foreign Key Constraints With Bitmap Indexes (Locked Out) April 17, 2014

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Foreign Keys, Index Internals, Oracle Indexes.
4 comments

Franck Pachot made a very valid comment in my previous entry on Indexing Foreign Keys (FK) that the use of a Bitmap Index on the FK columns does not avoid the table locks associated with deleting rows from the parent table. Thought I might discuss why this is the case and why only a B-Tree index does the trick.

Let’s first setup some very simple Parent-Child tables:

SQL> create table bowie_dad (id number, dad_name varchar2(30));

Table created.

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> insert into bowie_dad values (2, 'ZIGGY STARDUST');

1 row created.

SQL> insert into bowie_dad values (3, 'MAJOR TOM');

1 row created.

SQL> insert into bowie_dad values (4, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

SQL> create table bowie_kid (id number, kid_name varchar2(30), dad_id number);

Table created.

SQL> insert into bowie_kid select rownum, 'ALADDIN SANE', mod(rownum,3)+2 from dual connect by level >=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> alter table bowie_dad add primary key(id);

Table altered.

SQL> alter table bowie_kid add constraint bowie_kid_fk foreign key(dad_id) references bowie_dad(id);

Table altered.

OK, so we have a small parent table (BOWIE_DAD) and a much larger child table (BOWIE_KID) with all the necessary constraints in place. Note we don’t actually have a child row with a  FK DAD_ID = 1. So we can potentially delete this row from the BOWIE_DAD table (where ID = 1).

Let’s begin by creating a B-Tree  index on the FK column (DAD_ID) and have a look a partial block dump of the first leaf block in the index:

SQL> create index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0035f861
Leaf block dump
===============
header address 360809060=0x15818264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06
…..

 

We’ll compare future block dumps with this one but for now just note that the first index entry has a value of (hex) C1 03, which corresponds to the minimum value for DAD_ID = 2 we currently have in this table/index.

If we insert a new child record in one session (but not yet commit);

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we can delete (but not yet commit) the unwanted parent row without any locking implications thanks to this index on the FK column:

SQL> delete bowie_dad where id = 1;

1 row deleted.

In a third session, we can insert another child record again with no locking implications, providing we don’t attempt to use the parent value the second session is in the process of deleting:

SQL> insert into bowie_kid values (1000002, 'LOW', 3);

1 row created.

But if we do try to insert a new child row with a FK value for which the parent is in the process of being deleted:

SQL> insert into bowie_kid values (1000003, 'HEROES', 1);

The statement hangs and it will do so until the transaction deleting the parent record commits (in which case it will receive an ORA-02291 integrity constraint error) or the transaction rolls back (in which case the insert will succeed).

If we take a fresh dump of the first leaf block (which must contain the associated index entry as it’s the minimum value now in the table):

 Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0x0008.004.00000b8a  0x01431602.01c5.14  —-    1  fsc 0x0000.00000000
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 514
kdxcofbo 1064=0x428
kdxcofeo 1868=0x74c
kdxcoavs 804
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1868] flag: ——-, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 80 7f 38 00 00
row#1[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#2[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03

 

We notice we indeed do have a new index entry (highlighted above), with all the associated locking information in ITL slot 2 for the new row in which the session is locked. So the key point here is that the index is indeed updated and Oracle can proceed or not depending on what happens with the transaction on the parent table. The overhead of this new index entry is minimal and locking can be easily policed and restricted to just the index entries with this specific value (hex) C1 02 which corresponds to DAD_ID = 1.

If we do indeed proceed with the delete on the parent table:

SQL> commit;

Commit complete.

 

The session attempting to insert the now deleted parent FK value indeed fails:

 

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);
insert into bowie_kid values (1000002, 'HEROES', 1)
*
ERROR at line 1:
ORA-02291: integrity constraint (BOWIE.BOWIE_KID_FK) violated - parent key not
found

 

And we notice with a fresh block dump that the index entry has been removed by the now unlocked session:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0035f861
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06

Everything is back to the way it was previously.

 

OK, let’s now re-insert the parent row, drop the FK index and replace it with a Bitmap Index instead:

 

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> commit;

Commit complete.

SQL> drop index bowie_kid_fk_i;

Index dropped.

SQL> create bitmap index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

If we take a look at a partial block dump of the first leaf block of this Bitmap Index:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0x00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.003602fc
Leaf block dump
===============
header address 360809060=0x15818264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

….

 

We notice the first key difference here in that these Bitmap Index entries are potentially HUGE, with just the 2 index entries in this block. The other thing to note is the combination of Bitmap indexes and DMLs can result in locking hell because if an index entry needs to be modified (resulting in a change in the compressed bitmap string), all rows between the rowid ranges specified within the Bitmap Index entry are effectively locked. So Bitmap Indexes introduce severe locking issues, regardless of the Parent/Child update issue highlighted above.

If we insert a child row in one session:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

And in another session insert another row with the same FK value:

SQL> insert into bowie_kid values (1000002, 'HEROES', 4);

The session hangs until the transaction in the first session completes because of the locking implications introduced with the Bitmap Index.

 

Therefore, with a Bitmap Index in place, the last of our worries will be locking issues associated with deleting a parent row. After rolling back the above, we attempt the following. In one session, we insert a child record:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we delete the unwanted parent row:

SQL> delete bowie_dad where id = 1;

and it hangs. The Bitmap Index is not effective in preventing this lock as it was with the B-Tree Index.

In a third session, we attempt to insert a child row with the soon to be deleted parent key:

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);

and it hangs as well. So the Bitmap Index on the FK does not prevent the locking hell such parent deletes can introduce into our environments.

If we roll all this back and simply have one session delete a parent row:

SQL> delete bowie_dad where id = 1;

1 row deleted.

And in another session insert a child row with the FK about to be deleted, the insert hangs as expected with an exclusive transaction lock:

SQL> insert into bowie_kid values (1000001, 'BOWIE', 1);

 

However, if we look at a fresh partial block dump of the first Bitmap Index leaf block:

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0x00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.003602fc
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

…..

 

Unlike the B-Tree index which was updated, the Bitmap index has remained unchanged. No attempt was made by Oracle at this stage to insert the index entry as such a new Bitmap Index entry would likely generate too much overheads and not appreciably reduce the locking implications of these DML statements with these Bitmap Indexes in place anyways. The actual index update is delayed until such as change is possible with the rollback of the parent deletion.

However, in a third session, an insert into the child table with a FK that’s not to be deleted is successful:

SQL> insert into bowie_kid values (1000002, 'BOWIE', 4);

1 row created.

Bitmap indexes are simply not designed with concurrency in mind and have efficiencies that make it easier for single sessions to load data in Data Warehouses environments where they are indeed suitable.

One advantage of the Bitmap index is that at least Oracle doesn’t have to perform a FTS on the (potentially huge) child table when checking for the existence of any associated child FK values. Oracle can quickly use the index to determine whether the parent delete can proceed or not. If we roll everything back and just attempt to delete a parent row:

SQL> delete bowie_dad where id = 1;

1 row deleted.

       
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
          3  consistent gets
          0  physical reads
        676  redo size
        862  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice at just 3 consistent gets, the potentially expensive FTS on the child table has been avoided. Drop the Bitmap index and the FTS must be performed to ensure no current FK values would violate the constraint when the parent row is deleted:

SQL> drop index bowie_kid_fk_i;

Index dropped.

   
SQL> delete bowie_dad where id = 1;

1 row deleted.

    
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
       3629  consistent gets
          0  physical reads
        676  redo size
        863  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice without the Bitmap Index in place, we are now performing many more (3629) consistent gets due to the necessary FTS.

So using a Bitmap Index to police a FK constraint doesn’t reduce the locking implications associated with deleting parent rows (with Bitmap indexes, we have locking hell regardless if there’s much DML) but it does at least reduce the overheads of checking the associated child table.

IOT Secondary Indexes: Primary Key Considerations (Beauty And The Beast) April 11, 2012

Posted by Richard Foote in Block Dumps, Index Organized Tables, IOT, Oracle Indexes, Primary Key, Secondary Indexes.
7 comments

As discussed previously, one of the nice features of an IOT Secondary Index is that it contains the mandatory Primary Key of the IOT, which is always maintained and can be used to access the necessary rows of the IOT regardless of  row movement within the IOT itself.

This can also be beneficial if only the PK columns of the table are required by the query (perhaps as part of a join) as a visit to the actual IOT table would be unnecessary.

However, one of the main disadvantages of an IOT Secondary Index is that it contains the PK of the IOT :)

Remember, one of the benefits of an IOT is that we don’t have to store columns twice as we would with a conventional Heap table, that being once within the table structure and again within the (often overloaded) PK  index. However, with an IOT Secondary index, we must store the PK columns again. In fact, we have to re-store the PK columns again for as many IOT Secondary indexes we have defined for the IOT.

So the overall additional overheads we’re talking about here becomes a product of two important factors. The actual overall size of the PK column(s) and the number of Secondary Indexes we have defined on the IOT. If the average size of the PK is large and/or we have a number of Secondary Indexes, then the overall overheads can be significant, reducing the benefits of the IOT.

If we look at the size of the IOT Secondary Index I created in my previous introductory post:

SQL> select leaf_blocks from dba_indexes where index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1728

If however we compare this with a secondary index associated with a conventional heap table containing identical data:

SQL> create table not_an_iot as select * from album_sales_IOT;

Table created.

SQL> create index not_an_iot_total_sales_i on not_an_iot(total_sales);

Index created.

SQL> select leaf_blocks from dba_indexes where index_name = 'NOT_AN_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1171

We notice that the IOT Secondary index is significantly larger, 1728 leaf blocks vs. 1171.

If we compare block dumps of the same index entry from both Secondary Indexes:

row#0[8016] flag: K—–, lock: 0, len=20
col 0; len 2; (2):  c1 06
col 1; len 3; (3):  c2 15 16
col 2; len 2; (2):  c1 5f
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 f1

Above is the IOT Secondary Index example, which is 20 bytes in length.

row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  01 01 68 7a 00 b4

Above is the Heap Table Secondary Index example, which is only 12 bytes in length.

The 8 bytes required for the table header and physical “guess” overheads within the IOT Secondary Index are almost cancelled out by the 7 bytes of overhead required for the ROWID column within the Heap Table Secondary index. However, most of the difference in length (20 bytes vs. 12 bytes) can be attributed to the 7 bytes required to store the PK columns and their associated length bytes in this example.

Obviously, the larger the PK, the greater the associated overheads. Obviously, the greater the number of IOT Secondary indexes, again the greater the overall associated overheads.

If we create a secondary index on a column that forms part of the PK, Oracle can be a lit bit cleverer. Following, we create an index on the COUNTRY_ID column, which is the second column of our PK (album_id, country_id):

SQL> create index album_sales_iot_country_id_i on album_sales_iot(country_id);

Index created.

We notice that for this new index, Oracle has eliminated “redundant” PK columns from the secondary index, as there’s no need to store the entire PK again as the indexed column itself already forms part of the PK:

SQL> select index_name, iot_redundant_pkey_elim from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     IOT_REDUNDANT_PKEY_ELIM
------------------------------ ------------------------
ALBUM_SALES_IOT_PK             NO
ALBUM_SALES_IOT_TOTAL_SALES_I  NO
ALBUM_SALES_IOT_COUNTRY_ID_I   YES

A quick look at a block dump of this secondary index will confirm that the PK portion of the index entry only contains the PK columns that are not included in the indexed column list (i.e. just the ALBUM_ID column):

row#0[8020] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#1[8004] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#2[7988] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5

So we have 3 index entries listed here. In each one:

col 0 represents  the indexed column (COUNTRY_ID) which happens to be part of the PK

col 1 is the remaining PK column yet to be defined in the index entry (ALBUM_ID)

col 0 (with a length of 4) represents the physical “guess”.

So Oracle still has defined within the index entry the full PK to access the IOT as necessary if the “guess” proves to be wrong.

The key points to take from this post is to fully consider the consequences of a large PK on any defined secondary index on an IOT and to fully consider the suitability of having the table defined as an IOT if you require many secondary indexes to be defined on the table.

More on this “guess” component in my next post …

Indexed Organized Tables – An Introduction to IOT Secondary Indexes (A Second Face) March 19, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Secondary Indexes.
14 comments

Man, its been ages since I had free time to update the blog, what with birthday parties to organise, Roger Water concerts to attend and Radiohead concerts in the planning !! OK, time to take an initial look at Secondary Indexes for Index Organized Tables (IOTs).

If the IOT needs to be accessed via the Primary Key (PK) column(s), then no problem, the IOT structure must have a PK defined and the logical structure of the IOT ensures that data within the IOT is ordered based on the PK. Therefore, the IOT can be navigated like any conventional PK and the necessary data can be efficiently accessed.

But what if we want to access the data efficiently via Non-PK columns or without specify the leading column of the PK ? Can we create secondary indexes on a IOT ?

When IOTs were first introduced way back in Oracle8, secondary indexes weren’t supported (they came later in 8i). That’s likely due to the fact Oracle had to resolve a tricky issue in relation to indexing an IOT structure, that being what to do when indexing rows that potentially move around all the time ?

With a conventional Heap table, once a row is inserted into the table, it doesn’t generally subsequently move. There are relatively few examples of when this occurs, for example updating the partitioned column of a row such that it needs to be stored in another partition. This is recognised as a rather expensive thing to do as not only do at least two blocks need to be accessed and modified but it also requires associated indexes to be updated as well. As such, it generally requires explicitly allowing such activities to occur (by enabling row movement and the such). Note, when rows migrate to another block due to an increase in row size, indexes are not impacted and still reference the original block and the remaining stub of the row which points to the new block/location of the row.

But with IOTs, the story can be very different. When a 50-50 index block split occurs, roughly half the rows in the leaf block move to a new block. A relatively expensive operation would be even more expensive if  Oracle had to also update the index entries of all secondary indexes that referenced all these moved rows. Although rare with Heap tables, rows moving to new locations could be relatively common in an IOT due to associated 50-50 block split operations.

To deal with the difficulties of frequently moving rows within an IOT, Oracle created the IOT Secondary Index structure. It has three main components:

  • The indexed column values
  • The PK columns of the associated IOT
  • A “guess” that points to the physical location of the rows within the IOT, initially at the time the index is created

So the IOT Secondary Index is used in the following fashion. During an index scan, Oracle attempts to use the “guess” to access the block that was the last known physical location of the  row within the IOT. If it finds the required row in the IOT, great. The index performs in a similar manner to using a rowid with a conventional secondary index. However, if the required row is nowhere to be seen within the referenced block, Oracle tries again, this time using the PK value contained with the IOT Secondary Index to perform a Unique Scan of the IOT. This is a little more expensive to perform as it requires navigating down the branch structures of the IOT, but is at least guaranteed to find the row this time in its current location.

So in the best case scenario, the index performs similar to that of a normal secondary index. In the worst case scenario where the row has moved, the index is forced to perform an additional Unique Scan of the IOT using the PK but at least this has the potential to be much more efficient that a Fast Full Scan of the IOT in order to find the necessary row.

The key point to note here is that the secondary index is  not updated when a block split on the parent IOT occurs. The “guess” via the physical pointer reference simply becomes stale and the PK which is also stored within the secondary index is used as a backup method of accessing the required row.

If we start with a traditionally simple little demo, let’s first create and populate an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sales number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> begin
  2  for i in 1..5000 loop
  3    for c in 1..100 loop
  4      insert into album_sales_iot values (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5    end loop;
  6  end loop;
  7  commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we now run a query based on the non-PK TOTAL_SALES column:

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1789589470

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     1 |    18 |   425   (1)| 00:00:06 |
|*  1 |  INDEX FAST FULL SCAN| ALBUM_SALES_IOT_PK |     1 |    18 |   425   (1)| 00:00:06 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1586  consistent gets
          0  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We see that Oracle has no real choice (the PK is of no direct use here) but to perform an expensive FAST FULL INDEX SCAN, even though it correctly knows relatively few rows are to be retrieved.

If we create a secondary index on the IOT however:

SQL> create index album_sales_IOT_total_sales_i on album_sales_iot(total_sales);

Index created.

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |     1 |    18 |4   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |     1 |    18 |4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |     1 |       |3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES"=2000)
   2 - access("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          5  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

 
We notice that the index is used as expected and the number of consistent gets has dropped significantly. Notice also that the IOT is accessed subsequently not via Index ROWIDs but by a INDEX UNIQUE SCAN via the IOT PK. More on this later …

If we look at a partial block dump of an index entry within the IOT Secondary index:

row#0[8015] flag: K—–, lock: 0, len=21
col 0; len 3; (3):  c2 1f 28
col 1; len 3; (3):  c2 15 37
col 2; len 2; (2):  c1 1b
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 da

col 0 represents the indexed value (TOTAL_SALES)

col 1 and col 2 represent the PK columns (ALBUM_ID and COUNTRY_ID)

Following the 3 byte table header overhead required for the “guess”, we have the second col 0, which represents the 4 byte  “guess” to the last known physical location of the row.

Much more to follow shortly …

Index Organized Tables – PCTTHRESHOLD (The Wedding Song) February 8, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, PCTTHRESHOLD.
7 comments

I’ve recently returned from a great two-week holiday, firstly at the Australian Open Tennis (what a final !!) and then up at the Gold Coast in not quite so sunny Queensland. Time now to get back to my blog :)

In my previous IOT examples, we had a very large column called Description which we didn’t really want to store within the Index Organized Table as it would cause the resultant index structure to get very inflated and inefficient. All the rows contained a very large Description value so it never made sense to include the Description column within the IOT.

In the following example, the Description column has values of varying lengths. Some of the values remain very large, however many of the Description values are quite moderate in size and wouldn’t be problematic to store within the IOT. Indeed, it would be quite beneficial as it wouldn’t be necessary to perform additional I/Os to the Overflow segment in cases where the Description was quite small in size and required by the application.

PCTTHRESHOLD gives us more flexibility in what is actually stored within the IOT index structure by storing  the non-PK columns up to the INCLUDING clause within the IOT but only if the row length to be stored inside the IOT is below a specified percentage threshold of the block size. So with a PCTTHRESHOLD of (say) 5, the non-PK columns up to the INCLUDING clause will be included within the IOT but only if the resultant row size is less than 5% of the blocksize. If a row size were to be greater than the specified percentage threshold of the block size, then any non-PK columns that would violate this length threshold would not be included within the IOT and stored instead within the Overflow segment.

In the following example, every other row is actually quite small and we would want these rows to have the Description value stored within the IOT. Therefore, we have modified the IOT table definition to include the Description column if the resultant row is less than 5% of the (8K in this case) blocksize:

SQL> CREATE TABLE album_sales_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING description PCTTHRESHOLD 5 OVERFLOW TABLESPACE bowie2;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         if mod(c,2) = 1 then
  5              INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6         else INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A short description');
  7         end if;
  8      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we look at the size of the resultant IOT:

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      2176          3          5       2052

The IOT is only of a moderate size, with 5 branch blocks and 2,052 leaf blocks.

If we look at the size of the Overflow segment:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_IOT';

 OBJECT_ID
----------
     74209

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74209';

TABLE_NAME         IOT_NAME         IOT_TYPE         BLOCKS
------------------ ---------------- ------------ ----------
SYS_IOT_OVER_74209 ALBUM_SALES_IOT  IOT_OVERFLOW      35715

We see that the vast majority of the storage is still allocated to the Overflow segment, at 35,715 blocks in size.

If look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 461972060=0x1b89225c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 1
kdxcoopc 0x97: opcode=7: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 251
kdxcofbo 538=0x21a
kdxcofeo 561=0x231
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21053971=0x1414213
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[561] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01811901.0
col  0: [ 5]  c4 04 57 1d 44
row#1[584] flag: K—S-, lock: 2, len=36
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 28 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 4]  c3 1d 2a 2e
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e
row#2[620] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01811901.1
col  0: [ 5]  c4 04 22 2d 07
row#3[643] flag: K—S-, lock: 2, len=37
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 05
tl: 29 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 36 17 52
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e

We notice the leaf block contains 251 row entries. Half the rows with a Description of 19 bytes have the Description value stored within the IOT leaf block, while the other half of rows with the larger Description values contain a nrid that refers to the corresponding Description within the Overflow segment.

If we analyze the table:

SQL> ANALYZE TABLE album_sales_iot COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_IOT';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_IOT                    500000     250000

We notice that only half the rows are now “chained rows”.

If we run a query that only references the rows with a small Description that are stored within the IOT structure:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=0;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=0)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       2211  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

The query at 7 consistent gets is relatively efficient as all the required data can be found within the IOT.

If however we run a query that references the larger Description rows:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=1;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4147  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

We see that it’s less efficient at 18 consistent gets as it needs to obviously access a larger volume of data and requires additional I/Os to access the corresponding Overflow segment.

So, with a combination of the INCLUDING and PCTTHRESHOLD clauses, one can control what data is and is not included within the IOT index structure.

Index Organized Tables – Overflow Segment Part II (The Loneliest Guy) January 18, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, Primary Key.
3 comments

In my previous post on Index Organized Tables (IOT), I introduced the concept of the IOT Overflow Segment, where we can store columns that we may not want to include within the actual IOT index structure. Before we move on, I just wanted to cover off a few additional points that could be a trap for the unwary …

In my experience, the Primary Key (PK) columns of a table are typically the first columns defined in the table. This has certainly been standard practice in most environments I’ve seen. This makes sense in that the PK are in many ways the “key” column(s) in the table and are identified as such by having the prestigious honour of being the first column(s) defined within the table. Most people look at and intuitively expect the first columns in the table to be the PK columns and for that reason alone, it’s probably good practice to consistently define the PK columns in this manner.

However, there’s also a good argument why having the PK columns as the leading columns in the table is precisely the wrong location for them. As many tables are “primarily” accessed via the PK columns and so accessed directly through the associated PK index, the application already knows the PK values of the row in question. Therefore, it’s somewhat inefficient to then have the PK columns the first columns defined in the table as these generally have to be read through and ignored before we get to the non-PK columns that are of direct interest and the reason for visiting the table block in the first place. By placing the PK columns after the most accessed non-PK columns, we avoid having to unnecessarily read through these PK columns again when accessing the table via the PK index.

I personally prefer to define the PK columns first in a standardised manner, with the advantages of avoiding possible confusion and misunderstandings outweighing any possible performance improvements. However, I can at least see the logic and merit of not following this standard with Heap tables.

The same however can not really be said for IOTs and I would strongly recommend defining the PK columns first in an IOT …

I’m going to run the same demo as I did in my last post on the Overflow Segment, but with one subtle change. I’m not going to define the two PK columns first but rather have them defined after my heavily accessed non-PK column:

SQL> CREATE TABLE album_sales_iot(total_sales NUMBER, album_id NUMBER, country_id NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING country_id OVERFLOW TABLESPACE bowie2;

Table created.

So in this example, my leading column is the non-PK total_sales column, followed then by the two PK columns. I still only want these 3 columns to be included in the actual IOT structure, so I have my INCLUDING clause only including columns up to the country_id column. I want the remaining large description column to be stored separately in an Overflow segment.

OK, let’s populate this table with the same data we used previously:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_iot VALUES(ceil(dbms_random.value(1,5000000)), i, c, 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we describe this table, we get the expected listing:


SQL> desc album_sales_iot
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 TOTAL_SALES                                        NUMBER
 ALBUM_ID                                  NOT NULL NUMBER
 COUNTRY_ID                                NOT NULL NUMBER
 DESCRIPTION                                        VARCHAR2(1000)

With the columns listed in the order as we defined them in the table.

If we query the column details from dba_tab_columns:

SQL> select column_id, column_name from dba_tab_columns where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID COLUMN_NAME
---------- ------------------------------
         1 TOTAL_SALES
         2 ALBUM_ID
         3 COUNTRY_ID
         4 DESCRIPTION

We again find the column order is as we defined them in the table.

When we run the same query we ran last time that returned the data with 5 consistent gets:

SQL> set arraysize 100
SQL> select album_id, country_id, total_sales from album_sales_iot where album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1300 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
       2394  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice that performance is a lot worse, with 20 consistent gets now required. Obviously, something has changed unexpectedly ???

The first clue on what’s going on here can be found by looking at dba_tab_cols:

SQL> select column_id, segment_column_id, column_name from dba_tab_cols where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME
---------- ----------------- ------------------------------
         1                 3 TOTAL_SALES
         2                 1 ALBUM_ID
         3                 2 COUNTRY_ID
         4                 4 DESCRIPTION

The SEGMENT_COLUMN_ID column determines the order of the columns as they’re actually stored within the segment and we notice the column order is different. The two PK columns are listed first, with the total_sales column only listed in the 3rd position.

As discussed in the IOT Introduction post, the structure of an index entry in an IOT has the PK columns as the leading columns, following by the non-PK columns in the table portion. This is critical because the PK columns determine the location within the IOT table where new rows need to be inserted and the subsequent ordering of the rows in the table. As such, the PK columns must always be the leading columns of an IOT, despite how the table is actually defined at creation time. If the PK columns are not listed first in the table creation DDL statement, Oracle will automatically re-order the columns and place the PK columns first regardless.

This now has consequences on the INCLUDING clause if specified. In the above table creation statement, the INCLUDING clause specified the country_id column. Although defined as the third column, as it’s a PK column, Oracle has automatically re-ordered the columns such that it’s physically listed as the second column within the IOT segment. Unfortunately the INCLUDING clause is only applied after the re-ordering of the columns and as such, the total_sales column which is now logically listed third and now after the country_id column, is not therefore actually included in the IOT index structure as (perhaps) intended.

A partial block dump of an IOT leaf block will confirm his:

Leaf block dump
===============
header address 298590812=0x11cc225c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 421
kdxcofbo 878=0x36e
kdxcofeo 879=0x36f
kdxcoavs 1
kdxlespl 0
kdxlende 0
kdxlenxt 21052811=0x1413d8b
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[879] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 9 fb: –H-F— lb: 0x0  cc: 0
nrid:  0x01811911.0
row#1[896] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 9 fb: –H-F— lb: 0x0  cc: 0
nrid:  0x01811911.1

As we can see, the IOT row entries only consist of the PK columns and the row reference to the corresponding Overflow segment. None of the non-PK columns (such as total_sales) are co-located within the IOT segment as the table column count is 0 (cc: 0).

As a result, additional consistent gets are now required to fetch the total_sales column from the Overflow segment to satisfy the query. This explains why the query is now less efficient than it was previously.

My recommendation with regard to defining IOTs is to simply list the PK columns first. This will ensure the INCLUDING clause is applied as intended and will generally reduce confusion and misunderstandings. Otherwise, the INCLUDING clause needs to specify a Non-PK column to ensure more than just the PK columns are actually included in the IOT segment, the consequences of which may not be obvious to the casual observer of the DDL or describer of the table.

Jonathan Lewis, a great source of information on indexes and Oracle in general has previously discussed this same IOT Trap on his blog.

Index Organized Tables – Overflow Segment (Shadow Man) January 13, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment.
14 comments

In my previous introductory IOT post, I illustrated how an Index Organized Table (IOT) might be worth consideration if most or all columns in a table were to be included within an index.

I’m going to use a slightly different demo this time, replacing one of the columns with a much larger DESCRIPTION column, one which is rarely accessed by the application:

SQL> CREATE TABLE album_sales_details_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Sorry for the unimaginative manner of loading the description field but you get the point :)

OK, let’s have a look at the size of the IOT:

SQL> ANALYZE INDEX album_sales_det_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
     71680          3        116      71429

As expected, the IOT is quite large as it has to accommodate the very large Description field within the IOT index structure. At 71,429 leaf blocks for the 500,000 rows in the table, that’s just 7 rows on average per leaf block.

The application doesn’t generally access the Description column with the following query typical (Note: to make fetching data as efficient as possible, I’ve set the arraysize to 100):

SQL> set arraysize 100
SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 521866300

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    17   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK |   100 |  1300 |    17   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         19  consistent gets
          0  physical reads
          0  redo size
       2387  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The query requires 19 consistent gets to retrieve the 100 rows because even though the data is extremely well clustered, there are very few rows per leaf block.

If we look at a partial block dump of one of these IOT leaf blocks:

Leaf block dump
===============
header address 548373084=0x20af825c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 7
kdxcofbo 50=0x32
kdxcofeo 1011=0x3f3
kdxcoavs 961
kdxlespl 0
kdxlende 0
kdxlenxt 20978307=0x1401a83
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1011] flag: K—–, lock: 0, len=1004
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 996 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 05 3b 03
col  1: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We can see the leaf block only has 7 rows, with the vast majority of space taken up by the very large Description column.

Considering the Description column is so large and/or that it’s rarely accessed, wouldn’t it be nice if we didn’t have to store this column directly within the IOT index structure itself.

Enter the IOT Overflow segment. The IOT Overflow segment enables us to store in another physical location those columns that we don’t necessarily want to store directly within the IOT index structure. So those columns that might be particularly large (or just the occurrences of those columns when the specific values might be too large to store within the IOT index structure) or those columns that are rarely accessed can be stored elsewhere. Effectively, we’re back to having a separate “table” like structure, but the Overflow segment will only hold those columns that we don’t necessarily want to store within the index structure. Unlike a normal Heap table, in which all columns are stored within the table segment.

There are a number of different methods we could use (to be explored further in future posts), for now I’ll use the INCLUDING clause:

SQL> CREATE TABLE album_sales_details_iot2(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk2 PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING total_sales OVERFLOW TABLESPACE bowie2;

Table created.

So in the above example, all columns up to and “including” the total_sales column will be included in the IOT index structure. All the following columns listed in the table definition (in this case the Description column) will be store in the Overflow segment, which in the above example will be created within the BOWIE2 tablespace.

If we now populate this table with the identical data as before:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot2 VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> ANALYZE INDEX album_sales_det_pk2 VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      1664          3          4       1613

We notice the IOT index structure is now significantly smaller, down from 71,429 to just 1,613 leaf blocks. All the “clutter” has now been removed and is stored elsewhere.

If we now re-run our query:

SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 |  1300 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       2390  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Things are now much more efficient, having reduced the required consistent gets down from 19 to just 5 consistent gets.

If we now look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 441197148=0x1a4c225c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 322
kdxcofbo 680=0x2a8
kdxcofeo 703=0x2bf
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21049987=0x1413283
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[703] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01800081.0
col  0: [ 5]  c4 02 5e 0d 25
row#1[726] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01800081.1
col  0: [ 5]  c4 04 41 13 43

We can see the number of index entries in the leaf block has increased from 7 to 322, with the size of the index entry decreasing from 1004 to just 23 bytes. Instead of the Description column being stored within the leaf block, we now have a nrid entry consisting of a 6 byte relative block address and row directory number (0x01800081.0), which effectively points to the actual location of the remaining portion of the row within the Overflow segment. We only therefore have a table column count of 1 (cc:1).

To find out more about the corresponding Overflow segment, we first must determine the OBJECT_ID of the IOT:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_DETAILS_IOT2';

 OBJECT_ID
----------
     74116

This OBJECT_ID is used to name the corresponding Overflow segment which we can determine from DBA_TABLES as it has a format of SYS_IOT_OVER_object_id:

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74116';

TABLE_NAME         IOT_NAME                 IOT_TYPE      BLOCKS
------------------ ------------------------ ------------ -------
SYS_IOT_OVER_74116 ALBUM_SALES_DETAILS_IOT2 IOT_OVERFLOW   71430

We notice this Overflow segment (at 71,430 blocks) is where the majority of our storage has been allocated.

Although it’s listed as a table, the Overflow segment can’t be directly accessed or manipulated. Any attempt to do so will result in an error:

SQL> select * from SYS_IOT_OVER_74116;
select * from SYS_IOT_OVER_74116
              *
ERROR at line 1:
ORA-25191: cannot reference overflow table of an index-organized table

If we look at a partial block dump of the Overflow segment block referenced in the previous IOT block dump:

Block header dump:  0x01800081
 Object id on Block? Y
 seg/obj: 0x12185  csc: 0x00.17482cc  itc: 1  flg: –  typ: 1 – DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.0000a919  0x00c24a2e.03d2.2a  C—    0  scn 0x0000.01748279
bdba: 0x01800081
data_block_dump,data header at 0x1a4c2244
===============
tsiz: 0x1fb8
hsiz: 0x20
pbl: 0x1a4c2244
     76543210
flag=——–
ntab=1
nrow=7
frre=-1
fsbo=0x20
fseo=0x4a6
avsp=0x486
tosp=0x486
0xe:pti[0] nrow=7 offs=0
0x12:pri[0] offs=0x1bda
0x14:pri[1] offs=0x17fc
0x16:pri[2] offs=0x141e
0x18:pri[3] offs=0x1040
0x1a:pri[4] offs=0xc62
0x1c:pri[5] offs=0x884
0x1e:pri[6] offs=0x4a6
block_row_dump:
tab 0, row 0, @0x1bda
tl: 990 fb: —–L– lb: 0x0  cc: 1
col  0: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We notice the Overflow block contains 7 rows as we would expect, as this was all the IOT segment could previously manage when it had to store the large Description column values.

The table row directory contains 7 rows, with the first row (#0) having an offset at address 0x1bda, which is the actual location of the first row within the Overflow block.

Therefore, in order to find a specific Description column value of interest from the IOT, Oracle references the (say) nrid:  0x01800081.0 within the IOT index entry for the row. This in turns points to the relative block address (0x01800081) of the Overflow block containing the description and the corresponding row directory number (0), which in turn specifies the offset (say) 0x1bda to the actual location of the Description value within the Overflow block. Easy !!

If we Analyze the IOT table:

SQL> ANALYZE TABLE album_sales_details_iot2 COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_DETAILS_IOT2';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_DETAILS_IOT2           500000     500000

We notice all the rows are listed as “Chained Rows“. This is because all the rows have a corresponding Description value stored in the Overflow segment and so the rows are not stored within the one block. As the previous query illustrated, this is no bad thing if we don’t need to reference these additional columns stored in the Overflow segment. It makes the resultant IOT table more compact and efficient to access.

However, on those (hopefully) rarer occasions when we do need to access the columns in the Overflow segment, this will clearly require additional block accesses:

SQL> SELECT * FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 | 99400 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 | 99400 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
       5541  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The above query which returns the Description column results in the consistent gets increasing to 32 consistent gets, from the 5 consistent gets when the Description wasn’t accessed and from the 19 consistent gets from when the Description column was co-located within the IOT segment. But this is a price we might be willing to pay if this query isn’t frequently executed while the frequently executed queries which don’t access the Description column are more efficient.

The Overflow segment gives us in a manner “the best of both worlds”. The ability to store just those columns of interest within the IOT segment (although these must always include all the Primary Key columns) and those that are less often accessed or too large to be efficiently stored within the IOT can be stored elsewhere. Effectively, it’s an index and table relationship except the table doesn’t have to store again the columns that are already stored within the index.

It’s all good news so far for IOTs …

Index Organized Tables – An Introduction Of Sorts (Pyramid Song) January 10, 2012

Posted by Richard Foote in Block Dumps, CBO, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Primary Key.
14 comments

Thought it was high time that I covered in a little detail the subject of Index Organized Tables (IOTs). When used appropriately, they can be an extremely useful method of storing and accessing data. Hopefully by the end of this series, you’ll have a better understanding of IOTs, their respective strengths and weaknesses and so perhaps be in a better position to take advantage of them when appropriate.

As I mentioned in a previous post, Martin Widlake has recently written an excellent series on IOTs, which I highly recommend. I’ll try to cover differing aspects of IOTs that will hopefully be of interest.

To start, let’s cover a very basic little example.

Let’s begin by creating and populating a simple Heap Table that holds information about musical albums (Note using an 8K blocksize in a MSSM tablespace):

SQL> CREATE TABLE album_sales(album_id number, country_id number, total_sales number, album_colour varchar2(20),
  2  CONSTRAINT album_sales_pk PRIMARY KEY(album_id, country_id));

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c IN 1..100 LOOP
  4        INSERT INTO album_sales VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We have a natural Primary Key that consists of two columns and an additional two columns of information.

Let’s look at some basic sizing information on the table and associated Primary Key index:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables WHERE table_name = 'ALBUM_SALES';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
      1570            0

SQL> ANALYZE INDEX album_sales_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1152          3       1062

So the table segment consists of 1570 blocks and the index segment 1152, with a total of 1062 leaf blocks.

OK, let’s run a basic query looking for all albums with an album_id=42:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3244723662

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   100 |  1800 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALBUM_SALES    |   100 |  1800 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ALBUM_SALES_PK |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4084  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As we can see, things are pretty good. 18 consistent gets in order to return 100 rows isn’t bad at all. Clearly, the index has a good Clustering Factor and can retrieve the 100 required rows in a relatively efficient manner.

However, this is a very frequently executed query and we want to do even better. One thing we notice is that we only have a couple of columns in the table which are not part of the index. Perhaps if we included these columns in the index as well, we can then use the index to extract all the required data and thus eliminate the need to visit the table segment at all. Overloading an index in this manner is a common tuning technique and will hopefully reduce the number of required logical I/Os to run the query.

We can do this by dropping and recreating the index with all the columns, making sure the PK columns remain the leading columns. This will ensure the index can still be used to police the PK constraint:

SQL> ALTER TABLE album_sales DROP PRIMARY KEY;

Table altered.

SQL> CREATE INDEX album_sales_pk_i ON album_sales(album_id, country_id, total_sales, album_colour) COMPUTE STATISTICS;

Index created.

SQL> ALTER TABLE album_sales ADD constraint album_sales_pk PRIMARY KEY(album_id, country_id);

Table altered.

OK, so the index now contains all the columns in the table and is now used to police the PK constraint:

SQL> select constraint_name, constraint_type, index_name from dba_constraints where constraint_name = 'ALBUM_SALES_PK';

CONSTRAINT_NAME                C INDEX_NAME
------------------------------ - ------------------------------
ALBUM_SALES_PK                 P ALBUM_SALES_PK_I

Let’s now look at the size of the index:

SQL> ANALYZE INDEX album_sales_pk_i VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      2048          5       2006

OK, as expected the index is now somewhat larger as it now needs to accommodate the extra columns. The number of overall blocks allocated to the index is 2048, with leaf blocks increasing from 1062  to 2006 leaf blocks.

If we now re-run the query:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1126128764

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_PK_I |   100 |  1800 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       3568  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice things have indeed improved and we have reduced the number consistent gets from 18 down to just 11. Not a bad improvement !!

If look at a partial block dump of one of the index leaf blocks:

Leaf block dump
===============
header address 484409948=0x1cdf825c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 5
kdxcosdc 0
kdxconro 258
kdxcofbo 552=0x228
kdxcofeo 1373=0x55d
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 20972941=0x140058d
kdxleprv 20972939=0x140058b
kdxledsz 0
kdxlebksz 8036
row#0[8010] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 12
col 2; len 5; (5):  c4 04 15 31 59
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b7
row#1[7984] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 13
col 2; len 5; (5):  c4 03 19 2c 3d
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b8

We notice that each leaf entry is 26 bytes in length. The length of the four columns adds up to 13 bytes. The remaining 13 bytes is basically overhead required for each index entry:

2 bytes for flag and lock information in the index entry header

5 x 1 byte for each of the length bytes for each column

6 bytes for the 5th index column which is the index rowid

So that’s 13 bytes of overhead per index entry in this example index.

Well, everything is currently pretty good. We have the application now performing approximately 40% less work than it was previously. But we have one little issue. With the index now consisting of all the columns in the table and with the application using the index exclusively, what’s the point of now having the table? It’s wasting storage and wasting resources in having to be maintained for no purpose other than having to exist so that the index can in turn exist.

Wouldn’t it be nice if we can somehow just have the index, but without the underlining table. Enter the Index Organized Table (IOT), first introduced way back in Oracle 8.0. It’s basically an index structure that can exist without the need for an underlining table. The index structure itself is the table by which we can store and retrieve the necessary data.

OK, let’s now create a new version of this table with the same data, but this time as an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sals number, album_colour varchar2(20),
     CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4        INSERT INTO album_sales_IOT VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

The key clause is here ORGANIZATION INDEX. I’ll discuss other options and syntax in coming posts.

If we look now at the table segment:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables
  2  WHERE table_name = 'ALBUM_SALES_IOT';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
                        IOT

We see there is an IOT segment listed but consists of no blocks as it doesn’t physically exist …

If we look at the size of the corresponding index:

SQL> SELECT index_name, table_name, blevel, leaf_blocks FROM dba_indexes
  2  WHERE table_name = 'ALBUM_SALES_IOT';

INDEX_NAME           TABLE_NAME       BLEVEL LEAF_BLOCKS
-------------------- --------------- ------- -----------
ALBUM_SALES_IOT_PK   ALBUM_SALES_IOT       2        1550

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1664          4       1550

We notice it’s smaller than the corresponding overloaded index for the Heap Table. The previous index consisted of 2048 blocks and 2006 leaf blocks but this index is somewhat smaller at just 1664 blocks and 1550 leaf blocks.

If we take a look at a partial block dump of a leaf block from the IOT:

Leaf block dump
===============
header address 483926620=0x1cd8225c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 1
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 2
kdxconro 336
kdxcofbo 708=0x2c4
kdxcofeo 710=0x2c6
kdxcoavs 2
kdxlespl 0
kdxlende 0
kdxlenxt 20976645=0x1401405
kdxleprv 20976643=0x1401403
kdxledsz 0
kdxlebksz 8036
row#0[710] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 49
tl: 14 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 2f 10 59
col  1: [ 4]  47 4f 4c 44
row#1[732] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 4a
tl: 14 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 03 01 03 46
col  1: [ 4]  47 4f 4c 44

Firstly, we notice it’s definitely an IOT block dump as the IOT flag is set.

The structure of the index entry is somewhat different here. It basically consists of:

2 bytes for lock and flag info in the index header as previously

Next come the two Primary Key columns with their corresponding length bytes. Note an IOT must have a PK defined.

Following are 3 bytes for the table header consisting of a lock byte, flag byte and a byte to denote the number of table (non PK) columns (in this case 2).

Followed finally by the 2 Non-PK columns and their corresponding length bytes.

Note the big missing component here from the previous block dump is that there is no rowid defined with its corresponding length byte. No need for a rowid if there’s no corresponding table to point down to …

So the overall overhead has been reduced to:

2 byes for the index header

3 bytes for the table header

4 bytes for the 4 column lengths

for a total of 9 bytes, 4 less than the 13 bytes overhead required in the previous example. So the total length of an index entry has reduced down from 26 bytes to just 22 bytes. Hence, the overall reduction in the size of the corresponding IOT index.

So we have saved 1570 table blocks and 384 index blocks in total.

If we now re-run the same query:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1800 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)

 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
       3575  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Not only have we saved ourselves some storage and having to maintain two physical segments, but things are a tad more efficient as well, reducing the number of consistent gets down from 11 to 10 as the corresponding index segment we need to access is smaller …

Enough to start with for now and yes the pun in the title is fully intended :)

Index Block Dumps: Final Demo (Come Together) November 4, 2010

Posted by Richard Foote in Block Dumps, Leaf Blocks, Oracle Indexes.
1 comment so far

The intent of this blog piece is just to bring together the whole discussion of block dumps and how we can use block dumps to demonstrate Oracle behaviour.

First, let’s start with a fresh little demo, creating an index on a NAME column with 500 entries (note this specific demo uses an 11.2.0.1 database running on windows). The column all have a value of ‘BOWIE’ with a distinct number concatenated on the end.

 

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> create index bowie_name_i on bowie(name);

Index created.

SQL> insert into bowie select rownum, 'BOWIE' || rownum from dual connect by level <= 500;

500 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=> true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We notice this index is a blevel 1 index, consisting of a root block pointing down to just 2 leaf blocks:


SQL> select blevel, leaf_blocks from dba_indexes where index_name = 'BOWIE_NAME_I';

    BLEVEL LEAF_BLOCKS
---------- -----------
         1           2

 

I’m just going to show selected portions from the different block dumps, focusing on the dump from disk section (hence flush the buffer cache before each block dump):

SQL> alter system flush buffer_cache;

System altered.

SQL> select header_file, header_block from dba_segments where segment_name='BOWIE_NAME_I';

HEADER_FILE HEADER_BLOCK
----------- ------------
          6          168

  

The specific block of interest will be the second (or last) index leaf block, so I just add 3 to the header block value (note index is in a non ASSM LMT):

SQL> alter system dump datafile 6 block 171;

System altered.

 
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0x0000.003bb7e9 seq: 0x01 flg: 0x04 tail: 0xb7e90601
frmt: 0x02 chkval: 0x285e type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0x00.3bb7e9  itc: 2  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0x0000.003bb7e1
0x02   0x0003.017.00000d3e  0x00c049a3.021a.03  C—    0  scn 0x0000.003bb7e3
Leaf block dump
===============
header address 211493468=0xc9b225c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 300
kdxcofbo 636=0x27c
kdxcofeo 2722=0xaa2
kdxcoavs 2086
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8036
row#0[4414] flag: ——, lock: 0, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4431] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4449] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 31
col 1; len 6; (6):  01 80 00 a1 01 18

We currently have 2 ITL entries in the index leaf block, the first entry used by Oracle to deal with the leaf block split required when loading the data, the second entry for the actual transaction loading the table/index. The kdxcronro count is 300 meaning we currently have 300 index entries in this block. Note the kdxlenxt value is 0 meaning there is no next pointer, ensuring we are indeed looking at the second (or last) index leaf block within the index structure. We’re now going to add a couple of new index entries that will have greater values than all our BOWIEs guaranteeing they’ll be inserted into this leaf block. We’re going to do this by running a couple of separate concurrent transactions running in different sessions:

In one session:

SQL> insert into bowie values (501, 'MAJOR TOM');

1 row created.

In another session:

SQL> insert into bowie values (502, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

 
Back in the first session:

SQL> commit;

Commit complete.

 
So there were 2 concurrent transactions inserting index entries, with the transaction inserting the value “MAJOR TOM” committing last. Looking at a dump of the index block now:

     
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0x0000.003bb95e seq: 0x01 flg: 0x06 tail: 0xb95e0601
frmt: 0x02 chkval: 0x1f40 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0x0000.003bb7e1
0x02   0x0004.016.00000d65  0x00c00e88.029f.03  –U-    1  fsc 0x0000.003bb95e
0x03   0x0007.00a.00000d5c  0x00c02578.0261.02  –U-    1  fsc 0x0000.003bb95a
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 302
kdxcofbo 640=0x280
kdxcofeo 2655=0xa5f
kdxcoavs 2015
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[4390] flag: ——, lock: 0, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4407] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4425] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 31


row#300[2679] flag: ——, lock: 2, len=19
col 0; len 9; (9):  4d 41 4a 4f 52 20 54 4f 4d
col 1; len 6; (6):  01 80 00 a2 00 55
row#301[2655] flag: ——, lock: 3, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56
—– end of leaf block dump —–

     
We notice we now have an additional ITL entry. The first entry is reserved for Oracle service operations (such as block splits). The second entry was therefore grabbed by the first transaction (which inserted “MAJOR TOM”) while a new third ITL entry had to be added to accommodate the second concurrent transaction. At the bottom of the block we can see the 2 new index entries, one currently marked as locked by the transaction in ITL 2 and the other entry containing “ZIGGY STARDUST” locked by the second transaction in ITL 3. These lock bytes (which are no longer required as the transactions have now completed) will be subsequently cleaned out as we shall see …

As the transaction in ITL 2 was the last to commit, its corresponding Scn/fsc (0x0000.003bb95e) is the last transaction to have changed the block and hence is also stored in the block header (scn: 0x0000.003bb95e).

Let’s now add another index entry:

SQL> insert into bowie values (503, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0x0000.003c327c seq: 0x02 flg: 0x06 tail: 0x327c0602
frmt: 0x02 chkval: 0xb367 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1

Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0x00.3c327b  itc: 3  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0x0000.003bb7e1
0x02   0x0004.016.00000d65  0x00c00e88.029f.03  C—    0  scn 0x0000.003bb95e
0x03   0x0003.010.00000d6c  0x00c015a9.0221.08  –U-    1  fsc 0x0000.003c327c
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 303
kdxcofbo 642=0x282
kdxcofeo 2630=0xa46
kdxcoavs 1988
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[4390] flag: ——, lock: 0, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4407] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4425] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 31
col 1; len 6; (6):  01 80 00 a1 01 18


row#300[2679] flag: ——, lock: 0, len=19
col 0; len 9; (9):  4d 41 4a 4f 52 20 54 4f 4d
col 1; len 6; (6):  01 80 00 a2 00 55
row#301[2630] flag: ——, lock: 3, len=25
col 0; len 15; (15):  54 48 49 4e 20 57 48 49 54 45 20 44 55 4b 45
col 1; len 6; (6):  01 80 00 a2 00 57
row#302[2655] flag: ——, lock: 0, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56
—– end of leaf block dump —–

We notice the previous lock information has now been cleaned out with only this last transaction (reusing the ITL entry of the previously oldest transaction, ITL 3) now having a lock byte set for its corresponding row (“THIN WHITE DUKE”). This transaction’s scn/fsc (0x0000.003c327c) is now the scn marking the block header.

Let’s delete a few rows now, firstly the row containing “MAJOR TOM”:

SQL> delete bowie where name = 'MAJOR TOM';

1 row deleted.

SQL> commit;

Commit complete.

 

And now all the rows that start with BOWIE as a separate transaction:

SQL> delete bowie where name like 'BOWIE%';

500 rows deleted.

SQL> commit;

Commit complete.

 
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0x0000.003c3e8a seq: 0x01 flg: 0x06 tail: 0x3e8a0601
frmt: 0x02 chkval: 0x139e type: 0x06=trans data

Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0x00.3c3e85  itc: 3  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0x0000.003bb7e1
0x02   0x0005.01a.00000d73  0x00c011bf.0268.05  C-U-    0  scn 0x0000.003c3b42
0x03   0x0004.01f.00000d72  0x00c01f0a.02a1.25  –U-  300  fsc 0x171a.003c3e8a
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 302
kdxcofbo 640=0x280
kdxcofeo 2630=0xa46
kdxcoavs 2009
kdxlespl 0
kdxlende 300
kdxlenxt 0=0x0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[4390] flag: —D–, lock: 3, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4407] flag: —D–, lock: 3, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4425] flag: —D–, lock: 3, len=18


row#299[7995] flag: —D–, lock: 3, len=17
col 0; len 7; (7):  42 4f 57 49 45 39 39
col 1; len 6; (6):  01 80 00 a1 00 62
row#300[2630] flag: ——, lock: 0, len=25
col 0; len 15; (15):  54 48 49 4e 20 57 48 49 54 45 20 44 55 4b 45
col 1; len 6; (6):  01 80 00 a2 00 57
row#301[2655] flag: ——, lock: 0, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56

  
The first transaction used the now oldest ITL slot 2. The second transaction then went on to use ITL slot 3, cleaning out the lock information of the first transaction in ITL 2. It deleted all 300 index entries within the block starting with BOWIE, marking them all as deleted with the D flag in all the index entries and with a 3 lock byte set. Note however the index entry for MAJOR TOM as deleted in the first transaction has already been physically removed from the leaf block …

Again, the transaction in ITL 3 being the last transaction now has its scn/fsc (0x171a.003c3e8a) in the block header (scn: 0x0000.003c3e8a).

Let’s add a couple new rows with 2 transactions to cycle through both ITL entries …

SQL> insert into bowie values (504, 'DAVID JONES');

1 row created.

SQL> commit;

Commit complete.

SQL> insert into bowie values (505, 'SCREAMING LORD BYRON');

1 row created.

SQL> commit;

Commit complete.

 
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0x0000.003c42b0 seq: 0x02 flg: 0x06 tail: 0x42b00602
frmt: 0x02 chkval: 0x0191 type: 0x06=trans data

Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0x00.3c42ae  itc: 3  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0x0000.003bb7e1
0x02   0x0009.001.00000d80  0x00c044f5.029a.03  C—    0  scn 0x0000.003c418d
0x03   0x0001.013.00000e05  0x00c0423b.0267.02  –U-    1  fsc 0x0000.003c42b0
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 4
kdxcofbo 44=0x2c
kdxcofeo 2579=0xa13
kdxcoavs 7868
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[2609] flag: ——, lock: 0, len=21
col 0; len 11; (11):  44 41 56 49 44 20 4a 4f 4e 45 53
col 1; len 6; (6):  01 80 00 a2 00 55
row#1[2579] flag: ——, lock: 3, len=30
col 0; len 20; (20):  53 43 52 45 41 4d 49 4e 47 20 4c 4f 52 44 20 42 59 52 4f 4e
col 1; len 6; (6):  01 80 00 a2 00 58
row#2[2630] flag: ——, lock: 0, len=25
col 0; len 15; (15):  54 48 49 4e 20 57 48 49 54 45 20 44 55 4b 45
col 1; len 6; (6):  01 80 00 a2 00 57
row#3[2655] flag: ——, lock: 0, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56
—– end of leaf block dump —–

      
We now notice all the 300 BOWIE entries have now been physically cleaned out of the block as well, cleaned out as part of the block changes required for these final transactions. The leaf block now only contains these 4 index entries, as shown with a kdxconro 4.  The last transaction (inserting “SCREAMING LORD BYRON”) using ITL 3 is the only transaction with its lock byte still set and has its scn/fsc (0x0000.003c42b0) in the block header (scn: 0x0000.003c42b0).

So each concurrent transaction within the index block requires an ITL entry (and Oracle will add them as necessary providing there’s sufficient free space within the block). A transaction will not only make its necessary changes, locking just those index entries associated with the transaction but will also clean out data from previous transactions if present (including index entries marked as deleted by a previous transaction). Finally, it will generally stamp the block header with the corresponding transaction scn.

Hopefully, this highlights how block dumps can be useful to both see and demonstrated Oracle behaviour.

Next, time to look at a number of 11g index related new features …

Index Block Dump: Index Only Section Part I (TVC 15) August 10, 2010

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes.
2 comments

Having already covered general block header details relevant to several different types of Oracle blocks (Block Dumps Part I and Part II), the next part of the block dump is relevant only to index blocks.

Below is a dump of the index only section of an index leaf block dump:

Leaf block dump
===============
header address 200514140=0xbf39a5c
kdxcolev 0
KDXCOLEV Flags = – – -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 3
kdxcofbo 42=0x2a
kdxcofeo 7987=0x1f33
kdxcoavs 7945
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036

 
First a note identifies the block as being a Leaf block dump or a Branch block dump.

All index block dumps (whether an index leaf or branch) have the details listed above up to and including kdxcoavs. Specific points of interest for details included in any index block dump:

kdxcolev 0 – index level (a value of 0 denotes this as a leaf block)

kdxconco 2 – number of columns in the index. Note: although the index was only defined on the single NAME column, the index actually includes the rowid as an additional column, as the index was defined as Non-Unique (hence the value 2). All Oracle indexes are effectively unique as Oracle makes them so by adding the rowid to the column list for all Non-Unique defined indexes.

kdxconro 3 – number of index entries in the index block (this block dump was taken when the index only had the 3 entries, prior to the insertion of the additional rows in Part II in this series)

kdxcofbo 42=0x2a – offset to the beginning of free space within the block

kdxcofeo 7987=0x1f33 – offset to the end of free space within the block. Note index entries get added “from the bottom” of the index free space and so it’s from this offset that any subsequent index entries will be added.

kdxcoavs 7945 – available free space within the block (effectively the space between kdxcofbo and kdxcofbe)

The following details are only included if the block is an index leaf block.

kdxlende 0 – number of index entries that have been marked as deleted within the block (this will be discussed more fully later). When an index entry is deleted (or indeed updated which is effectively a delete of an index entry followed by an insert), the entry is not physically deleted but is only marked as deleted. Generally these deleted index entries are subsequently cleaned out by Oracle but kdxlende keeps a count of those deleted index entries that have yet to be cleaned out.

kdxlenxt 0=0x0 – pointer to the next index leaf block within the logical index structure (as this is the only leaf block within the index, there is no pointer set although this is usually only unset for the last or “right most” leaf block within the index structure). During a larger index range scan, all the required index entries may not be found within a single index leaf block and Oracle may need to quickly find the next logical leaf block within the index structure. This pointer basically contains the Relative Block Address (RBA) of the next logical leaf block which can be accessed directly as necessary during such index range scans.

kdxleprv 0=0x0 – pointer to the previous index leaf block within the logical index structure (as this is the first and only leaf block within the index, again there is no pointer set although this is usually only unset for the first or “left most” leaf block within the index structure). Again, during a larger index range scan in which the data is required in descending order, all the required index entries may not be found within a single index leaf block and Oracle may need to quickly find the previous logical leaf block within the index structure. This pointer basically contains the RBA of the previous logical leaf block which can be accessed directly as necessary. As such, it’s used by Oracle for index range scans that require the data to be extracted in logically reverse order (eg. to avoid an order by desc). Note only a leaf block associated with a blevel 0 index has neither the kdxlenxt or kdxleprv assigned.

These two pointers provide the linked list mechanism for all such index range scan operations involving more than one leaf block. Note that a branch block doesn’t have such pointers between branch blocks but it does have a pointer to the first block it references below itself in the index structure (kdxbrlmc).

kdxlebksz 8036 – actual maximum useable space within the index block (basically the block size less the block header “overheads”). Note that an index branch block has an equivalent value called kdxbrbksz which typically has more useable space as a branch block only has the one ITL entry by default.

  
Next, we’ll look at the index entries themselves.

Index Block Dump: Block Header Part II and Read Consistency (I Can’t Read) July 28, 2010

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes, Read Consistency.
3 comments

OK, let’s look at the next portion of the index block dump.
 
Following the hex dump of the block (as we ended Part I of the series) is the second part of the block header (see below):   

Block header dump:  0x0201490a
 Object id on Block? Y
 seg/obj: 0x1c205  csc: 0x00.2d11214  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.02d11214

  

The following details are listed initially:
 
0x0201490a – rdba again
seg/obj: 0x1c205 – data_object_id again
csc: 0x00.2d11214 – initial scn of block
itc: 2 – number of Interested Transaction List (ITL) entries in the block
typ: 2 – INDEX  – denotes this specific block as indeed an Index block
 
Then follows the Interested Transaction List (Itl) which is used by Oracle to store critical information of any transaction making changes within the block. An index leaf block is basically assigned 2 ITL entries by default (but only 1 if it’s an index Branch block). However the number of pre-assigned ITL entries is set by the INITRANS physical attribute parameter of the index. 

A ITL entry basically consists of:
 
A unique slot ID (starting with 01)
Transaction ID
Undo Block Address (Uba) which denotes the Undo segment assigned to the specific transaction
Flag and Lock Bytes
System Change Number (SCN) 
 

Any transaction wishing to make a change to the block must first be allocated to an ITL entry, with the first ITL entry (01) reserved by Oracle for internal recursive operations. If there’s an ITL entry that’s not being used by a current transaction, great, a new transaction will simply use a free ITL entry. However, if all ITL slots are already assigned to current transactions within the index block, then an additional ITL entry is created and allocated to the new transaction by Oracle, providing there’s sufficient free space within the block or the 255 limit hasn’t been reached (or the MAXTRANS parameter is not reached in older versions of Oracle).
 
As a simple demo to see this in action, I’m going to insert two additional rows into the table (and so create two new index entries in the index): 

 
SQL> insert into bowie values (4, 'THIN WHITE DUKE');
 
1 row created.
 
SQL> insert into bowie values (5, 'ALADDIN SANE');
 
1 row created.
 
SQL> commit;
 
Commit complete.

         
If we now dump the index block and look at the ITL slots:
 


Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x000a.006.000065d4  0x00c00441.2fb5.09  --U-    2  fsc 0x0000.02d120c8

   

We see the transaction has used ITL slot number 02 as relevant details are now assigned.
 
Points of interest includes the Flag is now set to U (Unclean) and that the Lock attribute is assigned the value 2. This means the transaction has been committed but the index block has not been cleaned out and is currently “unclean” in that the 2 new index rows associated with the transaction still have the lock byte set (as we’ll see in a later post).
 
The other important piece of information is the value of the SCN within the ITL (0x0000.02d120c8). The SCN is basically a timestamp that denotes when the transaction made the associated changes to the index block. As this is the last transaction to have changed the block, this SCN effectively denotes the point of time when the block was last changed.
 
If we now also look at the start of the block dump and the block header details discussed in Part I of this series:
 

Start dump data blocks tsn: 8 file#:8 minblk 84234 maxblk 84234
Block dump from cache:
Dump of buffer cache at level 4 for tsn=8, rdba=33638666
BH (0x157E78EC) file#: 8 rdba: 0x0201490a (8/84234) class: 1 ba: 0x1548C000
  set: 3 bsz: 8192 bsi: 0 sflg: 0 pwc: 0 lid: 0x00000000,0x00000000
  dbwrid: 0 obj: 115205 objn: 115205 tsn: 8 afn: 8
  hash: [0x292CCE8C,0x292CCE8C] lru: [0x19FFB3BC,0x2930A050]
  obj-flags: object_ckpt_list
  ckptq: [0x1A3EA500,0x18FF0A90] fileq: [0x293178F0,0x19FFB398] objq: [0x2183AA3C,0x2183AA3C]
  st: XCURRENT md: NULL tch: 2
  flags: buffer_dirty redo_since_read gotten_in_current_mode
  LRBA: [0x627.e9dd.0] LSCN: [0x0.2d120c8] HSCN: [0x0.2d120c8] HSUB: [1]
  cr pin refcnt: 0 sh pin refcnt: 0
  buffer tsn: 8 rdba: 0x0201490a (8/84234)
  scn: 0x0000.02d120c8 seq: 0x03 flg: 0x02 tail: 0x20c80603
  frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
 

We notice the header has also been changed and now has the SCN from this transaction stamped in the header. Therefore any process needing this index block to be at a specific point in time (say a Select statement during a consistent read operation) can now quickly check the block header to see “when” it was last changed.
 
If the block has indeed changed since the (say) Select statement started, it need now only check the ITL slots to see which transaction has matching SCN details and so find the specific transaction that last changed the index block. Knowing the transaction in question, Oracle can then reference the Undo Block Address in the ITL entry and so determine the specific undo segment that contains details of the previous state of the block and rollback all changes made by this transaction and create a new consistent image of the block as it was prior to the transaction making the block changes. The Select statement can now check the previous SCN in the header and repeat the same process as necessary until the index block is at a point in time prior to the Select statement starting.
 
In short,  basically how consistent reads are implemented in Oracle.

Everything I’ve discussed so far in Parts I & II are just as applicable to tables (and other segments) as they are to indexes. In the next post, we’ll look at the next portion of an index block dump which is unique to index segments …

Index Block Dump: Block Header Part I (Editions Of You) July 20, 2010

Posted by Richard Foote in 11g, Block Dumps, Index Internals, Oracle Indexes.
1 comment so far

I’ve previously looked at how to generate an Oracle block dump, time to now go into a little more detail.

As I mentioned, a block dump is a formatted representation of the actual contents of an Oracle block. Producing strategic block dumps can be an extremely useful method of determining what might be going on in Oracle under the covers and over the years I’ve found them to be an invaluable aid in helping me understand Oracle behaviour, troubleshoot issues, investigate the contents of corrupted blocks, etc. The focus in this series will be Oracle block dumps from the perspective of indexes, although the contents any Oracle block can be dumped and investigated.
 
To setup the demo, I’m going to initially create a simple little table and associated index that has only 3 rows to begin with. As the index is so tiny, all the contents can fit within the one index leaf block resulting in an index with a blevel of 0 (height of 1). Note I’m using a 11.1.0.6.0 database running on Windows for this specific demo. The actual format and content of a block dump differs between releases and continually changes. However, much of the useful content which I’ll focus on remains relatively consistent.
 

SQL> create table bowie (id number, name varchar2(20));
 
Table created.
 
SQL> insert into bowie values (1, 'BOWIE');
 
1 row created.
 
SQL> insert into bowie values (2, 'ZIGGY');
 
1 row created.
 
SQL> insert into bowie values (3, 'MAJOR TOM');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_name_i on bowie(name);
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks from dba_indexes where index_name = 'BOWIE_NAME_I';
 
INDEX_NAME          BLEVEL LEAF_BLOCKS
--------------- ---------- -----------
BOWIE_NAME_I             0           1

 

OK, as this is a simple blevel 0 index, all the contents of the index can be found in the block immediately following the index segment header.
 

SQL> select header_file, header_block from dba_segments where segment_name='BOWIE_NAME_I';
 
HEADER_FILE HEADER_BLOCK
----------- ------------
          8        84233

 

So the block we want to dump is the block in datafile 8, block 84234 (1 more than the segment header block). Before we dump the block though, let’s just note a few other points about this index. Firstly, given we know the datafile and block of interest, we can determine its relative data block address (rdba) by plugging in these details into the dbms_utility.make_data_block_address function:

SQL> select dbms_utility.make_data_block_address(8, 84234) from dual;
 
DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(8,84234)
---------------------------------------------
                                     33638666

  
We’ll note the rdba of 33638666 for future reference. The other thing we’ll just record is the Object ID of the index:
 

SQL> select object_id, data_object_id from dba_objects where object_name = 'BOWIE_NAME_I';
 
 OBJECT_ID DATA_OBJECT_ID
---------- --------------
    115205         115205

 

Again, we’ll note an Object_Id and Data_Object_Id of 115205 for future reference. Finally, I’m just going to flush the buffer cache to ensure the current contents of the block is written to disk. This is useful in 11g when the block dump differentiates between the block contents in the buffer cache and on disk.

SQL> alter system flush buffer_cache;
 
System altered.

 

OK, let’s now dump the block associated with this index leaf block:
 

SQL> alter system dump datafile 8 block 84234;
 
System altered.

 

Below is just the block header portion of the resultant block dump:
 

Start dump data blocks tsn: 8 file#:8 minblk 84234 maxblk 84234
Block dump from cache:
Dump of buffer cache at level 4 for tsn=8, rdba=33638666
BH (0x17BF3D8C) file#: 8 rdba: 0x0201490a (8/84234) class: 1 ba: 0x17A70000
  set: 3 bsz: 8192 bsi: 0 sflg: 0 pwc: 0 lid: 0x00000000,0x00000000
  dbwrid: 0 obj: 115205 objn: 115205 tsn: 8 afn: 8
  hash: [0x292CCE8C,0x15BFB42C] lru: [0x167EA1EC,0x15BFB48C]
  lru-flags: on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL]
  st: FREE md: NULL tch: 0
  flags:
  cr pin refcnt: 0 sh pin refcnt: 0
  Buffer contents not dumped
Block dump from disk:
buffer tsn: 8 rdba: 0x0201490a (8/84234)
scn: 0x0000.02d11215 seq: 0x01 flg: 0x04 tail: 0x12150601
frmt: 0x02 chkval: 0x9ae6 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0BF39A00 to 0x0BF3BA00

BF39A00 0000A206 0201490A 02D11215 04010000  [.....I..........]
BF39A10 00009AE6 001A0002 0001C205 02D11214  [................]
BF39A20 1FE80000 00021F02 00000000 00000000  [................]
BF39A30 00000000 00000000 00000000 00000000  [................]
BF39A40 00000000 0000FFFF 00000000 00000000  [................]
 
… Snip rest of memory dump …
 

As I mentioned, the actual format and content of a block dump differs between releases and continually changes so your formatted block dump may differ somewhat. However, the main points which I’ll discuss should be found in most currently supported versions of Oracle. This post should only be considered as a basic introduction on the subject. More depth and details to come.

As this is an 11g block dump, the block header consists of three distinct sections:

1) Dump of the buffer cache details associated with the index block

2) Dump of index block from disk

3) Full raw hex dump of the associated block
 

The first thing to check is that one is looking at the correct block dump and that the correct segment block was indeed dumped. The start of the formated block dump states the details of the dumped block(s):
 
file#:8 minblk 84234 maxblk 84234
 
so indeed, we’re looking at the correct dump file. These details are also listed along with the hex representation of relative block address (rdba) information:
 
file#: 8 rdba: 0x0201490a (8/84234)
 
and we can also confirm that the rdba is also correct and consistent with the block we have dumped:
 
rdba=33638666
 
as this matches the results of the rdba from the rdba as previously displayed via the MAKE_DATA_BLOCK_ADDRESS function.
 
We can also confirm that the correct index segment was dumped as the object id of the index:
 
obj: 115205 objn: 115205
 
matches the data_object_id and object_id from dba_objects as listed previously.
 

Other details of interest found in block dumps in most Oracle versions include:
 
scn: 0x0000.02d11215 - The System Change Number (SCN) of the block when it was last modified (I’ll show you how this gets populated in a later post). This is the effective “timestamp” of the block and is used by Oracle to determine crucial information such as effectively “when” the block was last modified and by “what” transaction.
 
type: 0x06=trans data - denotes the type of block. 06 represents a transactional block (table/index/cluster)
 
seq: 0x01 - number of the block changes within the the current scn (we’ll see how this increments in a later post)
 
tail: 0x12150601 – which consists of the last 2 bytes of the SCN, type and seq
 
frmt: 0x02 - denotes the block format with 02 representing a post Oracle8 block format. An A2 block (as shown in the raw block dump) denotes a post 10g block format.
 
chkval: 0x9ae6 – checksum value of block as used by Oracle in part to check the consistency and validity of the block 
 

In later versions of Oracle (10g and beyond), the block dump includes a full hex dump of the associated memory buffer. I’ve snipped most of this in the above dump extract. However, one can see where the details of the block I’ve listed above can be found within the memory dump (hopefully, the colour code will help to highlight where each distinct piece of information can be found). Note also in 11g and beyond, more details of the buffer cache are listed and detailed as defined in the buffer cache section.
 
I’ll look at the following portion of the index block header, the Interested Transaction Slots, in the next post. We’ve only just begun …

Bitmap Index Degradation After DML Prior To 10g (Beauty and the Beast) May 25, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Index Internals, Oracle Indexes.
5 comments

Bitmap Indexes have a bad reputation with regard to being problematic and suffering from severe degradation following heavy DML operations, especially larger amounts of insert activity. Bitmap indexes have been known to grow substantially and require periodic rebuilds following such insert activity.

While this was certainly the case in previous versions of Oracle, things have dramatically improved since version 10g. I thought it might be worthwhile explaining why Bitmap Indexes had such issues prior to 10g and how things are now so much better, in many cases making Bitmap index rebuilds unnecessary.

To start with, a demo on an Oracle 9.2.0.7 database. I’m first going to create a simple little table with 1M rows, with a Bitmap index on a CODE column with 1,000 distinct values. The values are effectively inserted into the table in an evenly distributed manner throughout the entire table.

SQL> create table bowie as select mod(rownum,1000)+1 id, mod(rownum,1000)+1 code,'BOWIE' name from dual connect by level <= 1000000;

Table created.

SQL> create bitmap index bowie_code_i on bowie(code) compute statistics;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

We notice that the Bitmap index has 1000 index entries, one for each index value.  There are 500 leaf blocks which means we can only fit 2 index entries in each leaf block on average. If we look at a partial block dump of the first leaf block:

row#0[5013] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 cc 0c 00 e0
col 2; len 6; (6):  02 02 d6 2d 01 7f
col 3; len 2998; (2998):
03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...
row#1[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

I’ve listed just the first portions of each of the 2 index entries in the leaf block. Each indexed value is different (col 0)  and each have a bitmap string of 2998 bytes in size (col3). Therefore in an 8K block, Oracle can’t indeed fit a 3rd index entry in a block, hence why there are only 2 index entries per block. The bitmap string column is quite large because in a 1M row table, there are 1000 occurences of each indexed value littered throughout the table that need to be referenced.

OK, next I’m going to insert just 1 additional row. It has a CODE value of 1, which is the minimum existing CODE value and so should be indexed in the first index leaf block. Let’s see what impact this has on the index statistics:

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

The crucial point here is that although the value (1) already existed within the index, Oracle has nonetheless created an additional index entry (1001 NUM_ROWS up from 1000). If look at a partial dump of the leaf block now:

row#1[1973] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 1; (1):  00
row#2[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

We notice there is indeed a new index entry (row#1) which only has a rowid range (col 1 and col 2) of just 8 rows (between 02 02 d6 2e 00 00 and 02 02 d6 2e 00 07). So rather than somehow modify the existing index entry, Oracle has created a new index entry with a rowid range of just 8 consecutive rowids.

This means we should now be able to add an additional 7 rows with a CODE value of 1, which providing they can all fit within the same table block, will be associated with this allocated rowid range.

SQL> begin
2  for i in 1..7 loop
3  insert into bowie values (1, 1, 'ZIGGY');
4  commit;
5  end loop;
6  end;
7  /

PL/SQL procedure successfully completed.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

Indeed, after inserting these additional rows, no new index entries were required as all the new rows fall within the rowid range of the new index entry. But add one more row …

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1002

Indeed, we now have yet another additional index entry (1002 NUM_ROWS up from 1001).  If we now look at a partial block dump of the index leaf block:

row#1[1819] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 2; (2):  c8 ff
row#2[1798] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 08
col 2; len 6; (6):  02 02 d6 2e 00 0f
col 3; len 1; (1):  00
row#3[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 

We notice we  have yet another index entry (row#2) covering another narrow 8 row range.

Basically prior to 10g, if a new index value didn’t fit within an existing bitmap index rowid range for the corresponding indexed value, a new index entry is added and all the overheads associated with it, with a default range of just 8 rowids. This is an extremely costly process as the new index entry not only has to store the index value itself again, but additionally 2 rowids and corresponding length bytes and the necessary bitmap string column as well. Additionally, in all likelihood, the rows in the specific rowid range are quite likely to contain differing values in the index columns and so will in turn require new index entries. I’ve just inserted a whole bunch of CODE values of 1 in the above example. In a randomly inserted column value, this might be quite a rare event as each subsequent indexed value is quite likely to differ for each new consecutive row.

If we create a table and index first and then populate the table with the 1 million rows, the number of index entries and size of index itself becomes huge …

SQL> create table radiohead (id number, code number, name char(5));

Table created.

SQL> create bitmap index radiohead_code_i on radiohead(code);

Index created.

SQL> begin
2  for i in 1..1000 loop
3    for j in 1..1000 loop
4     insert into radiohead values (j, j, 'BOWIE');
5     commit;
6    end loop;
7  end loop;
8  end;
9  /

PL/SQL procedure successfully completed.

SQL> analyze index radiohead_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        2        5347    1000000

Because the clustering is so bad, each new row for a given index value isn’t within an existing 8 rowid window and so each and every row in the table requires its own bitmap index entry. As a result, the Bitmap index is huge at 5,347 leaf blocks and has a full1M NUM_ROWS. A partital block dump shows just how inefficient this index is:

row#0[4273] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8a 00 00
col 2; len 6; (6):  02 02 d8 8a 00 07
col 3; len 1; (1):  00
row#1[4294] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8c 00 e0
col 2; len 6; (6):  02 02 d8 8c 00 e7
col 3; len 1; (1):  01
row#2[4315] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8f 00 40
col 2; len 6; (6):  02 02 d8 8f 00 47
col 3; len 1; (1):  04
row#3[4336] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 91 01 20
col 2; len 6; (6):  02 02 d8 91 01 27
col 3; len 1; (1):  05
row#4[4357] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 94 00 88
col 2; len 6; (6):  02 02 d8 94 00 8f
col 3; len 1; (1):  00

We can see that each and every indexed value (col 1) in this partial leaf block dump is the same (hex c1 02) with each only having a rowid range (col 1 and col 2) that covers the bare minimum 8 row range. It’s about as inefficient a Bitmap index as you can get with 1000 distinct values in a 1M row table.

If we now however rebuild the Bitmap index:

SQL> alter index radiohead_code_i rebuild compute statistics;

Index altered.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        1         500       1000

We get back to our nice, efficient Bitmap Index structure with just the bare minimum 1000 index entries, all fitting into a fraction of the index leaf blocks (just 500 down from 5,347).

Fortunately, Oracle has got a lot more cleverer since 10g with regard to how it maintains its Bitmap Index structures during DML as I’ll cover soon in Part II.

Concatenated Bitmap Indexes Part I (Two Of Us) May 6, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Clustering Factor, Concatenated Indexes, Oracle Indexes.
6 comments

Although Bitmap Indexes are commonly created on one column, you can create multi-column, concatenated Bitmap indexes as well.
 
Many of the same issues and factors in deciding to create a single, multi-column index vs. several, single column indexes apply to Bitmap indexes as they do with B-Tree indexes, although there are a number of key differences to consider as well.
 
The first difference to note is that a bitmap index doesn’t have as many index entries as there are rows in the table (with not null values), as with B-Tree indexes. A Bitmap index can potentially just have only as many index entries as there are distinct values for the indexed columns. This is one of the main reasons why Bitmap indexes can be considerably smaller than equivalent B-Tree indexes. A newly created Bitmap index only needs to have multiple index entries for the same column value if the associated index entry is greater than 1/2 a block size. If an index entry were to be larger than 1/2 a block size, Oracle creates another Bitmap index “piece” for the same indexed value, with a bitmap column covering a different range of rowids. (Note: additional Bitmap index pieces can be created based on subsequent DML, especially in earlier versions of Oracle. To be discussed at a later point in time).
 
Another thing to note regarding a concatenated Bitmap index is that the potential number of index entries is a product of distinct combinations of data of the indexed columns. For example, if two columns have 100 distinct values each, then as separate Bitmap indexes, they potentially may have as few as 100 index entries each. However, when combined as a concatenated Bitmap index, there may be a minimum of just 100 index entries only if there are just 100 different combinations of data between the columns (ie. there is a 1 to 1 relationship between column values). If there are however say 100 x 100 = 10,000 maximum combinations of data, there will be 10,000 index entries as a minimum in the associated concatenated Bitmap index.
 
A key point though is that if there are many more combinations of data when stored in a concatenated index, the occurrence of each distinct value will be far less within the table, meaning there will be many more “0” (not true) values in the corresponding bitmap column for each index entry and so can be compressed more effectively and likely use substantially less space than a corresponding index entry in a single column Bitmap index.
 
A simple little example to illustrate. To start, I’m going to create a 1M row table that has an ID and a CODE Colum, each with 100 distinct values. 

In this first example, there’s an implicit 1 to 1 relationship between these 2 columns (eg. they always have the same corresponding values) such that there’s also only 100 distinct combinations of ID and CODE. Additionally, the values are distributed evenly throughout the table so the effective clustering of the data in relation to the index is awful.
  

SQL> create table bowie as select mod(rownum,100)+1 id, mod(rownum,100)+1 code,'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_1_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_2_i on bowie(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_3_i on bowie(id,code) pctfree 0;
 
Index created.

  

If we look at the size and characteristics of these indexes we notice a couple of interesting points:  

 
SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE_1_I', 'BOWIE_2_I', 'BOWIE_3_I');

INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE_1_I          200        400
BOWIE_2_I          200        400
BOWIE_3_I          200        400

   
Firstly, even though there are only 100 distinct values for each indexed column or columns, there are actually 400 index entries in these indexes. This means there are on average 4 Bitmap index pieces for each distinct indexed value. Oracle can’t fit the associated index entry for a specific value within 1/2 a block (in this example, the block size is 8k). In fact, it takes approximately two 8K index leaf blocks it fit all the index data for a specific value and it therefore requires 4 Bitmap index pieces per indexed value.
 
If we look at a partial block dump of a leaf block from say the BOWIE_1_I:
 

  
row#0[4089] flag: ------, lock: 0, len=3947
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b5 09 00 60
col 2; len 6; (6):  01 c2 b8 f3 00 3f
col 3; len 3926; (3926):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
... 
   
row#1[142] flag: ------, lock: 0, len=3946
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b8 f3 00 a0
col 2; len 6; (6):  02 03 62 5d 00 1f
col 3; len 3925; (3925):
 01 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7
 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c5 18 61 5d 61 c3 1b 5f 63 5f c1
 ...
 

 

We notice there are actually 2 index entries within the block. Each index entry is for the same indexed value (both col 0 have identical values) but because the associated bitmap column (col 3) is so large (3926 and 3925 byes respectively), we need 4 index entries on average to store the bitmap data for each specific indexed value in order for each piece to not exceed the 1/2 block size limit.
 
Remember, for 100 distinct values in a 1M row table, that’s approximately 10,000 occurrences for each distinct value that need to somehow be mapped within the index. Oracle needs 4 index entries per value to fit the necessary bitmap information with the index such that no single index entry exceeds the 1/2 block size limit.  

The next thing to note is that the size of the concatenated Bitmap index is actually about the same size of each of the individual single column Bitmap index (200 leaf blocks). Therefore, the overall storage required to store the two columns in one Bitmap is only half of that required to store the columns as separate Bitmap indexes.
 
If we look at a partial dump of the concatenated index:

    
row#0[4091] flag: ------, lock: 0, len=3945
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 b5 09 00 60
col 3; len 6; (6):  01 c2 b8 f2 00 57
col 4; len 3921; (3921):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
...
 

 

We notice the overall length of an index entry is practically identical to those of the single column index. The storage required to store the additional indexed column (col 1) within the index entry is just 3 byes in the above example, 2 bytes for the numeric index value and 1 byte for its length. Considering the length of the bitmap column (col 4) is in the order of 3920 bytes for each index entry, an additional 3 bytes here or there is trivial and so doesn’t impact the overall size of the Bitmap index.  

OK, let’s look at a slightly different example. The table is again 1M rows with the overall data being similar. However, I’m making a few subtle differences. Firstly, the data for the ID is actually perfectly clustered and is ordered in exactly the same manner as the index. Additionally, the distribution of data between the columns is such that there are now 100 x 100 = 10,000 distinct combinations of ID and CODE values.
 
  

 
SQL> create table bowie2 (id number, code number, name char(5));
 
Table created.
 
SQL> begin
  2  for i in 1..100 loop
  3    for x in 1..100 loop
  4      for y in 1..100 loop
  5        insert into bowie2 values (x, y, 'BOWIE');
  6      end loop;
  7    end loop;
  8  end loop;
  9  commit;
 10  end;
 11  /
PL/SQL procedure successfully completed.
 

SQL> create bitmap index bowie2_1_i on bowie2(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_2_i on bowie2(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_3_i on bowie2(id,code) pctfree 0;
 
Index created.

        

If we look at the size and characteristics of the these Bitmap indexes:
     

SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE2_1_I', 'BOWIE2_2_I', 'BOWIE2_3_I');
 
INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE2_1_I          25        100
BOWIE2_2_I         200        400
BOWIE2_3_I         417      10000

    
   
We see a number of key differences when compared to the Bitmap indexes in the first example. Firstly, the Bitmap index for the ID column is tiny, just 25 leaf blocks compared to the 200 leaf blocks required previously. Additionally, there are only 100 index entries, rather than the 400 previous index entries. This is due to the fact the data is perfectly clustered within the table and as such, all the “1”s (true) are all grouped together and all the “0”s (false) are likewise grouped together and can be compressed very efficiently. The overall size of the bitmap has now reduced such that it can all fit comfortably within 1/2 a block and so just the 1 index entry is necessary for each indexed value.
 
If we look at a partial block dump of the ID Bitmap index:

  
   

 
row#0[6218] flag: ------, lock: 0, len=1818
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 c6 16 00 d8
col 2; len 6; (6):  02 03 78 18 01 3f
col 3; len 1797; (1797):
 cf f0 ff ff ff ff ff ff ff cc ff ff ff ff ff fc de 10 80 ff ff ff 07 ff 21
 ff ff ff ff ff ff ff ff c8 ff ff de 10 80 ff ff ff ff ff ff ff cd ff ff ff
 ff ff 07 ff de 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 07 f8 21 07 ff de
 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 3f ff fd 19 c0 ff ff ff ff ff ff
 ff cd ff ff ff ff ff 03 ff de 10 fe ff ff ff ff ff ff ff cc ff ff ff ff 1f
...

 
  

We see that the bitmap column (col 3) is now only 1797 bytes and the overall index entry is 1818 bytes, which comfortably fits within 1/2 an 8K block.
  

The Bitmap index for the CODE remains unchanged because its values are still poorly clustered and distributed throughout the entire table.
 
The concatenated Bitmap index is now significantly larger than is was previously (417 leaf blocks, up from 200 leaf blocks), primarily because we now have 10,000 distinct values to deal with rather than just 100. However, as the occurrences of each of these 10,000 distinct values is so much rarer (only 100 occurrences per distinct combination of values), the associated bitmap column is going to be relatively small and well compressed for each Bitmap index entry.
 
If we look at a complete index entry from a partial block dump of the concatenated Bitmap index:

  
  

 
row#0[7710] flag: ------, lock: 0, len=326
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 c6 16 00 d8
col 3; len 6; (6):  02 03 78 18 00 d7
col 4; len 302; (302):
 04 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c4 d8 10 c4 80 11
 c7 d8 10 c3 f8 19 c3 80 11 c6 d8 10 c1 d9 10 c1 80 11 c5 f7 19 c0 d9 10 c0
 80 11 c3 d8 10 c3 80 11 c2 d0 19 c2 80 11 c5 d8 10 c5 80 11 c0 d9 10 c4 f7
 19 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c1 80 11 c4 d8 10
 c7 d8 10 c0 87 09 c3 d8 10 c3 80 11 c6 d8 10 c6 80 11 c1 d9 10 c5 de 08 c5
 80 11 c0 d9 10 c0 80 11 c3 d8 10 c3 80 11 c0 a8 f4 ec bb 01 c3 d8 10 c6 d8
 10 c6 80 11 c1 d9 10 c1 80 11 c5 de e4 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80
 11 c6 d8 10 c7 86 09 c2 d9 10 c2 80 11 c5 d8 10 c0 d9 10 c0 80 11 c4 de 08
 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c5 d8 10 c6 86 09 c1 d9 10 c1 80 11 c4
 d8 10 c4 80 11 c7 d8 10 c0 87 09 c3 d8 10 c6 d8 10 c6 80 11 c1 d9 10 c1 80
 11 c5 de 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80 11 c6 d8 10 c7 86 09 c2 d9 10
 c2 80 11 c5 d8 10 c0 d9 10 c4 f7 19 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c6
 f7 19

 
  

We notice the index entries are relatively small at just 326 bytes even though we have 2 indexed columns (col 0 and col 1). The size of the bitmap column (col 4) is just 302 bytes, only a fraction of that of the single column Bitmap indexes. With so few 1s (true) to contend with, the resultant bitmap is far more efficiently compressed than with the single Bitmap column indexes as it has far more 0s (false) than can be effectively compressed.

A concatenated Bitmap index can potentially use less or more space than corresponding single column indexes, it depends on the number of index entries that are derived and the distribution of the data with the table.

I’ll post Part II in the next few days.

Index Block Dumps and Index Tree Dumps Part I: (Knock On Wood) February 8, 2010

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes, Tree Dumps.
15 comments

I thought before I jump into a topic that requires looking at a number of index block dumps, it might be worth briefly recapping how one goes about dumping index blocks in Oracle.
 
A block dump is simply a formatted representation of the contents of a particular Oracle database block.  Although I’ll be focusing specifically on index related blocks, any Oracle data block type can potentially be dumped and investigated.
 
The basic command to dump a specific block is:
 
ALTER SYSTEM DUMP DATAFILE 5 BLOCK 42;
 
where the block 42 in datafile 5 is dumped.
 
To dump a number of consecutive blocks with the one command you can also:
 
ALTER SYSTEM DUMP DATAFILE 5 BLOCK MIN 42 BLOCK MAX 50;
 
The resultant representation of the dumped block(s) are written to a trace file in the user_dump_dest directory.
 
Although these commands are not in the official Oracle documentation (the last time I had a real good look, it was only briefly mentioned in the Database Vault Administration Guide) and are not officially supported, there are enough references in Metalink/MOS and various writings for these commands to be widely known and used. I’ve been dumping the contents of Oracle blocks since the mid 1990’s and although they can sometimes take some time to decipher, I find them a vital source of information on determining how Oracle actually works under the covers.
 
From an index perspective, the question is how can one figure which specific blocks to dump for a given index. There are a couple of useful little tips. 

The first thing to point out with an index is that the critical Root Block of an index is always the block after the index segment header. This is always the case regardless of the database version, platform or type of tablespace. I’ve discussed how the index root block is always the block after the index segment header in this earlier post:
 
Therefore, to start exploring a specific index, we first find the root block details after the index segment header:

 
SQL> SELECT header_file, header_block+1 FROM dba_segments WHERE segment_name='BOWIE_I';
 
HEADER_FILE HEADER_BLOCK+1
----------- --------------
          7         219274

 
The following command will then dump the associated index root block:

 
SQL> ALTER SYSTEM DUMP DATAFILE 7 BLOCK 219274;
 
System altered.

As we’ll see in a later post, from the dump of the index root block, we can then find what other index blocks the root block points to and references.
 
However, another useful method of determining which index blocks might be worth dumping is to do a “treedump” of an index.
 
One first needs to find the object_id of the index in question:


 
SQL> SELECT object_id FROM dba_objects WHERE object_name = 'BOWIE_I';
 
 OBJECT_ID
----------
    106315

 
To then do a treedump of the index:


 
SQL> ALTER SESSION SET EVENTS 'immediate trace name treedump level 106315';
 
Session altered.

 
where level 106315 is the object_id of the index.
 

A partial listing from a treedump follows:
 
branch: 0x1c3588a 29579402 (0: nrow: 222, level: 1)
   leaf: 0x1c3588b 29579403 (-1: nrow: 485 rrow: 485)
   leaf: 0x1c3588c 29579404 (0: nrow: 479 rrow: 479)
   leaf: 0x1c3588d 29579405 (1: nrow: 479 rrow: 479)
   leaf: 0x1c3588e 29579406 (2: nrow: 479 rrow: 479)
   leaf: 0x1c3588f 29579407 (3: nrow: 479 rrow: 479)

A treedump simply lists each index block in the logical order of the index structure. Starting always with the index root block at the top, we notice that it’s simply listed as a branch (albeit a rather important one). The characters after the branch keyword represent a hex (0x1c3588a) and decimal (29579402) version of the Relative Block Address (RBA), which is used by Oracle to find the actual physical location of the block.  As there’s only ever the one root block, it starts from position 0, the nrow: 222 denotes the root block points to 222 distinct index blocks in the level below it and level 1 denotes this is a level 1 index (height 2) so the blocks below the root block must all be leaf blocks (there are no intermediate branch levels in this case).

The first leaf block listed is the first block being referenced within the parent root block and must therefore be the  “left-most” leaf block in the index structure. It has a RBA of hex (0x1c3588b) decimal(29579403), the -1 denotes it’s the first leaf block (as the counter starts at -1), the nrow: 485 denotes the leaf block has 485 index entries and the rrow: 485 denotes that 485 of the index entries are non-deleted entries (meaning there are no deleted index entries in this specific leaf block).

The next leaf block in the treedump corresponds to the second block (number 0) referenced in the root block and is the second left-hand most leaf block in the index structure, followed by its specific details. The third leaf block (number 1) in the treedump is the third leaf block in the index structure and so on for all 222 leaf blocks in the index (the last leaf block numbered 220).

The RBA of any of these blocks in the treedump can be then used to determine which block of interest to block dump. The DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE and DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK functions can be used to covert the RBA into the corresponding DATAFILE ID and BLOCK ID in which to dump the block.

For example, to determine the DATAFILE and BLOCK of the third leaf block in the index:


SQL> SELECT DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(29579405),
  2         DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(29579405)
  3  FROM dual;

DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(29579405)
----------------------------------------------
DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(29579405)
-----------------------------------------------
                                             7
                                         219277

So now we can confirm the index block of interest here is specifically located at datafile 7, block 219277.

OK, we now have the necessary basics to start block dumping a few index blocks and having a look at what they might show us.

Follow

Get every new post delivered to your Inbox.

Join 1,861 other followers