jump to navigation

Indexing Foreign Key Constraints With Bitmap Indexes (Locked Out) April 17, 2014

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Foreign Keys, Index Internals, Oracle Indexes.
2 comments

Franck Pachot made a very valid comment in my previous entry on Indexing Foreign Keys (FK) that the use of a Bitmap Index on the FK columns does not avoid the table locks associated with deleting rows from the parent table. Thought I might discuss why this is the case and why only a B-Tree index does the trick.

Let’s first setup some very simple Parent-Child tables:

SQL> create table bowie_dad (id number, dad_name varchar2(30));

Table created.

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> insert into bowie_dad values (2, 'ZIGGY STARDUST');

1 row created.

SQL> insert into bowie_dad values (3, 'MAJOR TOM');

1 row created.

SQL> insert into bowie_dad values (4, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

SQL> create table bowie_kid (id number, kid_name varchar2(30), dad_id number);

Table created.

SQL> insert into bowie_kid select rownum, 'ALADDIN SANE', mod(rownum,3)+2 from dual connect by level >=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> alter table bowie_dad add primary key(id);

Table altered.

SQL> alter table bowie_kid add constraint bowie_kid_fk foreign key(dad_id) references bowie_dad(id);

Table altered.

OK, so we have a small parent table (BOWIE_DAD) and a much larger child table (BOWIE_KID) with all the necessary constraints in place. Note we don’t actually have a child row with a  FK DAD_ID = 1. So we can potentially delete this row from the BOWIE_DAD table (where ID = 1).

Let’s begin by creating a B-Tree  index on the FK column (DAD_ID) and have a look a partial block dump of the first leaf block in the index:

SQL> create index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0×00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0xffff.000.00000000  0×00000000.0000.00  C—    0  scn 0×0000.0035f861
Leaf block dump
===============
header address 360809060=0×15818264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0×426
kdxcofeo 1880=0×758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06
…..

 

We’ll compare future block dumps with this one but for now just note that the first index entry has a value of (hex) C1 03, which corresponds to the minimum value for DAD_ID = 2 we currently have in this table/index.

If we insert a new child record in one session (but not yet commit);

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we can delete (but not yet commit) the unwanted parent row without any locking implications thanks to this index on the FK column:

SQL> delete bowie_dad where id = 1;

1 row deleted.

In a third session, we can insert another child record again with no locking implications, providing we don’t attempt to use the parent value the second session is in the process of deleting:

SQL> insert into bowie_kid values (1000002, 'LOW', 3);

1 row created.

But if we do try to insert a new child row with a FK value for which the parent is in the process of being deleted:

SQL> insert into bowie_kid values (1000003, 'HEROES', 1);

The statement hangs and it will do so until the transaction deleting the parent record commits (in which case it will receive an ORA-02291 integrity constraint error) or the transaction rolls back (in which case the insert will succeed).

If we take a fresh dump of the first leaf block (which must contain the associated index entry as it’s the minimum value now in the table):

 Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0×00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0×0008.004.00000b8a  0×01431602.01c5.14  —-    1  fsc 0×0000.00000000
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 514
kdxcofbo 1064=0×428
kdxcofeo 1868=0x74c
kdxcoavs 804
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[1868] flag: ——-, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 80 7f 38 00 00
row#1[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#2[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03

 

We notice we indeed do have a new index entry (highlighted above), with all the associated locking information in ITL slot 2 for the new row in which the session is locked. So the key point here is that the index is indeed updated and Oracle can proceed or not depending on what happens with the transaction on the parent table. The overhead of this new index entry is minimal and locking can be easily policed and restricted to just the index entries with this specific value (hex) C1 02 which corresponds to DAD_ID = 1.

If we do indeed proceed with the delete on the parent table:

SQL> commit;

Commit complete.

 

The session attempting to insert the now deleted parent FK value indeed fails:

 

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);
insert into bowie_kid values (1000002, 'HEROES', 1)
*
ERROR at line 1:
ORA-02291: integrity constraint (BOWIE.BOWIE_KID_FK) violated - parent key not
found

 

And we notice with a fresh block dump that the index entry has been removed by the now unlocked session:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0×00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0xffff.000.00000000  0×00000000.0000.00  C—    0  scn 0×0000.0035f861
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0×426
kdxcofeo 1880=0×758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06

Everything is back to the way it was previously.

 

OK, let’s now re-insert the parent row, drop the FK index and replace it with a Bitmap Index instead:

 

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> commit;

Commit complete.

SQL> drop index bowie_kid_fk_i;

Index dropped.

SQL> create bitmap index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

If we take a look at a partial block dump of the first leaf block of this Bitmap Index:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0×00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0xffff.000.00000000  0×00000000.0000.00  C—    0  scn 0×0000.003602fc
Leaf block dump
===============
header address 360809060=0×15818264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0×28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

….

 

We notice the first key difference here in that these Bitmap Index entries are potentially HUGE, with just the 2 index entries in this block. The other thing to note is the combination of Bitmap indexes and DMLs can result in locking hell because if an index entry needs to be modified (resulting in a change in the compressed bitmap string), all rows between the rowid ranges specified within the Bitmap Index entry are effectively locked. So Bitmap Indexes introduce severe locking issues, regardless of the Parent/Child update issue highlighted above.

If we insert a child row in one session:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

And in another session insert another row with the same FK value:

SQL> insert into bowie_kid values (1000002, 'HEROES', 4);

The session hangs until the transaction in the first session completes because of the locking implications introduced with the Bitmap Index.

 

Therefore, with a Bitmap Index in place, the last of our worries will be locking issues associated with deleting a parent row. After rolling back the above, we attempt the following. In one session, we insert a child record:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we delete the unwanted parent row:

SQL> delete bowie_dad where id = 1;

and it hangs. The Bitmap Index is not effective in preventing this lock as it was with the B-Tree Index.

In a third session, we attempt to insert a child row with the soon to be deleted parent key:

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);

and it hangs as well.

A fresh partial block dump of the first Bitmap Index leaf block:

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0×00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0xffff.000.00000000  0×00000000.0000.00  C—    0  scn 0×0000.003602fc
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0×28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

…..

 

Shows that the index has remained unchanged. No attempt was made by Oracle to insert the index entry as such a new Bitmap Index entry would likely generate too much overheads and not appreciably reduce the locking implications of these DML statements with these Bitmap Indexes in place anyways.

One advantage of the Bitmap index is that at least Oracle doesn’t have to perform a FTS on the (potentially huge) child table when checking for the existence of any associated child FK values. Oracle can quickly use the index to determine whether the parent delete can proceed or not. If we roll everything back and just attempt to delete a parent row:

SQL> delete bowie_dad where id = 1;

1 row deleted.

       
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
          3  consistent gets
          0  physical reads
        676  redo size
        862  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice at just 3 consistent gets, the potentially expensive FTS on the child table has been avoided. Drop the Bitmap index and the FTS must be performed to ensure no current FK values would violate the constraint when the parent row is deleted:

SQL> drop index bowie_kid_fk_i;

Index dropped.

   
SQL> delete bowie_dad where id = 1;

1 row deleted.

    
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
       3629  consistent gets
          0  physical reads
        676  redo size
        863  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice without the Bitmap Index in place, we are now performing many more (3629) consistent gets due to the necessary FTS.

So using a Bitmap Index to police a FK constraint doesn’t reduce the locking implications associated with deleting parent rows (with Bitmap indexes, we have locking hell regardless if there’s much DML) but it does at least reduce the overheads of checking the associated child table.

Indexed Organized Tables – An Introduction to IOT Secondary Indexes (A Second Face) March 19, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Secondary Indexes.
14 comments

Man, its been ages since I had free time to update the blog, what with birthday parties to organise, Roger Water concerts to attend and Radiohead concerts in the planning !! OK, time to take an initial look at Secondary Indexes for Index Organized Tables (IOTs).

If the IOT needs to be accessed via the Primary Key (PK) column(s), then no problem, the IOT structure must have a PK defined and the logical structure of the IOT ensures that data within the IOT is ordered based on the PK. Therefore, the IOT can be navigated like any conventional PK and the necessary data can be efficiently accessed.

But what if we want to access the data efficiently via Non-PK columns or without specify the leading column of the PK ? Can we create secondary indexes on a IOT ?

When IOTs were first introduced way back in Oracle8, secondary indexes weren’t supported (they came later in 8i). That’s likely due to the fact Oracle had to resolve a tricky issue in relation to indexing an IOT structure, that being what to do when indexing rows that potentially move around all the time ?

With a conventional Heap table, once a row is inserted into the table, it doesn’t generally subsequently move. There are relatively few examples of when this occurs, for example updating the partitioned column of a row such that it needs to be stored in another partition. This is recognised as a rather expensive thing to do as not only do at least two blocks need to be accessed and modified but it also requires associated indexes to be updated as well. As such, it generally requires explicitly allowing such activities to occur (by enabling row movement and the such). Note, when rows migrate to another block due to an increase in row size, indexes are not impacted and still reference the original block and the remaining stub of the row which points to the new block/location of the row.

But with IOTs, the story can be very different. When a 50-50 index block split occurs, roughly half the rows in the leaf block move to a new block. A relatively expensive operation would be even more expensive if  Oracle had to also update the index entries of all secondary indexes that referenced all these moved rows. Although rare with Heap tables, rows moving to new locations could be relatively common in an IOT due to associated 50-50 block split operations.

To deal with the difficulties of frequently moving rows within an IOT, Oracle created the IOT Secondary Index structure. It has three main components:

  • The indexed column values
  • The PK columns of the associated IOT
  • A “guess” that points to the physical location of the rows within the IOT, initially at the time the index is created

So the IOT Secondary Index is used in the following fashion. During an index scan, Oracle attempts to use the “guess” to access the block that was the last known physical location of the  row within the IOT. If it finds the required row in the IOT, great. The index performs in a similar manner to using a rowid with a conventional secondary index. However, if the required row is nowhere to be seen within the referenced block, Oracle tries again, this time using the PK value contained with the IOT Secondary Index to perform a Unique Scan of the IOT. This is a little more expensive to perform as it requires navigating down the branch structures of the IOT, but is at least guaranteed to find the row this time in its current location.

So in the best case scenario, the index performs similar to that of a normal secondary index. In the worst case scenario where the row has moved, the index is forced to perform an additional Unique Scan of the IOT using the PK but at least this has the potential to be much more efficient that a Fast Full Scan of the IOT in order to find the necessary row.

The key point to note here is that the secondary index is  not updated when a block split on the parent IOT occurs. The “guess” via the physical pointer reference simply becomes stale and the PK which is also stored within the secondary index is used as a backup method of accessing the required row.

If we start with a traditionally simple little demo, let’s first create and populate an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sales number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> begin
  2  for i in 1..5000 loop
  3    for c in 1..100 loop
  4      insert into album_sales_iot values (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5    end loop;
  6  end loop;
  7  commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we now run a query based on the non-PK TOTAL_SALES column:

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1789589470

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     1 |    18 |   425   (1)| 00:00:06 |
|*  1 |  INDEX FAST FULL SCAN| ALBUM_SALES_IOT_PK |     1 |    18 |   425   (1)| 00:00:06 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1586  consistent gets
          0  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We see that Oracle has no real choice (the PK is of no direct use here) but to perform an expensive FAST FULL INDEX SCAN, even though it correctly knows relatively few rows are to be retrieved.

If we create a secondary index on the IOT however:

SQL> create index album_sales_IOT_total_sales_i on album_sales_iot(total_sales);

Index created.

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |     1 |    18 |4   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |     1 |    18 |4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |     1 |       |3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES"=2000)
   2 - access("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          5  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

 
We notice that the index is used as expected and the number of consistent gets has dropped significantly. Notice also that the IOT is accessed subsequently not via Index ROWIDs but by a INDEX UNIQUE SCAN via the IOT PK. More on this later …

If we look at a partial block dump of an index entry within the IOT Secondary index:

row#0[8015] flag: K—–, lock: 0, len=21
col 0; len 3; (3):  c2 1f 28
col 1; len 3; (3):  c2 15 37
col 2; len 2; (2):  c1 1b
tl: 8 fb: –H-FL– lb: 0×0  cc: 1
col  0: [ 4]  01 01 41 da

col 0 represents the indexed value (TOTAL_SALES)

col 1 and col 2 represent the PK columns (ALBUM_ID and COUNTRY_ID)

Following the 3 byte table header overhead required for the “guess”, we have the second col 0, which represents the 4 byte  “guess” to the last known physical location of the row.

Much more to follow shortly …

Index Organized Tables – PCTTHRESHOLD (The Wedding Song) February 8, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, PCTTHRESHOLD.
7 comments

I’ve recently returned from a great two-week holiday, firstly at the Australian Open Tennis (what a final !!) and then up at the Gold Coast in not quite so sunny Queensland. Time now to get back to my blog :)

In my previous IOT examples, we had a very large column called Description which we didn’t really want to store within the Index Organized Table as it would cause the resultant index structure to get very inflated and inefficient. All the rows contained a very large Description value so it never made sense to include the Description column within the IOT.

In the following example, the Description column has values of varying lengths. Some of the values remain very large, however many of the Description values are quite moderate in size and wouldn’t be problematic to store within the IOT. Indeed, it would be quite beneficial as it wouldn’t be necessary to perform additional I/Os to the Overflow segment in cases where the Description was quite small in size and required by the application.

PCTTHRESHOLD gives us more flexibility in what is actually stored within the IOT index structure by storing  the non-PK columns up to the INCLUDING clause within the IOT but only if the row length to be stored inside the IOT is below a specified percentage threshold of the block size. So with a PCTTHRESHOLD of (say) 5, the non-PK columns up to the INCLUDING clause will be included within the IOT but only if the resultant row size is less than 5% of the blocksize. If a row size were to be greater than the specified percentage threshold of the block size, then any non-PK columns that would violate this length threshold would not be included within the IOT and stored instead within the Overflow segment.

In the following example, every other row is actually quite small and we would want these rows to have the Description value stored within the IOT. Therefore, we have modified the IOT table definition to include the Description column if the resultant row is less than 5% of the (8K in this case) blocksize:

SQL> CREATE TABLE album_sales_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING description PCTTHRESHOLD 5 OVERFLOW TABLESPACE bowie2;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         if mod(c,2) = 1 then
  5              INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6         else INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A short description');
  7         end if;
  8      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we look at the size of the resultant IOT:

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      2176          3          5       2052

The IOT is only of a moderate size, with 5 branch blocks and 2,052 leaf blocks.

If we look at the size of the Overflow segment:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_IOT';

 OBJECT_ID
----------
     74209

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74209';

TABLE_NAME         IOT_NAME         IOT_TYPE         BLOCKS
------------------ ---------------- ------------ ----------
SYS_IOT_OVER_74209 ALBUM_SALES_IOT  IOT_OVERFLOW      35715

We see that the vast majority of the storage is still allocated to the Overflow segment, at 35,715 blocks in size.

If look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 461972060=0x1b89225c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 1
kdxcoopc 0×97: opcode=7: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 251
kdxcofbo 538=0x21a
kdxcofeo 561=0×231
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21053971=0×1414213
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[561] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0×0  cc: 1
nrid:  0×01811901.0
col  0: [ 5]  c4 04 57 1d 44
row#1[584] flag: K—S-, lock: 2, len=36
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 28 fb: –H-FL– lb: 0×0  cc: 2
col  0: [ 4]  c3 1d 2a 2e
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e
row#2[620] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 15 fb: –H-F— lb: 0×0  cc: 1
nrid:  0×01811901.1
col  0: [ 5]  c4 04 22 2d 07
row#3[643] flag: K—S-, lock: 2, len=37
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 05
tl: 29 fb: –H-FL– lb: 0×0  cc: 2
col  0: [ 5]  c4 04 36 17 52
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e

We notice the leaf block contains 251 row entries. Half the rows with a Description of 19 bytes have the Description value stored within the IOT leaf block, while the other half of rows with the larger Description values contain a nrid that refers to the corresponding Description within the Overflow segment.

If we analyze the table:

SQL> ANALYZE TABLE album_sales_iot COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_IOT';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_IOT                    500000     250000

We notice that only half the rows are now “chained rows”.

If we run a query that only references the rows with a small Description that are stored within the IOT structure:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=0;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=0)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       2211  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

The query at 7 consistent gets is relatively efficient as all the required data can be found within the IOT.

If however we run a query that references the larger Description rows:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=1;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4147  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

We see that it’s less efficient at 18 consistent gets as it needs to obviously access a larger volume of data and requires additional I/Os to access the corresponding Overflow segment.

So, with a combination of the INCLUDING and PCTTHRESHOLD clauses, one can control what data is and is not included within the IOT index structure.

Index Organized Tables – Overflow Segment Part II (The Loneliest Guy) January 18, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, Primary Key.
3 comments

In my previous post on Index Organized Tables (IOT), I introduced the concept of the IOT Overflow Segment, where we can store columns that we may not want to include within the actual IOT index structure. Before we move on, I just wanted to cover off a few additional points that could be a trap for the unwary …

In my experience, the Primary Key (PK) columns of a table are typically the first columns defined in the table. This has certainly been standard practice in most environments I’ve seen. This makes sense in that the PK are in many ways the “key” column(s) in the table and are identified as such by having the prestigious honour of being the first column(s) defined within the table. Most people look at and intuitively expect the first columns in the table to be the PK columns and for that reason alone, it’s probably good practice to consistently define the PK columns in this manner.

However, there’s also a good argument why having the PK columns as the leading columns in the table is precisely the wrong location for them. As many tables are “primarily” accessed via the PK columns and so accessed directly through the associated PK index, the application already knows the PK values of the row in question. Therefore, it’s somewhat inefficient to then have the PK columns the first columns defined in the table as these generally have to be read through and ignored before we get to the non-PK columns that are of direct interest and the reason for visiting the table block in the first place. By placing the PK columns after the most accessed non-PK columns, we avoid having to unnecessarily read through these PK columns again when accessing the table via the PK index.

I personally prefer to define the PK columns first in a standardised manner, with the advantages of avoiding possible confusion and misunderstandings outweighing any possible performance improvements. However, I can at least see the logic and merit of not following this standard with Heap tables.

The same however can not really be said for IOTs and I would strongly recommend defining the PK columns first in an IOT …

I’m going to run the same demo as I did in my last post on the Overflow Segment, but with one subtle change. I’m not going to define the two PK columns first but rather have them defined after my heavily accessed non-PK column:

SQL> CREATE TABLE album_sales_iot(total_sales NUMBER, album_id NUMBER, country_id NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING country_id OVERFLOW TABLESPACE bowie2;

Table created.

So in this example, my leading column is the non-PK total_sales column, followed then by the two PK columns. I still only want these 3 columns to be included in the actual IOT structure, so I have my INCLUDING clause only including columns up to the country_id column. I want the remaining large description column to be stored separately in an Overflow segment.

OK, let’s populate this table with the same data we used previously:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_iot VALUES(ceil(dbms_random.value(1,5000000)), i, c, 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we describe this table, we get the expected listing:


SQL> desc album_sales_iot
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 TOTAL_SALES                                        NUMBER
 ALBUM_ID                                  NOT NULL NUMBER
 COUNTRY_ID                                NOT NULL NUMBER
 DESCRIPTION                                        VARCHAR2(1000)

With the columns listed in the order as we defined them in the table.

If we query the column details from dba_tab_columns:

SQL> select column_id, column_name from dba_tab_columns where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID COLUMN_NAME
---------- ------------------------------
         1 TOTAL_SALES
         2 ALBUM_ID
         3 COUNTRY_ID
         4 DESCRIPTION

We again find the column order is as we defined them in the table.

When we run the same query we ran last time that returned the data with 5 consistent gets:

SQL> set arraysize 100
SQL> select album_id, country_id, total_sales from album_sales_iot where album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1300 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
       2394  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice that performance is a lot worse, with 20 consistent gets now required. Obviously, something has changed unexpectedly ???

The first clue on what’s going on here can be found by looking at dba_tab_cols:

SQL> select column_id, segment_column_id, column_name from dba_tab_cols where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME
---------- ----------------- ------------------------------
         1                 3 TOTAL_SALES
         2                 1 ALBUM_ID
         3                 2 COUNTRY_ID
         4                 4 DESCRIPTION

The SEGMENT_COLUMN_ID column determines the order of the columns as they’re actually stored within the segment and we notice the column order is different. The two PK columns are listed first, with the total_sales column only listed in the 3rd position.

As discussed in the IOT Introduction post, the structure of an index entry in an IOT has the PK columns as the leading columns, following by the non-PK columns in the table portion. This is critical because the PK columns determine the location within the IOT table where new rows need to be inserted and the subsequent ordering of the rows in the table. As such, the PK columns must always be the leading columns of an IOT, despite how the table is actually defined at creation time. If the PK columns are not listed first in the table creation DDL statement, Oracle will automatically re-order the columns and place the PK columns first regardless.

This now has consequences on the INCLUDING clause if specified. In the above table creation statement, the INCLUDING clause specified the country_id column. Although defined as the third column, as it’s a PK column, Oracle has automatically re-ordered the columns such that it’s physically listed as the second column within the IOT segment. Unfortunately the INCLUDING clause is only applied after the re-ordering of the columns and as such, the total_sales column which is now logically listed third and now after the country_id column, is not therefore actually included in the IOT index structure as (perhaps) intended.

A partial block dump of an IOT leaf block will confirm his:

Leaf block dump
===============
header address 298590812=0x11cc225c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 421
kdxcofbo 878=0x36e
kdxcofeo 879=0x36f
kdxcoavs 1
kdxlespl 0
kdxlende 0
kdxlenxt 21052811=0x1413d8b
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[879] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 9 fb: –H-F— lb: 0×0  cc: 0
nrid:  0×01811911.0
row#1[896] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 9 fb: –H-F— lb: 0×0  cc: 0
nrid:  0×01811911.1

As we can see, the IOT row entries only consist of the PK columns and the row reference to the corresponding Overflow segment. None of the non-PK columns (such as total_sales) are co-located within the IOT segment as the table column count is 0 (cc: 0).

As a result, additional consistent gets are now required to fetch the total_sales column from the Overflow segment to satisfy the query. This explains why the query is now less efficient than it was previously.

My recommendation with regard to defining IOTs is to simply list the PK columns first. This will ensure the INCLUDING clause is applied as intended and will generally reduce confusion and misunderstandings. Otherwise, the INCLUDING clause needs to specify a Non-PK column to ensure more than just the PK columns are actually included in the IOT segment, the consequences of which may not be obvious to the casual observer of the DDL or describer of the table.

Jonathan Lewis, a great source of information on indexes and Oracle in general has previously discussed this same IOT Trap on his blog.

Index Organized Tables – Overflow Segment (Shadow Man) January 13, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment.
12 comments

In my previous introductory IOT post, I illustrated how an Index Organized Table (IOT) might be worth consideration if most or all columns in a table were to be included within an index.

I’m going to use a slightly different demo this time, replacing one of the columns with a much larger DESCRIPTION column, one which is rarely accessed by the application:

SQL> CREATE TABLE album_sales_details_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Sorry for the unimaginative manner of loading the description field but you get the point :)

OK, let’s have a look at the size of the IOT:

SQL> ANALYZE INDEX album_sales_det_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
     71680          3        116      71429

As expected, the IOT is quite large as it has to accommodate the very large Description field within the IOT index structure. At 71,429 leaf blocks for the 500,000 rows in the table, that’s just 7 rows on average per leaf block.

The application doesn’t generally access the Description column with the following query typical (Note: to make fetching data as efficient as possible, I’ve set the arraysize to 100):

SQL> set arraysize 100
SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 521866300

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    17   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK |   100 |  1300 |    17   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         19  consistent gets
          0  physical reads
          0  redo size
       2387  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The query requires 19 consistent gets to retrieve the 100 rows because even though the data is extremely well clustered, there are very few rows per leaf block.

If we look at a partial block dump of one of these IOT leaf blocks:

Leaf block dump
===============
header address 548373084=0x20af825c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 7
kdxcofbo 50=0×32
kdxcofeo 1011=0x3f3
kdxcoavs 961
kdxlespl 0
kdxlende 0
kdxlenxt 20978307=0x1401a83
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[1011] flag: K—–, lock: 0, len=1004
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 996 fb: –H-FL– lb: 0×0  cc: 2
col  0: [ 5]  c4 04 05 3b 03
col  1: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We can see the leaf block only has 7 rows, with the vast majority of space taken up by the very large Description column.

Considering the Description column is so large and/or that it’s rarely accessed, wouldn’t it be nice if we didn’t have to store this column directly within the IOT index structure itself.

Enter the IOT Overflow segment. The IOT Overflow segment enables us to store in another physical location those columns that we don’t necessarily want to store directly within the IOT index structure. So those columns that might be particularly large (or just the occurrences of those columns when the specific values might be too large to store within the IOT index structure) or those columns that are rarely accessed can be stored elsewhere. Effectively, we’re back to having a separate “table” like structure, but the Overflow segment will only hold those columns that we don’t necessarily want to store within the index structure. Unlike a normal Heap table, in which all columns are stored within the table segment.

There are a number of different methods we could use (to be explored further in future posts), for now I’ll use the INCLUDING clause:

SQL> CREATE TABLE album_sales_details_iot2(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk2 PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING total_sales OVERFLOW TABLESPACE bowie2;

Table created.

So in the above example, all columns up to and “including” the total_sales column will be included in the IOT index structure. All the following columns listed in the table definition (in this case the Description column) will be store in the Overflow segment, which in the above example will be created within the BOWIE2 tablespace.

If we now populate this table with the identical data as before:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot2 VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> ANALYZE INDEX album_sales_det_pk2 VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      1664          3          4       1613

We notice the IOT index structure is now significantly smaller, down from 71,429 to just 1,613 leaf blocks. All the “clutter” has now been removed and is stored elsewhere.

If we now re-run our query:

SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 |  1300 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       2390  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Things are now much more efficient, having reduced the required consistent gets down from 19 to just 5 consistent gets.

If we now look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 441197148=0x1a4c225c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 322
kdxcofbo 680=0x2a8
kdxcofeo 703=0x2bf
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21049987=0×1413283
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[703] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0×0  cc: 1
nrid:  0×01800081.0
col  0: [ 5]  c4 02 5e 0d 25
row#1[726] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 15 fb: –H-F— lb: 0×0  cc: 1
nrid:  0×01800081.1
col  0: [ 5]  c4 04 41 13 43

We can see the number of index entries in the leaf block has increased from 7 to 322, with the size of the index entry decreasing from 1004 to just 23 bytes. Instead of the Description column being stored within the leaf block, we now have a nrid entry consisting of a 6 byte relative block address and row directory number (0×01800081.0), which effectively points to the actual location of the remaining portion of the row within the Overflow segment. We only therefore have a table column count of 1 (cc:1).

To find out more about the corresponding Overflow segment, we first must determine the OBJECT_ID of the IOT:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_DETAILS_IOT2';

 OBJECT_ID
----------
     74116

This OBJECT_ID is used to name the corresponding Overflow segment which we can determine from DBA_TABLES as it has a format of SYS_IOT_OVER_object_id:

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74116';

TABLE_NAME         IOT_NAME                 IOT_TYPE      BLOCKS
------------------ ------------------------ ------------ -------
SYS_IOT_OVER_74116 ALBUM_SALES_DETAILS_IOT2 IOT_OVERFLOW   71430

We notice this Overflow segment (at 71,430 blocks) is where the majority of our storage has been allocated.

Although it’s listed as a table, the Overflow segment can’t be directly accessed or manipulated. Any attempt to do so will result in an error:

SQL> select * from SYS_IOT_OVER_74116;
select * from SYS_IOT_OVER_74116
              *
ERROR at line 1:
ORA-25191: cannot reference overflow table of an index-organized table

If we look at a partial block dump of the Overflow segment block referenced in the previous IOT block dump:

Block header dump:  0×01800081
 Object id on Block? Y
 seg/obj: 0×12185  csc: 0×00.17482cc  itc: 1  flg: -  typ: 1 – DATA
     fsl: 0  fnx: 0×0 ver: 0×01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0x000a.00b.0000a919  0x00c24a2e.03d2.2a  C—    0  scn 0×0000.01748279
bdba: 0×01800081
data_block_dump,data header at 0x1a4c2244
===============
tsiz: 0x1fb8
hsiz: 0×20
pbl: 0x1a4c2244
     76543210
flag=——–
ntab=1
nrow=7
frre=-1
fsbo=0×20
fseo=0x4a6
avsp=0×486
tosp=0×486
0xe:pti[0] nrow=7 offs=0
0×12:pri[0] offs=0x1bda
0×14:pri[1] offs=0x17fc
0×16:pri[2] offs=0x141e
0×18:pri[3] offs=0×1040
0x1a:pri[4] offs=0xc62
0x1c:pri[5] offs=0×884
0x1e:pri[6] offs=0x4a6
block_row_dump:
tab 0, row 0, @0x1bda
tl: 990 fb: —–L– lb: 0×0  cc: 1
col  0: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We notice the Overflow block contains 7 rows as we would expect, as this was all the IOT segment could previously manage when it had to store the large Description column values.

The table row directory contains 7 rows, with the first row (#0) having an offset at address 0x1bda, which is the actual location of the first row within the Overflow block.

Therefore, in order to find a specific Description column value of interest from the IOT, Oracle references the (say) nrid:  0×01800081.0 within the IOT index entry for the row. This in turns points to the relative block address (0×01800081) of the Overflow block containing the description and the corresponding row directory number (0), which in turn specifies the offset (say) 0x1bda to the actual location of the Description value within the Overflow block. Easy !!

If we Analyze the IOT table:

SQL> ANALYZE TABLE album_sales_details_iot2 COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_DETAILS_IOT2';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_DETAILS_IOT2           500000     500000

We notice all the rows are listed as “Chained Rows“. This is because all the rows have a corresponding Description value stored in the Overflow segment and so the rows are not stored within the one block. As the previous query illustrated, this is no bad thing if we don’t need to reference these additional columns stored in the Overflow segment. It makes the resultant IOT table more compact and efficient to access.

However, on those (hopefully) rarer occasions when we do need to access the columns in the Overflow segment, this will clearly require additional block accesses:

SQL> SELECT * FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 | 99400 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 | 99400 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
       5541  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The above query which returns the Description column results in the consistent gets increasing to 32 consistent gets, from the 5 consistent gets when the Description wasn’t accessed and from the 19 consistent gets from when the Description column was co-located within the IOT segment. But this is a price we might be willing to pay if this query isn’t frequently executed while the frequently executed queries which don’t access the Description column are more efficient.

The Overflow segment gives us in a manner “the best of both worlds”. The ability to store just those columns of interest within the IOT segment (although these must always include all the Primary Key columns) and those that are less often accessed or too large to be efficiently stored within the IOT can be stored elsewhere. Effectively, it’s an index and table relationship except the table doesn’t have to store again the columns that are already stored within the index.

It’s all good news so far for IOTs …

Index Organized Tables – An Introduction Of Sorts (Pyramid Song) January 10, 2012

Posted by Richard Foote in Block Dumps, CBO, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Primary Key.
14 comments

Thought it was high time that I covered in a little detail the subject of Index Organized Tables (IOTs). When used appropriately, they can be an extremely useful method of storing and accessing data. Hopefully by the end of this series, you’ll have a better understanding of IOTs, their respective strengths and weaknesses and so perhaps be in a better position to take advantage of them when appropriate.

As I mentioned in a previous post, Martin Widlake has recently written an excellent series on IOTs, which I highly recommend. I’ll try to cover differing aspects of IOTs that will hopefully be of interest.

To start, let’s cover a very basic little example.

Let’s begin by creating and populating a simple Heap Table that holds information about musical albums (Note using an 8K blocksize in a MSSM tablespace):

SQL> CREATE TABLE album_sales(album_id number, country_id number, total_sales number, album_colour varchar2(20),
  2  CONSTRAINT album_sales_pk PRIMARY KEY(album_id, country_id));

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c IN 1..100 LOOP
  4        INSERT INTO album_sales VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We have a natural Primary Key that consists of two columns and an additional two columns of information.

Let’s look at some basic sizing information on the table and associated Primary Key index:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables WHERE table_name = 'ALBUM_SALES';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
      1570            0

SQL> ANALYZE INDEX album_sales_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1152          3       1062

So the table segment consists of 1570 blocks and the index segment 1152, with a total of 1062 leaf blocks.

OK, let’s run a basic query looking for all albums with an album_id=42:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3244723662

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   100 |  1800 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALBUM_SALES    |   100 |  1800 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ALBUM_SALES_PK |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4084  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As we can see, things are pretty good. 18 consistent gets in order to return 100 rows isn’t bad at all. Clearly, the index has a good Clustering Factor and can retrieve the 100 required rows in a relatively efficient manner.

However, this is a very frequently executed query and we want to do even better. One thing we notice is that we only have a couple of columns in the table which are not part of the index. Perhaps if we included these columns in the index as well, we can then use the index to extract all the required data and thus eliminate the need to visit the table segment at all. Overloading an index in this manner is a common tuning technique and will hopefully reduce the number of required logical I/Os to run the query.

We can do this by dropping and recreating the index with all the columns, making sure the PK columns remain the leading columns. This will ensure the index can still be used to police the PK constraint:

SQL> ALTER TABLE album_sales DROP PRIMARY KEY;

Table altered.

SQL> CREATE INDEX album_sales_pk_i ON album_sales(album_id, country_id, total_sales, album_colour) COMPUTE STATISTICS;

Index created.

SQL> ALTER TABLE album_sales ADD constraint album_sales_pk PRIMARY KEY(album_id, country_id);

Table altered.

OK, so the index now contains all the columns in the table and is now used to police the PK constraint:

SQL> select constraint_name, constraint_type, index_name from dba_constraints where constraint_name = 'ALBUM_SALES_PK';

CONSTRAINT_NAME                C INDEX_NAME
------------------------------ - ------------------------------
ALBUM_SALES_PK                 P ALBUM_SALES_PK_I

Let’s now look at the size of the index:

SQL> ANALYZE INDEX album_sales_pk_i VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      2048          5       2006

OK, as expected the index is now somewhat larger as it now needs to accommodate the extra columns. The number of overall blocks allocated to the index is 2048, with leaf blocks increasing from 1062  to 2006 leaf blocks.

If we now re-run the query:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1126128764

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_PK_I |   100 |  1800 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       3568  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice things have indeed improved and we have reduced the number consistent gets from 18 down to just 11. Not a bad improvement !!

If look at a partial block dump of one of the index leaf blocks:

Leaf block dump
===============
header address 484409948=0x1cdf825c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 5
kdxcosdc 0
kdxconro 258
kdxcofbo 552=0×228
kdxcofeo 1373=0x55d
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 20972941=0x140058d
kdxleprv 20972939=0x140058b
kdxledsz 0
kdxlebksz 8036
row#0[8010] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 12
col 2; len 5; (5):  c4 04 15 31 59
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b7
row#1[7984] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 13
col 2; len 5; (5):  c4 03 19 2c 3d
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b8

We notice that each leaf entry is 26 bytes in length. The length of the four columns adds up to 13 bytes. The remaining 13 bytes is basically overhead required for each index entry:

2 bytes for flag and lock information in the index entry header

5 x 1 byte for each of the length bytes for each column

6 bytes for the 5th index column which is the index rowid

So that’s 13 bytes of overhead per index entry in this example index.

Well, everything is currently pretty good. We have the application now performing approximately 40% less work than it was previously. But we have one little issue. With the index now consisting of all the columns in the table and with the application using the index exclusively, what’s the point of now having the table? It’s wasting storage and wasting resources in having to be maintained for no purpose other than having to exist so that the index can in turn exist.

Wouldn’t it be nice if we can somehow just have the index, but without the underlining table. Enter the Index Organized Table (IOT), first introduced way back in Oracle 8.0. It’s basically an index structure that can exist without the need for an underlining table. The index structure itself is the table by which we can store and retrieve the necessary data.

OK, let’s now create a new version of this table with the same data, but this time as an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sals number, album_colour varchar2(20),
     CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4        INSERT INTO album_sales_IOT VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

The key clause is here ORGANIZATION INDEX. I’ll discuss other options and syntax in coming posts.

If we look now at the table segment:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables
  2  WHERE table_name = 'ALBUM_SALES_IOT';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
                        IOT

We see there is an IOT segment listed but consists of no blocks as it doesn’t physically exist …

If we look at the size of the corresponding index:

SQL> SELECT index_name, table_name, blevel, leaf_blocks FROM dba_indexes
  2  WHERE table_name = 'ALBUM_SALES_IOT';

INDEX_NAME           TABLE_NAME       BLEVEL LEAF_BLOCKS
-------------------- --------------- ------- -----------
ALBUM_SALES_IOT_PK   ALBUM_SALES_IOT       2        1550

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1664          4       1550

We notice it’s smaller than the corresponding overloaded index for the Heap Table. The previous index consisted of 2048 blocks and 2006 leaf blocks but this index is somewhat smaller at just 1664 blocks and 1550 leaf blocks.

If we take a look at a partial block dump of a leaf block from the IOT:

Leaf block dump
===============
header address 483926620=0x1cd8225c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 1
kdxcoopc 0×90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 2
kdxconro 336
kdxcofbo 708=0x2c4
kdxcofeo 710=0x2c6
kdxcoavs 2
kdxlespl 0
kdxlende 0
kdxlenxt 20976645=0×1401405
kdxleprv 20976643=0×1401403
kdxledsz 0
kdxlebksz 8036
row#0[710] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 49
tl: 14 fb: –H-FL– lb: 0×0  cc: 2
col  0: [ 5]  c4 04 2f 10 59
col  1: [ 4]  47 4f 4c 44
row#1[732] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 4a
tl: 14 fb: –H-FL– lb: 0×0  cc: 2
col  0: [ 5]  c4 03 01 03 46
col  1: [ 4]  47 4f 4c 44

Firstly, we notice it’s definitely an IOT block dump as the IOT flag is set.

The structure of the index entry is somewhat different here. It basically consists of:

2 bytes for lock and flag info in the index header as previously

Next come the two Primary Key columns with their corresponding length bytes. Note an IOT must have a PK defined.

Following are 3 bytes for the table header consisting of a lock byte, flag byte and a byte to denote the number of table (non PK) columns (in this case 2).

Followed finally by the 2 Non-PK columns and their corresponding length bytes.

Note the big missing component here from the previous block dump is that there is no rowid defined with its corresponding length byte. No need for a rowid if there’s no corresponding table to point down to …

So the overall overhead has been reduced to:

2 byes for the index header

3 bytes for the table header

4 bytes for the 4 column lengths

for a total of 9 bytes, 4 less than the 13 bytes overhead required in the previous example. So the total length of an index entry has reduced down from 26 bytes to just 22 bytes. Hence, the overall reduction in the size of the corresponding IOT index.

So we have saved 1570 table blocks and 384 index blocks in total.

If we now re-run the same query:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1800 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)

 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
       3575  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Not only have we saved ourselves some storage and having to maintain two physical segments, but things are a tad more efficient as well, reducing the number of consistent gets down from 11 to 10 as the corresponding index segment we need to access is smaller …

Enough to start with for now and yes the pun in the title is fully intended :)

Curious Case Of The Ever Increasing Index Quiz (She’ll Drive The Big Car) January 4, 2012

Posted by Richard Foote in Index Internals, Indexing Myth, Oracle Indexes, Quiz.
22 comments

I received an email recently that had a nice example of what can potentially go wrong with an index.

Let’s first create a simple table with a unique index and populate it with 200,000 rows (following demo run on 11.2.0.1):

SQL> create table bowie (id number constraint bowie_pk primary key, name varchar2(100));

Table created.

SQL> insert into bowie select rownum, 'BOWIE' from dual connect by level <= 200000;

200000 rows created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_pk validate structure;

Index analyzed.

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              2     200000           0        374          1

So far, everything is as expected. With have an index with 200,000 rows that currently has 374 leaf blocks.

OK, what we want to do is basically gradually delete the current set of rows and replace them with 200,000 new rows, with ever-increasing Primary Key values. To this end, we create the following procedure:

SQL> create or replace procedure delete_insert_rows
  2  as
  3       n number;
  4       m number;
  5  begin
  6       select min(id),max(id) into n,m from bowie;
  7       for i in 1..200000 loop
  8           delete from bowie where id=n+i-1;
  9           insert into bowie values(m+i,'David Bowie');
 10           if (i mod 1000=0) then
 11                commit;
 12           end if;
 13       end loop;
 14       commit;
 15  end;
 16  /

Procedure created.

So the procedure basically determines the current MIN and MAX values of our PK column and gradually deletes the current rows and then inserts new ones. Every 1000 iterations, we commit the changes. Nothing too complex here.

When we run this procedure for the first time:

SQL> exec delete_insert_rows

PL/SQL procedure successfully completed.

SQL> analyze index bowie_pk validate structure;

Index analyzed.

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              2     293820       93820        619          1

We notice we now have a whole bunch of deleted leaf entries and that the index has grown from 374 to 619 leaf blocks.

If we run the procedure again:

SQL> exec delete_insert_rows

PL/SQL procedure successfully completed.

SQL> analyze index bowie_pk validate structure;

Index analyzed.

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              3     347841      147841        994          3

Things have gone even worse. We still only have 200,000 rows in the table but the index now has an additional 147,841 deleted entries and the number of leaf blocks has again increased substantially to 994 leaf blocks.

If we have a look at a partial treedump of the index:

SQL> select object_id from dba_objects where object_name = 'BOWIE_PK';

 OBJECT_ID
----------
     74060

SQL> alter session set events 'immediate trace name treedump level 74060';

Session altered.

—– begin tree dump
branch: 0x100378b 16791435 (0: nrow: 2, level: 2)
   branch: 0x1003ce0 16792800 (-1: nrow: 733, level: 1)
      leaf: 0x100378e 16791438 (-1: nrow: 149 rrow: 0)
      leaf: 0x100378f 16791439 (0: nrow: 571 rrow: 0)
      leaf: 0x100378c 16791436 (1: nrow: 291 rrow: 0)
      leaf: 0×1003795 16791445 (2: nrow: 571 rrow: 0)
      leaf: 0×1003796 16791446 (3: nrow: 433 rrow: 0)
      leaf: 0×1003797 16791447 (4: nrow: 4 rrow: 0)
      leaf: 0×1003790 16791440 (5: nrow: 571 rrow: 0)
      leaf: 0×1003791 16791441 (6: nrow: 146 rrow: 0)
      leaf: 0×1003792 16791442 (7: nrow: 571 rrow: 0)
      leaf: 0×1003793 16791443 (8: nrow: 288 rrow: 0)
      leaf: 0×1003794 16791444 (9: nrow: 571 rrow: 0)
      leaf: 0x10037a9 16791465 (10: nrow: 430 rrow: 0)

… (most of the treedump has been cut out, following is the last portion of the dump)

  
     leaf: 0x1003e70 16793200 (248: nrow: 533 rrow: 533)
      leaf: 0x1003e74 16793204 (249: nrow: 533 rrow: 533)
      leaf: 0x1003e78 16793208 (250: nrow: 533 rrow: 533)
      leaf: 0x1003e7c 16793212 (251: nrow: 533 rrow: 533)
      leaf: 0x1003e41 16793153 (252: nrow: 533 rrow: 533)
      leaf: 0x1003e45 16793157 (253: nrow: 533 rrow: 533)
      leaf: 0x1003e49 16793161 (254: nrow: 533 rrow: 533)
      leaf: 0x1003e4d 16793165 (255: nrow: 533 rrow: 533)
      leaf: 0x1003e51 16793169 (256: nrow: 533 rrow: 533)
      leaf: 0x1003e3e 16793150 (257: nrow: 533 rrow: 533)
      leaf: 0x1003e03 16793091 (258: nrow: 533 rrow: 533)
      leaf: 0x1003e07 16793095 (259: nrow: 236 rrow: 236)
—– end tree dump

We notice that the first portion of the index contains leaf blocks with nothing but deleted index entries. The number of rrows is 0 for a vast number of leaf blocks. We also notice that the root block has a rba of 0x100378b 16791435, which is only a few values below some of the rba values of the left most indexes in the index structure (say) 0x100378e 16791438. Therefore, this highlights that even though these left most blocks in the index structure contain nothing but deleted index entries, Oracle is not recycling them as it should do. Oracle is simply adding new blocks to the index structure rather than recycling empty leaf blocks, resulting in the index growing bigger and bigger.

The leaf blocks however at the right most end of the index structure (the second portion of the partial treedump), shows us a nice compact set of leaf blocks with lots of index entries per block (most with 533 per leaf block) and with no deleted index entries (rrows matches the nrows value). 

If we run the procedure 10 times in total, we get an index that looks like the following:

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              3    1325132     1125132       4136          7

We now have 1,125,132 deleted index entries and the index is now over 10 times the original size, up from 374 to a massive 4,136 leaf blocks, even though the table only contains 200,000 rows.

There are a number of contributing factors here :)

The question is why, why is the index behaving in this fashion and what can we do to ensure the index doesn’t grow in this manner and can remain basically the same size as we delete and insert new rows into the table ?

Index Block Dump: Index Only Section Part I (TVC 15) August 10, 2010

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes.
2 comments

Having already covered general block header details relevant to several different types of Oracle blocks (Block Dumps Part I and Part II), the next part of the block dump is relevant only to index blocks.

Below is a dump of the index only section of an index leaf block dump:

Leaf block dump
===============
header address 200514140=0xbf39a5c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 3
kdxcofbo 42=0x2a
kdxcofeo 7987=0x1f33
kdxcoavs 7945
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036

 
First a note identifies the block as being a Leaf block dump or a Branch block dump.

All index block dumps (whether an index leaf or branch) have the details listed above up to and including kdxcoavs. Specific points of interest for details included in any index block dump:

kdxcolev 0 – index level (a value of 0 denotes this as a leaf block)

kdxconco 2 - number of columns in the index. Note: although the index was only defined on the single NAME column, the index actually includes the rowid as an additional column, as the index was defined as Non-Unique (hence the value 2). All Oracle indexes are effectively unique as Oracle makes them so by adding the rowid to the column list for all Non-Unique defined indexes.

kdxconro 3 - number of index entries in the index block (this block dump was taken when the index only had the 3 entries, prior to the insertion of the additional rows in Part II in this series)

kdxcofbo 42=0x2a – offset to the beginning of free space within the block

kdxcofeo 7987=0x1f33 – offset to the end of free space within the block. Note index entries get added “from the bottom” of the index free space and so it’s from this offset that any subsequent index entries will be added.

kdxcoavs 7945 - available free space within the block (effectively the space between kdxcofbo and kdxcofbe)

The following details are only included if the block is an index leaf block.

kdxlende 0 - number of index entries that have been marked as deleted within the block (this will be discussed more fully later). When an index entry is deleted (or indeed updated which is effectively a delete of an index entry followed by an insert), the entry is not physically deleted but is only marked as deleted. Generally these deleted index entries are subsequently cleaned out by Oracle but kdxlende keeps a count of those deleted index entries that have yet to be cleaned out.

kdxlenxt 0=0×0 - pointer to the next index leaf block within the logical index structure (as this is the only leaf block within the index, there is no pointer set although this is usually only unset for the last or “right most” leaf block within the index structure). During a larger index range scan, all the required index entries may not be found within a single index leaf block and Oracle may need to quickly find the next logical leaf block within the index structure. This pointer basically contains the Relative Block Address (RBA) of the next logical leaf block which can be accessed directly as necessary during such index range scans.

kdxleprv 0=0×0 - pointer to the previous index leaf block within the logical index structure (as this is the first and only leaf block within the index, again there is no pointer set although this is usually only unset for the first or “left most” leaf block within the index structure). Again, during a larger index range scan in which the data is required in descending order, all the required index entries may not be found within a single index leaf block and Oracle may need to quickly find the previous logical leaf block within the index structure. This pointer basically contains the RBA of the previous logical leaf block which can be accessed directly as necessary. As such, it’s used by Oracle for index range scans that require the data to be extracted in logically reverse order (eg. to avoid an order by desc). Note only a leaf block associated with a blevel 0 index has neither the kdxlenxt or kdxleprv assigned.

These two pointers provide the linked list mechanism for all such index range scan operations involving more than one leaf block. Note that a branch block doesn’t have such pointers between branch blocks but it does have a pointer to the first block it references below itself in the index structure (kdxbrlmc).

kdxlebksz 8036 – actual maximum useable space within the index block (basically the block size less the block header “overheads”). Note that an index branch block has an equivalent value called kdxbrbksz which typically has more useable space as a branch block only has the one ITL entry by default.

  
Next, we’ll look at the index entries themselves.

Index Block Dump: Block Header Part II and Read Consistency (I Can’t Read) July 28, 2010

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes, Read Consistency.
3 comments

OK, let’s look at the next portion of the index block dump.
 
Following the hex dump of the block (as we ended Part I of the series) is the second part of the block header (see below):   

Block header dump:  0x0201490a
 Object id on Block? Y
 seg/obj: 0x1c205  csc: 0x00.2d11214  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.02d11214

  

The following details are listed initially:
 
0x0201490a - rdba again
seg/obj: 0x1c205 – data_object_id again
csc: 0×00.2d11214 - initial scn of block
itc: 2 - number of Interested Transaction List (ITL) entries in the block
typ: 2 – INDEX  – denotes this specific block as indeed an Index block
 
Then follows the Interested Transaction List (Itl) which is used by Oracle to store critical information of any transaction making changes within the block. An index leaf block is basically assigned 2 ITL entries by default (but only 1 if it’s an index Branch block). However the number of pre-assigned ITL entries is set by the INITRANS physical attribute parameter of the index. 

A ITL entry basically consists of:
 
A unique slot ID (starting with 01)
Transaction ID
Undo Block Address (Uba) which denotes the Undo segment assigned to the specific transaction
Flag and Lock Bytes
System Change Number (SCN) 
 

Any transaction wishing to make a change to the block must first be allocated to an ITL entry, with the first ITL entry (01) reserved by Oracle for internal recursive operations. If there’s an ITL entry that’s not being used by a current transaction, great, a new transaction will simply use a free ITL entry. However, if all ITL slots are already assigned to current transactions within the index block, then an additional ITL entry is created and allocated to the new transaction by Oracle, providing there’s sufficient free space within the block or the 255 limit hasn’t been reached (or the MAXTRANS parameter is not reached in older versions of Oracle).
 
As a simple demo to see this in action, I’m going to insert two additional rows into the table (and so create two new index entries in the index): 

 
SQL> insert into bowie values (4, 'THIN WHITE DUKE');
 
1 row created.
 
SQL> insert into bowie values (5, 'ALADDIN SANE');
 
1 row created.
 
SQL> commit;
 
Commit complete.

         
If we now dump the index block and look at the ITL slots:
 


Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0x000a.006.000065d4  0x00c00441.2fb5.09  --U-    2  fsc 0x0000.02d120c8

   

We see the transaction has used ITL slot number 02 as relevant details are now assigned.
 
Points of interest includes the Flag is now set to U (Unclean) and that the Lock attribute is assigned the value 2. This means the transaction has been committed but the index block has not been cleaned out and is currently “unclean” in that the 2 new index rows associated with the transaction still have the lock byte set (as we’ll see in a later post).
 
The other important piece of information is the value of the SCN within the ITL (0×0000.02d120c8). The SCN is basically a timestamp that denotes when the transaction made the associated changes to the index block. As this is the last transaction to have changed the block, this SCN effectively denotes the point of time when the block was last changed.
 
If we now also look at the start of the block dump and the block header details discussed in Part I of this series:
 

Start dump data blocks tsn: 8 file#:8 minblk 84234 maxblk 84234
Block dump from cache:
Dump of buffer cache at level 4 for tsn=8, rdba=33638666
BH (0x157E78EC) file#: 8 rdba: 0x0201490a (8/84234) class: 1 ba: 0x1548C000
  set: 3 bsz: 8192 bsi: 0 sflg: 0 pwc: 0 lid: 0×00000000,0×00000000
  dbwrid: 0 obj: 115205 objn: 115205 tsn: 8 afn: 8
  hash: [0x292CCE8C,0x292CCE8C] lru: [0x19FFB3BC,0x2930A050]
  obj-flags: object_ckpt_list
  ckptq: [0x1A3EA500,0x18FF0A90] fileq: [0x293178F0,0x19FFB398] objq: [0x2183AA3C,0x2183AA3C]
  st: XCURRENT md: NULL tch: 2
  flags: buffer_dirty redo_since_read gotten_in_current_mode
  LRBA: [0x627.e9dd.0] LSCN: [0x0.2d120c8] HSCN: [0x0.2d120c8] HSUB: [1]
  cr pin refcnt: 0 sh pin refcnt: 0
  buffer tsn: 8 rdba: 0x0201490a (8/84234)
  scn: 0×0000.02d120c8 seq: 0×03 flg: 0×02 tail: 0x20c80603
  frmt: 0×02 chkval: 0×0000 type: 0×06=trans data
 

We notice the header has also been changed and now has the SCN from this transaction stamped in the header. Therefore any process needing this index block to be at a specific point in time (say a Select statement during a consistent read operation) can now quickly check the block header to see “when” it was last changed.
 
If the block has indeed changed since the (say) Select statement started, it need now only check the ITL slots to see which transaction has matching SCN details and so find the specific transaction that last changed the index block. Knowing the transaction in question, Oracle can then reference the Undo Block Address in the ITL entry and so determine the specific undo segment that contains details of the previous state of the block and rollback all changes made by this transaction and create a new consistent image of the block as it was prior to the transaction making the block changes. The Select statement can now check the previous SCN in the header and repeat the same process as necessary until the index block is at a point in time prior to the Select statement starting.
 
In short,  basically how consistent reads are implemented in Oracle.

Everything I’ve discussed so far in Parts I & II are just as applicable to tables (and other segments) as they are to indexes. In the next post, we’ll look at the next portion of an index block dump which is unique to index segments …

Index Block Dump: Block Header Part I (Editions Of You) July 20, 2010

Posted by Richard Foote in 11g, Block Dumps, Index Internals, Oracle Indexes.
1 comment so far

I’ve previously looked at how to generate an Oracle block dump, time to now go into a little more detail.

As I mentioned, a block dump is a formatted representation of the actual contents of an Oracle block. Producing strategic block dumps can be an extremely useful method of determining what might be going on in Oracle under the covers and over the years I’ve found them to be an invaluable aid in helping me understand Oracle behaviour, troubleshoot issues, investigate the contents of corrupted blocks, etc. The focus in this series will be Oracle block dumps from the perspective of indexes, although the contents any Oracle block can be dumped and investigated.
 
To setup the demo, I’m going to initially create a simple little table and associated index that has only 3 rows to begin with. As the index is so tiny, all the contents can fit within the one index leaf block resulting in an index with a blevel of 0 (height of 1). Note I’m using a 11.1.0.6.0 database running on Windows for this specific demo. The actual format and content of a block dump differs between releases and continually changes. However, much of the useful content which I’ll focus on remains relatively consistent.
 

SQL> create table bowie (id number, name varchar2(20));
 
Table created.
 
SQL> insert into bowie values (1, 'BOWIE');
 
1 row created.
 
SQL> insert into bowie values (2, 'ZIGGY');
 
1 row created.
 
SQL> insert into bowie values (3, 'MAJOR TOM');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_name_i on bowie(name);
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks from dba_indexes where index_name = 'BOWIE_NAME_I';
 
INDEX_NAME          BLEVEL LEAF_BLOCKS
--------------- ---------- -----------
BOWIE_NAME_I             0           1

 

OK, as this is a simple blevel 0 index, all the contents of the index can be found in the block immediately following the index segment header.
 

SQL> select header_file, header_block from dba_segments where segment_name='BOWIE_NAME_I';
 
HEADER_FILE HEADER_BLOCK
----------- ------------
          8        84233

 

So the block we want to dump is the block in datafile 8, block 84234 (1 more than the segment header block). Before we dump the block though, let’s just note a few other points about this index. Firstly, given we know the datafile and block of interest, we can determine its relative data block address (rdba) by plugging in these details into the dbms_utility.make_data_block_address function:

SQL> select dbms_utility.make_data_block_address(8, 84234) from dual;
 
DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(8,84234)
---------------------------------------------
                                     33638666

  
We’ll note the rdba of 33638666 for future reference. The other thing we’ll just record is the Object ID of the index:
 

SQL> select object_id, data_object_id from dba_objects where object_name = 'BOWIE_NAME_I';
 
 OBJECT_ID DATA_OBJECT_ID
---------- --------------
    115205         115205

 

Again, we’ll note an Object_Id and Data_Object_Id of 115205 for future reference. Finally, I’m just going to flush the buffer cache to ensure the current contents of the block is written to disk. This is useful in 11g when the block dump differentiates between the block contents in the buffer cache and on disk.

SQL> alter system flush buffer_cache;
 
System altered.

 

OK, let’s now dump the block associated with this index leaf block:
 

SQL> alter system dump datafile 8 block 84234;
 
System altered.

 

Below is just the block header portion of the resultant block dump:
 

Start dump data blocks tsn: 8 file#:8 minblk 84234 maxblk 84234
Block dump from cache:
Dump of buffer cache at level 4 for tsn=8, rdba=33638666
BH (0x17BF3D8C) file#: 8 rdba: 0x0201490a (8/84234) class: 1 ba: 0x17A70000
  set: 3 bsz: 8192 bsi: 0 sflg: 0 pwc: 0 lid: 0×00000000,0×00000000
  dbwrid: 0 obj: 115205 objn: 115205 tsn: 8 afn: 8
  hash: [0x292CCE8C,0x15BFB42C] lru: [0x167EA1EC,0x15BFB48C]
  lru-flags: on_auxiliary_list
  ckptq: [NULL] fileq: [NULL] objq: [NULL]
  st: FREE md: NULL tch: 0
  flags:
  cr pin refcnt: 0 sh pin refcnt: 0
  Buffer contents not dumped
Block dump from disk:
buffer tsn: 8 rdba: 0x0201490a (8/84234)
scn: 0×0000.02d11215 seq: 0×01 flg: 0×04 tail: 0x12150601
frmt: 0×02 chkval: 0x9ae6 type: 0×06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0BF39A00 to 0x0BF3BA00

BF39A00 0000A206 0201490A 02D11215 04010000  [.....I..........]
BF39A10 00009AE6 001A0002 0001C205 02D11214  [................]
BF39A20 1FE80000 00021F02 00000000 00000000  [................]
BF39A30 00000000 00000000 00000000 00000000  [................]
BF39A40 00000000 0000FFFF 00000000 00000000  [................]
 
… Snip rest of memory dump …
 

As I mentioned, the actual format and content of a block dump differs between releases and continually changes so your formatted block dump may differ somewhat. However, the main points which I’ll discuss should be found in most currently supported versions of Oracle. This post should only be considered as a basic introduction on the subject. More depth and details to come.

As this is an 11g block dump, the block header consists of three distinct sections:

1) Dump of the buffer cache details associated with the index block

2) Dump of index block from disk

3) Full raw hex dump of the associated block
 

The first thing to check is that one is looking at the correct block dump and that the correct segment block was indeed dumped. The start of the formated block dump states the details of the dumped block(s):
 
file#:8 minblk 84234 maxblk 84234
 
so indeed, we’re looking at the correct dump file. These details are also listed along with the hex representation of relative block address (rdba) information:
 
file#: 8 rdba: 0x0201490a (8/84234)
 
and we can also confirm that the rdba is also correct and consistent with the block we have dumped:
 
rdba=33638666
 
as this matches the results of the rdba from the rdba as previously displayed via the MAKE_DATA_BLOCK_ADDRESS function.
 
We can also confirm that the correct index segment was dumped as the object id of the index:
 
obj: 115205 objn: 115205
 
matches the data_object_id and object_id from dba_objects as listed previously.
 

Other details of interest found in block dumps in most Oracle versions include:
 
scn: 0×0000.02d11215 - The System Change Number (SCN) of the block when it was last modified (I’ll show you how this gets populated in a later post). This is the effective “timestamp” of the block and is used by Oracle to determine crucial information such as effectively “when” the block was last modified and by “what” transaction.
 
type: 0×06=trans data - denotes the type of block. 06 represents a transactional block (table/index/cluster)
 
seq: 0×01 - number of the block changes within the the current scn (we’ll see how this increments in a later post)
 
tail: 0x12150601 - which consists of the last 2 bytes of the SCN, type and seq
 
frmt: 0×02 - denotes the block format with 02 representing a post Oracle8 block format. An A2 block (as shown in the raw block dump) denotes a post 10g block format.
 
chkval: 0x9ae6 – checksum value of block as used by Oracle in part to check the consistency and validity of the block 
 

In later versions of Oracle (10g and beyond), the block dump includes a full hex dump of the associated memory buffer. I’ve snipped most of this in the above dump extract. However, one can see where the details of the block I’ve listed above can be found within the memory dump (hopefully, the colour code will help to highlight where each distinct piece of information can be found). Note also in 11g and beyond, more details of the buffer cache are listed and detailed as defined in the buffer cache section.
 
I’ll look at the following portion of the index block header, the Interested Transaction Slots, in the next post. We’ve only just begun …

Bitmap Index Degradation Since 10g (Fix You) June 1, 2010

Posted by Richard Foote in Bitmap Indexes, Index Internals, Oracle Indexes.
6 comments

As discussed in my earlier post on Bitmap Index Degradation After DML Prior To 10g, Oracle wasn’t particularly efficient in the manner it maintained Bitmap Indexes after DML operations. During insert operations, if an existing Bitmap index entry didn’t cover the rowid range of a new row to be inserted, Oracle would create a new Bitmap index entry with a default range of 8 rowids and all the overheads this entails (and all the overheads this entails. As such, Bitmap indexes prior to 9i would often explode in size.

Thankfully since 10g, these issues have been largely addressed.

To illustrate, I’m going to run the exact same demo as I did in 9i, this time specifically on a 10.2.0.4 windows database, although you should get similar results in all versions of 10g/11g as well.

As before, I’m going to create the same table and populate it with the same data, creating a Bitmap index on the CODE column which has 1000 distinct values:

SQL> create table bowie as select mod(rownum,1000)+1 id, mod(rownum,1000)+1 code,'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_code_i on bowie(code) compute statistics;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

 

A freshly created Bitmap index has the same leaf blocks and index entries as in 9i.

Looking at a partial block dump of the first leaf block in the index:

row#0[5013] flag: ------, lock: 0, len=3019
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 42 1e 8c 00 e0
col 2; len 6; (6):  01 42 28 ad 01 7f
col 3; len 2998; (2998):
 03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...

row#1[1994] flag: ------, lock: 0, len=3019
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 42 1e 8a 00 00
col 2; len 6; (6):  01 42 28 ab 00 9f
col 3; len 2998; (2998):
 00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
 c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4
...

 

We’ll look back and compare differences within this leaf block in a moment but for now just note there are 2 index entries within the leaf block, with 1 index entry for each distinct value of the CODE column (as with the 9i example).

OK, next we insert a new row and see what happens. Remember in 9i, Oracle created a new index entry as no existing bitmap index entries had a rowid range that span the rowid of this newly inserted row:

SQL> insert into bowie values (1, 1, 'ZIGGY');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index bowie_code_i compute statistics;
 
Index analyzed.
 
SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

OK, we notice a significant difference here from the 9i example. Although we’ve just inserted a new row, Oracle has not created a new Bitmap index entry (still 1000 rows in the index). Clearly, Oracle has managed to reuse the existing index entry for the newly inserted CODE value “1″, rather than add a new index entry.

A partial leaf block dump reveals what has happened:

row#0[1992] flag: ------, lock: 2, len=3021
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 42 1e 8c 00 e0
col 2; len 6; (6):  01 42 28 ae 00 07
col 3; len 3000; (3000):
 03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...

 

Rather than create a new index entry for the CODE value “1″, Oracle has made a couple of key changes to the existing index entry. Firstly, we notice it has changed the end rowid range (col 2) from a value of 01 42 28 ad 01 7f  to  01 42 28 ae 00 07 so that the rowid range now includes the rowid associated with the newly inserted row.

Additionally, it has modified the bitmap index string column (col 3) to incorporate the location of the newly inserted row within the increased rowid range. This has resulted in the bitmap string column increasing from 2998 to 3000 bytes, thus increasing the overall size of the index entry by the 2 additional bytes (3021 up from 3019).

So since 10g, Oracle is significantly more efficient and where possible will simply adjust the current rowid range of the Bitmap index entry and modify the bitmap string accordingly to accommodate a new row value (resulting in an overall increase of just 2 bytes overall in this example) rather than create a totally new index entry (which required an additional 21 bytes in the 9i example).

If we were to populate this Bitmap index from scratch as we did in the 9i example:

SQL> create table radiohead (id number, code number, name char(5));
 
Table created.
 
SQL> create bitmap index radiohead_code_i on radiohead(code);
 
Index created.
 
SQL> begin
  2  for i in 1..1000 loop
  3    for j in 1..1000 loop
  4     insert into radiohead values (j, j, 'BOWIE');
  5     commit;
  6    end loop;
  7  end loop;
  8  end;
  9  /
 
PL/SQL procedure successfully completed.
 
SQL> analyze index radiohead_code_i compute statistics;
 
Index analyzed.
 
SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        2         781       1000

  

We notice the index has not deteriorated anywhere near to the same degree as the 9i bitmap index example. Previously, the 9i bitmap index grew to a massive 5,347 leaf blocks but this 10g version has only become a moderate 781 leaf blocks, just 281 leaf blocks greater than a freshly rebuilt bitmap index. It has only grown by some 56%, (of which much has to do with the free space associated with subsequent index block splits), whereas the 9i version of the index grew by a massive 969%.

In summary, bitmap indexes in currently “supported” versions of Oracle are maintained in a much more efficient manner than they were previously, to the point where the need for frequently rebuilds has been much reduced, even in tables in which such indexes are not dropped during heavy loads.

That said, Bitmap indexes are still unsuitable in OLTP type environments (even in 11g)  due to the locking implications associated with them. Perhaps a discussion for another day.

Bitmap Index Degradation After DML Prior To 10g (Beauty and the Beast) May 25, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Index Internals, Oracle Indexes.
5 comments

Bitmap Indexes have a bad reputation with regard to being problematic and suffering from severe degradation following heavy DML operations, especially larger amounts of insert activity. Bitmap indexes have been known to grow substantially and require periodic rebuilds following such insert activity.

While this was certainly the case in previous versions of Oracle, things have dramatically improved since version 10g. I thought it might be worthwhile explaining why Bitmap Indexes had such issues prior to 10g and how things are now so much better, in many cases making Bitmap index rebuilds unnecessary.

To start with, a demo on an Oracle 9.2.0.7 database. I’m first going to create a simple little table with 1M rows, with a Bitmap index on a CODE column with 1,000 distinct values. The values are effectively inserted into the table in an evenly distributed manner throughout the entire table.

SQL> create table bowie as select mod(rownum,1000)+1 id, mod(rownum,1000)+1 code,'BOWIE' name from dual connect by level <= 1000000;

Table created.

SQL> create bitmap index bowie_code_i on bowie(code) compute statistics;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1000

We notice that the Bitmap index has 1000 index entries, one for each index value.  There are 500 leaf blocks which means we can only fit 2 index entries in each leaf block on average. If we look at a partial block dump of the first leaf block:

row#0[5013] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 cc 0c 00 e0
col 2; len 6; (6):  02 02 d6 2d 01 7f
col 3; len 2998; (2998):
03 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01
c5 bd 01 c3 eb 01 c1 eb 01 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c1 eb 01 c4
bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb
01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01
c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6
...
row#1[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

I’ve listed just the first portions of each of the 2 index entries in the leaf block. Each indexed value is different (col 0)  and each have a bitmap string of 2998 bytes in size (col3). Therefore in an 8K block, Oracle can’t indeed fit a 3rd index entry in a block, hence why there are only 2 index entries per block. The bitmap string column is quite large because in a 1M row table, there are 1000 occurences of each indexed value littered throughout the table that need to be referenced.

OK, next I’m going to insert just 1 additional row. It has a CODE value of 1, which is the minimum existing CODE value and so should be indexed in the first index leaf block. Let’s see what impact this has on the index statistics:

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

The crucial point here is that although the value (1) already existed within the index, Oracle has nonetheless created an additional index entry (1001 NUM_ROWS up from 1000). If look at a partial dump of the leaf block now:

row#1[1973] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 1; (1):  00
row#2[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd
01 c5 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 01 c5 eb 01 c3 eb 01 c6 bd 01
c4 eb 01 c0 be 01 c6 ea 01 c4 eb 01 c7 bd 01 c5 eb 01 c0 be 01 c6 ea 01 c4

We notice there is indeed a new index entry (row#1) which only has a rowid range (col 1 and col 2) of just 8 rows (between 02 02 d6 2e 00 00 and 02 02 d6 2e 00 07). So rather than somehow modify the existing index entry, Oracle has created a new index entry with a rowid range of just 8 consecutive rowids.

This means we should now be able to add an additional 7 rows with a CODE value of 1, which providing they can all fit within the same table block, will be associated with this allocated rowid range.

SQL> begin
2  for i in 1..7 loop
3  insert into bowie values (1, 1, 'ZIGGY');
4  commit;
5  end loop;
6  end;
7  /

PL/SQL procedure successfully completed.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1001

Indeed, after inserting these additional rows, no new index entries were required as all the new rows fall within the rowid range of the new index entry. But add one more row …

SQL> insert into bowie values (1, 1, 'ZIGGY');

1 row created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_CODE_I';
INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
BOWIE_CODE_I                            1         500       1002

Indeed, we now have yet another additional index entry (1002 NUM_ROWS up from 1001).  If we now look at a partial block dump of the index leaf block:

row#1[1819] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 00
col 2; len 6; (6):  02 02 d6 2e 00 07
col 3; len 2; (2):  c8 ff
row#2[1798] flag: -----, lock: 2
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d6 2e 00 08
col 2; len 6; (6):  02 02 d6 2e 00 0f
col 3; len 1; (1):  00
row#3[1994] flag: -----, lock: 0
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 02 cc 0a 00 00
col 2; len 6; (6):  02 02 d6 2b 00 9f
col 3; len 2998; (2998):
00 c4 bd 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01
c3 eb 01 c6 bd 01 c4 eb 01 c2 eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c2
eb 01 c5 bd 01 c3 eb 01 c6 bd 01 c4 eb 01 c3 eb 01 c6 bd 01 c4 eb 01 c7 bd 

We notice we  have yet another index entry (row#2) covering another narrow 8 row range.

Basically prior to 10g, if a new index value didn’t fit within an existing bitmap index rowid range for the corresponding indexed value, a new index entry is added and all the overheads associated with it, with a default range of just 8 rowids. This is an extremely costly process as the new index entry not only has to store the index value itself again, but additionally 2 rowids and corresponding length bytes and the necessary bitmap string column as well. Additionally, in all likelihood, the rows in the specific rowid range are quite likely to contain differing values in the index columns and so will in turn require new index entries. I’ve just inserted a whole bunch of CODE values of 1 in the above example. In a randomly inserted column value, this might be quite a rare event as each subsequent indexed value is quite likely to differ for each new consecutive row.

If we create a table and index first and then populate the table with the 1 million rows, the number of index entries and size of index itself becomes huge …

SQL> create table radiohead (id number, code number, name char(5));

Table created.

SQL> create bitmap index radiohead_code_i on radiohead(code);

Index created.

SQL> begin
2  for i in 1..1000 loop
3    for j in 1..1000 loop
4     insert into radiohead values (j, j, 'BOWIE');
5     commit;
6    end loop;
7  end loop;
8  end;
9  /

PL/SQL procedure successfully completed.

SQL> analyze index radiohead_code_i compute statistics;

Index analyzed.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        2        5347    1000000

Because the clustering is so bad, each new row for a given index value isn’t within an existing 8 rowid window and so each and every row in the table requires its own bitmap index entry. As a result, the Bitmap index is huge at 5,347 leaf blocks and has a full1M NUM_ROWS. A partital block dump shows just how inefficient this index is:

row#0[4273] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8a 00 00
col 2; len 6; (6):  02 02 d8 8a 00 07
col 3; len 1; (1):  00
row#1[4294] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8c 00 e0
col 2; len 6; (6):  02 02 d8 8c 00 e7
col 3; len 1; (1):  01
row#2[4315] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 8f 00 40
col 2; len 6; (6):  02 02 d8 8f 00 47
col 3; len 1; (1):  04
row#3[4336] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 91 01 20
col 2; len 6; (6):  02 02 d8 91 01 27
col 3; len 1; (1):  05
row#4[4357] flag: -----, lock: 0
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 02 d8 94 00 88
col 2; len 6; (6):  02 02 d8 94 00 8f
col 3; len 1; (1):  00

We can see that each and every indexed value (col 1) in this partial leaf block dump is the same (hex c1 02) with each only having a rowid range (col 1 and col 2) that covers the bare minimum 8 row range. It’s about as inefficient a Bitmap index as you can get with 1000 distinct values in a 1M row table.

If we now however rebuild the Bitmap index:

SQL> alter index radiohead_code_i rebuild compute statistics;

Index altered.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='RADIOHEAD_CODE_I';

INDEX_NAME                         BLEVEL LEAF_BLOCKS   NUM_ROWS
------------------------------ ---------- ----------- ----------
RADIOHEAD_CODE_I                        1         500       1000

We get back to our nice, efficient Bitmap Index structure with just the bare minimum 1000 index entries, all fitting into a fraction of the index leaf blocks (just 500 down from 5,347).

Fortunately, Oracle has got a lot more cleverer since 10g with regard to how it maintains its Bitmap Index structures during DML as I’ll cover soon in Part II.

Unique Bitmap Indexes Part I (Unnatural Selection) March 24, 2010

Posted by Richard Foote in Bitmap Indexes, Index Internals, Oracle Indexes, Unique Indexes.
17 comments

As I’ve discussed previously, a Bitmap index can be considered over a B-tree index (where concurrent DML is not an issue) even if there are potentially tens of millions of distinct values, in a table that has say hundreds of millions of rows.
 
However, if a column is unique or “approaches” uniqueness, then one has gone too far and the bitmap index is going to be larger and less efficient than an equivalent b-tree index. So you wouldn’t consider a bitmap index on a column with a million distinct values if the table itself only has in the vicinity of a million rows as well.
 
To understand why a column approaching uniqueness shouldn’t be considered as a bitmap index, one only needs to understand the structure and corresponding differences of index entries in both bitmap and b-tree indexes.
 
I’ll begin by creating a simple table and populating it with a million rows.


SQL> create table bowie (id number, name varchar2(20));
 
Table created.
 
SQL> insert into bowie select rownum, 'Ziggy Stardust' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.

   

Note that the ID column is unique. We can therefore create a Unique b-tree index:
 


SQL> create unique index bowie_unique_i on bowie(id) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_UNIQUE_I';

INDEX_NAME          BLEVEL LEAF_BLOCKS DISTINCT_KEYS
--------------- ---------- ----------- -------------
BOWIE_UNIQUE_I           2        1875       1000000

 

Note that the unique index has 1875 leaf blocks. If we dump a leaf block and look at some (say 5) of the index entries:
 

row#0[8025] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 00
col 0; len 2; (2):  c1 02
row#1[8014] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 01
col 0; len 2; (2):  c1 03
row#2[8003] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 02
col 0; len 2; (2):  c1 04
row#3[7992] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 03
col 0; len 2; (2):  c1 05
row#4[7981] flag: ------, lock: 0, len=11, data:(6):  02 00 6b 0a 00 04

 

We notice the length of these first 5 index entries are all 11 bytes (len=11).
 
An index entry from this Unique index basically consists of the indexed value (col 0) which is 2 bytes in length in the above sample plus the following overhead:
 
2 bytes for flags and locks
6 bytes for the rowid
1 byte for the index column length
 
So there’s a total of 9 bytes of overhead per index entry in this index in addition to the index value itself. Note also there’s an index entry for each and every indexed value. This is always the case for a non-compressed b-tree index.
 
If we now compare this with an equivalent  Non-Unique index on the same column:

 
 
SQL> drop index bowie_unique_i;
 
Index dropped.
 
SQL> create index bowie_nonunique_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_NONUNIQUE_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_NONUNIQUE_I          2        1999       1000000
 

 

We notice the index is now somewhat larger than the equivalent Unique index, with there now being 1999 leaf blocks, an increase of 124 leaf blocks. A block dump of a leaf block reveals the key difference:
 

 
row#0[8024] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
row#1[8012] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 01
row#2[8000] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 02
row#3[7988] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 03
row#4[7976] flag: ------, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 04

 

As I’ve discussed previously, Oracle makes the Non-Unique index effectively unique by adding the rowid as an additional indexed column within the index entry (col 1 is this additional index column comprising the rowid). There are therefore 2 columns in the index entry, not just the one (denoted by col 0 and col 1). This ensures all duplicate indexed values are subsequently sorted in rowid order within the index and can be efficiently accessed as necessary.
 
The consequence of this subtle difference is that an additional byte is now required to store the length of the rowid column and so the total overhead increases from 9 bytes to 10 bytes per index entry. The overall length of an index entry has therefore increased from 11 to 12 byes (len=12) and this results in the overall increase of 124 leaf blocks in the index, required to effectively store these additional 1 million bytes.
 
If we now create the index as an equivalent bitmap index:
 

  
 
SQL> drop index bowie_nonunique_i;
 
Index dropped.
 
SQL> create bitmap index bowie_bitmap_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> select index_name, blevel, leaf_blocks, distinct_keys from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME            BLEVEL LEAF_BLOCKS DISTINCT_KEYS
----------------- ---------- ----------- -------------
BOWIE_BITMAP_I             2        3124       1000000

  

We now notice the index has increased substantially from 1999 leaf blocks for the Non-Unique index to 3124 leaf blocks.
 
Again, a dump of an index leaf block highlights the reason for the increase:
 

 
row#0[8015] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  00
row#1[7994] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  01
row#2[7973] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  02
row#3[7952] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  03
row#4[7931] flag: ------, lock: 0, len=21
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  02 00 6b 0a 00 00
col 2; len 6; (6):  02 00 6b 0a 00 07
col 3; len 1; (1):  04
 

 

The index entry structure is now somewhat different. We now have an index that has not 1 column (as in the Unique index) or 2 columns (as in the Non-unique index) but 4 columns in the index entry. As before, we still have the index column value of 2 bytes but we now have the following overheads per index entry:
 
2 bytes for flags and locking (as before)
1 byte for the indexed column length (as before)
6 bytes for a rowid index column (col 1) stating the start of a range of rowids that are covered by the particular index entry
1 byte for the length of this start rowid index column
6 bytes for a rowid index column (col 2) stating the end of a range of rowids that are covered by the particular index entry
1 byte for the length of this end rowid index column
1 byte for the bitmap bit sequence column (col 3) required for all the bits referencing rows within the above rowid ranges
1 byte for the length of this bitmap column
 
So the total overhead for each of the 5 index entries listed above is now 19 bytes, not 9 or 10 bytes as for the equivalent b-tree indexes. The length of an index entry is therefore 21 bytes in total, not 11 or 12 bytes as for the equivalent b-tree indexes.

A few important points to note.
 
As the columns are effectively unique, the number of index entries are the same for both b-tree and bitmap indexes. A key advantage of a bitmap index over a b-tree index is that for each distinct value, a single index entry is sufficient to cater for a range of rowids, potentially covering the whole table. For example, a specific column value with 100 duplicates may only need the one index entry for the column value within a bitmap index, but would require 100 different index entries within a (non-compressed) b-tree. However, as the column in the above example is unique, there are no duplicate values and so this potential saving is not possible in this bitmap index.
 
Notice also the size of the bitmap string for each index entry is actually tiny, just a single byte, even though there are 1 million rows in the table. It doesn’t even look like it’s using a million bits to store the necessary bitmap string information for each index entry. This is because for each index entry, only one bit is ever set to 1 (“true”), all other occurrences are logically false as only 1 row in the table ever has the specific index value. Remember, the column values are effectively unique.

Therefore, Oracle can use a very narrow range of rowid ranges for each index entry and effectively not bother storing details for the vast majority of the possible rowid ranges within the table as there’s only one bit that’s of interest and it only corresponds to a specific part of the table. Even in cases where there might just be a duplicate here or there, most values would be zeros (false) regardless and can be compressed extremely efficiently (topic for another day).

Although many folks commonly think otherwise (see original Burleson article for a perfect example of the following misperception), if a column which is unique or is approaching uniqueness is indexed via a bitmap index, the overheads associated with the bitmap string in the index entry is usually very minimal as by definition most bit values are logically “false” (0), with only the rare “true” (1) bit value needing to be stored and reference.

The issue is not necessarily with the overheads associated with just the bitmap string per se but also with the other overhead components, namely the additional rowid and column length bytes.
 
In short, the bitmap index can’t be any more efficient that use just 1 byte to store the necessary bitmap string information (plus 1 byte for the bitmap string length), however 19 bytes of overhead is the minimum required, mainly because of the requirement to store 2 rowids instead of 1 rowid and for all the additional index column length bytes. If the bitmap index needs to cater for a wider range of rowids and for more occurrences of 1s (true) values, then the overheads associated with the bitmap sequence can of course be much more considerable than the 1 byte (again, a topic for another day).
 
Therefore, the bitmap index is going to be significantly less efficient if the indexed values are unique or near unique as there’s all this additional overhead per index entry without the subsequent savings by not having to store separate index entries for duplicates column values. There needs to be at least some measure of duplication within a column for a bitmap index to have some chance of being the more efficient when compared to an equivalent b-tree index.
 
However, how many duplicate values within a column are actually necessary for a bitmap index to be considered and be viable alternative ? The answer is far fewer than many may think (again see original Burleson article for a common misunderstanding in this respect), although this question will be addressed in a future post on the subject.

Index Block Dumps and Index Tree Dumps Part I: (Knock On Wood) February 8, 2010

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes, Tree Dumps.
15 comments

I thought before I jump into a topic that requires looking at a number of index block dumps, it might be worth briefly recapping how one goes about dumping index blocks in Oracle.
 
A block dump is simply a formatted representation of the contents of a particular Oracle database block.  Although I’ll be focusing specifically on index related blocks, any Oracle data block type can potentially be dumped and investigated.
 
The basic command to dump a specific block is:
 
ALTER SYSTEM DUMP DATAFILE 5 BLOCK 42;
 
where the block 42 in datafile 5 is dumped.
 
To dump a number of consecutive blocks with the one command you can also:
 
ALTER SYSTEM DUMP DATAFILE 5 BLOCK MIN 42 BLOCK MAX 50;
 
The resultant representation of the dumped block(s) are written to a trace file in the user_dump_dest directory.
 
Although these commands are not in the official Oracle documentation (the last time I had a real good look, it was only briefly mentioned in the Database Vault Administration Guide) and are not officially supported, there are enough references in Metalink/MOS and various writings for these commands to be widely known and used. I’ve been dumping the contents of Oracle blocks since the mid 1990′s and although they can sometimes take some time to decipher, I find them a vital source of information on determining how Oracle actually works under the covers.
 
From an index perspective, the question is how can one figure which specific blocks to dump for a given index. There are a couple of useful little tips. 

The first thing to point out with an index is that the critical Root Block of an index is always the block after the index segment header. This is always the case regardless of the database version, platform or type of tablespace. I’ve discussed how the index root block is always the block after the index segment header in this earlier post:
 
Therefore, to start exploring a specific index, we first find the root block details after the index segment header:

 
SQL> SELECT header_file, header_block+1 FROM dba_segments WHERE segment_name='BOWIE_I';
 
HEADER_FILE HEADER_BLOCK+1
----------- --------------
          7         219274

 
The following command will then dump the associated index root block:

 
SQL> ALTER SYSTEM DUMP DATAFILE 7 BLOCK 219274;
 
System altered.

As we’ll see in a later post, from the dump of the index root block, we can then find what other index blocks the root block points to and references.
 
However, another useful method of determining which index blocks might be worth dumping is to do a “treedump” of an index.
 
One first needs to find the object_id of the index in question:


 
SQL> SELECT object_id FROM dba_objects WHERE object_name = 'BOWIE_I';
 
 OBJECT_ID
----------
    106315

 
To then do a treedump of the index:


 
SQL> ALTER SESSION SET EVENTS 'immediate trace name treedump level 106315';
 
Session altered.

 
where level 106315 is the object_id of the index.
 

A partial listing from a treedump follows:
 
branch: 0x1c3588a 29579402 (0: nrow: 222, level: 1)
   leaf: 0x1c3588b 29579403 (-1: nrow: 485 rrow: 485)
   leaf: 0x1c3588c 29579404 (0: nrow: 479 rrow: 479)
   leaf: 0x1c3588d 29579405 (1: nrow: 479 rrow: 479)
   leaf: 0x1c3588e 29579406 (2: nrow: 479 rrow: 479)
   leaf: 0x1c3588f 29579407 (3: nrow: 479 rrow: 479)

A treedump simply lists each index block in the logical order of the index structure. Starting always with the index root block at the top, we notice that it’s simply listed as a branch (albeit a rather important one). The characters after the branch keyword represent a hex (0x1c3588a) and decimal (29579402) version of the Relative Block Address (RBA), which is used by Oracle to find the actual physical location of the block.  As there’s only ever the one root block, it starts from position 0, the nrow: 222 denotes the root block points to 222 distinct index blocks in the level below it and level 1 denotes this is a level 1 index (height 2) so the blocks below the root block must all be leaf blocks (there are no intermediate branch levels in this case).

The first leaf block listed is the first block being referenced within the parent root block and must therefore be the  “left-most” leaf block in the index structure. It has a RBA of hex (0x1c3588b) decimal(29579403), the -1 denotes it’s the first leaf block (as the counter starts at -1), the nrow: 485 denotes the leaf block has 485 index entries and the rrow: 485 denotes that 485 of the index entries are non-deleted entries (meaning there are no deleted index entries in this specific leaf block).

The next leaf block in the treedump corresponds to the second block (number 0) referenced in the root block and is the second left-hand most leaf block in the index structure, followed by its specific details. The third leaf block (number 1) in the treedump is the third leaf block in the index structure and so on for all 222 leaf blocks in the index (the last leaf block numbered 220).

The RBA of any of these blocks in the treedump can be then used to determine which block of interest to block dump. The DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE and DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK functions can be used to covert the RBA into the corresponding DATAFILE ID and BLOCK ID in which to dump the block.

For example, to determine the DATAFILE and BLOCK of the third leaf block in the index:


SQL> SELECT DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(29579405),
  2         DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(29579405)
  3  FROM dual;

DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(29579405)
----------------------------------------------
DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(29579405)
-----------------------------------------------
                                             7
                                         219277

So now we can confirm the index block of interest here is specifically located at datafile 7, block 219277.

OK, we now have the necessary basics to start block dumping a few index blocks and having a look at what they might show us.

Indexes And Small Tables Part V (It’s No Game) May 13, 2009

Posted by Richard Foote in Index Internals, Oracle Indexes, Small Indexes, Unique Indexes.
12 comments

So far in our little example, we’ve looked at how accessing a row of a one block table via a FTS required 4 consistent gets while accessing this same table via a Non-unique index reduced the consistent gets down to 3.

Time to take the next step and improve the efficiency yet further of accessing this small one block table.

We’re now going to replace the Non-unique index with a Unique Index instead. We can obviously do this because the values on the indexed ID column are indeed unique.

Now it’s always a good idea of course to document these table business rules (such as a column being unique) inside the database, however it’s somewhat alarming just how many application just don’t this. I’ve also previously discussed how a PK or Unique constraint can actually be policed via a Non-Unique index so there are many reasons why a small table might not have an associated Unique index.

Not least of course the incorrect perception that an index is not going to be much use on a small table anyways …

So let’s now replace the Non-Unique index with a Unique index instead:

SQL> drop index small_id_i;

Index dropped.

SQL> alter table small add primary key (id) using index (create unique index small_id_i on small(id));

Table altered.

 
OK, so now we have our Unique index in place. Let’s now run the same query again to see how our consistent gets related statistics might change:

 
SQL> select * from small where id = 42;

 

        ID NAME
---------- -----
        42 BOWIE

--------------------------------------------
|Id |Operation                  |Name      |
--------------------------------------------
|  1|TABLE ACCESS BY INDEX ROWID|SMALL     |
|* 2| INDEX UNIQUE SCAN         |SMALL_ID_I|
--------------------------------------------

Statistics
--------------------------------------------
  0  recursive calls
  0  db block gets
  2  consistent gets
  0  physical reads
  0  redo size
465  bytes sent via SQL*Net to client
396  bytes received via SQL*Net from client
  2  SQL*Net roundtrips to/from client
  0  sorts (memory)
  0  sorts (disk)
  1  rows processed

 

We notice our first significant change. The number of consistent gets has reduced further down to just 2.

Why ?

Because with a Unique index, there can only be a maximum of 1 row returned. It’s simply not possible to return 2 or more rows.

Therefore, when selecting this one row, Oracle doesn’t have to perform the second fetch operation to confirm there are indeed no more rows to return. The first fetch will either return the one row of interest or none at all when resolving an equality predicate. That’s it, there are no other possibilities. We return the row of interest by simply accessing the index block (1 consistent get) followed by the table block (the second consistent get).

As we have previously, if we look at the actual consistent gets statistics of interest by running the following query in another session before/after the select statement:

SQL> SELECT n.name, s.value FROM v$sesstat s, v$statname n WHERE s.statistic# =n.statistic# AND s.sid = 141 AND n.name LIKE ‘consistent%’;

NAME                           VALUE
------------------------------ -----
consistent gets                31236
consistent gets - examination   5084

 
And again afterwards ..

 
SQL> SELECT n.name, s.value FROM v$sesstat s, v$statname n WHERE s.statistic# =n.statistic# AND s.sid = 141 AND n.name LIKE ‘consistent%’;

NAME                           VALUE
------------------------------ -----
consistent gets                31238 (+2)
consistent gets - examination   5086 (+2)

We notice we have indeed only performed the 2 consistent gets. But we also notice another significant difference, that being both consistent gets are now the “cheaper” consistent gets – examination.

This means that the latches required to now perform this select statement via the Unique index is just 2, down from 6 for the Non-unique index and 8 from the FTS.

Generally during a consistent get, Oracle needs to grab the cache buffers chain latch so it can pin the specific block in memory and then grab the latch again so that it can subsequently unpin the block once it’s finished processing the block. Each of these accesses to the latch and the subsequently pin/unpinning of the block requires CPU and is a possible source of contention within the database.

For some operations that only require a very very quick “read and get out of there” type operation and/or on blocks that are unlikely to change within a given point of time, Oracle uses a cheaper consistent get operation which doesn’t actually require the block to be pinned. There’s no point in pinning the block as it’s only going to be read and accessed for a short time (shorter than might otherwise be required when processing a block in memory) and the block is unlikely to change anyways.

So for these operations, Oracle uses a cheaper consistent get called a consistent gets – examination. These consistent gets examinations only need grab the cache buffers chain latch before quickly reading the block and releasing the latch once the read operation is complete. Therefore it only needs to grab and release the cache buffer chains latch the once without having to pin/unpin the block, which means less CPU and less latch contention overall.

Now this isn’t particularly well documented. Often discussions mention reads of undo blocks as being candidates for consistent gets examinations as these reads are likely to be both relatively quick and a specific undo block is unlikely to change as only one transaction can actually update an undo block at a given time.

Getting back to indexes, reads of index root blocks are another candidate mentioned as again a read of an index root block is going to be very quick and an index root block is unlikely to change at a given point of time.

However, what is not well documented at all is the fact that any block accessed during an index Unique Scan is accessed via a consistent get – examination, including the consistent get associated with reading the table block as well. This is because again, any such read operation is going to be relatively quick as the most that ever needs to be read is the one index related entry and the one table row.

The net result is that now accessing a row from a small table via a Unique index requires only 2 latch accesses vs. the initial FTS example which required 8 latch gets as none of the FTS consistent gets are examinations.

Now you might say that these are all very small numbers, that 4 consistent reads isn’t that much, that 8 latches isn’t really that huge a number and reducing 8 latches down to 2 latches isn’t really going to be that noticeable. Yes it is effectively a 75% reduction but it’s a 75% reduction of not very much.

And if you’re discussing a single read of a single small lookup table you would likely be right.

But what if the small table is accessed frequently by the application, perhaps many 1,000s of times per minute. What if you have many such small tables, often used in small join operations by your OLTP applications. What if you have large numbers of users in a large application with many many such small table accesses. This effectively 75% saving can potentially become very significant, both in terms of the reduction in CPU load and also in the reduction of latch contention, which in turn can further reduce CPU loads.

A small improvement multiplied by a large amount can indeed make a difference …

However, I have one more step to go yet in further improving the efficiency of these small table lookups via an index.

One which can reduce the overall overheads by yet another 50% …

Follow

Get every new post delivered to your Inbox.

Join 1,703 other followers