jump to navigation

When Does A ROWID Change? Part I (“Fearless”) December 7, 2022

Posted by Richard Foote in Autonomous Database, Block Dumps, Changing ROWID, Oracle, Oracle Blog, Oracle Cloud, Oracle General, Oracle Indexes, Oracle Table Internals, Oracle19c, Pink Floyd, Richard's Blog, ROWID.
7 comments

Recently, my mate Connor McDonald caused a tad of a storm when he disclosed that the once sacred, (almost) unchangeable ROWID can now indeed potentially easily change, without the DBA doing a thing.

You can watch his excellent video on the subject here.

As the humble ROWID is a critical component of any index, I thought it worthwhile to have a deep dive discussion on when a ROWID can and can’t change and some of the key changes that have been introduced within Oracle’s Autonomous database environments.

If you’re a developer who explicitly uses the ROWID in your applications, you might want to pay extra attention to these changes.

I thought I’ll begin first though by discussing how the ROWID currently doesn’t generally change…

The ROWID is basically just a pointer stored in indexes that effectively points to the physical location of a row within a table that’s associated with the specific indexed key. It consists of the Data Object ID (if it’s a Global Index associated to a Partitioned Table, as the Relative File ID is no longer a unique value and so needs this to determine the appropriate tablespace), a Relative File ID, a Data Block ID within the Relative File and the Row Location ID within the Data Block.

Now although it has never been a completely risk free approach to manually store and directly use these ROWIDs to fetch a row from a table, it’s a technique that has been frequently used by developers for 3 very good reasons.

The first reason is that it’s one of the most efficient ways to fetch a required row, because the database can go directly to the physical location and directly access the required row, without having to even read a single index block.

The second reason is because it has always been fully supported by Oracle to do so, with the required syntax well documented. Indeed, Oracle APEX includes the base functionality to store and access rows directly via their associated ROWIDs.

The third reason is because it’s generally well understood that the ROWID doesn’t change, except for a very few (generally) well known scenarios, and so using the ROWID to access data within an application is viewed as being reasonable safe.

When a row is inserted into a table, each corresponding index on the table (generally) has a new index entry also inserted, including its associated ROWID. Now, if the row never moves from its current physical location, there is no need to ever worry about the ROWID subsequently changing.

So the big question is, when can a row physically move AND the associated ROWID change?

Logically, a scenario that springs to mind is when a row is updated and made bigger and there’s no longer room within the current data block to store the larger row. Does the row move to another block with sufficient free space and result in the ROWID to change?

To check out this scenario, I’ll create and populate a basic table, with PCTFREE set to 0, so once the blocks within the table are filled, there is precious little space for rows to subsequently grow:

SQL> create table bowie (id number, name varchar2(142)) pctfree 0 enable row movement;

Table created.

SQL> insert into bowie select rownum, 'BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_id_i on bowie(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

 

For good measure, I’ve also included the ENABLE ROW MOVEMENT clause, which is usually associated with Partitioned Tables (as I’ll discuss in Part II in this series).

Let’s have a look at the ROWIDs of a few random rows:

SQL> select id, rowid from bowie where id in (42, 424, 4242) order by id;

ID         ROWID
---------- ------------------
42         AAASe4AAMAAAACHAAp
424        AAASe4AAMAAAACHAGn
4242       AAASe4AAMAAAACNAHV

If we look at a partial block dump of one of the table blocks:

tab 0, row 0, @0x1312
tl: 12 fb: --H-FL-- lb: 0x1 cc: 2
col 0: [ 2] c1 02
col 1: [ 5] 42 4f 57 49 45
tab 0, row 1, @0x131e
tl: 12 fb: --H-FL-- lb: 0x1 cc: 2
col 0: [ 2] c1 03
col 1: [ 5] 42 4f 57 49 45
tab 0, row 2, @0x132a
tl: 12 fb: --H-FL-- lb: 0x1 cc: 2
col 0: [ 2] c1 04
col 1: [ 5] 42 4f 57 49 45
tab 0, row 3, @0x1336
tl: 12 fb: --H-FL-- lb: 0x1 cc: 2
col 0: [ 2] c1 05
col 1: [ 5] 42 4f 57 49 45
tab 0, row 4, @0x1342
tl: 12 fb: --H-FL-- lb: 0x1 cc: 2
col 0: [ 2] c1 06
col 1: [ 5] 42 4f 57 49 45

The above partial block dump shows the first 5 rows within the block, with the contents of the 2 table columns listed (in hex format).

If we access a row via an index:

SQL> select * from bowie where id=42;

ID         NAME
---------- --------------------------------------------------------------
42         THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS

Execution Plan
--------------------------------------------------------------------------------------------------
|  Id | Operation                            | Name       | Rows | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |    1 |    66 |       2 (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |    1 |    66 |       2 (0)| 00:00:01 |
| * 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |    1 |       |       1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          4 consistent gets
          0 physical reads
          0 redo size
        702 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

We can see in this example that Oracle requires 4 consistent gets, 3 of which are accesses to the index.

If we access this table via its ROWID:

SQL> select * from bowie where rowid='AAASe4AAMAAAACHAAp';

ID         NAME
---------- --------------------------------------------------------------
42         THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS

Execution Plan
------------------------------------------------------------------------------------
| Id | Operation                   | Name  | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |       |    1 |    10 |       1 (0) | 00:00:01 |
|  1 |  TABLE ACCESS BY USER ROWID | BOWIE |    1 |    10 |       1 (0) | 00:00:01 |
------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          1 consistent gets
          0 physical reads
          0 redo size
        698 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

We can see that this only requires just the 1 consistent get (vs. 4 when using the index) to access the row.

So we can see the appeal of using the ROWID to access a row.

If we now update the rows within the table and make them substantially larger so they can no longer fit within the currently filled blocks:

SQL> update bowie set name='THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS';

10000 rows updated.

SQL> commit;

Commit complete.

And now look again at the ROWIDs of these selected rows:

SQL> select id, rowid from bowie where id in (42, 424, 4242) order by id;

ID         ROWID
---------- ------------------
42         AAASe4AAMAAAACHAAp
424        AAASe4AAMAAAACHAGn
4242       AAASe4AAMAAAACNAHV

We notice that the ROWIDs all remain the same.

This has always historically been the behaviour here. If we update a row and the updated row can no longer fit within the current block, the row “migrates” to another table block with sufficient free space. BUT the associated ROWIDs do NOT change and the associated indexes are NOT updated.

Rather, the row data within the updated table is replaced with a “pointer”, that points to the new physical location of the migrated row. The advantage here being that Oracle only has to update the table with this new pointer, rather having to update the associated ROWIDs of all the (possibly many) associated indexes (noting that such an update would actually result in a delete followed by a re-insert of each index entry).

The disadvantage of course is that to now access this migrated row via an index requires an extra hop, to first read the initial table block and then to follow the pointer and access the actual block that now contains the row. Note if this row is forced to be migrated again because it grows again and can’t be housed in the current block, this pointer in the initial block is updated to reflect the newer location, so at least there is only ever the one extra hop.

If we look at a new partial block dump of the previously accessed block:

tab 0, row 0, @0x1f8f
tl: 9 fb: --H----- lb: 0x2 cc: 0
nrid: 0x030000bf.4b
tab 0, row 1, @0x1f86
tl: 9 fb: --H----- lb: 0x2 cc: 0
nrid: 0x030000bf.4c
tab 0, row 2, @0x1f7d
tl: 9 fb: --H----- lb: 0x2 cc: 0
nrid: 0x030000bf.4d
tab 0, row 3, @0x1f74
tl: 9 fb: --H----- lb: 0x2 cc: 0
nrid: 0x030000bf.4e
tab 0, row 4, @0x1f6b
tl: 9 fb: --H----- lb: 0x2 cc: 0
nrid: 0x030000b8.0

We notice that the rows with their 2 columns have been replaced with a logical nrid pointer (consisting of a relative block address and row location within the block), that effectively points to the new physical location of the row.

Note we can still use the same, unchanged ROWID to access the same table rows:

SQL> select * from bowie where rowid='AAASe4AAMAAAACHAAp';

ID         NAME
---------- --------------------------------------------------------------
42         THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS

Execution Plan
------------------------------------------------------------------------------------
| Id | Operation                   | Name  | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |       |    1 |    10 |       1 (0) | 00:00:01 |
|  1 |  TABLE ACCESS BY USER ROWID | BOWIE |    1 |    10 |       1 (0) | 00:00:01 |
------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
          2 consistent gets
          0 physical reads
          0 redo size
        698 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

This ROWID access still works fine, except it has now increased to 2 consistent gets, one to access the initial block referenced by the ROWID and the extra consistent get to follow the pointer and access the new physical location of the row.

So historically, we haven’t had to worry about updates changing the ROWID of a row (except perhaps at looking at reducing the number of these migrated rows).

Well, except for one clear example as I’ll discuss in Part II

Enable Index To Search For NULLs By Adding Constant to Index List. But Some Constants Better Than Others (Never Let Me Down) November 15, 2018

Posted by Richard Foote in Block Dumps, Index Internals, Indexing NULLs, Leaf Blocks, Oracle Indexes.
9 comments

Never Let Me Down

By default, Oracle doesn’t index an entry if all columns within the index are NULL. However, (as I’ve blogged previously), it’s possible to index all possible NULL values by simply adding a constant value to the index column list. Importantly, the CBO knows when a column has all it’s NULL values indexed and can potentially use the index accordingly.

However, the point of the article is to simple highlight that some constant values are better to use in this scenario than others…

A simple example to illustrate. First, create a table with both the CODE and GRADE columns nullable:

SQL> create table bowie (id number, code number, grade number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,100), mod(rownum,1000), 'DAVID
BOWIE' from dual connect by level 1000000;

999999 rows created.

SQL> insert into bowie values (1000000, null, null, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

If we create an index on both CODE and GRADE columns:

SQL> create index bowie_code_grade_i on bowie(code, grade);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE');

PL/SQL procedure successfully completed.

And then run a query looking for any CODE with a NULL value:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       |    1 |    24 |    1115 (3) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE |    1 |    24 |    1115 (3) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE" IS NULL)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
4030 consistent gets
   0 physical reads
   0 redo size
 784 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

We notice the CBO performs a Full Table Scan even though the CBO knows there’s likely only one row that meets the criteria. Not matter what we do, hint the query, beg, whatever, it’s impossible for the CBO to use the index because the null row is simply not indexed.

Now we come to the rub of the post.

A common recommendation is to simply add a constant to the column list. A constant is always present and associated NULL values are indexed if another index column has a corresponding Non-NULL value. Adding a constant value to the index column list guarantees all NULL values for all index columns must always be present within the index. The CBO recognises this and can therefore potentially use the index to fetch the required NULL values.

However, a common recommendation is also to use a number as the constant. There was a recent tweet I saw a few days ago that had the following example of using the number 1 as the constant value:

SQL> create index bowie_code_grade_i_2 on bowie (code, grade, 1);

Index created.

When we run the query again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

The index is indeed used to return the row with the NULL value of interest and only 4 consistent gets are performed.

So what’s the problem?

Nothing, except that perhaps a better constant might have been used, such as say a single space:

SQL> create index bowie_code_grade_i_3 on bowie(code, grade, ' ');

Index created.

If we run the query yet again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We get the exact same performance. So, what’s the point?

Well, if we look at the size of the corresponding indexes:

SQL> select index_name, leaf_blocks from user_indexes where table_name='BOWIE';

INDEX_NAME                LEAF_BLOCKS
------------------------- -----------
BOWIE_CODE_GRADE_I               2490
BOWIE_CODE_GRADE_I_2             2908
BOWIE_CODE_GRADE_I_3             2769

We notice the original index has the smallest size as expected, as it doesn’t have to index the constant value. But then we notice that the index with the constant value as the number is somewhat larger than the index with the constant value as a space.

Why?

An index block dump of both indexes will highlight why:

First a partial leaf block dump of index with the “1” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 399
kdxcofbo 834=0x342
kdxcofeo 1652=0x674
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29444101=0x1c14805
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8020] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8004] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7988] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 0d 00 af

Next, a partial leaf block dump of index with the space ” ” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 422
kdxcofbo 880=0x370
kdxcofeo 1706=0x6aa
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 29447173=0x1c15405
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8006] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7991] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 0d 00 af

We notice that the number requires 2 bytes, while the space only requires 1 byte.

So storing the constant as a single byte character, rather a 2 byte number is a free saving, which adds up with each and every index entry, by simply being a little more fastidious when selecting which constant value to use in this scenario.

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part III” (Ricochet) October 25, 2018

Posted by Richard Foote in Block Dumps, Global Indexes, Local Indexes, Oracle Indexes, Partitioned Indexes, Partitioning.
1 comment so far

ricochet

In Part I and Part II of this series, we looked at how Global Indexes can effectively perform “Partition Pruning” when the partition keys are specified in SQL predicates, by only using those index entries that have a Data Object of interest stored within the index Rowids.

In this piece, I’ll cover the key performance advantage that Global Indexes have over Local Indexes and why I generally recommended Global Indexes from a purely performance perspective.

First, a quick recap of how the Global Index performed. Following is the performance of a query where the table partitioned key is specified in the query:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42 and
      release_date between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 1081241859

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                    | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                         |    1 |    25 |      13 (0) | 00:00:01 |        |       |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE               |    1 |    25 |      13 (0) | 00:00:01 |      7 |     7 |
|* 2 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |        |       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RELEASE_DATE"=TO_DATE('2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

As discussed previously, at just 5 consistent gets, it’s very efficient as only the table blocks that reside in possible partitions of interest are only accessed.

The following query selects all TOTAL_SALES values of interest, with no partition key predicate:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42;

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
    400041         42         42 28-JAN-12          42
   1800041         42         42 28-JAN-12          42
    800041         42         42 03-MAR-13          42
   1200041         42         42 07-APR-14          42
   1600041         42         42 12-MAY-15          42
    200041         42         42 12-MAY-15          42
    600041         42         42 15-JUN-16          42
   1000041         42         42 20-JUL-17          42
        41         42         42 24-AUG-18          42
   1400041         42         42 24-AUG-18          42

10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1761527485

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                    | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                         |   10 |   250 |      13 (0) | 00:00:01 |        |       |
|  1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE               |   10 |   250 |      13 (0) | 00:00:01 |  ROWID | ROWID |
|* 2 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |        |       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
  14 consistent gets
   0 physical reads
   0 redo size
1184 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

So the index is scanned (4 consistent gets) and 10 consistent gets for the 10 rows accessed (as the clustering here is poor) for a total of 14 consistent gets.

Let’s now compare this to an equivalent Local Index.

SQL> create index big_bowie_total_sales_local_i
on big_bowie(total_sales) local invisible;

Index created.

SQL> alter index big_bowie_total_sales_i invisible;

Index altered.

SQL> alter index big_bowie_total_sales_local_i visible;

Index altered.

If we compare the size characteristics between the two indexes we notice a couple of important differences:

SQL> select index_name, blevel, leaf_blocks from dba_indexes where table_name='BIG_BOWIE';

INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
BIG_BOWIE_TOTAL_SALES_I                 2        5585
BIG_BOWIE_TOTAL_SALES_LOCAL_I           1        4444

SQL> select index_name, partition_name, blevel, leaf_blocks
from dba_ind_partitions where index_name='BIG_BOWIE_TOTAL_SALES_LOCAL_I';

INDEX_NAME                     PARTITION_NAME           BLEVEL LEAF_BLOCKS
------------------------------ -------------------- ---------- -----------
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2011                   1         525
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2012                   1         581
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2013                   1         579
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2014                   1         579
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2015                   1         579
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2016                   1         581
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2017                   1         580
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2018                   1         440

8 rows selected.

The first difference is that the corresponding Local index segments have a reduced BLEVEL (just 1) when compared to the Global Index (value of 2). A reduction in BLEVEL is quite possible as instead of one “big” index segment, we now have 8 “smaller” index segments.

However, if we look at the overall size of both indexes, we notice that the Local Index (at 4444 leaf blocks) is somewhat smaller than the Global Index (5585 leaf blocks). This is due to the Rowids of Local Indexes not having to be the extended Global Index 10 byte version (which contains the 4 byte Data Object Id), but the standard 6 byte version. Local Indexes can only reference the one table partition and so it’s unnecessary to store the corresponding Data Object Id within the Rowid.

A partial block dump of a Local Index leaf block:
Leaf block dump
===============
header address 924483684=0x371a8064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 483
kdxcofbo 1002=0x3ea
kdxcofeo 1823=0x71f
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 29412237=0x1c0cb8d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 47
col 1; len 6; (6): 01 c0 20 7b 00 a6
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 47
col 1; len 6; (6): 01 c0 22 3a 00 00
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 48
col 1; len 6; (6): 01 c0 20 7b 00 a7

Shows that the Rowids are only 6 bytes.

If we re-run the query that references the partition key in a SQL predicate:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42 and
      release_date between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 3499166408

--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name                          | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                          |                               |    1 |    25 |       2 (0) | 00:00:01 |        |       |
|  1 | PARTITION RANGE SINGLE                    |                               |    1 |    25 |       2 (0) | 00:00:01 |      7 |     7 |
|* 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE                     |    1 |    25 |       2 (0) | 00:00:01 |      7 |     7 |
|* 3 | INDEX RANGE SCAN                          | BIG_BOWIE_TOTAL_SALES_LOCAL_I |    1 |       |       1 (0) | 00:00:01 |      7 |     7 |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("RELEASE_DATE"<=TO_DATE(' 2017-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We notice that this is slightly more efficient with only 4 consistent gets, when previously the Global Index required 5 consistent gets. This is directly due to the reduction in the BLEVEL.

So this is a good thing, especially if this query is frequently executed.

If we now run the query without the partition key SQL predicate:

SQL> SELECT * FROM big_bowie WHERE total_sales = 42;

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
    400041         42         42 28-JAN-12          42
   1800041         42         42 28-JAN-12          42
    800041         42         42 03-MAR-13          42
   1200041         42         42 07-APR-14          42
   1600041         42         42 12-MAY-15          42
    200041         42         42 12-MAY-15          42
    600041         42         42 15-JUN-16          42
   1000041         42         42 20-JUL-17          42
        41         42         42 24-AUG-18          42
   1400041         42         42 24-AUG-18          42

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3527547124

--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name                          | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                          |                               |   10 |   250 |      15 (0) | 00:00:01 |        |       |
|  1 | PARTITION RANGE ALL                       |                               |   10 |   250 |      15 (0) | 00:00:01 |      1 |     8 |
|  2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE                     |   10 |   250 |      15 (0) | 00:00:01 |      1 |     8 |
|* 3 | INDEX RANGE SCAN                          | BIG_BOWIE_TOTAL_SALES_LOCAL_I |   10 |       |       9 (0) | 00:00:01 |      1 |     8 |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
  27 consistent gets
   0 physical reads
   0 redo size
1088 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

We notice that consistent gets have increased more significantly, up to 27 consistent gets when it was previously 14 consistent gets.

This is because instead of accessing the one Global Index structure, we are now forced to access all 8 Local index structures, as the required TOTAL_SALES value could potentially be found in any of the table partitions. So that’s a minimum of at least 2 consistent gets per Local Index (with an index of BLEVEL 1) that has to accessed even if there are actually no corresponding rows of interest in the particular table partition.

Imagine if this table had a 1000+ table partitions, you can easily see how the cost of using such Local Indexes can quickly become excessive.

So Local Indexes can be very problematic if the partition key is NOT referenced in the SQL or if the range of possible table partitions is excessive. The advantage of a Non-Partitioned index is that there is only the one index structure that need be accessed, regardless of the number of table partitions.

So what if you want to protect yourself from the possible ramifications of the table partition key not being referenced in SQL predicates, but you want to take advantage of the performance benefits of smaller index structures that might have a reduced index BLEVEL?

That’s the topic of Part IV in this series 🙂

Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out) June 4, 2018

Posted by Richard Foote in Block Dumps, Branch Blocks, Index Branches, Index Column Order, Index Compression, Index Internals, Oracle Indexes.
15 comments

day in day out bowie

I recently replied on Twitter to some comments regarding an excellent blog post by Franck Pachot – Covering indexes in Oracle, and branch size, where I disagreed somewhat with one of the conclusions stated in the post:

ensure that selective columns appear as early as possible (without compromising the index access efficiency of course) in order to lower the bytes required to address branches and leaves“.

Based on the Twitter discussion, the post was updated on 14 April 2018 with an additional clarification that putting the most selective indexed column first is a “common misconception“.

I’ve written a number of times about index column order, including this post that’s now some 10 years old – “It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?“. The summary being that it generally makes no appreciable difference to the performance of an index in which order you position the columns in an index, if all index columns are referenced equality type SQL predicates. I thought it might be worth revisiting this topic, with a new example that discusses why I specifically disagree with the notion of putting the most selective columns first, despite the possible impact on Index Branches.

I’ll begin with a simple table that has 2 columns of interest, the ID which is effectively unique and the CODE column which is “relatively” large in size but only has 5 distinct values:

SQL> CREATE TABLE ziggy AS
SELECT rownum id, 'SOME LARGE OFTEN REPEATED VALUE ' || mod(rownum,5) code, 'ZIGGY' name
FROM dual CONNECT BY LEVEL <= 2000000;

Table created.

I'll next create a concatenated index based on both the ID and CODE columns, with the highly selective ID column leading:

SQL> create index ziggy_id_code_i ON ziggy(id, code);

Index created.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14135         23      176612   113264736  101146313

So we notice the index has a Height of 3, with a total of 23 Index Branch blocks. There are a total of 14,135 leaf blocks.

If we look at a partial block dump of a Branch block:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 21
kdxcofbo 70=0x46
kdxcofeo 7840=0x1ea0
kdxcoavs 7770
kdxbrlmc 29440826=0x1c13b3a
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8050] dba: 29441507=0x1c13de3
col 0; len 4; (4): c3 0a 45 4e
col 1; TERM
row#1[8040] dba: 29442190=0x1c1408e
col 0; len 4; (4): c3 14 1b 58
col 1; TERM
row#2[8030] dba: 29442871=0x1c14337
col 0; len 4; (4): c3 1d 55 62
col 1; TERM

We can see that each entry in the Index Branch only contains the leading ID column. That’s because the column is so selective that it provides all the necessary data to determine the exact Leaf Block location of any given indexed value. The following columns (CODE and ROWID) do not provide any additional useful information and would be redundant if stored. Therefore each Index Branch entry is shown with a TERM value, meaning that subsequent indexed values are not stored within the Index Branch.

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=4242 AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(("ID"=4 OR "ID"=42 OR "ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242)
AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We note for now the number of consistent gets (5 and 19) for each of these queries.

If we now create another index, but this time with the columns the other way around and so with the very unselective CODE column leading:

SQL> create index ziggy_code_id_i on ziggy(code,id);

Index created.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14125         83      656341   113666656  101626042

So the number of Index Branch blocks has increased from 23 to 83 compared to the other index (although the number of Leaf Blocks are almost the same). Note that at 83, the percentage of branch blocks to leaf blocks is still tiny, just 0.06%.

The reason for the greater number of Index Branches can be seen with a partial index block dump of an Index Branch:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 81
kdxcofbo 190=0xbe
kdxcofeo 4458=0x116a
kdxcoavs 4268
kdxbrlmc 29440318=0x1c1393e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8016] dba: 29440496=0x1c139f0
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 0d 3d 38
col 2; TERM
row#1[7972] dba: 29440676=0x1c13aa4
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 1a 0c 51
col 2; TERM
row#2[7928] dba: 29440854=0x1c13b56
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 26 40 06
col 2; TERM

With the larger CODE column now leading, the column must therefore be stored within the Branch Block. However, as this column is so unselective with just 5 distinct values (notice how the same col 0 CODE value is repeated for each of the displayed branch entries), it’s not sufficient on its own to ensure the navigation down to the first leaf block containing the required index entry. Therefore, the next column (the highly selective col 1 ID column) is also necessary as part of each branch entry.

The branch entry with both the CODE and ID columns has ranges sufficiently selective enough to ensure any indexed value can be found within leaf blocks. Therefore the third column (the Rowid) is not required and is marked with the TERM value in the block dump.

So on the surface, it looks as if this index is not as efficient as there are indeed more Index Branches within the index. However, during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case), having more branches as in this example makes no appreciable difference to the efficiency of the index.

If we run the same queries as we did when using the previous index:

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND "ID"=4242)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND ("ID"=4 OR "ID"=42 OR
"ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242))

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We notice the number of consistent gets remains exactly the same, with the additional branch blocks making no appreciable difference to the performance of the index.

So the column order, providing all index columns are referenced with equality type SQL predicates, makes no real difference to the performance of the index. In both cases, there are enough columns referenced in the branch blocks to always point down to the first index leaf block that contains the first index entry of interest.

In Part II, we’ll see how having the unselective column as the leading column of the index can actually make an appreciable positive difference to the index.

Empty Leaf Blocks After Rollback Part II (Editions of You) June 24, 2015

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes, Tree Dumps, Unique Indexes, Update Indexes.
5 comments

In my last post, I discussed how both 1/2 empty and totally empty leaf blocks can be generated by rolling back a bulk update operation.

An important point I made within the comments of the previous post is that almost the exact scenario would have taken place had the transaction committed rather than rolled back. A commit would also have resulted with the leaf blocks being 1/2 empty in the first example (with the previous index entries now all marked as deleted) and with effectively empty leaf blocks in the second example (with the previous leaf blocks all now containing index entries marked as deleted). The important aspect here is not the rollback but the fact that update statements result in the deletion of the previous indexed value and the re-insertion of the new value. (BTW, it’s always a useful exercise to read through the comments on this blog as this is often where some of the best learning takes place due to some of the really nice discussions) 🙂

That said, the previous post used a Non-Unique index. Let’s now repeat the same scenario but this time use a Unique Index instead.

So let’s start with another table with the same data but this time with a unique index on the ID column:

SQL> create table ziggy (id number, name varchar2(42));

Table created.

SQL> insert into ziggy select rownum, 'DAVID BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index ziggy_id_i on ziggy(id);

Index created.

 

OK, let’s have a look at a tree dump of this index:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 20, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
—– end tree dump

Now we notice a bit of a difference already. Here, the index consists of 20 leaf blocks with 513 index entries in most leaf blocks whereas the non-unique index had 21 leaf blocks and just 479 index entries per leaf block. One of the advantages of unique indexes over non-unique as I’ve discussed previously.

Let’s now perform our first bulk update where I increment the ID of each value by 1:

SQL> update ziggy set id=id+1;

10000 rows updated.

SQL> rollback;

Rollback complete.

Now with the non-unique index, this resulted in the index doubling in size as we created an additional index entry for each and every row. After the rollback, we were effectively left with an index that not only was twice the size but had only 1/2 empty leaf blocks.

With a unique index though, things differ. The most important characteristic of a unique index of course is that each index value can only ever exist once, each index entry must be unique. So for a unique index, the rowid is not actually part of the indexed column list, but treated as additional “overhead” or metadata associated with the index entry.

When we perform our update here, we’re effectively replicating each value, except for the very last ID value where 10001 doesn’t exist. But with the first row, when the ID=1 becomes 2 after the update, we already have an index entry with an ID value of 2 (the second row). So Oracle can mark the first index entry as deleted (as ID=1 no longer exists) but rather than insert a new index entry simply update the rowid associated with the unique index entry with the ID of 2. Oracle then updates the rowid of the index entry with a value of 3 with the rowid of that previously referenced ID=2 . And so on and so on for all the other index entries except for index value 100001 which has to be inserted as it didn’t previously exist. So Oracle nicely maintains the consistency of the index during the single update operation by effectively recycling the existing index entries.

The net result is that the index remains the same size as the index entries are not reinserted as they are for a non-unique index. The effective change that occurs during this update is that the first index entry is marked as deleted and one new index entry is added at the very end.

If we look at a partial block dump of the first leaf block before the rollback operation:

Leaf block dump
===============
header address 375991908=0x16692e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 520
kdxcofbo 1076=0x434
kdxcofeo 1900=0x76c
kdxcoavs 824
kdxlespl 0
kdxlende 1
kdxlenxt 25166205=0x180017d
kdxleprv 0=0x0
kdxledsz 6
kdxlebksz 8036
row#0[8025] flag: —D—, lock: 2, len=11, data:(6):  01 80 01 57 00 00
col 0; len 2; (2):  c1 02
row#1[8014] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 00
col 0; len 2; (2):  c1 03
row#2[8003] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 01
col 0; len 2; (2):  c1 04
row#3[7992] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 02
col 0; len 2; (2):  c1 05
row#4[7981] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 03
col 0; len 2; (2):  c1 06
row#5[7970] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 04
col 0; len 2; (2):  c1 07

We notice that the first index entry is marked as deleted (as we now no longer have an ID=1) but all the other index entries have been “recycled” with their updated rowids. Note how the rowid of the deleted index entry (01 80 01 57 00 00) is now associated with the second index entry (which is effectively now the first index entry now).

If we look at a tree dump after the rollback was performed:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 20, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
—– end tree dump

We notice that it’s exactly the same size as before and we don’t have the same issues with a bloated index as we did in the previous non-unique index example.

However, if we perform the second update which effectively changes all the ID values to those which don’t currently exist within the table:

SQL> update ziggy set id=id+10000;

10000 rows updated.

SQL> rollback;

Rollback complete.

Now Oracle can’t recycle the existing index entries as the new values don’t currently exist within the index. So Oracle is indeed forced to mark all the existing index entries as deleted and insert new index entries into the index. These new index entries all exist in the right hand most side of the index, resulting in 90-10 block splits with additional index leaf blocks being added to the index. If we rollback this transaction, it will result in all the new index entries being removed, leaving behind these new empty leaf blocks just as with the non-unique index example.

A new tree dump will confirm this:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 47, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
leaf: 0x18004d9 25167065 (19: row:0.0 avs:8000)
leaf: 0x18004da 25167066 (20: row:0.0 avs:8000)
leaf: 0x18004df 25167071 (21: row:0.0 avs:8000)
leaf: 0x18004dd 25167069 (22: row:0.0 avs:8000)
leaf: 0x18004de 25167070 (23: row:0.0 avs:8000)
leaf: 0x18004db 25167067 (24: row:0.0 avs:8000)
leaf: 0x18004dc 25167068 (25: row:0.0 avs:8000)
leaf: 0x18004e5 25167077 (26: row:0.0 avs:8000)
leaf: 0x18004e6 25167078 (27: row:0.0 avs:8000)
leaf: 0x18004e7 25167079 (28: row:0.0 avs:8000)
leaf: 0x18004e4 25167076 (29: row:0.0 avs:8000)
leaf: 0x18004ed 25167085 (30: row:0.0 avs:8000)
leaf: 0x18004ee 25167086 (31: row:0.0 avs:8000)
leaf: 0x18004ef 25167087 (32: row:0.0 avs:8000)
leaf: 0x18004e1 25167073 (33: row:0.0 avs:8000)
leaf: 0x18004e2 25167074 (34: row:0.0 avs:8000)
leaf: 0x18004e3 25167075 (35: row:0.0 avs:8000)
leaf: 0x18004e9 25167081 (36: row:0.0 avs:8000)
leaf: 0x18004ea 25167082 (37: row:0.0 avs:8000)
leaf: 0x18004eb 25167083 (38: row:0.0 avs:8000)
leaf: 0x18004ec 25167084 (39: row:0.0 avs:8000)
leaf: 0x18004f5 25167093 (40: row:0.0 avs:8000)
leaf: 0x18004f6 25167094 (41: row:0.0 avs:8000)
leaf: 0x18004f7 25167095 (42: row:0.0 avs:8000)
leaf: 0x18004f1 25167089 (43: row:0.0 avs:8000)
leaf: 0x18004e8 25167080 (44: row:0.0 avs:8000)
leaf: 0x18004f2 25167090 (45: row:0.0 avs:8000)
—– end tree dump

 

The index has indeed bloated in size as a result of the update. Note that the index would be the same size had the transaction committed, except that the leaf blocks that currently contain data would effectively be empty and contain nothing but deleted index entries while the empty leaf blocks would all contain the new indexed values.

So depending on the update operation, a unique index can potentially reuse existing index entries if the new column values existed previously in other rows. If not, then the usual delete/insert mechanism applies.

Indexing and Transparent Data Encryption Part II (Hide Away) May 20, 2015

Posted by Richard Foote in Block Dumps, Oracle Indexes, TDE.
add a comment

In Part I, I quickly ran through how to setup an encrypted tablespace using Transparent Data Encryption and to take care creating indexes outside of these tablespaces.

Another method of encrypting data in the Oracle database is to just encrypt selected columns. Although the advantage here is that we can just encrypt sensitive columns of interest (and that the data remains encrypted within the buffer cache), this method has a number of major restrictions, especially in relation to indexing.

To first set the scene, I’ll start by creating and populating an unencrypted table:

SQL> create table bowie (id number, code number, salary number, text varchar2(30)) tablespace unsafe;

Table created.

SQL> insert into bowie select rownum, 42, rownum, 'BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

I’ll now create an index on the (effectively unique) ID column:

SQL> create index bowie_id_i on bowie(id);

Index created.

We can effectively use this index on some basic range based predicates:

SQL> select * from bowie where id between 42 and 43;

ID       CODE     SALARY TEXT
---------- ---------- ---------- ------------------------------
42         42         42 BOWIE
43         42         43 BOWIE
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |     2 |   112 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |     2 |   112 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |     2 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

SQL> select * from bowie where id > 10000000000;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1405654398

--------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name       | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |     1 |    56 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE      |     1 |    56 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_ID_I |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

So all is well with our index here and is used as expected.

If we perform a block dump and have a look at the first table block containing rows (following are excerpts from the block dump):

Block header dump:  0x020001dc
Object id on Block? Y
seg/obj: 0x16cda  csc: 0x00.2b3abc  itc: 2  flg: E  typ: 1 – DATA
brn: 0  bdba: 0x20001d8 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.006.000005dd  0x014004e3.00bb.17  –U-  330  fsc 0x0000.002b3aca
0x02   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
bdba: 0x020001dc
data_block_dump,data header at 0x1c72e64
===============
tsiz: 0x1f98
hsiz: 0x2a6
pbl: 0x01c72e64
76543210
flag=——–
ntab=1
nrow=330
frre=-1
fsbo=0x2a6
fseo=0x5d6
avsp=0x330
tosp=0x330
0xe:pti[0] nrow=330 offs=0

We notice the block contains 330 rows. Note this number as we’ll check back on it later.

If we look further on in the dump to the section listing some of the rows within the block:

block_row_dump:
tab 0, row 0, @0x181a
tl: 20 fb: –H-FL– lb: 0x1  cc: 4
col  0: [ 3]  c2 07 47
col  1: [ 2]  c1 2b
col  2: [ 3]  c2 07 47
col  3: [ 5]  42 4f 57 49 45
tab 0, row 1, @0x182e
tl: 20 fb: –H-FL– lb: 0x1  cc: 4
col  0: [ 3]  c2 07 48
col  1: [ 2]  c1 2b
col  2: [ 3]  c2 07 48
col  3: [ 5]  42 4f 57 49 45
tab 0, row 2, @0x1842
tl: 20 fb: –H-FL– lb: 0x1  cc: 4
col  0: [ 3]  c2 07 49
col  1: [ 2]  c1 2b
col  2: [ 3]  c2 07 49
col  3: [ 5]  42 4f 57 49 45
tab 0, row 3, @0x1856

…..

We can see the 4 columns of each row and note that the length of the ID and CODE columns are 3 and 2 bytes respectively. We can also see that the hex values of the CODE column (col 1) are all the same: c1 2b (as they all have a value of 42).

OK, time to encrypt some columns. I’ll re-create the table but this time encrypt both the ID and CODE columns using all the default settings:

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number encrypt, code number encrypt, salary number, text varchar2(30)) tablespace unsafe;

Table created.

SQL> insert into bowie select rownum, 42, rownum, 'BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> desc bowie
Name                                      Null?    Type
----------------------------------------- -------- ----------------------------

ID                                                 NUMBER ENCRYPT
CODE                                               NUMBER ENCRYPT
SALARY                                             NUMBER
TEXT                                               VARCHAR2(30)

 

If we look at a dump of this table we notice a number of key differences:

 

Block header dump:  0x02000464
Object id on Block? Y
seg/obj: 0x16cef  csc: 0x00.2e6e1f  itc: 2  flg: E  typ: 1 – DATA
brn: 0  bdba: 0x2000460 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0004.001.00000671  0x01409e62.00ec.19  –U-   60  fsc 0x0000.002e6e7b
0x02   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
bdba: 0x02000464
data_block_dump,data header at 0x1d82e64
===============
tsiz: 0x1f98
hsiz: 0x8a
pbl: 0x01d82e64
76543210
flag=——–
ntab=1
nrow=60
frre=-1
fsbo=0x8a
fseo=0x3dc
avsp=0x352
tosp=0x352
0xe:pti[0] nrow=60 offs=0

….

 

The first difference is that where previously we had 330 rows, the block now only contains 60 rows of data.

If we look further on to the rows themselves:

tab 0, row 0, @0x1b72
tl: 118 fb: –H-FL– lb: 0x1  cc: 4
col  0: [52]
bd 33 16 9a aa 6a 68 cf f7 a2 99 6c 3d b0 0c 1f 11 3d 42 cb 76 20 52 cc 18
8a 96 fd 0e 59 41 f2 a0 0f da 96 37 40 9b 28 dd 93 82 29 74 32 f1 53 c5 27
e8 e1
col  1: [52]
57 0c 63 c8 4f a9 42 f7 61 c3 63 b2 1b 0e f9 bc 74 5b 74 46 87 08 26 23 6b
c9 ae 52 ca 0e 31 9e ac 54 79 f7 2d f9 64 41 30 e5 6e 11 00 a3 55 d4 81 42
a4 19
col  2: [ 2]  c1 3e
col  3: [ 5]  42 4f 57 49 45
tab 0, row 1, @0x1be8
tl: 118 fb: –H-FL– lb: 0x1  cc: 4
col  0: [52]
c3 13 d5 f7 a8 1f 46 8b bd 2b a9 e0 25 8e b3 15 2d 52 77 f5 fa 8a 52 46 f6
1d 0d 0a 58 8f 68 51 ea 2f 8c 0e 56 9d 2b cf 6f 86 7c d4 a9 d9 2a 25 fd a1
6e 7d
col  1: [52]
49 19 8f 40 da 0e ad fb 7a 79 6d 50 61 c9 9e 33 90 3f 73 c9 64 0a f0 98 c6
2f 8c c1 3f 47 fd 78 55 be 5c e5 df 1f 94 ab c0 3b a2 e3 37 65 f4 8e 80 c0
f4 66
col  2: [ 2]  c1 3f
col  3: [ 5]  42 4f 57 49 45
tab 0, row 2, @0x1c5e
tl: 118 fb: –H-FL– lb: 0x1  cc: 4
col  0: [52]
1e 9d 90 85 70 e8 50 39 c9 64 9f 85 6f c6 e6 03 44 8c 59 73 9a 0e a6 cb 3c
ff 41 91 42 2a 36 f1 98 b7 54 61 24 1a 62 87 3f b9 21 5e d7 43 f6 39 14 7a
d0 34
col  1: [52]
e5 a0 4a 82 06 12 88 08 fb df db b7 00 19 60 cc d1 da 98 34 1d 24 44 64 79
04 48 29 7a 2c 5d 26 06 0d f3 5a 42 1c 34 59 65 14 85 53 e7 07 ac ee 11 73
82 5f
col  2: [ 2]  c1 40
col  3: [ 5]  42 4f 57 49 45

….

We see the reason why we have fewer rows per block is that the encrypted columns have significantly increased in size. Where previously they were just 3 and 2 bytes respectively, both the ID and CODE columns are now 52 bytes in length. The actual size would in part depend on the encryption algorithm used (some algorithms round to the next 8 bytes), in this example I used the default AES192.

With AES192, the length of the column is rounded to the next 16 bytes. However, if we simply encrypt a columns as is, it would mean a column value would be encrypted to the same value when using the same encryption key. This means a malicious person could potentially attempt to reverse engineer a value by inserting known column values and seeing if the generated encrypted values are the same as those in the table. To prevent this, Oracle by default adds “Salt”, which is basically a random string, to the column value being encrypted to make it now impossible to reverse engineer the inserted value. This adds another 16 bytes to the length of the column value. If we look at the second CODE column (col 1) in the block dump, we notice they all have different encrypted values even though they all have the same actual value of 42 within the table.

So that’s 32 bytes accounted for. The remaining 20 bytes is a result of TDE adding a Message Authentication Code (MAC) to each encrypted value for integrity checking purposes.

Clearly, having columns that increase so significantly due to encryption will also have an impact on any associated indexes as they will likewise not be able to contain as many entries per index block and hence be significantly larger.

However, the more pressing issue is that by adding salt to the encryption process, there is no easy deterministic way Oracle can associate an actual indexed value with the encrypted value when accessing and modifying the index. As a result, Oracle simply doesn’t allow an index to be created on any column that has been encrypted with salt.

SQL> create index bowie_id_i on bowie(id);
create index bowie_id_i on bowie(id)
*
ERROR at line 1:
ORA-28338: Column(s) cannot be both indexed and encrypted with salt

SQL> alter table bowie add primary key(id);
alter table bowie add primary key(id)
*
ERROR at line 1:
ORA-28338: Column(s) cannot be both indexed and encrypted with salt

If we want to encrypt a column and have the column indexed, we must encrypt the column without salt. Additionally, if you want to make the index more efficient without the overheads associated with MAC integrity checks, you may also want to encrypt the columns with the NOMAC option.

Let’s re-create the table without these encryption defaults:

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number encrypt 'nomac' no salt, code number encrypt 'nomac' no salt, salary number, text varchar2(30)) tablespace unsafe;

Table created.

SQL> insert into bowie select rownum, 42, rownum, 'BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

If we look at a block dump of this table:

Block header dump:  0x020000e4
Object id on Block? Y
seg/obj: 0x16cf0  csc: 0x00.2e75a9  itc: 2  flg: E  typ: 1 – DATA
brn: 0  bdba: 0x20000e0 ver: 0x01 opc: 0
inc: 0  exflg: 0

Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0005.00a.00000872  0x0140b74e.00f0.09  –U-  148  fsc 0x0000.002e75bb
0x02   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
bdba: 0x020000e4
data_block_dump,data header at 0x1dc2e64
===============
tsiz: 0x1f98
hsiz: 0x13a
pbl: 0x01dc2e64
76543210
flag=——–
ntab=1
nrow=148
frre=-1
fsbo=0x13a
fseo=0x46d
avsp=0x333
tosp=0x333
0xe:pti[0] nrow=148 offs=0

We notice the number of rows in the table has somewhat bounced back up to 148 rows.

If we look at some of the rows:

block_row_dump:
tab 0, row 0, @0x1b01
tl: 47 fb: –H-FL– lb: 0x1  cc: 4
col  0: [16]  de 59 69 48 5e 65 d8 21 da 49 bd ba 19 d5 1f 80
col  1: [16]  9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61
col  2: [ 3]  c2 02 34
col  3: [ 5]  42 4f 57 49 45
tab 0, row 1, @0x1b30
tl: 47 fb: –H-FL– lb: 0x1  cc: 4
col  0: [16]  9e 96 cc e1 57 07 74 61 64 19 99 4a ad 08 d9 93
col  1: [16]  9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61
col  2: [ 3]  c2 02 35
col  3: [ 5]  42 4f 57 49 45
tab 0, row 2, @0x1b5f
tl: 47 fb: –H-FL– lb: 0x1  cc: 4
col  0: [16]  f8 cf 58 ba f2 1b f7 a6 2d 59 1c c7 2c e4 28 86
col  1: [16]  9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61
col  2: [ 3]  c2 02 36
col  3: [ 5]  42 4f 57 49 45

 

We see the length of the encrypted columns has dropped back down to 16 bytes, still more than the unencrypted columns but less than the 52 bytes required for the encrypted columns with both salt and MAC enabled.

Note though that the CODE column (col 1) while encrypted all now have the same encrypted hex value (9e d8 3b 95 65 60 43 df 2c e2 b0 85 ae 5e 87 61) without the salt applied. So the encrypted data is that little bit less secure but we can now successfully create B-tree indexes on these encrypted columns:

SQL> create index bowie_id_i on bowie(id);

Index created.

 

This however doesn’t end the various restrictions associated with indexing column encrypted columns as we’ll see in the next post.

Index Compression Part VI: 12c Index Advanced Compression Block Dumps (Tumble and Twirl) October 9, 2014

Posted by Richard Foote in 12c, Advanced Index Compression, Block Dumps, Index Compression, Oracle Indexes.
5 comments

Sometimes, a few pictures (or in this case index block dumps) is better than a whole bunch of words 🙂

In my previous post, I introduced the new Advanced Index Compression feature, whereby Oracle automatically determines how to best compress an index. I showed a simple example of an indexed column that had sections of index entries that were basically unique (and so don’t benefit from compression) and other sections with index entries that had many duplicates (that do compress well). Advanced Index Compression enables Oracle to automatically just compress those index leaf blocks where compression is beneficial.

If we look at a couple of partial block dumps from this index, first a dump from a leaf block that did have duplicate index entries:

Leaf block dump
===============
header address 216542820=0xce82e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 651
kdxcofbo 1346=0x542
kdxcofeo 2172=0x87c
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 25166046=0x18000de
kdxleprv 25166044=0x18000dc
kdxledsz 0
kdxlebksz 8036
kdxlepnro 1
kdxlepnco 1 (Adaptive)
prefix row#0[8031] flag: -P—–, lock: 0, len=5
col 0; len 2; (2):  c1 2b
prc 651
row#0[8022] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5c
psno 0
row#1[8013] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5d
psno 0
row#2[8004] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5e
psno 0
row#3[7995] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 5f
psno 0
row#4[7986] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 86 00 60
psno 0

row#650[2172] flag: ——-, lock: 0, len=9
col 0; len 6; (6):  01 80 1e 8d 00 10
psno 0
—– end of leaf block Logical dump —–

The red section is a portion of the index header that determines the number of rows in the prefix table of the index (kdxlepnro 1). The prefix table basically lists all the distinct column values in the leaf blocks that are to be compressed. The value 1 denotes there is actually only just the 1 distinct column value in this specific leaf block (i.e. all index entries have the same indexed value). This section also denotes how many of the indexed columns are to be compressed (kdxlepnco 1). As this index only has the one column, it also has a value of 1. Note this value can potentially be anything between 0 (no columns compressed) up to the number of columns in the index.  The (Adaptive) reference tells us that Index Advanced Compression has been used and that the values here can change from leaf block to leaf block depending on the data characteristics of the index entries within each leaf block (a dump of a basic compressed index will not have the “Adaptive” reference).

The green section is the compression prefix table and details all the unique combinations of index entries to be compressed within the leaf block. As all indexed values are the same in this index (value 42, internally represented as c1 2b hex), the prefix table only has the one row. prc 651 denotes that all 651 index entries in this leaf block have this specific indexed value.

Next follows all the actual index entries, which now only consist of the rowid (the 6 byte col 0 column) as they all reference psno 0, which is the unique row id of the only row within the prefix table (row#0).

So rather than storing the indexed value 651 times, we can just store the index value (42) just the once within the prefix table and simply reference it from within the actual index entries. This is why index compression can save us storage, storing something once within a leaf block rather than multiple times.

If we now look at a partial block dump of another index leaf block within the index, that consists of many differing (basically unique) index entries:

Leaf block dump
===============
header address 216542820=0xce82e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 449
kdxcofbo 938=0x3aa
kdxcofeo 1754=0x6da
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 25168667=0x1800b1b
kdxleprv 25168665=0x1800b19
kdxledsz 0
kdxlebksz 8036
kdxlepnro 0
kdxlepnco 0 (Adaptive)
row#0[8022] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 3d 2c
col 1; len 6; (6):  01 80 12 e6 00 41
row#1[8008] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 3d 2d
col 1; len 6; (6):  01 80 12 e6 00 42
row#2[7994] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 3d 2e
col 1; len 6; (6):  01 80 12 e6 00 43


row#448[1754] flag: ——-, lock: 0, len=14
col 0; len 4; (4):  c3 58 41 5c
col 1; len 6; (6):  01 80 12 ee 00 1d
—– end of leaf block Logical dump —–

We notice that in the red section, both kdxlepnro 0 and kdxlepnco 0 (Adaptive) have a value of 0, meaning we have no rows and no columns within the prefix table. As such, we have no prefix table at all here and that this leaf block has simply not been compressed.

If we look at the actual index entries, they all have an additional column now in blue, that being the actual indexed value as all the index values in this leaf block are different from each other. Without some form of index entry duplication, there would be no benefit from compression and Index Advanced Compression has automatically determined this and not bothered to compress this leaf block. An attempt to compress this block would have actually increased the necessary overall storage for these index entries, due to the additional overheads associated with the prefix table (note it has an additional 2 byes of overhead per row within the prefix table).

I’ll next look at an example of a multi-column index and how Index Advanced Compression handles which columns in the index to compress.

Indexing Foreign Key Constraints With Invisible Indexes (Invisible People) April 22, 2014

Posted by Richard Foote in 12c, Block Dumps, Foreign Keys, Invisible Indexes, Oracle Indexes.
1 comment so far

In my previous post I discussed when deleting rows from parent tables, how Bitmap Indexes based on the FK constraint can prevent the expensive Full Tables Scans (FTS) on the child tables but not the associated exclusive table locks.

Last year, I discussed how it was possible in Oracle Database 12c to have multiple indexes on the same column list.

Quite some time ago, I discussed how so-called Invisible Indexes can indeed still be visible in various scenarios, including when policing FK constraints.

Well, lets put all these three topics together 🙂

First, let use the same basic setup as the last post:

SQL> create table bowie_dad (id number, dad_name varchar2(30)); 

Table created.

SQL> insert into bowie_dad values (1, 'DAVID BOWIE'); 

1 row created.

SQL> insert into bowie_dad values (2, 'ZIGGY STARDUST'); 

1 row created.

SQL> insert into bowie_dad values (3, 'MAJOR TOM');

1 row created.

SQL> insert into bowie_dad values (4, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

SQL> create table bowie_kid (id number, kid_name varchar2(30), dad_id number);

Table created.

SQL> insert into bowie_kid select rownum, 'ALADDIN SANE', mod(rownum,3)+2 from dual connect by level <=1000000; 

1000000 rows created.

SQL> commit;

Commit complete.

SQL> alter table bowie_dad add primary key(id);

Table altered.

SQL> alter table bowie_kid add constraint bowie_kid_fk foreign key(dad_id) references bowie_dad(id);

Table altered.

 

We’re now going to create two indexes concurrently on the FK constraint on the DAD_ID column, a Bitmap Index and an invisible B-Tree Index as is now possible since Oracle Database 12c:

SQL> create bitmap index bowie_kid_fk_i on bowie_kid(dad_id); 

Index created.

SQL> create index bowie_kid_fk2_i on bowie_kid(dad_id) invisible;

Index created.

Oracle Database 12c allows us to now create multiple indexes on the same column list, providing only one index is visible at a time.

Let’s look at a partial block dump of the first leaf block of each index. First the Bitmap Index:

Block header dump:  0x0180805c
 Object id on Block? Y
 seg/obj: 0x16f45  csc: 0x00.36bc54  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1808058 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0036bc54
Leaf block dump
===============
header address 32801380=0x1f48264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 959=0x3bf
kdxcoavs 919
kdxlespl 0
kdxlende 0
kdxlenxt 25198685=0x180805d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——-, lock: 0, len=3537
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 00
col 2; len 6; (6):  01 80 80 2c 00 3f
col 3; len 3516; (3516):
 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cb 92 24 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24
 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49
 92 24 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49

 

Note the indexed value is c1 03, denoting the lowest DAD_ID=2 currently in the table.

Now the  partial block dump of the invisible B-Tree Index:

 

Block header dump:  0x0181b724
 Object id on Block? Y
 seg/obj: 0x16f46  csc: 0x00.36bc78  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x181b720 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0036bc78
Leaf block dump
===============
header address 32801380=0x1f48264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25278245=0x181b725
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 01
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 04
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 07

Again as expected the first index entry is C1 03.

With only a visible Bitmap Index in place, does that mean we’ll have table locking issues if we delete a parent row with current transactions in place ? Let’s check it out.

In one session, we have a current transaction on the child table:

SQL> insert into bowie_kid values (1000001, 'LOW', 4); 

1 row created.

 

In another session, we attempt to delete a parent row (with an ID = 1 which doesn’t currently exist with the child table):

SQL> delete bowie_dad where id = 1; 

1 row deleted.

We note the DML was successful and didn’t hang. This means the B-Tree index is clearly being used to police this constraint, even though it’s currently invisible.

In a third session, we now attempt to insert a child row using a FK value that’s in the process of being deleted:

SQL> insert into bowie_kid values (1000003, 'HEROES', 1);

 

As expected, it hangs as it’s currently effectively waiting on the row level lock made possible by the index entry in the B-Tree index as invisible indexes are still maintained behind the scenes. If we look at a fresh block dump of both indexes, beginning with the Bitmap Index:

Block header dump:  0x0180805c
 Object id on Block? Y
 seg/obj: 0x16f45  csc: 0x00.36bc54  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1808058 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0036bc54
Leaf block dump
===============
header address 402948708=0x18048264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 959=0x3bf
kdxcoavs 919
kdxlespl 0
kdxlende 0
kdxlenxt 25198685=0x180805d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——-, lock: 0, len=3537
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 00
col 2; len 6; (6):  01 80 80 2c 00 3f
col 3; len 3516; (3516):
 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cb 92 24 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24
 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49
 92 24 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49

 

We note the Bitmap Index has not been updated. It still lists the C1 03 value as the minimum indexed value.

However, if we look at the invisible B-Tree index:

Block header dump:  0x0181b724
 Object id on Block? Y
 seg/obj: 0x16f46  csc: 0x00.36bc78  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x181b720 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0x0008.015.00000b86  0x014316ab.01c5.42  —-    1  fsc 0x0000.00000000
Leaf block dump
===============
header address 402948708=0x18048264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 514
kdxcofbo 1064=0x428
kdxcofeo 1868=0x74c
kdxcoavs 804
kdxlespl 0
kdxlende 0
kdxlenxt 25278245=0x181b725
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1868] flag: ——-, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 81 b6 f3 00 00
row#1[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 01
row#2[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 7f d3 00 04
row#3[8000] flag: ——-, lock: 0, len=12

It has been updated and lists a new index entry C1 02 as the minimum value now in the index.

So the B-Tree index can be used to successfully police the FK index and prevent the possible table level locking issues associated with deleting parent rows, even though it’s invisible and there is an equivalent visible Bitmap index in place. Invisible indexes are simply not considered as viable execution paths by the Cost Based Optimizer, but may still be “visible” in a variety of scenarios such as quietly policing constraints behind the scenes.

Do I recommend creating two such indexes in Oracle Database 12c. Well, no as the costs of maintaining both indexes need to be considered. But I certainly do caution simply making indexes invisible and expecting the database to behave in exactly the same manner if the index were to be subsequently dropped.

Because rolling back all the above and then dropping the invisible index:

SQL> drop index bowie_kid_fk2_i;

Index dropped.

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

 

Means in another session the parent delete operation will now hang without the B-Tree index being in place:

SQL> delete bowie_dad where id = 1;

 

 

 

 

 

 

Indexing Foreign Key Constraints With Bitmap Indexes (Locked Out) April 17, 2014

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Foreign Keys, Index Internals, Oracle Indexes.
6 comments

Franck Pachot made a very valid comment in my previous entry on Indexing Foreign Keys (FK) that the use of a Bitmap Index on the FK columns does not avoid the table locks associated with deleting rows from the parent table. Thought I might discuss why this is the case and why only a B-Tree index does the trick.

Let’s first setup some very simple Parent-Child tables:

SQL> create table bowie_dad (id number, dad_name varchar2(30));

Table created.

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> insert into bowie_dad values (2, 'ZIGGY STARDUST');

1 row created.

SQL> insert into bowie_dad values (3, 'MAJOR TOM');

1 row created.

SQL> insert into bowie_dad values (4, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

SQL> create table bowie_kid (id number, kid_name varchar2(30), dad_id number);

Table created.

SQL> insert into bowie_kid select rownum, 'ALADDIN SANE', mod(rownum,3)+2 from dual connect by level >=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> alter table bowie_dad add primary key(id);

Table altered.

SQL> alter table bowie_kid add constraint bowie_kid_fk foreign key(dad_id) references bowie_dad(id);

Table altered.

OK, so we have a small parent table (BOWIE_DAD) and a much larger child table (BOWIE_KID) with all the necessary constraints in place. Note we don’t actually have a child row with a  FK DAD_ID = 1. So we can potentially delete this row from the BOWIE_DAD table (where ID = 1).

Let’s begin by creating a B-Tree  index on the FK column (DAD_ID) and have a look a partial block dump of the first leaf block in the index:

SQL> create index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0035f861
Leaf block dump
===============
header address 360809060=0x15818264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06
…..

 

We’ll compare future block dumps with this one but for now just note that the first index entry has a value of (hex) C1 03, which corresponds to the minimum value for DAD_ID = 2 we currently have in this table/index.

If we insert a new child record in one session (but not yet commit);

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we can delete (but not yet commit) the unwanted parent row without any locking implications thanks to this index on the FK column:

SQL> delete bowie_dad where id = 1;

1 row deleted.

In a third session, we can insert another child record again with no locking implications, providing we don’t attempt to use the parent value the second session is in the process of deleting:

SQL> insert into bowie_kid values (1000002, 'LOW', 3);

1 row created.

But if we do try to insert a new child row with a FK value for which the parent is in the process of being deleted:

SQL> insert into bowie_kid values (1000003, 'HEROES', 1);

The statement hangs and it will do so until the transaction deleting the parent record commits (in which case it will receive an ORA-02291 integrity constraint error) or the transaction rolls back (in which case the insert will succeed).

If we take a fresh dump of the first leaf block (which must contain the associated index entry as it’s the minimum value now in the table):

 Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0x0008.004.00000b8a  0x01431602.01c5.14  —-    1  fsc 0x0000.00000000
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 514
kdxcofbo 1064=0x428
kdxcofeo 1868=0x74c
kdxcoavs 804
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1868] flag: ——-, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 80 7f 38 00 00
row#1[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#2[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03

 

We notice we indeed do have a new index entry (highlighted above), with all the associated locking information in ITL slot 2 for the new row in which the session is locked. So the key point here is that the index is indeed updated and Oracle can proceed or not depending on what happens with the transaction on the parent table. The overhead of this new index entry is minimal and locking can be easily policed and restricted to just the index entries with this specific value (hex) C1 02 which corresponds to DAD_ID = 1.

If we do indeed proceed with the delete on the parent table:

SQL> commit;

Commit complete.

 

The session attempting to insert the now deleted parent FK value indeed fails:

 

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);
insert into bowie_kid values (1000002, 'HEROES', 1)
*
ERROR at line 1:
ORA-02291: integrity constraint (BOWIE.BOWIE_KID_FK) violated - parent key not
found

 

And we notice with a fresh block dump that the index entry has been removed by the now unlocked session:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f0b  csc: 0x00.35f861  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.0035f861
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 03
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 06

Everything is back to the way it was previously.

 

OK, let’s now re-insert the parent row, drop the FK index and replace it with a Bitmap Index instead:

 

SQL> insert into bowie_dad values (1, 'DAVID BOWIE');

1 row created.

SQL> commit;

Commit complete.

SQL> drop index bowie_kid_fk_i;

Index dropped.

SQL> create bitmap index bowie_kid_fk_i on bowie_kid(dad_id);

Index created.

 

If we take a look at a partial block dump of the first leaf block of this Bitmap Index:

 

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0x00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.003602fc
Leaf block dump
===============
header address 360809060=0x15818264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

….

 

We notice the first key difference here in that these Bitmap Index entries are potentially HUGE, with just the 2 index entries in this block. The other thing to note is the combination of Bitmap indexes and DMLs can result in locking hell because if an index entry needs to be modified (resulting in a change in the compressed bitmap string), all rows between the rowid ranges specified within the Bitmap Index entry are effectively locked. So Bitmap Indexes introduce severe locking issues, regardless of the Parent/Child update issue highlighted above.

If we insert a child row in one session:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

And in another session insert another row with the same FK value:

SQL> insert into bowie_kid values (1000002, 'HEROES', 4);

The session hangs until the transaction in the first session completes because of the locking implications introduced with the Bitmap Index.

 

Therefore, with a Bitmap Index in place, the last of our worries will be locking issues associated with deleting a parent row. After rolling back the above, we attempt the following. In one session, we insert a child record:

SQL> insert into bowie_kid values (1000001, 'LOW', 4);

1 row created.

In a second session, we delete the unwanted parent row:

SQL> delete bowie_dad where id = 1;

and it hangs. The Bitmap Index is not effective in preventing this lock as it was with the B-Tree Index.

In a third session, we attempt to insert a child row with the soon to be deleted parent key:

SQL> insert into bowie_kid values (1000002, 'HEROES', 1);

and it hangs as well. So the Bitmap Index on the FK does not prevent the locking hell such parent deletes can introduce into our environments.

If we roll all this back and simply have one session delete a parent row:

SQL> delete bowie_dad where id = 1;

1 row deleted.

And in another session insert a child row with the FK about to be deleted, the insert hangs as expected with an exclusive transaction lock:

SQL> insert into bowie_kid values (1000001, 'BOWIE', 1);

 

However, if we look at a fresh partial block dump of the first Bitmap Index leaf block:

Block header dump:  0x01806efc
 Object id on Block? Y
 seg/obj: 0x16f14  csc: 0x00.3602fc  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1806ef8 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  —-    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C—    0  scn 0x0000.003602fc
Leaf block dump
===============
header address 225280612=0xd6d8264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 958=0x3be
kdxcoavs 918
kdxlespl 0
kdxlende 0
kdxlenxt 25194237=0x1806efd
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4498] flag: ——-, lock: 0, len=3538
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 80 52 73 00 00
col 2; len 6; (6):  01 80 6e cc 00 3f
col 3; len 3517; (3517):
 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32 92 24 49 92 24 49
 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24 49 cf 92 24 49 92
 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cb 92 24
 49 92 ff 33 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24
 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cc 92 24 49 92 24 ff 32 24 49 92
 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24
 49 92 24 49 92 24 49 cb 92 24 49 92 ff 33 92 24 49 92 24 49 92 24 cf 49 92
 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc
 49 92 24 49 02 ff 32 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24 49 cf
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24 01 ff 32
 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24 49 92 24
 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24 49
 92 24 49 cb 92 24 49 92 ff 33 49 92 24 49 92 24 49 92 cf 24 49 92 24 49 92
 24 49 cf 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cc 24 49 92 24
 01 ff 32 92 24 49 92 24 49 92 24 cf 49 92 24 49 92 24 49 92 cf 24 49 92 24
 49 92 24 49 cf 92 24 49 92 24 49 92 24 cc 49 92 24 49 02 ff 32 24 49 92 24

…..

 

Unlike the B-Tree index which was updated, the Bitmap index has remained unchanged. No attempt was made by Oracle at this stage to insert the index entry as such a new Bitmap Index entry would likely generate too much overheads and not appreciably reduce the locking implications of these DML statements with these Bitmap Indexes in place anyways. The actual index update is delayed until such as change is possible with the rollback of the parent deletion.

However, in a third session, an insert into the child table with a FK that’s not to be deleted is successful:

SQL> insert into bowie_kid values (1000002, 'BOWIE', 4);

1 row created.

Bitmap indexes are simply not designed with concurrency in mind and have efficiencies that make it easier for single sessions to load data in Data Warehouses environments where they are indeed suitable.

One advantage of the Bitmap index is that at least Oracle doesn’t have to perform a FTS on the (potentially huge) child table when checking for the existence of any associated child FK values. Oracle can quickly use the index to determine whether the parent delete can proceed or not. If we roll everything back and just attempt to delete a parent row:

SQL> delete bowie_dad where id = 1;

1 row deleted.

       
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
          3  consistent gets
          0  physical reads
        676  redo size
        862  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice at just 3 consistent gets, the potentially expensive FTS on the child table has been avoided. Drop the Bitmap index and the FTS must be performed to ensure no current FK values would violate the constraint when the parent row is deleted:

SQL> drop index bowie_kid_fk_i;

Index dropped.

   
SQL> delete bowie_dad where id = 1;

1 row deleted.

    
Execution Plan
----------------------------------------------------------
Plan hash value: 2571176721

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | DELETE STATEMENT   |              |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  DELETE            | BOWIE_DAD    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| SYS_C0010356 |     1 |    13 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access('ID'=1)

    
Statistics
----------------------------------------------------------
          7  recursive calls
          8  db block gets
       3629  consistent gets
          0  physical reads
        676  redo size
        863  bytes sent via SQL*Net to client
        830  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

We notice without the Bitmap Index in place, we are now performing many more (3629) consistent gets due to the necessary FTS.

So using a Bitmap Index to police a FK constraint doesn’t reduce the locking implications associated with deleting parent rows (with Bitmap indexes, we have locking hell regardless if there’s much DML) but it does at least reduce the overheads of checking the associated child table.

IOT Secondary Indexes: Primary Key Considerations (Beauty And The Beast) April 11, 2012

Posted by Richard Foote in Block Dumps, Index Organized Tables, IOT, Oracle Indexes, Primary Key, Secondary Indexes.
7 comments

As discussed previously, one of the nice features of an IOT Secondary Index is that it contains the mandatory Primary Key of the IOT, which is always maintained and can be used to access the necessary rows of the IOT regardless of  row movement within the IOT itself.

This can also be beneficial if only the PK columns of the table are required by the query (perhaps as part of a join) as a visit to the actual IOT table would be unnecessary.

However, one of the main disadvantages of an IOT Secondary Index is that it contains the PK of the IOT 🙂

Remember, one of the benefits of an IOT is that we don’t have to store columns twice as we would with a conventional Heap table, that being once within the table structure and again within the (often overloaded) PK  index. However, with an IOT Secondary index, we must store the PK columns again. In fact, we have to re-store the PK columns again for as many IOT Secondary indexes we have defined for the IOT.

So the overall additional overheads we’re talking about here becomes a product of two important factors. The actual overall size of the PK column(s) and the number of Secondary Indexes we have defined on the IOT. If the average size of the PK is large and/or we have a number of Secondary Indexes, then the overall overheads can be significant, reducing the benefits of the IOT.

If we look at the size of the IOT Secondary Index I created in my previous introductory post:

SQL> select leaf_blocks from dba_indexes where index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1728

If however we compare this with a secondary index associated with a conventional heap table containing identical data:

SQL> create table not_an_iot as select * from album_sales_IOT;

Table created.

SQL> create index not_an_iot_total_sales_i on not_an_iot(total_sales);

Index created.

SQL> select leaf_blocks from dba_indexes where index_name = 'NOT_AN_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1171

We notice that the IOT Secondary index is significantly larger, 1728 leaf blocks vs. 1171.

If we compare block dumps of the same index entry from both Secondary Indexes:

row#0[8016] flag: K—–, lock: 0, len=20
col 0; len 2; (2):  c1 06
col 1; len 3; (3):  c2 15 16
col 2; len 2; (2):  c1 5f
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 f1

Above is the IOT Secondary Index example, which is 20 bytes in length.

row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  01 01 68 7a 00 b4

Above is the Heap Table Secondary Index example, which is only 12 bytes in length.

The 8 bytes required for the table header and physical “guess” overheads within the IOT Secondary Index are almost cancelled out by the 7 bytes of overhead required for the ROWID column within the Heap Table Secondary index. However, most of the difference in length (20 bytes vs. 12 bytes) can be attributed to the 7 bytes required to store the PK columns and their associated length bytes in this example.

Obviously, the larger the PK, the greater the associated overheads. Obviously, the greater the number of IOT Secondary indexes, again the greater the overall associated overheads.

If we create a secondary index on a column that forms part of the PK, Oracle can be a lit bit cleverer. Following, we create an index on the COUNTRY_ID column, which is the second column of our PK (album_id, country_id):

SQL> create index album_sales_iot_country_id_i on album_sales_iot(country_id);

Index created.

We notice that for this new index, Oracle has eliminated “redundant” PK columns from the secondary index, as there’s no need to store the entire PK again as the indexed column itself already forms part of the PK:

SQL> select index_name, iot_redundant_pkey_elim from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     IOT_REDUNDANT_PKEY_ELIM
------------------------------ ------------------------
ALBUM_SALES_IOT_PK             NO
ALBUM_SALES_IOT_TOTAL_SALES_I  NO
ALBUM_SALES_IOT_COUNTRY_ID_I   YES

A quick look at a block dump of this secondary index will confirm that the PK portion of the index entry only contains the PK columns that are not included in the indexed column list (i.e. just the ALBUM_ID column):

row#0[8020] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#1[8004] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#2[7988] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5

So we have 3 index entries listed here. In each one:

col 0 represents  the indexed column (COUNTRY_ID) which happens to be part of the PK

col 1 is the remaining PK column yet to be defined in the index entry (ALBUM_ID)

col 0 (with a length of 4) represents the physical “guess”.

So Oracle still has defined within the index entry the full PK to access the IOT as necessary if the “guess” proves to be wrong.

The key points to take from this post is to fully consider the consequences of a large PK on any defined secondary index on an IOT and to fully consider the suitability of having the table defined as an IOT if you require many secondary indexes to be defined on the table.

More on this “guess” component in my next post …

Indexed Organized Tables – An Introduction to IOT Secondary Indexes (A Second Face) March 19, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Secondary Indexes.
14 comments

Man, its been ages since I had free time to update the blog, what with birthday parties to organise, Roger Water concerts to attend and Radiohead concerts in the planning !! OK, time to take an initial look at Secondary Indexes for Index Organized Tables (IOTs).

If the IOT needs to be accessed via the Primary Key (PK) column(s), then no problem, the IOT structure must have a PK defined and the logical structure of the IOT ensures that data within the IOT is ordered based on the PK. Therefore, the IOT can be navigated like any conventional PK and the necessary data can be efficiently accessed.

But what if we want to access the data efficiently via Non-PK columns or without specify the leading column of the PK ? Can we create secondary indexes on a IOT ?

When IOTs were first introduced way back in Oracle8, secondary indexes weren’t supported (they came later in 8i). That’s likely due to the fact Oracle had to resolve a tricky issue in relation to indexing an IOT structure, that being what to do when indexing rows that potentially move around all the time ?

With a conventional Heap table, once a row is inserted into the table, it doesn’t generally subsequently move. There are relatively few examples of when this occurs, for example updating the partitioned column of a row such that it needs to be stored in another partition. This is recognised as a rather expensive thing to do as not only do at least two blocks need to be accessed and modified but it also requires associated indexes to be updated as well. As such, it generally requires explicitly allowing such activities to occur (by enabling row movement and the such). Note, when rows migrate to another block due to an increase in row size, indexes are not impacted and still reference the original block and the remaining stub of the row which points to the new block/location of the row.

But with IOTs, the story can be very different. When a 50-50 index block split occurs, roughly half the rows in the leaf block move to a new block. A relatively expensive operation would be even more expensive if  Oracle had to also update the index entries of all secondary indexes that referenced all these moved rows. Although rare with Heap tables, rows moving to new locations could be relatively common in an IOT due to associated 50-50 block split operations.

To deal with the difficulties of frequently moving rows within an IOT, Oracle created the IOT Secondary Index structure. It has three main components:

  • The indexed column values
  • The PK columns of the associated IOT
  • A “guess” that points to the physical location of the rows within the IOT, initially at the time the index is created

So the IOT Secondary Index is used in the following fashion. During an index scan, Oracle attempts to use the “guess” to access the block that was the last known physical location of the  row within the IOT. If it finds the required row in the IOT, great. The index performs in a similar manner to using a rowid with a conventional secondary index. However, if the required row is nowhere to be seen within the referenced block, Oracle tries again, this time using the PK value contained with the IOT Secondary Index to perform a Unique Scan of the IOT. This is a little more expensive to perform as it requires navigating down the branch structures of the IOT, but is at least guaranteed to find the row this time in its current location.

So in the best case scenario, the index performs similar to that of a normal secondary index. In the worst case scenario where the row has moved, the index is forced to perform an additional Unique Scan of the IOT using the PK but at least this has the potential to be much more efficient that a Fast Full Scan of the IOT in order to find the necessary row.

The key point to note here is that the secondary index is  not updated when a block split on the parent IOT occurs. The “guess” via the physical pointer reference simply becomes stale and the PK which is also stored within the secondary index is used as a backup method of accessing the required row.

If we start with a traditionally simple little demo, let’s first create and populate an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sales number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> begin
  2  for i in 1..5000 loop
  3    for c in 1..100 loop
  4      insert into album_sales_iot values (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5    end loop;
  6  end loop;
  7  commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we now run a query based on the non-PK TOTAL_SALES column:

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1789589470

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     1 |    18 |   425   (1)| 00:00:06 |
|*  1 |  INDEX FAST FULL SCAN| ALBUM_SALES_IOT_PK |     1 |    18 |   425   (1)| 00:00:06 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1586  consistent gets
          0  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We see that Oracle has no real choice (the PK is of no direct use here) but to perform an expensive FAST FULL INDEX SCAN, even though it correctly knows relatively few rows are to be retrieved.

If we create a secondary index on the IOT however:

SQL> create index album_sales_IOT_total_sales_i on album_sales_iot(total_sales);

Index created.

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |     1 |    18 |4   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |     1 |    18 |4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |     1 |       |3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES"=2000)
   2 - access("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          5  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

 
We notice that the index is used as expected and the number of consistent gets has dropped significantly. Notice also that the IOT is accessed subsequently not via Index ROWIDs but by a INDEX UNIQUE SCAN via the IOT PK. More on this later …

If we look at a partial block dump of an index entry within the IOT Secondary index:

row#0[8015] flag: K—–, lock: 0, len=21
col 0; len 3; (3):  c2 1f 28
col 1; len 3; (3):  c2 15 37
col 2; len 2; (2):  c1 1b
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 da

col 0 represents the indexed value (TOTAL_SALES)

col 1 and col 2 represent the PK columns (ALBUM_ID and COUNTRY_ID)

Following the 3 byte table header overhead required for the “guess”, we have the second col 0, which represents the 4 byte  “guess” to the last known physical location of the row.

Much more to follow shortly …

Index Organized Tables – PCTTHRESHOLD (The Wedding Song) February 8, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, PCTTHRESHOLD.
7 comments

I’ve recently returned from a great two-week holiday, firstly at the Australian Open Tennis (what a final !!) and then up at the Gold Coast in not quite so sunny Queensland. Time now to get back to my blog 🙂

In my previous IOT examples, we had a very large column called Description which we didn’t really want to store within the Index Organized Table as it would cause the resultant index structure to get very inflated and inefficient. All the rows contained a very large Description value so it never made sense to include the Description column within the IOT.

In the following example, the Description column has values of varying lengths. Some of the values remain very large, however many of the Description values are quite moderate in size and wouldn’t be problematic to store within the IOT. Indeed, it would be quite beneficial as it wouldn’t be necessary to perform additional I/Os to the Overflow segment in cases where the Description was quite small in size and required by the application.

PCTTHRESHOLD gives us more flexibility in what is actually stored within the IOT index structure by storing  the non-PK columns up to the INCLUDING clause within the IOT but only if the row length to be stored inside the IOT is below a specified percentage threshold of the block size. So with a PCTTHRESHOLD of (say) 5, the non-PK columns up to the INCLUDING clause will be included within the IOT but only if the resultant row size is less than 5% of the blocksize. If a row size were to be greater than the specified percentage threshold of the block size, then any non-PK columns that would violate this length threshold would not be included within the IOT and stored instead within the Overflow segment.

In the following example, every other row is actually quite small and we would want these rows to have the Description value stored within the IOT. Therefore, we have modified the IOT table definition to include the Description column if the resultant row is less than 5% of the (8K in this case) blocksize:

SQL> CREATE TABLE album_sales_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING description PCTTHRESHOLD 5 OVERFLOW TABLESPACE bowie2;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         if mod(c,2) = 1 then
  5              INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6         else INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A short description');
  7         end if;
  8      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we look at the size of the resultant IOT:

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      2176          3          5       2052

The IOT is only of a moderate size, with 5 branch blocks and 2,052 leaf blocks.

If we look at the size of the Overflow segment:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_IOT';

 OBJECT_ID
----------
     74209

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74209';

TABLE_NAME         IOT_NAME         IOT_TYPE         BLOCKS
------------------ ---------------- ------------ ----------
SYS_IOT_OVER_74209 ALBUM_SALES_IOT  IOT_OVERFLOW      35715

We see that the vast majority of the storage is still allocated to the Overflow segment, at 35,715 blocks in size.

If look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 461972060=0x1b89225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 1
kdxcoopc 0x97: opcode=7: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 251
kdxcofbo 538=0x21a
kdxcofeo 561=0x231
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21053971=0x1414213
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[561] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01811901.0
col  0: [ 5]  c4 04 57 1d 44
row#1[584] flag: K—S-, lock: 2, len=36
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 28 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 4]  c3 1d 2a 2e
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e
row#2[620] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01811901.1
col  0: [ 5]  c4 04 22 2d 07
row#3[643] flag: K—S-, lock: 2, len=37
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 05
tl: 29 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 36 17 52
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e

We notice the leaf block contains 251 row entries. Half the rows with a Description of 19 bytes have the Description value stored within the IOT leaf block, while the other half of rows with the larger Description values contain a nrid that refers to the corresponding Description within the Overflow segment.

If we analyze the table:

SQL> ANALYZE TABLE album_sales_iot COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_IOT';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_IOT                    500000     250000

We notice that only half the rows are now “chained rows”.

If we run a query that only references the rows with a small Description that are stored within the IOT structure:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=0;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=0)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       2211  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

The query at 7 consistent gets is relatively efficient as all the required data can be found within the IOT.

If however we run a query that references the larger Description rows:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=1;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4147  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

We see that it’s less efficient at 18 consistent gets as it needs to obviously access a larger volume of data and requires additional I/Os to access the corresponding Overflow segment.

So, with a combination of the INCLUDING and PCTTHRESHOLD clauses, one can control what data is and is not included within the IOT index structure.

Index Organized Tables – Overflow Segment Part II (The Loneliest Guy) January 18, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, Primary Key.
3 comments

In my previous post on Index Organized Tables (IOT), I introduced the concept of the IOT Overflow Segment, where we can store columns that we may not want to include within the actual IOT index structure. Before we move on, I just wanted to cover off a few additional points that could be a trap for the unwary …

In my experience, the Primary Key (PK) columns of a table are typically the first columns defined in the table. This has certainly been standard practice in most environments I’ve seen. This makes sense in that the PK are in many ways the “key” column(s) in the table and are identified as such by having the prestigious honour of being the first column(s) defined within the table. Most people look at and intuitively expect the first columns in the table to be the PK columns and for that reason alone, it’s probably good practice to consistently define the PK columns in this manner.

However, there’s also a good argument why having the PK columns as the leading columns in the table is precisely the wrong location for them. As many tables are “primarily” accessed via the PK columns and so accessed directly through the associated PK index, the application already knows the PK values of the row in question. Therefore, it’s somewhat inefficient to then have the PK columns the first columns defined in the table as these generally have to be read through and ignored before we get to the non-PK columns that are of direct interest and the reason for visiting the table block in the first place. By placing the PK columns after the most accessed non-PK columns, we avoid having to unnecessarily read through these PK columns again when accessing the table via the PK index.

I personally prefer to define the PK columns first in a standardised manner, with the advantages of avoiding possible confusion and misunderstandings outweighing any possible performance improvements. However, I can at least see the logic and merit of not following this standard with Heap tables.

The same however can not really be said for IOTs and I would strongly recommend defining the PK columns first in an IOT …

I’m going to run the same demo as I did in my last post on the Overflow Segment, but with one subtle change. I’m not going to define the two PK columns first but rather have them defined after my heavily accessed non-PK column:

SQL> CREATE TABLE album_sales_iot(total_sales NUMBER, album_id NUMBER, country_id NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING country_id OVERFLOW TABLESPACE bowie2;

Table created.

So in this example, my leading column is the non-PK total_sales column, followed then by the two PK columns. I still only want these 3 columns to be included in the actual IOT structure, so I have my INCLUDING clause only including columns up to the country_id column. I want the remaining large description column to be stored separately in an Overflow segment.

OK, let’s populate this table with the same data we used previously:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_iot VALUES(ceil(dbms_random.value(1,5000000)), i, c, 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we describe this table, we get the expected listing:


SQL> desc album_sales_iot
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 TOTAL_SALES                                        NUMBER
 ALBUM_ID                                  NOT NULL NUMBER
 COUNTRY_ID                                NOT NULL NUMBER
 DESCRIPTION                                        VARCHAR2(1000)

With the columns listed in the order as we defined them in the table.

If we query the column details from dba_tab_columns:

SQL> select column_id, column_name from dba_tab_columns where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID COLUMN_NAME
---------- ------------------------------
         1 TOTAL_SALES
         2 ALBUM_ID
         3 COUNTRY_ID
         4 DESCRIPTION

We again find the column order is as we defined them in the table.

When we run the same query we ran last time that returned the data with 5 consistent gets:

SQL> set arraysize 100
SQL> select album_id, country_id, total_sales from album_sales_iot where album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1300 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
       2394  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice that performance is a lot worse, with 20 consistent gets now required. Obviously, something has changed unexpectedly ???

The first clue on what’s going on here can be found by looking at dba_tab_cols:

SQL> select column_id, segment_column_id, column_name from dba_tab_cols where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME
---------- ----------------- ------------------------------
         1                 3 TOTAL_SALES
         2                 1 ALBUM_ID
         3                 2 COUNTRY_ID
         4                 4 DESCRIPTION

The SEGMENT_COLUMN_ID column determines the order of the columns as they’re actually stored within the segment and we notice the column order is different. The two PK columns are listed first, with the total_sales column only listed in the 3rd position.

As discussed in the IOT Introduction post, the structure of an index entry in an IOT has the PK columns as the leading columns, following by the non-PK columns in the table portion. This is critical because the PK columns determine the location within the IOT table where new rows need to be inserted and the subsequent ordering of the rows in the table. As such, the PK columns must always be the leading columns of an IOT, despite how the table is actually defined at creation time. If the PK columns are not listed first in the table creation DDL statement, Oracle will automatically re-order the columns and place the PK columns first regardless.

This now has consequences on the INCLUDING clause if specified. In the above table creation statement, the INCLUDING clause specified the country_id column. Although defined as the third column, as it’s a PK column, Oracle has automatically re-ordered the columns such that it’s physically listed as the second column within the IOT segment. Unfortunately the INCLUDING clause is only applied after the re-ordering of the columns and as such, the total_sales column which is now logically listed third and now after the country_id column, is not therefore actually included in the IOT index structure as (perhaps) intended.

A partial block dump of an IOT leaf block will confirm his:

Leaf block dump
===============
header address 298590812=0x11cc225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 421
kdxcofbo 878=0x36e
kdxcofeo 879=0x36f
kdxcoavs 1
kdxlespl 0
kdxlende 0
kdxlenxt 21052811=0x1413d8b
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[879] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 9 fb: –H-F— lb: 0x0  cc: 0
nrid:  0x01811911.0
row#1[896] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 9 fb: –H-F— lb: 0x0  cc: 0
nrid:  0x01811911.1

As we can see, the IOT row entries only consist of the PK columns and the row reference to the corresponding Overflow segment. None of the non-PK columns (such as total_sales) are co-located within the IOT segment as the table column count is 0 (cc: 0).

As a result, additional consistent gets are now required to fetch the total_sales column from the Overflow segment to satisfy the query. This explains why the query is now less efficient than it was previously.

My recommendation with regard to defining IOTs is to simply list the PK columns first. This will ensure the INCLUDING clause is applied as intended and will generally reduce confusion and misunderstandings. Otherwise, the INCLUDING clause needs to specify a Non-PK column to ensure more than just the PK columns are actually included in the IOT segment, the consequences of which may not be obvious to the casual observer of the DDL or describer of the table.

Jonathan Lewis, a great source of information on indexes and Oracle in general has previously discussed this same IOT Trap on his blog.

Index Organized Tables – Overflow Segment (Shadow Man) January 13, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment.
14 comments

In my previous introductory IOT post, I illustrated how an Index Organized Table (IOT) might be worth consideration if most or all columns in a table were to be included within an index.

I’m going to use a slightly different demo this time, replacing one of the columns with a much larger DESCRIPTION column, one which is rarely accessed by the application:

SQL> CREATE TABLE album_sales_details_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Sorry for the unimaginative manner of loading the description field but you get the point 🙂

OK, let’s have a look at the size of the IOT:

SQL> ANALYZE INDEX album_sales_det_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
     71680          3        116      71429

As expected, the IOT is quite large as it has to accommodate the very large Description field within the IOT index structure. At 71,429 leaf blocks for the 500,000 rows in the table, that’s just 7 rows on average per leaf block.

The application doesn’t generally access the Description column with the following query typical (Note: to make fetching data as efficient as possible, I’ve set the arraysize to 100):

SQL> set arraysize 100
SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 521866300

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    17   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK |   100 |  1300 |    17   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         19  consistent gets
          0  physical reads
          0  redo size
       2387  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The query requires 19 consistent gets to retrieve the 100 rows because even though the data is extremely well clustered, there are very few rows per leaf block.

If we look at a partial block dump of one of these IOT leaf blocks:

Leaf block dump
===============
header address 548373084=0x20af825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 7
kdxcofbo 50=0x32
kdxcofeo 1011=0x3f3
kdxcoavs 961
kdxlespl 0
kdxlende 0
kdxlenxt 20978307=0x1401a83
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1011] flag: K—–, lock: 0, len=1004
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 996 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 05 3b 03
col  1: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We can see the leaf block only has 7 rows, with the vast majority of space taken up by the very large Description column.

Considering the Description column is so large and/or that it’s rarely accessed, wouldn’t it be nice if we didn’t have to store this column directly within the IOT index structure itself.

Enter the IOT Overflow segment. The IOT Overflow segment enables us to store in another physical location those columns that we don’t necessarily want to store directly within the IOT index structure. So those columns that might be particularly large (or just the occurrences of those columns when the specific values might be too large to store within the IOT index structure) or those columns that are rarely accessed can be stored elsewhere. Effectively, we’re back to having a separate “table” like structure, but the Overflow segment will only hold those columns that we don’t necessarily want to store within the index structure. Unlike a normal Heap table, in which all columns are stored within the table segment.

There are a number of different methods we could use (to be explored further in future posts), for now I’ll use the INCLUDING clause:

SQL> CREATE TABLE album_sales_details_iot2(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk2 PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING total_sales OVERFLOW TABLESPACE bowie2;

Table created.

So in the above example, all columns up to and “including” the total_sales column will be included in the IOT index structure. All the following columns listed in the table definition (in this case the Description column) will be store in the Overflow segment, which in the above example will be created within the BOWIE2 tablespace.

If we now populate this table with the identical data as before:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot2 VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> ANALYZE INDEX album_sales_det_pk2 VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      1664          3          4       1613

We notice the IOT index structure is now significantly smaller, down from 71,429 to just 1,613 leaf blocks. All the “clutter” has now been removed and is stored elsewhere.

If we now re-run our query:

SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 |  1300 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       2390  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Things are now much more efficient, having reduced the required consistent gets down from 19 to just 5 consistent gets.

If we now look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 441197148=0x1a4c225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 322
kdxcofbo 680=0x2a8
kdxcofeo 703=0x2bf
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21049987=0x1413283
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[703] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01800081.0
col  0: [ 5]  c4 02 5e 0d 25
row#1[726] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01800081.1
col  0: [ 5]  c4 04 41 13 43

We can see the number of index entries in the leaf block has increased from 7 to 322, with the size of the index entry decreasing from 1004 to just 23 bytes. Instead of the Description column being stored within the leaf block, we now have a nrid entry consisting of a 6 byte relative block address and row directory number (0x01800081.0), which effectively points to the actual location of the remaining portion of the row within the Overflow segment. We only therefore have a table column count of 1 (cc:1).

To find out more about the corresponding Overflow segment, we first must determine the OBJECT_ID of the IOT:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_DETAILS_IOT2';

 OBJECT_ID
----------
     74116

This OBJECT_ID is used to name the corresponding Overflow segment which we can determine from DBA_TABLES as it has a format of SYS_IOT_OVER_object_id:

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74116';

TABLE_NAME         IOT_NAME                 IOT_TYPE      BLOCKS
------------------ ------------------------ ------------ -------
SYS_IOT_OVER_74116 ALBUM_SALES_DETAILS_IOT2 IOT_OVERFLOW   71430

We notice this Overflow segment (at 71,430 blocks) is where the majority of our storage has been allocated.

Although it’s listed as a table, the Overflow segment can’t be directly accessed or manipulated. Any attempt to do so will result in an error:

SQL> select * from SYS_IOT_OVER_74116;
select * from SYS_IOT_OVER_74116
              *
ERROR at line 1:
ORA-25191: cannot reference overflow table of an index-organized table

If we look at a partial block dump of the Overflow segment block referenced in the previous IOT block dump:

Block header dump:  0x01800081
 Object id on Block? Y
 seg/obj: 0x12185  csc: 0x00.17482cc  itc: 1  flg: –  typ: 1 – DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.0000a919  0x00c24a2e.03d2.2a  C—    0  scn 0x0000.01748279
bdba: 0x01800081
data_block_dump,data header at 0x1a4c2244
===============
tsiz: 0x1fb8
hsiz: 0x20
pbl: 0x1a4c2244
     76543210
flag=——–
ntab=1
nrow=7
frre=-1
fsbo=0x20
fseo=0x4a6
avsp=0x486
tosp=0x486
0xe:pti[0] nrow=7 offs=0
0x12:pri[0] offs=0x1bda
0x14:pri[1] offs=0x17fc
0x16:pri[2] offs=0x141e
0x18:pri[3] offs=0x1040
0x1a:pri[4] offs=0xc62
0x1c:pri[5] offs=0x884
0x1e:pri[6] offs=0x4a6
block_row_dump:
tab 0, row 0, @0x1bda
tl: 990 fb: —–L– lb: 0x0  cc: 1
col  0: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We notice the Overflow block contains 7 rows as we would expect, as this was all the IOT segment could previously manage when it had to store the large Description column values.

The table row directory contains 7 rows, with the first row (#0) having an offset at address 0x1bda, which is the actual location of the first row within the Overflow block.

Therefore, in order to find a specific Description column value of interest from the IOT, Oracle references the (say) nrid:  0x01800081.0 within the IOT index entry for the row. This in turns points to the relative block address (0x01800081) of the Overflow block containing the description and the corresponding row directory number (0), which in turn specifies the offset (say) 0x1bda to the actual location of the Description value within the Overflow block. Easy !!

If we Analyze the IOT table:

SQL> ANALYZE TABLE album_sales_details_iot2 COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_DETAILS_IOT2';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_DETAILS_IOT2           500000     500000

We notice all the rows are listed as “Chained Rows“. This is because all the rows have a corresponding Description value stored in the Overflow segment and so the rows are not stored within the one block. As the previous query illustrated, this is no bad thing if we don’t need to reference these additional columns stored in the Overflow segment. It makes the resultant IOT table more compact and efficient to access.

However, on those (hopefully) rarer occasions when we do need to access the columns in the Overflow segment, this will clearly require additional block accesses:

SQL> SELECT * FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 | 99400 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 | 99400 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
       5541  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The above query which returns the Description column results in the consistent gets increasing to 32 consistent gets, from the 5 consistent gets when the Description wasn’t accessed and from the 19 consistent gets from when the Description column was co-located within the IOT segment. But this is a price we might be willing to pay if this query isn’t frequently executed while the frequently executed queries which don’t access the Description column are more efficient.

The Overflow segment gives us in a manner “the best of both worlds”. The ability to store just those columns of interest within the IOT segment (although these must always include all the Primary Key columns) and those that are less often accessed or too large to be efficiently stored within the IOT can be stored elsewhere. Effectively, it’s an index and table relationship except the table doesn’t have to store again the columns that are already stored within the index.

It’s all good news so far for IOTs …

Index Organized Tables – An Introduction Of Sorts (Pyramid Song) January 10, 2012

Posted by Richard Foote in Block Dumps, CBO, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Primary Key.
16 comments

Thought it was high time that I covered in a little detail the subject of Index Organized Tables (IOTs). When used appropriately, they can be an extremely useful method of storing and accessing data. Hopefully by the end of this series, you’ll have a better understanding of IOTs, their respective strengths and weaknesses and so perhaps be in a better position to take advantage of them when appropriate.

As I mentioned in a previous post, Martin Widlake has recently written an excellent series on IOTs, which I highly recommend. I’ll try to cover differing aspects of IOTs that will hopefully be of interest.

To start, let’s cover a very basic little example.

Let’s begin by creating and populating a simple Heap Table that holds information about musical albums (Note using an 8K blocksize in a MSSM tablespace):

SQL> CREATE TABLE album_sales(album_id number, country_id number, total_sales number, album_colour varchar2(20),
  2  CONSTRAINT album_sales_pk PRIMARY KEY(album_id, country_id));

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c IN 1..100 LOOP
  4        INSERT INTO album_sales VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We have a natural Primary Key that consists of two columns and an additional two columns of information.

Let’s look at some basic sizing information on the table and associated Primary Key index:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables WHERE table_name = 'ALBUM_SALES';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
      1570            0

SQL> ANALYZE INDEX album_sales_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1152          3       1062

So the table segment consists of 1570 blocks and the index segment 1152, with a total of 1062 leaf blocks.

OK, let’s run a basic query looking for all albums with an album_id=42:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3244723662

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   100 |  1800 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALBUM_SALES    |   100 |  1800 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ALBUM_SALES_PK |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4084  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As we can see, things are pretty good. 18 consistent gets in order to return 100 rows isn’t bad at all. Clearly, the index has a good Clustering Factor and can retrieve the 100 required rows in a relatively efficient manner.

However, this is a very frequently executed query and we want to do even better. One thing we notice is that we only have a couple of columns in the table which are not part of the index. Perhaps if we included these columns in the index as well, we can then use the index to extract all the required data and thus eliminate the need to visit the table segment at all. Overloading an index in this manner is a common tuning technique and will hopefully reduce the number of required logical I/Os to run the query.

We can do this by dropping and recreating the index with all the columns, making sure the PK columns remain the leading columns. This will ensure the index can still be used to police the PK constraint:

SQL> ALTER TABLE album_sales DROP PRIMARY KEY;

Table altered.

SQL> CREATE INDEX album_sales_pk_i ON album_sales(album_id, country_id, total_sales, album_colour) COMPUTE STATISTICS;

Index created.

SQL> ALTER TABLE album_sales ADD constraint album_sales_pk PRIMARY KEY(album_id, country_id);

Table altered.

OK, so the index now contains all the columns in the table and is now used to police the PK constraint:

SQL> select constraint_name, constraint_type, index_name from dba_constraints where constraint_name = 'ALBUM_SALES_PK';

CONSTRAINT_NAME                C INDEX_NAME
------------------------------ - ------------------------------
ALBUM_SALES_PK                 P ALBUM_SALES_PK_I

Let’s now look at the size of the index:

SQL> ANALYZE INDEX album_sales_pk_i VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      2048          5       2006

OK, as expected the index is now somewhat larger as it now needs to accommodate the extra columns. The number of overall blocks allocated to the index is 2048, with leaf blocks increasing from 1062  to 2006 leaf blocks.

If we now re-run the query:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1126128764

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_PK_I |   100 |  1800 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       3568  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice things have indeed improved and we have reduced the number consistent gets from 18 down to just 11. Not a bad improvement !!

If look at a partial block dump of one of the index leaf blocks:

Leaf block dump
===============
header address 484409948=0x1cdf825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 5
kdxcosdc 0
kdxconro 258
kdxcofbo 552=0x228
kdxcofeo 1373=0x55d
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 20972941=0x140058d
kdxleprv 20972939=0x140058b
kdxledsz 0
kdxlebksz 8036
row#0[8010] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 12
col 2; len 5; (5):  c4 04 15 31 59
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b7
row#1[7984] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 13
col 2; len 5; (5):  c4 03 19 2c 3d
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b8

We notice that each leaf entry is 26 bytes in length. The length of the four columns adds up to 13 bytes. The remaining 13 bytes is basically overhead required for each index entry:

2 bytes for flag and lock information in the index entry header

5 x 1 byte for each of the length bytes for each column

6 bytes for the 5th index column which is the index rowid

So that’s 13 bytes of overhead per index entry in this example index.

Well, everything is currently pretty good. We have the application now performing approximately 40% less work than it was previously. But we have one little issue. With the index now consisting of all the columns in the table and with the application using the index exclusively, what’s the point of now having the table? It’s wasting storage and wasting resources in having to be maintained for no purpose other than having to exist so that the index can in turn exist.

Wouldn’t it be nice if we can somehow just have the index, but without the underlining table. Enter the Index Organized Table (IOT), first introduced way back in Oracle 8.0. It’s basically an index structure that can exist without the need for an underlining table. The index structure itself is the table by which we can store and retrieve the necessary data.

OK, let’s now create a new version of this table with the same data, but this time as an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sals number, album_colour varchar2(20),
     CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4        INSERT INTO album_sales_IOT VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

The key clause is here ORGANIZATION INDEX. I’ll discuss other options and syntax in coming posts.

If we look now at the table segment:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables
  2  WHERE table_name = 'ALBUM_SALES_IOT';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
                        IOT

We see there is an IOT segment listed but consists of no blocks as it doesn’t physically exist …

If we look at the size of the corresponding index:

SQL> SELECT index_name, table_name, blevel, leaf_blocks FROM dba_indexes
  2  WHERE table_name = 'ALBUM_SALES_IOT';

INDEX_NAME           TABLE_NAME       BLEVEL LEAF_BLOCKS
-------------------- --------------- ------- -----------
ALBUM_SALES_IOT_PK   ALBUM_SALES_IOT       2        1550

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1664          4       1550

We notice it’s smaller than the corresponding overloaded index for the Heap Table. The previous index consisted of 2048 blocks and 2006 leaf blocks but this index is somewhat smaller at just 1664 blocks and 1550 leaf blocks.

If we take a look at a partial block dump of a leaf block from the IOT:

Leaf block dump
===============
header address 483926620=0x1cd8225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 1
kdxcoopc 0x90: opcode=0: iot flags=I— is converted=Y
kdxconco 2
kdxcosdc 2
kdxconro 336
kdxcofbo 708=0x2c4
kdxcofeo 710=0x2c6
kdxcoavs 2
kdxlespl 0
kdxlende 0
kdxlenxt 20976645=0x1401405
kdxleprv 20976643=0x1401403
kdxledsz 0
kdxlebksz 8036
row#0[710] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 49
tl: 14 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 2f 10 59
col  1: [ 4]  47 4f 4c 44
row#1[732] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 4a
tl: 14 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 03 01 03 46
col  1: [ 4]  47 4f 4c 44

Firstly, we notice it’s definitely an IOT block dump as the IOT flag is set.

The structure of the index entry is somewhat different here. It basically consists of:

2 bytes for lock and flag info in the index header as previously

Next come the two Primary Key columns with their corresponding length bytes. Note an IOT must have a PK defined.

Following are 3 bytes for the table header consisting of a lock byte, flag byte and a byte to denote the number of table (non PK) columns (in this case 2).

Followed finally by the 2 Non-PK columns and their corresponding length bytes.

Note the big missing component here from the previous block dump is that there is no rowid defined with its corresponding length byte. No need for a rowid if there’s no corresponding table to point down to …

So the overall overhead has been reduced to:

2 byes for the index header

3 bytes for the table header

4 bytes for the 4 column lengths

for a total of 9 bytes, 4 less than the 13 bytes overhead required in the previous example. So the total length of an index entry has reduced down from 26 bytes to just 22 bytes. Hence, the overall reduction in the size of the corresponding IOT index.

So we have saved 1570 table blocks and 384 index blocks in total.

If we now re-run the same query:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1800 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)

 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
       3575  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Not only have we saved ourselves some storage and having to maintain two physical segments, but things are a tad more efficient as well, reducing the number of consistent gets down from 11 to 10 as the corresponding index segment we need to access is smaller …

Enough to start with for now and yes the pun in the title is fully intended 🙂