jump to navigation

IOT Secondary Indexes – The Logical ROWID Guess Component Part II (Move On) May 8, 2012

Posted by Richard Foote in Index Block Size, Index Organized Tables, IOT, ROWID, Secondary Indexes.
7 comments

Having mentioned a couple of dangers associated with IOT Secondary Indexes, thought I might discuss a couple of their nicer attributes.

In the previous post, we saw how 50-50 index block splits on the ALBUM_SALES_IOT IOT table caused rows to move to new leaf blocks, resulting in a degradation in the PCT_DIRECT_ACCESS value of the associated ALBUM_SALES_IOT_TOTAL_SALES_I secondary index, which in turn resulted in poorer performance when using this index. We had to rebuild the secondary index (or update block references) to make all the “guess” components accurate and the index efficient again and so point to the correct locations within the parent IOT.

So, if you have 50-50 block splits occurring in your IOT, this will degrade the efficiency of the associated IOT Secondary indexes over time.

However, if you don’t have 50-50 block splits and the entries in the IOT don’t move from leaf block to leaf block, then this will not be an issue. Remembering of course that many Primary Key values are based on a sequence which monotonically increases and results in 90-10 block splits rather than 50-50 block splits.  90-10 block splits don’t move data around, Oracle leaves the full blocks alone and simply adds a new block in the IOT Btree structure into which new values are added. Therefore, with IOT data not moving around, the “guess” component of the logical ROWIDS remain valid and don’t go stale over time and so the associated secondary indexes remain nice and efficient.

If we look at the current state of the ALBUM_SALES_IOT_TOTAL_SALES_I secondary index:

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

We notice the PCT_DIRECT_ACCESS is currently nice and efficient at 100%.

If we now add a bunch of new rows into the IOT, but this time with PK values that monotonically increase:

SQL> BEGIN
  2    FOR i IN 5001..10000 LOOP
  3      FOR c IN 201..300 LOOP
  4        INSERT INTO album_sales_iot VALUES(i,c,ceil(dbms_random.value(1,5000000)), 'Yet more new rows');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

And collect fresh statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'ALBUM_SALES_IOT', estimate_percent=>null, cascade=>true, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

We notice that the PCT_DIRECT_ACCESS value remains unchanged. So, no 50-50 block split, no PCT_DIRECT_ACCESS degradation with regard the secondary indexes.

OK, another nice feature with IOT Secondary Indexes.

With a “normal” Heap table, if we were to MOVE and reorganise the table, all associated indexes become invalid as the Move results in all the rows being relocated and the indexes are not maintained during this process (as this would add considerably to the overhead in the Move process). All associated indexes have to be rebuilt after the Move operation completes, which is both expensive and adds considerably to the availability issues associated with the whole table reorg process as the table is locked during the Move operation. In short, moving a heap table is an expensive and an availability unfriendly process.

As this little demo illustrates, moving a heap table results in all indexes becoming unusable:

SQL> create table radiohead (id number constraint radiohead_pk primary key, code number, name varchar2(30));

Table created.

SQL> create index code_i on radiohead(code);

Index created.

SQL> insert into radiohead select rownum, rownum, 'OK COMPUTER' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> select index_name, status from dba_indexes where table_name = 'RADIOHEAD';

INDEX_NAME                     STATUS
------------------------------ --------
RADIOHEAD_PK                   VALID
CODE_I                         VALID

SQL> alter table radiohead move;

Table altered.

SQL> select index_name, status from dba_indexes where table_name = 'RADIOHEAD';

INDEX_NAME                     STATUS
------------------------------ --------
RADIOHEAD_PK                   UNUSABLE
CODE_I                         UNUSABLE

However, moving an IOT has a number of advantages over a heap table.

Firstly, as it’s an index structure, it can be reorganised and rebuilt in much the same way as we can rebuild any btree index. Remembering, an index can be rebuilt “online” (on Enterprise Edition), overcoming many of the locking issues associated with moving heap tables.

Additionally, although the physical locations of all the rows in the IOT change following a Move operation, the PK values  themselves don’t change. Therefore, although the PCT_DIRECT_ACCESS value becomes 0, the indexes themselves are still Valid and usable as the PK component can still be used to access the relevant data.

So the syntax to move an IOT table can be expanded to be performed “Online” and all the secondary indexes will remain “Valid”:

SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     STATUS   PCT_DIRECT_ACCESS
------------------------------ -------- -----------------
ALBUM_SALES_IOT_PK             VALID                    0
ALBUM_SALES_IOT_TOTAL_SALES_I  VALID                  100
ALBUM_SALES_IOT_COUNTRY_ID_I   VALID                  100

SQL> alter table album_sales_iot move online;

Table altered.

SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     STATUS   PCT_DIRECT_ACCESS
------------------------------ -------- -----------------
ALBUM_SALES_IOT_PK             VALID                    0
ALBUM_SALES_IOT_TOTAL_SALES_I  VALID                    0
ALBUM_SALES_IOT_COUNTRY_ID_I   VALID                    0

So although the PCT_DIRECT_ACCESS values for the secondary indexes has gone down to 0, making them less efficient as a result, they do at least remain valid and usable by the CBO:

SQL> select * from album_sales_iot where total_sales between 424242 and 424343;

26 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |    33 |   858 |    68   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |    33 |   858 |    68   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |    33 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
   2 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         83  consistent gets
         53  physical reads
          0  redo size
       1655  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         26  rows processed

The secondary index is still used by the CBO, although at 83 consistent gets in this example, it’s not as efficient as it could be.

The rebuild of the secondary index can be performed subsequently to repair the stale guesses and improve the efficiency of the index as desired:

SQL> alter index album_sales_iot_total_sales_i rebuild online;

Index altered.

SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     STATUS   PCT_DIRECT_ACCESS
------------------------------ -------- -----------------
ALBUM_SALES_IOT_PK             VALID                    0
ALBUM_SALES_IOT_TOTAL_SALES_I  VALID                  100
ALBUM_SALES_IOT_COUNTRY_ID_I   VALID                    0
SQL> select * from album_sales_iot where total_sales between 424242 and 424343;

26 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |    33 |   858 |    36   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |    33 |   858 |    36   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |    33 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
   2 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         31  consistent gets
          0  physical reads
          0  redo size
       1655  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         26  rows processed

Following the rebuild of the secondary index and getting the PCT_DIRECT_ACCESS back to 100%, the example query is now more efficient, with a reduction of consistent gets down from 83 to just 31.

So IOTs can be less problematic to reorganise and if 90-10 block splits are performed, the impact on the secondary indexes is minimised.

IOT Secondary Indexes – The Logical ROWID Guess Component Part I (Lucky) April 26, 2012

Posted by Richard Foote in Index Organized Tables, IOT, Oracle Indexes, Primary Key, ROWID, Secondary Indexes.
8 comments

As discussed previously, an index entry within a Secondary Index on an Index Organized Table (IOT) basically consists of the indexed column(s) and the Logical Rowid, the PK column(s) and a “guess” to the physical block in the IOT containing the corresponding row.

Let’s discuss this “guess” component in a bit more detail.

When the Secondary Index is created, this guess is spot on and will indeed point to the correct block within the IOT structure that contains the row being referenced by the indexed entry.

When I initially created the Secondary Index on the Total_Sales column, all the physical guesses were accurate and indeed pointed to the correct blocks within the IOT structure. This can be confirmed by the following query:

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';
 INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

As we can see, the PCT_DIRECT_ACCESS value is 100, which means that 100% of all the guess components are correct. Therefore, the index behaves in a manner very similar to an ordinary Secondary Index with a rowid, in that all the initial accesses to the IOT are valid and there’s no need to subsequently re-access the IOT via the PK component. From the perspective of finding the required row entries with the IOT structure, the Secondary Index is as efficient as possible when all the guesses are valid.

If we run a little query to access a number of rows via this Secondary Index:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   102 |  1836 |   105   (0)| 00:00:02 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   102 |  1836 |   105   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   102 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        110  consistent gets
          0  physical reads
          0  redo size
       3657  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         99  rows processed

Note we’re returning 99 rows which requires 110 consistent gets. So a touch over 1 consistent get per row being access. Note these numbers, we’ll reference them again later …

OK, we’re now going to add some more rows to the table. This will result in 50-50 block splits occurring which will in turn cause a whole bunch of rows to move to new physical blocks within the IOT.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3       FOR c IN 101..200 LOOP
  4          INSERT INTO album_sales_iot
  5          VALUES(i,c,ceil(dbms_random.value(1,5000000)), 'Some new rows');
  6       END LOOP;
  7    END LOOP;
  8    COMMIT;
  9  END;
 10  /
 PL/SQL procedure successfully completed.

If we now collect fresh statistics and look at the index statistics again:

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'ALBUM_SALES_IOT', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                 58 NO

We notice that the PCT_DIRECT_ACCESS value has dropped significantly to just 58%. This means that only 58% of the guesses are now accurate and that in the other 42% of cases, Oracle is forced to now re-access the IOT again via the PK component stored in the Secondary Indexes. This results in additional consistent gets now likely being required to access the IOT via the index, resulting in a less efficient index.

If we now re-run the original query again:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

184 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   202 |  4646 |   376   (0)| 00:00:05 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   202 |  4646 |   376   (0)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   202 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        463  consistent gets
          0  physical reads
          0  redo size
       7144  bytes sent via SQL*Net to client
        656  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        184  rows processed

We can see that approximately doubled the number of rows are now returned (184 from 99 rows). However, the number of consistent gets has increased by approximately 4 fold (from 110 to 463). The index is now not as efficient in retrieving rows as it was previously, requiring now some 2.5 consistent gets per row being accessed.

To fix these guesses and make the index more efficient again, one can either ALTER the index with the REBUILD or the UPDATE BLOCK REFERENCES clause:

SQL> alter index album_sales_iot_total_sales_i UPDATE BLOCK REFERENCES;

Index altered.

If we now look at some fresh index statistics:

SQL> exec dbms_stats.gather_index_stats(ownname=> null, indname=> 'ALBUM_SALES_IOT_TOTAL_SALES_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

We notice that the index now has the PCT_DIRECT_ACCESS back at a nice high 100%. If we re-run the same query again:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

184 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   202 |  4646 |   206   (0)| 00:00:03 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   202 |  4646 |   206   (0)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   202 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        202  consistent gets
          0  physical reads
          0  redo size
       7144  bytes sent via SQL*Net to client
        656  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)

The consistent gets have now reduced substantially from 463 to just 202, back at a touch over 1 consistent get per row retrieved …

So, IOT Secondary Indexes can be as efficient as normal secondary indexes, but if the IOT is subject to 50-50 block splits, they’ll likely need to be maintained more regularly to ensure they stay nice and efficient. Another significant disadvantage associated with IOTs that have Secondary Indexes …

There’s a bit more I’ll like to say on the subject but I’ll leave it for a Part II 🙂

IOT Secondary Indexes: Primary Key Considerations (Beauty And The Beast) April 11, 2012

Posted by Richard Foote in Block Dumps, Index Organized Tables, IOT, Oracle Indexes, Primary Key, Secondary Indexes.
7 comments

As discussed previously, one of the nice features of an IOT Secondary Index is that it contains the mandatory Primary Key of the IOT, which is always maintained and can be used to access the necessary rows of the IOT regardless of  row movement within the IOT itself.

This can also be beneficial if only the PK columns of the table are required by the query (perhaps as part of a join) as a visit to the actual IOT table would be unnecessary.

However, one of the main disadvantages of an IOT Secondary Index is that it contains the PK of the IOT 🙂

Remember, one of the benefits of an IOT is that we don’t have to store columns twice as we would with a conventional Heap table, that being once within the table structure and again within the (often overloaded) PK  index. However, with an IOT Secondary index, we must store the PK columns again. In fact, we have to re-store the PK columns again for as many IOT Secondary indexes we have defined for the IOT.

So the overall additional overheads we’re talking about here becomes a product of two important factors. The actual overall size of the PK column(s) and the number of Secondary Indexes we have defined on the IOT. If the average size of the PK is large and/or we have a number of Secondary Indexes, then the overall overheads can be significant, reducing the benefits of the IOT.

If we look at the size of the IOT Secondary Index I created in my previous introductory post:

SQL> select leaf_blocks from dba_indexes where index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1728

If however we compare this with a secondary index associated with a conventional heap table containing identical data:

SQL> create table not_an_iot as select * from album_sales_IOT;

Table created.

SQL> create index not_an_iot_total_sales_i on not_an_iot(total_sales);

Index created.

SQL> select leaf_blocks from dba_indexes where index_name = 'NOT_AN_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1171

We notice that the IOT Secondary index is significantly larger, 1728 leaf blocks vs. 1171.

If we compare block dumps of the same index entry from both Secondary Indexes:

row#0[8016] flag: K—–, lock: 0, len=20
col 0; len 2; (2):  c1 06
col 1; len 3; (3):  c2 15 16
col 2; len 2; (2):  c1 5f
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 f1

Above is the IOT Secondary Index example, which is 20 bytes in length.

row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  01 01 68 7a 00 b4

Above is the Heap Table Secondary Index example, which is only 12 bytes in length.

The 8 bytes required for the table header and physical “guess” overheads within the IOT Secondary Index are almost cancelled out by the 7 bytes of overhead required for the ROWID column within the Heap Table Secondary index. However, most of the difference in length (20 bytes vs. 12 bytes) can be attributed to the 7 bytes required to store the PK columns and their associated length bytes in this example.

Obviously, the larger the PK, the greater the associated overheads. Obviously, the greater the number of IOT Secondary indexes, again the greater the overall associated overheads.

If we create a secondary index on a column that forms part of the PK, Oracle can be a lit bit cleverer. Following, we create an index on the COUNTRY_ID column, which is the second column of our PK (album_id, country_id):

SQL> create index album_sales_iot_country_id_i on album_sales_iot(country_id);

Index created.

We notice that for this new index, Oracle has eliminated “redundant” PK columns from the secondary index, as there’s no need to store the entire PK again as the indexed column itself already forms part of the PK:

SQL> select index_name, iot_redundant_pkey_elim from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     IOT_REDUNDANT_PKEY_ELIM
------------------------------ ------------------------
ALBUM_SALES_IOT_PK             NO
ALBUM_SALES_IOT_TOTAL_SALES_I  NO
ALBUM_SALES_IOT_COUNTRY_ID_I   YES

A quick look at a block dump of this secondary index will confirm that the PK portion of the index entry only contains the PK columns that are not included in the indexed column list (i.e. just the ALBUM_ID column):

row#0[8020] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#1[8004] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#2[7988] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5

So we have 3 index entries listed here. In each one:

col 0 represents  the indexed column (COUNTRY_ID) which happens to be part of the PK

col 1 is the remaining PK column yet to be defined in the index entry (ALBUM_ID)

col 0 (with a length of 4) represents the physical “guess”.

So Oracle still has defined within the index entry the full PK to access the IOT as necessary if the “guess” proves to be wrong.

The key points to take from this post is to fully consider the consequences of a large PK on any defined secondary index on an IOT and to fully consider the suitability of having the table defined as an IOT if you require many secondary indexes to be defined on the table.

More on this “guess” component in my next post …

Indexed Organized Tables – An Introduction to IOT Secondary Indexes (A Second Face) March 19, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Secondary Indexes.
14 comments

Man, its been ages since I had free time to update the blog, what with birthday parties to organise, Roger Water concerts to attend and Radiohead concerts in the planning !! OK, time to take an initial look at Secondary Indexes for Index Organized Tables (IOTs).

If the IOT needs to be accessed via the Primary Key (PK) column(s), then no problem, the IOT structure must have a PK defined and the logical structure of the IOT ensures that data within the IOT is ordered based on the PK. Therefore, the IOT can be navigated like any conventional PK and the necessary data can be efficiently accessed.

But what if we want to access the data efficiently via Non-PK columns or without specify the leading column of the PK ? Can we create secondary indexes on a IOT ?

When IOTs were first introduced way back in Oracle8, secondary indexes weren’t supported (they came later in 8i). That’s likely due to the fact Oracle had to resolve a tricky issue in relation to indexing an IOT structure, that being what to do when indexing rows that potentially move around all the time ?

With a conventional Heap table, once a row is inserted into the table, it doesn’t generally subsequently move. There are relatively few examples of when this occurs, for example updating the partitioned column of a row such that it needs to be stored in another partition. This is recognised as a rather expensive thing to do as not only do at least two blocks need to be accessed and modified but it also requires associated indexes to be updated as well. As such, it generally requires explicitly allowing such activities to occur (by enabling row movement and the such). Note, when rows migrate to another block due to an increase in row size, indexes are not impacted and still reference the original block and the remaining stub of the row which points to the new block/location of the row.

But with IOTs, the story can be very different. When a 50-50 index block split occurs, roughly half the rows in the leaf block move to a new block. A relatively expensive operation would be even more expensive if  Oracle had to also update the index entries of all secondary indexes that referenced all these moved rows. Although rare with Heap tables, rows moving to new locations could be relatively common in an IOT due to associated 50-50 block split operations.

To deal with the difficulties of frequently moving rows within an IOT, Oracle created the IOT Secondary Index structure. It has three main components:

  • The indexed column values
  • The PK columns of the associated IOT
  • A “guess” that points to the physical location of the rows within the IOT, initially at the time the index is created

So the IOT Secondary Index is used in the following fashion. During an index scan, Oracle attempts to use the “guess” to access the block that was the last known physical location of the  row within the IOT. If it finds the required row in the IOT, great. The index performs in a similar manner to using a rowid with a conventional secondary index. However, if the required row is nowhere to be seen within the referenced block, Oracle tries again, this time using the PK value contained with the IOT Secondary Index to perform a Unique Scan of the IOT. This is a little more expensive to perform as it requires navigating down the branch structures of the IOT, but is at least guaranteed to find the row this time in its current location.

So in the best case scenario, the index performs similar to that of a normal secondary index. In the worst case scenario where the row has moved, the index is forced to perform an additional Unique Scan of the IOT using the PK but at least this has the potential to be much more efficient that a Fast Full Scan of the IOT in order to find the necessary row.

The key point to note here is that the secondary index is  not updated when a block split on the parent IOT occurs. The “guess” via the physical pointer reference simply becomes stale and the PK which is also stored within the secondary index is used as a backup method of accessing the required row.

If we start with a traditionally simple little demo, let’s first create and populate an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sales number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> begin
  2  for i in 1..5000 loop
  3    for c in 1..100 loop
  4      insert into album_sales_iot values (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5    end loop;
  6  end loop;
  7  commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we now run a query based on the non-PK TOTAL_SALES column:

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1789589470

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     1 |    18 |   425   (1)| 00:00:06 |
|*  1 |  INDEX FAST FULL SCAN| ALBUM_SALES_IOT_PK |     1 |    18 |   425   (1)| 00:00:06 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1586  consistent gets
          0  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We see that Oracle has no real choice (the PK is of no direct use here) but to perform an expensive FAST FULL INDEX SCAN, even though it correctly knows relatively few rows are to be retrieved.

If we create a secondary index on the IOT however:

SQL> create index album_sales_IOT_total_sales_i on album_sales_iot(total_sales);

Index created.

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |     1 |    18 |4   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |     1 |    18 |4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |     1 |       |3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES"=2000)
   2 - access("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          5  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

 
We notice that the index is used as expected and the number of consistent gets has dropped significantly. Notice also that the IOT is accessed subsequently not via Index ROWIDs but by a INDEX UNIQUE SCAN via the IOT PK. More on this later …

If we look at a partial block dump of an index entry within the IOT Secondary index:

row#0[8015] flag: K—–, lock: 0, len=21
col 0; len 3; (3):  c2 1f 28
col 1; len 3; (3):  c2 15 37
col 2; len 2; (2):  c1 1b
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 da

col 0 represents the indexed value (TOTAL_SALES)

col 1 and col 2 represent the PK columns (ALBUM_ID and COUNTRY_ID)

Following the 3 byte table header overhead required for the “guess”, we have the second col 0, which represents the 4 byte  “guess” to the last known physical location of the row.

Much more to follow shortly …

Index Organized Tables – PCTTHRESHOLD (The Wedding Song) February 8, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, PCTTHRESHOLD.
7 comments

I’ve recently returned from a great two-week holiday, firstly at the Australian Open Tennis (what a final !!) and then up at the Gold Coast in not quite so sunny Queensland. Time now to get back to my blog 🙂

In my previous IOT examples, we had a very large column called Description which we didn’t really want to store within the Index Organized Table as it would cause the resultant index structure to get very inflated and inefficient. All the rows contained a very large Description value so it never made sense to include the Description column within the IOT.

In the following example, the Description column has values of varying lengths. Some of the values remain very large, however many of the Description values are quite moderate in size and wouldn’t be problematic to store within the IOT. Indeed, it would be quite beneficial as it wouldn’t be necessary to perform additional I/Os to the Overflow segment in cases where the Description was quite small in size and required by the application.

PCTTHRESHOLD gives us more flexibility in what is actually stored within the IOT index structure by storing  the non-PK columns up to the INCLUDING clause within the IOT but only if the row length to be stored inside the IOT is below a specified percentage threshold of the block size. So with a PCTTHRESHOLD of (say) 5, the non-PK columns up to the INCLUDING clause will be included within the IOT but only if the resultant row size is less than 5% of the blocksize. If a row size were to be greater than the specified percentage threshold of the block size, then any non-PK columns that would violate this length threshold would not be included within the IOT and stored instead within the Overflow segment.

In the following example, every other row is actually quite small and we would want these rows to have the Description value stored within the IOT. Therefore, we have modified the IOT table definition to include the Description column if the resultant row is less than 5% of the (8K in this case) blocksize:

SQL> CREATE TABLE album_sales_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING description PCTTHRESHOLD 5 OVERFLOW TABLESPACE bowie2;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         if mod(c,2) = 1 then
  5              INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6         else INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A short description');
  7         end if;
  8      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we look at the size of the resultant IOT:

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      2176          3          5       2052

The IOT is only of a moderate size, with 5 branch blocks and 2,052 leaf blocks.

If we look at the size of the Overflow segment:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_IOT';

 OBJECT_ID
----------
     74209

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74209';

TABLE_NAME         IOT_NAME         IOT_TYPE         BLOCKS
------------------ ---------------- ------------ ----------
SYS_IOT_OVER_74209 ALBUM_SALES_IOT  IOT_OVERFLOW      35715

We see that the vast majority of the storage is still allocated to the Overflow segment, at 35,715 blocks in size.

If look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 461972060=0x1b89225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 1
kdxcoopc 0x97: opcode=7: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 251
kdxcofbo 538=0x21a
kdxcofeo 561=0x231
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21053971=0x1414213
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[561] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01811901.0
col  0: [ 5]  c4 04 57 1d 44
row#1[584] flag: K—S-, lock: 2, len=36
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 28 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 4]  c3 1d 2a 2e
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e
row#2[620] flag: K—S-, lock: 2, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01811901.1
col  0: [ 5]  c4 04 22 2d 07
row#3[643] flag: K—S-, lock: 2, len=37
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 05
tl: 29 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 36 17 52
col  1: [19]  41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e

We notice the leaf block contains 251 row entries. Half the rows with a Description of 19 bytes have the Description value stored within the IOT leaf block, while the other half of rows with the larger Description values contain a nrid that refers to the corresponding Description within the Overflow segment.

If we analyze the table:

SQL> ANALYZE TABLE album_sales_iot COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_IOT';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_IOT                    500000     250000

We notice that only half the rows are now “chained rows”.

If we run a query that only references the rows with a small Description that are stored within the IOT structure:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=0;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=0)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       2211  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

The query at 7 consistent gets is relatively efficient as all the required data can be found within the IOT.

If however we run a query that references the larger Description rows:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=1;

50 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |     1 |   510 |     5   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |     1 |   510 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
       filter(MOD("COUNTRY_ID",2)=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4147  bytes sent via SQL*Net to client
        557  bytes received via SQL*Net from client
          5  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         50  rows processed

We see that it’s less efficient at 18 consistent gets as it needs to obviously access a larger volume of data and requires additional I/Os to access the corresponding Overflow segment.

So, with a combination of the INCLUDING and PCTTHRESHOLD clauses, one can control what data is and is not included within the IOT index structure.

Index Organized Tables – Overflow Segment Part II (The Loneliest Guy) January 18, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, Primary Key.
3 comments

In my previous post on Index Organized Tables (IOT), I introduced the concept of the IOT Overflow Segment, where we can store columns that we may not want to include within the actual IOT index structure. Before we move on, I just wanted to cover off a few additional points that could be a trap for the unwary …

In my experience, the Primary Key (PK) columns of a table are typically the first columns defined in the table. This has certainly been standard practice in most environments I’ve seen. This makes sense in that the PK are in many ways the “key” column(s) in the table and are identified as such by having the prestigious honour of being the first column(s) defined within the table. Most people look at and intuitively expect the first columns in the table to be the PK columns and for that reason alone, it’s probably good practice to consistently define the PK columns in this manner.

However, there’s also a good argument why having the PK columns as the leading columns in the table is precisely the wrong location for them. As many tables are “primarily” accessed via the PK columns and so accessed directly through the associated PK index, the application already knows the PK values of the row in question. Therefore, it’s somewhat inefficient to then have the PK columns the first columns defined in the table as these generally have to be read through and ignored before we get to the non-PK columns that are of direct interest and the reason for visiting the table block in the first place. By placing the PK columns after the most accessed non-PK columns, we avoid having to unnecessarily read through these PK columns again when accessing the table via the PK index.

I personally prefer to define the PK columns first in a standardised manner, with the advantages of avoiding possible confusion and misunderstandings outweighing any possible performance improvements. However, I can at least see the logic and merit of not following this standard with Heap tables.

The same however can not really be said for IOTs and I would strongly recommend defining the PK columns first in an IOT …

I’m going to run the same demo as I did in my last post on the Overflow Segment, but with one subtle change. I’m not going to define the two PK columns first but rather have them defined after my heavily accessed non-PK column:

SQL> CREATE TABLE album_sales_iot(total_sales NUMBER, album_id NUMBER, country_id NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING country_id OVERFLOW TABLESPACE bowie2;

Table created.

So in this example, my leading column is the non-PK total_sales column, followed then by the two PK columns. I still only want these 3 columns to be included in the actual IOT structure, so I have my INCLUDING clause only including columns up to the country_id column. I want the remaining large description column to be stored separately in an Overflow segment.

OK, let’s populate this table with the same data we used previously:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_iot VALUES(ceil(dbms_random.value(1,5000000)), i, c, 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description');
  6      END LOOP;
  9    END LOOP;
 10    COMMIT;
 11  END;
 12  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we describe this table, we get the expected listing:


SQL> desc album_sales_iot
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 TOTAL_SALES                                        NUMBER
 ALBUM_ID                                  NOT NULL NUMBER
 COUNTRY_ID                                NOT NULL NUMBER
 DESCRIPTION                                        VARCHAR2(1000)

With the columns listed in the order as we defined them in the table.

If we query the column details from dba_tab_columns:

SQL> select column_id, column_name from dba_tab_columns where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID COLUMN_NAME
---------- ------------------------------
         1 TOTAL_SALES
         2 ALBUM_ID
         3 COUNTRY_ID
         4 DESCRIPTION

We again find the column order is as we defined them in the table.

When we run the same query we ran last time that returned the data with 5 consistent gets:

SQL> set arraysize 100
SQL> select album_id, country_id, total_sales from album_sales_iot where album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1300 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         20  consistent gets
          0  physical reads
          0  redo size
       2394  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice that performance is a lot worse, with 20 consistent gets now required. Obviously, something has changed unexpectedly ???

The first clue on what’s going on here can be found by looking at dba_tab_cols:

SQL> select column_id, segment_column_id, column_name from dba_tab_cols where table_name = 'ALBUM_SALES_IOT' order by column_id;

 COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME
---------- ----------------- ------------------------------
         1                 3 TOTAL_SALES
         2                 1 ALBUM_ID
         3                 2 COUNTRY_ID
         4                 4 DESCRIPTION

The SEGMENT_COLUMN_ID column determines the order of the columns as they’re actually stored within the segment and we notice the column order is different. The two PK columns are listed first, with the total_sales column only listed in the 3rd position.

As discussed in the IOT Introduction post, the structure of an index entry in an IOT has the PK columns as the leading columns, following by the non-PK columns in the table portion. This is critical because the PK columns determine the location within the IOT table where new rows need to be inserted and the subsequent ordering of the rows in the table. As such, the PK columns must always be the leading columns of an IOT, despite how the table is actually defined at creation time. If the PK columns are not listed first in the table creation DDL statement, Oracle will automatically re-order the columns and place the PK columns first regardless.

This now has consequences on the INCLUDING clause if specified. In the above table creation statement, the INCLUDING clause specified the country_id column. Although defined as the third column, as it’s a PK column, Oracle has automatically re-ordered the columns such that it’s physically listed as the second column within the IOT segment. Unfortunately the INCLUDING clause is only applied after the re-ordering of the columns and as such, the total_sales column which is now logically listed third and now after the country_id column, is not therefore actually included in the IOT index structure as (perhaps) intended.

A partial block dump of an IOT leaf block will confirm his:

Leaf block dump
===============
header address 298590812=0x11cc225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 421
kdxcofbo 878=0x36e
kdxcofeo 879=0x36f
kdxcoavs 1
kdxlespl 0
kdxlende 0
kdxlenxt 21052811=0x1413d8b
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[879] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 9 fb: –H-F— lb: 0x0  cc: 0
nrid:  0x01811911.0
row#1[896] flag: K—–, lock: 0, len=17
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 9 fb: –H-F— lb: 0x0  cc: 0
nrid:  0x01811911.1

As we can see, the IOT row entries only consist of the PK columns and the row reference to the corresponding Overflow segment. None of the non-PK columns (such as total_sales) are co-located within the IOT segment as the table column count is 0 (cc: 0).

As a result, additional consistent gets are now required to fetch the total_sales column from the Overflow segment to satisfy the query. This explains why the query is now less efficient than it was previously.

My recommendation with regard to defining IOTs is to simply list the PK columns first. This will ensure the INCLUDING clause is applied as intended and will generally reduce confusion and misunderstandings. Otherwise, the INCLUDING clause needs to specify a Non-PK column to ensure more than just the PK columns are actually included in the IOT segment, the consequences of which may not be obvious to the casual observer of the DDL or describer of the table.

Jonathan Lewis, a great source of information on indexes and Oracle in general has previously discussed this same IOT Trap on his blog.

Index Organized Tables – Overflow Segment (Shadow Man) January 13, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment.
14 comments

In my previous introductory IOT post, I illustrated how an Index Organized Table (IOT) might be worth consideration if most or all columns in a table were to be included within an index.

I’m going to use a slightly different demo this time, replacing one of the columns with a much larger DESCRIPTION column, one which is rarely accessed by the application:

SQL> CREATE TABLE album_sales_details_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Sorry for the unimaginative manner of loading the description field but you get the point 🙂

OK, let’s have a look at the size of the IOT:

SQL> ANALYZE INDEX album_sales_det_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
     71680          3        116      71429

As expected, the IOT is quite large as it has to accommodate the very large Description field within the IOT index structure. At 71,429 leaf blocks for the 500,000 rows in the table, that’s just 7 rows on average per leaf block.

The application doesn’t generally access the Description column with the following query typical (Note: to make fetching data as efficient as possible, I’ve set the arraysize to 100):

SQL> set arraysize 100
SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 521866300

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1300 |    17   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK |   100 |  1300 |    17   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         19  consistent gets
          0  physical reads
          0  redo size
       2387  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The query requires 19 consistent gets to retrieve the 100 rows because even though the data is extremely well clustered, there are very few rows per leaf block.

If we look at a partial block dump of one of these IOT leaf blocks:

Leaf block dump
===============
header address 548373084=0x20af825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 7
kdxcofbo 50=0x32
kdxcofeo 1011=0x3f3
kdxcoavs 961
kdxlespl 0
kdxlende 0
kdxlenxt 20978307=0x1401a83
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1011] flag: K—–, lock: 0, len=1004
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 996 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 05 3b 03
col  1: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We can see the leaf block only has 7 rows, with the vast majority of space taken up by the very large Description column.

Considering the Description column is so large and/or that it’s rarely accessed, wouldn’t it be nice if we didn’t have to store this column directly within the IOT index structure itself.

Enter the IOT Overflow segment. The IOT Overflow segment enables us to store in another physical location those columns that we don’t necessarily want to store directly within the IOT index structure. So those columns that might be particularly large (or just the occurrences of those columns when the specific values might be too large to store within the IOT index structure) or those columns that are rarely accessed can be stored elsewhere. Effectively, we’re back to having a separate “table” like structure, but the Overflow segment will only hold those columns that we don’t necessarily want to store within the index structure. Unlike a normal Heap table, in which all columns are stored within the table segment.

There are a number of different methods we could use (to be explored further in future posts), for now I’ll use the INCLUDING clause:

SQL> CREATE TABLE album_sales_details_iot2(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk2 PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING total_sales OVERFLOW TABLESPACE bowie2;

Table created.

So in the above example, all columns up to and “including” the total_sales column will be included in the IOT index structure. All the following columns listed in the table definition (in this case the Description column) will be store in the Overflow segment, which in the above example will be created within the BOWIE2 tablespace.

If we now populate this table with the identical data as before:

SQL> BEGIN
  2    FOR i in 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4         INSERT INTO album_sales_details_iot2 VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description');
  5       END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> ANALYZE INDEX album_sales_det_pk2 VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats;

    BLOCKS     HEIGHT    BR_BLKS    LF_BLKS
---------- ---------- ---------- ----------
      1664          3          4       1613

We notice the IOT index structure is now significantly smaller, down from 71,429 to just 1,613 leaf blocks. All the “clutter” has now been removed and is stored elsewhere.

If we now re-run our query:

SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 |  1300 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 |  1300 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       2390  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Things are now much more efficient, having reduced the required consistent gets down from 19 to just 5 consistent gets.

If we now look at a partial block dump of an IOT leaf block:

Leaf block dump
===============
header address 441197148=0x1a4c225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 322
kdxcofbo 680=0x2a8
kdxcofeo 703=0x2bf
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21049987=0x1413283
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[703] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01800081.0
col  0: [ 5]  c4 02 5e 0d 25
row#1[726] flag: K—–, lock: 0, len=23
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 15 fb: –H-F— lb: 0x0  cc: 1
nrid:  0x01800081.1
col  0: [ 5]  c4 04 41 13 43

We can see the number of index entries in the leaf block has increased from 7 to 322, with the size of the index entry decreasing from 1004 to just 23 bytes. Instead of the Description column being stored within the leaf block, we now have a nrid entry consisting of a 6 byte relative block address and row directory number (0x01800081.0), which effectively points to the actual location of the remaining portion of the row within the Overflow segment. We only therefore have a table column count of 1 (cc:1).

To find out more about the corresponding Overflow segment, we first must determine the OBJECT_ID of the IOT:

SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_DETAILS_IOT2';

 OBJECT_ID
----------
     74116

This OBJECT_ID is used to name the corresponding Overflow segment which we can determine from DBA_TABLES as it has a format of SYS_IOT_OVER_object_id:

SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74116';

TABLE_NAME         IOT_NAME                 IOT_TYPE      BLOCKS
------------------ ------------------------ ------------ -------
SYS_IOT_OVER_74116 ALBUM_SALES_DETAILS_IOT2 IOT_OVERFLOW   71430

We notice this Overflow segment (at 71,430 blocks) is where the majority of our storage has been allocated.

Although it’s listed as a table, the Overflow segment can’t be directly accessed or manipulated. Any attempt to do so will result in an error:

SQL> select * from SYS_IOT_OVER_74116;
select * from SYS_IOT_OVER_74116
              *
ERROR at line 1:
ORA-25191: cannot reference overflow table of an index-organized table

If we look at a partial block dump of the Overflow segment block referenced in the previous IOT block dump:

Block header dump:  0x01800081
 Object id on Block? Y
 seg/obj: 0x12185  csc: 0x00.17482cc  itc: 1  flg: –  typ: 1 – DATA
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.0000a919  0x00c24a2e.03d2.2a  C—    0  scn 0x0000.01748279
bdba: 0x01800081
data_block_dump,data header at 0x1a4c2244
===============
tsiz: 0x1fb8
hsiz: 0x20
pbl: 0x1a4c2244
     76543210
flag=——–
ntab=1
nrow=7
frre=-1
fsbo=0x20
fseo=0x4a6
avsp=0x486
tosp=0x486
0xe:pti[0] nrow=7 offs=0
0x12:pri[0] offs=0x1bda
0x14:pri[1] offs=0x17fc
0x16:pri[2] offs=0x141e
0x18:pri[3] offs=0x1040
0x1a:pri[4] offs=0xc62
0x1c:pri[5] offs=0x884
0x1e:pri[6] offs=0x4a6
block_row_dump:
tab 0, row 0, @0x1bda
tl: 990 fb: —–L– lb: 0x0  cc: 1
col  0: [984]
 41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
 73 63 72 69 70 74 69 6f 6e

We notice the Overflow block contains 7 rows as we would expect, as this was all the IOT segment could previously manage when it had to store the large Description column values.

The table row directory contains 7 rows, with the first row (#0) having an offset at address 0x1bda, which is the actual location of the first row within the Overflow block.

Therefore, in order to find a specific Description column value of interest from the IOT, Oracle references the (say) nrid:  0x01800081.0 within the IOT index entry for the row. This in turns points to the relative block address (0x01800081) of the Overflow block containing the description and the corresponding row directory number (0), which in turn specifies the offset (say) 0x1bda to the actual location of the Description value within the Overflow block. Easy !!

If we Analyze the IOT table:

SQL> ANALYZE TABLE album_sales_details_iot2 COMPUTE STATISTICS;

Table analyzed.

SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_DETAILS_IOT2';

TABLE_NAME                       NUM_ROWS  CHAIN_CNT     BLOCKS
------------------------------ ---------- ---------- ----------
ALBUM_SALES_DETAILS_IOT2           500000     500000

We notice all the rows are listed as “Chained Rows“. This is because all the rows have a corresponding Description value stored in the Overflow segment and so the rows are not stored within the one block. As the previous query illustrated, this is no bad thing if we don’t need to reference these additional columns stored in the Overflow segment. It makes the resultant IOT table more compact and efficient to access.

However, on those (hopefully) rarer occasions when we do need to access the columns in the Overflow segment, this will clearly require additional block accesses:

SQL> SELECT * FROM album_sales_details_iot2 WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2379894191

----------------------------------------------------------------------------------------
| Id  | Operation        | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                     |   100 | 99400 |    18   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 |   100 | 99400 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
       5541  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

The above query which returns the Description column results in the consistent gets increasing to 32 consistent gets, from the 5 consistent gets when the Description wasn’t accessed and from the 19 consistent gets from when the Description column was co-located within the IOT segment. But this is a price we might be willing to pay if this query isn’t frequently executed while the frequently executed queries which don’t access the Description column are more efficient.

The Overflow segment gives us in a manner “the best of both worlds”. The ability to store just those columns of interest within the IOT segment (although these must always include all the Primary Key columns) and those that are less often accessed or too large to be efficiently stored within the IOT can be stored elsewhere. Effectively, it’s an index and table relationship except the table doesn’t have to store again the columns that are already stored within the index.

It’s all good news so far for IOTs …

Index Organized Tables – An Introduction Of Sorts (Pyramid Song) January 10, 2012

Posted by Richard Foote in Block Dumps, CBO, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Primary Key.
16 comments

Thought it was high time that I covered in a little detail the subject of Index Organized Tables (IOTs). When used appropriately, they can be an extremely useful method of storing and accessing data. Hopefully by the end of this series, you’ll have a better understanding of IOTs, their respective strengths and weaknesses and so perhaps be in a better position to take advantage of them when appropriate.

As I mentioned in a previous post, Martin Widlake has recently written an excellent series on IOTs, which I highly recommend. I’ll try to cover differing aspects of IOTs that will hopefully be of interest.

To start, let’s cover a very basic little example.

Let’s begin by creating and populating a simple Heap Table that holds information about musical albums (Note using an 8K blocksize in a MSSM tablespace):

SQL> CREATE TABLE album_sales(album_id number, country_id number, total_sales number, album_colour varchar2(20),
  2  CONSTRAINT album_sales_pk PRIMARY KEY(album_id, country_id));

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c IN 1..100 LOOP
  4        INSERT INTO album_sales VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We have a natural Primary Key that consists of two columns and an additional two columns of information.

Let’s look at some basic sizing information on the table and associated Primary Key index:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables WHERE table_name = 'ALBUM_SALES';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
      1570            0

SQL> ANALYZE INDEX album_sales_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1152          3       1062

So the table segment consists of 1570 blocks and the index segment 1152, with a total of 1062 leaf blocks.

OK, let’s run a basic query looking for all albums with an album_id=42:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3244723662

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   100 |  1800 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALBUM_SALES    |   100 |  1800 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ALBUM_SALES_PK |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       4084  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As we can see, things are pretty good. 18 consistent gets in order to return 100 rows isn’t bad at all. Clearly, the index has a good Clustering Factor and can retrieve the 100 required rows in a relatively efficient manner.

However, this is a very frequently executed query and we want to do even better. One thing we notice is that we only have a couple of columns in the table which are not part of the index. Perhaps if we included these columns in the index as well, we can then use the index to extract all the required data and thus eliminate the need to visit the table segment at all. Overloading an index in this manner is a common tuning technique and will hopefully reduce the number of required logical I/Os to run the query.

We can do this by dropping and recreating the index with all the columns, making sure the PK columns remain the leading columns. This will ensure the index can still be used to police the PK constraint:

SQL> ALTER TABLE album_sales DROP PRIMARY KEY;

Table altered.

SQL> CREATE INDEX album_sales_pk_i ON album_sales(album_id, country_id, total_sales, album_colour) COMPUTE STATISTICS;

Index created.

SQL> ALTER TABLE album_sales ADD constraint album_sales_pk PRIMARY KEY(album_id, country_id);

Table altered.

OK, so the index now contains all the columns in the table and is now used to police the PK constraint:

SQL> select constraint_name, constraint_type, index_name from dba_constraints where constraint_name = 'ALBUM_SALES_PK';

CONSTRAINT_NAME                C INDEX_NAME
------------------------------ - ------------------------------
ALBUM_SALES_PK                 P ALBUM_SALES_PK_I

Let’s now look at the size of the index:

SQL> ANALYZE INDEX album_sales_pk_i VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      2048          5       2006

OK, as expected the index is now somewhat larger as it now needs to accommodate the extra columns. The number of overall blocks allocated to the index is 2048, with leaf blocks increasing from 1062  to 2006 leaf blocks.

If we now re-run the query:

SQL> SELECT * FROM album_sales WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1126128764

-------------------------------------------------------------------------------------
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                  |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_PK_I |   100 |  1800 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
       3568  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

We notice things have indeed improved and we have reduced the number consistent gets from 18 down to just 11. Not a bad improvement !!

If look at a partial block dump of one of the index leaf blocks:

Leaf block dump
===============
header address 484409948=0x1cdf825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 5
kdxcosdc 0
kdxconro 258
kdxcofbo 552=0x228
kdxcofeo 1373=0x55d
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 20972941=0x140058d
kdxleprv 20972939=0x140058b
kdxledsz 0
kdxlebksz 8036
row#0[8010] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 12
col 2; len 5; (5):  c4 04 15 31 59
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b7
row#1[7984] flag: ——, lock: 0, len=26
col 0; len 2; (2):  c1 07
col 1; len 2; (2):  c1 13
col 2; len 5; (5):  c4 03 19 2c 3d
col 3; len 4; (4):  47 4f 4c 44
col 4; len 6; (6):  01 40 05 82 00 b8

We notice that each leaf entry is 26 bytes in length. The length of the four columns adds up to 13 bytes. The remaining 13 bytes is basically overhead required for each index entry:

2 bytes for flag and lock information in the index entry header

5 x 1 byte for each of the length bytes for each column

6 bytes for the 5th index column which is the index rowid

So that’s 13 bytes of overhead per index entry in this example index.

Well, everything is currently pretty good. We have the application now performing approximately 40% less work than it was previously. But we have one little issue. With the index now consisting of all the columns in the table and with the application using the index exclusively, what’s the point of now having the table? It’s wasting storage and wasting resources in having to be maintained for no purpose other than having to exist so that the index can in turn exist.

Wouldn’t it be nice if we can somehow just have the index, but without the underlining table. Enter the Index Organized Table (IOT), first introduced way back in Oracle 8.0. It’s basically an index structure that can exist without the need for an underlining table. The index structure itself is the table by which we can store and retrieve the necessary data.

OK, let’s now create a new version of this table with the same data, but this time as an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sals number, album_colour varchar2(20),
     CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3      FOR c in 1..100 LOOP
  4        INSERT INTO album_sales_IOT VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

The key clause is here ORGANIZATION INDEX. I’ll discuss other options and syntax in coming posts.

If we look now at the table segment:

SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables
  2  WHERE table_name = 'ALBUM_SALES_IOT';

    BLOCKS EMPTY_BLOCKS IOT_TYPE
---------- ------------ ------------
                        IOT

We see there is an IOT segment listed but consists of no blocks as it doesn’t physically exist …

If we look at the size of the corresponding index:

SQL> SELECT index_name, table_name, blevel, leaf_blocks FROM dba_indexes
  2  WHERE table_name = 'ALBUM_SALES_IOT';

INDEX_NAME           TABLE_NAME       BLEVEL LEAF_BLOCKS
-------------------- --------------- ------- -----------
ALBUM_SALES_IOT_PK   ALBUM_SALES_IOT       2        1550

SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE;

Index analyzed.

SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats;

    BLOCKS    BR_BLKS    LF_BLKS
---------- ---------- ----------
      1664          4       1550

We notice it’s smaller than the corresponding overloaded index for the Heap Table. The previous index consisted of 2048 blocks and 2006 leaf blocks but this index is somewhat smaller at just 1664 blocks and 1550 leaf blocks.

If we take a look at a partial block dump of a leaf block from the IOT:

Leaf block dump
===============
header address 483926620=0x1cd8225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 1
kdxcoopc 0x90: opcode=0: iot flags=I— is converted=Y
kdxconco 2
kdxcosdc 2
kdxconro 336
kdxcofbo 708=0x2c4
kdxcofeo 710=0x2c6
kdxcoavs 2
kdxlespl 0
kdxlende 0
kdxlenxt 20976645=0x1401405
kdxleprv 20976643=0x1401403
kdxledsz 0
kdxlebksz 8036
row#0[710] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 49
tl: 14 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 04 2f 10 59
col  1: [ 4]  47 4f 4c 44
row#1[732] flag: K—S-, lock: 2, len=22
col 0; len 2; (2):  c1 08
col 1; len 2; (2):  c1 4a
tl: 14 fb: –H-FL– lb: 0x0  cc: 2
col  0: [ 5]  c4 03 01 03 46
col  1: [ 4]  47 4f 4c 44

Firstly, we notice it’s definitely an IOT block dump as the IOT flag is set.

The structure of the index entry is somewhat different here. It basically consists of:

2 bytes for lock and flag info in the index header as previously

Next come the two Primary Key columns with their corresponding length bytes. Note an IOT must have a PK defined.

Following are 3 bytes for the table header consisting of a lock byte, flag byte and a byte to denote the number of table (non PK) columns (in this case 2).

Followed finally by the 2 Non-PK columns and their corresponding length bytes.

Note the big missing component here from the previous block dump is that there is no rowid defined with its corresponding length byte. No need for a rowid if there’s no corresponding table to point down to …

So the overall overhead has been reduced to:

2 byes for the index header

3 bytes for the table header

4 bytes for the 4 column lengths

for a total of 9 bytes, 4 less than the 13 bytes overhead required in the previous example. So the total length of an index entry has reduced down from 26 bytes to just 22 bytes. Hence, the overall reduction in the size of the corresponding IOT index.

So we have saved 1570 table blocks and 384 index blocks in total.

If we now re-run the same query:

SQL> SELECT * FROM album_sales_iot WHERE album_id = 42;

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1834499174

---------------------------------------------------------------------------------------
| Id  | Operation        | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                    |   100 |  1800 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| ALBUM_SALES_IOT_PK |   100 |  1800 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ALBUM_ID"=42)

 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
       3575  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

Not only have we saved ourselves some storage and having to maintain two physical segments, but things are a tad more efficient as well, reducing the number of consistent gets down from 11 to 10 as the corresponding index segment we need to access is smaller …

Enough to start with for now and yes the pun in the title is fully intended 🙂

Why Are My Indexes Still Valid Solution (A Second Face) October 20, 2011

Posted by Richard Foote in IOT, Oracle Indexes, Quiz, Secondary Indexes.
add a comment

I’ve been so busy lately, I just haven’t had any spare time to post.

For now, the quick answer to the last quiz is that the second table was indeed an Index Organized Table (IOT).

One of the nice benefits of an IOT is that when re-organised, unlike a Heap Table, all indexes remain valid, even the Secondary Indexes. I’ll explain why in my next post in the next few days. I’ll also explain why secondary indexes are one of the main disadvantages with IOTs as well.

Stay tuned !!