jump to navigation

IOT Secondary Indexes – The Logical ROWID Guess Component Part I (Lucky) April 26, 2012

Posted by Richard Foote in Index Organized Tables, IOT, Oracle Indexes, Primary Key, ROWID, Secondary Indexes.
8 comments

As discussed previously, an index entry within a Secondary Index on an Index Organized Table (IOT) basically consists of the indexed column(s) and the Logical Rowid, the PK column(s) and a “guess” to the physical block in the IOT containing the corresponding row.

Let’s discuss this “guess” component in a bit more detail.

When the Secondary Index is created, this guess is spot on and will indeed point to the correct block within the IOT structure that contains the row being referenced by the indexed entry.

When I initially created the Secondary Index on the Total_Sales column, all the physical guesses were accurate and indeed pointed to the correct blocks within the IOT structure. This can be confirmed by the following query:

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';
 INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

As we can see, the PCT_DIRECT_ACCESS value is 100, which means that 100% of all the guess components are correct. Therefore, the index behaves in a manner very similar to an ordinary Secondary Index with a rowid, in that all the initial accesses to the IOT are valid and there’s no need to subsequently re-access the IOT via the PK component. From the perspective of finding the required row entries with the IOT structure, the Secondary Index is as efficient as possible when all the guesses are valid.

If we run a little query to access a number of rows via this Secondary Index:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   102 |  1836 |   105   (0)| 00:00:02 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   102 |  1836 |   105   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   102 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        110  consistent gets
          0  physical reads
          0  redo size
       3657  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         99  rows processed

Note we’re returning 99 rows which requires 110 consistent gets. So a touch over 1 consistent get per row being access. Note these numbers, we’ll reference them again later …

OK, we’re now going to add some more rows to the table. This will result in 50-50 block splits occurring which will in turn cause a whole bunch of rows to move to new physical blocks within the IOT.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3       FOR c IN 101..200 LOOP
  4          INSERT INTO album_sales_iot
  5          VALUES(i,c,ceil(dbms_random.value(1,5000000)), 'Some new rows');
  6       END LOOP;
  7    END LOOP;
  8    COMMIT;
  9  END;
 10  /
 PL/SQL procedure successfully completed.

If we now collect fresh statistics and look at the index statistics again:

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'ALBUM_SALES_IOT', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                 58 NO

We notice that the PCT_DIRECT_ACCESS value has dropped significantly to just 58%. This means that only 58% of the guesses are now accurate and that in the other 42% of cases, Oracle is forced to now re-access the IOT again via the PK component stored in the Secondary Indexes. This results in additional consistent gets now likely being required to access the IOT via the index, resulting in a less efficient index.

If we now re-run the original query again:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

184 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   202 |  4646 |   376   (0)| 00:00:05 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   202 |  4646 |   376   (0)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   202 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        463  consistent gets
          0  physical reads
          0  redo size
       7144  bytes sent via SQL*Net to client
        656  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        184  rows processed

We can see that approximately doubled the number of rows are now returned (184 from 99 rows). However, the number of consistent gets has increased by approximately 4 fold (from 110 to 463). The index is now not as efficient in retrieving rows as it was previously, requiring now some 2.5 consistent gets per row being accessed.

To fix these guesses and make the index more efficient again, one can either ALTER the index with the REBUILD or the UPDATE BLOCK REFERENCES clause:

SQL> alter index album_sales_iot_total_sales_i UPDATE BLOCK REFERENCES;

Index altered.

If we now look at some fresh index statistics:

SQL> exec dbms_stats.gather_index_stats(ownname=> null, indname=> 'ALBUM_SALES_IOT_TOTAL_SALES_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

We notice that the index now has the PCT_DIRECT_ACCESS back at a nice high 100%. If we re-run the same query again:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

184 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   202 |  4646 |   206   (0)| 00:00:03 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   202 |  4646 |   206   (0)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   202 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        202  consistent gets
          0  physical reads
          0  redo size
       7144  bytes sent via SQL*Net to client
        656  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)

The consistent gets have now reduced substantially from 463 to just 202, back at a touch over 1 consistent get per row retrieved …

So, IOT Secondary Indexes can be as efficient as normal secondary indexes, but if the IOT is subject to 50-50 block splits, they’ll likely need to be maintained more regularly to ensure they stay nice and efficient. Another significant disadvantage associated with IOTs that have Secondary Indexes …

There’s a bit more I’ll like to say on the subject but I’ll leave it for a Part II :)

IOT Secondary Indexes: Primary Key Considerations (Beauty And The Beast) April 11, 2012

Posted by Richard Foote in Block Dumps, Index Organized Tables, IOT, Oracle Indexes, Primary Key, Secondary Indexes.
7 comments

As discussed previously, one of the nice features of an IOT Secondary Index is that it contains the mandatory Primary Key of the IOT, which is always maintained and can be used to access the necessary rows of the IOT regardless of  row movement within the IOT itself.

This can also be beneficial if only the PK columns of the table are required by the query (perhaps as part of a join) as a visit to the actual IOT table would be unnecessary.

However, one of the main disadvantages of an IOT Secondary Index is that it contains the PK of the IOT :)

Remember, one of the benefits of an IOT is that we don’t have to store columns twice as we would with a conventional Heap table, that being once within the table structure and again within the (often overloaded) PK  index. However, with an IOT Secondary index, we must store the PK columns again. In fact, we have to re-store the PK columns again for as many IOT Secondary indexes we have defined for the IOT.

So the overall additional overheads we’re talking about here becomes a product of two important factors. The actual overall size of the PK column(s) and the number of Secondary Indexes we have defined on the IOT. If the average size of the PK is large and/or we have a number of Secondary Indexes, then the overall overheads can be significant, reducing the benefits of the IOT.

If we look at the size of the IOT Secondary Index I created in my previous introductory post:

SQL> select leaf_blocks from dba_indexes where index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1728

If however we compare this with a secondary index associated with a conventional heap table containing identical data:

SQL> create table not_an_iot as select * from album_sales_IOT;

Table created.

SQL> create index not_an_iot_total_sales_i on not_an_iot(total_sales);

Index created.

SQL> select leaf_blocks from dba_indexes where index_name = 'NOT_AN_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1171

We notice that the IOT Secondary index is significantly larger, 1728 leaf blocks vs. 1171.

If we compare block dumps of the same index entry from both Secondary Indexes:

row#0[8016] flag: K—–, lock: 0, len=20
col 0; len 2; (2):  c1 06
col 1; len 3; (3):  c2 15 16
col 2; len 2; (2):  c1 5f
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 f1

Above is the IOT Secondary Index example, which is 20 bytes in length.

row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  01 01 68 7a 00 b4

Above is the Heap Table Secondary Index example, which is only 12 bytes in length.

The 8 bytes required for the table header and physical “guess” overheads within the IOT Secondary Index are almost cancelled out by the 7 bytes of overhead required for the ROWID column within the Heap Table Secondary index. However, most of the difference in length (20 bytes vs. 12 bytes) can be attributed to the 7 bytes required to store the PK columns and their associated length bytes in this example.

Obviously, the larger the PK, the greater the associated overheads. Obviously, the greater the number of IOT Secondary indexes, again the greater the overall associated overheads.

If we create a secondary index on a column that forms part of the PK, Oracle can be a lit bit cleverer. Following, we create an index on the COUNTRY_ID column, which is the second column of our PK (album_id, country_id):

SQL> create index album_sales_iot_country_id_i on album_sales_iot(country_id);

Index created.

We notice that for this new index, Oracle has eliminated “redundant” PK columns from the secondary index, as there’s no need to store the entire PK again as the indexed column itself already forms part of the PK:

SQL> select index_name, iot_redundant_pkey_elim from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     IOT_REDUNDANT_PKEY_ELIM
------------------------------ ------------------------
ALBUM_SALES_IOT_PK             NO
ALBUM_SALES_IOT_TOTAL_SALES_I  NO
ALBUM_SALES_IOT_COUNTRY_ID_I   YES

A quick look at a block dump of this secondary index will confirm that the PK portion of the index entry only contains the PK columns that are not included in the indexed column list (i.e. just the ALBUM_ID column):

row#0[8020] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#1[8004] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#2[7988] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5

So we have 3 index entries listed here. In each one:

col 0 represents  the indexed column (COUNTRY_ID) which happens to be part of the PK

col 1 is the remaining PK column yet to be defined in the index entry (ALBUM_ID)

col 0 (with a length of 4) represents the physical “guess”.

So Oracle still has defined within the index entry the full PK to access the IOT as necessary if the “guess” proves to be wrong.

The key points to take from this post is to fully consider the consequences of a large PK on any defined secondary index on an IOT and to fully consider the suitability of having the table defined as an IOT if you require many secondary indexes to be defined on the table.

More on this “guess” component in my next post …

Follow

Get every new post delivered to your Inbox.

Join 1,861 other followers