IOT Secondary Indexes – The Logical ROWID Guess Component Part II (Move On) May 8, 2012
Posted by Richard Foote in Index Block Size, Index Organized Tables, IOT, ROWID, Secondary Indexes.7 comments
Having mentioned a couple of dangers associated with IOT Secondary Indexes, thought I might discuss a couple of their nicer attributes.
In the previous post, we saw how 50-50 index block splits on the ALBUM_SALES_IOT IOT table caused rows to move to new leaf blocks, resulting in a degradation in the PCT_DIRECT_ACCESS value of the associated ALBUM_SALES_IOT_TOTAL_SALES_I secondary index, which in turn resulted in poorer performance when using this index. We had to rebuild the secondary index (or update block references) to make all the “guess” components accurate and the index efficient again and so point to the correct locations within the parent IOT.
So, if you have 50-50 block splits occurring in your IOT, this will degrade the efficiency of the associated IOT Secondary indexes over time.
However, if you don’t have 50-50 block splits and the entries in the IOT don’t move from leaf block to leaf block, then this will not be an issue. Remembering of course that many Primary Key values are based on a sequence which monotonically increases and results in 90-10 block splits rather than 50-50 block splits. 90-10 block splits don’t move data around, Oracle leaves the full blocks alone and simply adds a new block in the IOT Btree structure into which new values are added. Therefore, with IOT data not moving around, the “guess” component of the logical ROWIDS remain valid and don’t go stale over time and so the associated secondary indexes remain nice and efficient.
If we look at the current state of the ALBUM_SALES_IOT_TOTAL_SALES_I secondary index:
SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I'; INDEX_NAME PCT_DIRECT_ACCESS IOT ------------------------------ ----------------- --- ALBUM_SALES_IOT_TOTAL_SALES_I 100 NO
We notice the PCT_DIRECT_ACCESS is currently nice and efficient at 100%.
If we now add a bunch of new rows into the IOT, but this time with PK values that monotonically increase:
SQL> BEGIN 2 FOR i IN 5001..10000 LOOP 3 FOR c IN 201..300 LOOP 4 INSERT INTO album_sales_iot VALUES(i,c,ceil(dbms_random.value(1,5000000)), 'Yet more new rows'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed.
And collect fresh statistics:
SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'ALBUM_SALES_IOT', estimate_percent=>null, cascade=>true, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I'; INDEX_NAME PCT_DIRECT_ACCESS IOT ------------------------------ ----------------- --- ALBUM_SALES_IOT_TOTAL_SALES_I 100 NO
We notice that the PCT_DIRECT_ACCESS value remains unchanged. So, no 50-50 block split, no PCT_DIRECT_ACCESS degradation with regard the secondary indexes.
OK, another nice feature with IOT Secondary Indexes.
With a “normal” Heap table, if we were to MOVE and reorganise the table, all associated indexes become invalid as the Move results in all the rows being relocated and the indexes are not maintained during this process (as this would add considerably to the overhead in the Move process). All associated indexes have to be rebuilt after the Move operation completes, which is both expensive and adds considerably to the availability issues associated with the whole table reorg process as the table is locked during the Move operation. In short, moving a heap table is an expensive and an availability unfriendly process.
As this little demo illustrates, moving a heap table results in all indexes becoming unusable:
SQL> create table radiohead (id number constraint radiohead_pk primary key, code number, name varchar2(30)); Table created. SQL> create index code_i on radiohead(code); Index created. SQL> insert into radiohead select rownum, rownum, 'OK COMPUTER' from dual connect by level <= 100000; 100000 rows created. SQL> commit; Commit complete. SQL> select index_name, status from dba_indexes where table_name = 'RADIOHEAD'; INDEX_NAME STATUS ------------------------------ -------- RADIOHEAD_PK VALID CODE_I VALID SQL> alter table radiohead move; Table altered. SQL> select index_name, status from dba_indexes where table_name = 'RADIOHEAD'; INDEX_NAME STATUS ------------------------------ -------- RADIOHEAD_PK UNUSABLE CODE_I UNUSABLE
However, moving an IOT has a number of advantages over a heap table.
Firstly, as it’s an index structure, it can be reorganised and rebuilt in much the same way as we can rebuild any btree index. Remembering, an index can be rebuilt “online” (on Enterprise Edition), overcoming many of the locking issues associated with moving heap tables.
Additionally, although the physical locations of all the rows in the IOT change following a Move operation, the PK values themselves don’t change. Therefore, although the PCT_DIRECT_ACCESS value becomes 0, the indexes themselves are still Valid and usable as the PK component can still be used to access the relevant data.
So the syntax to move an IOT table can be expanded to be performed “Online” and all the secondary indexes will remain “Valid”:
SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT'; INDEX_NAME STATUS PCT_DIRECT_ACCESS ------------------------------ -------- ----------------- ALBUM_SALES_IOT_PK VALID 0 ALBUM_SALES_IOT_TOTAL_SALES_I VALID 100 ALBUM_SALES_IOT_COUNTRY_ID_I VALID 100 SQL> alter table album_sales_iot move online; Table altered. SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT'; INDEX_NAME STATUS PCT_DIRECT_ACCESS ------------------------------ -------- ----------------- ALBUM_SALES_IOT_PK VALID 0 ALBUM_SALES_IOT_TOTAL_SALES_I VALID 0 ALBUM_SALES_IOT_COUNTRY_ID_I VALID 0
So although the PCT_DIRECT_ACCESS values for the secondary indexes has gone down to 0, making them less efficient as a result, they do at least remain valid and usable by the CBO:
SQL> select * from album_sales_iot where total_sales between 424242 and 424343; 26 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1433198708 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 33 | 858 | 68 (0)| 00:00:01 | |* 1 | INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK | 33 | 858 | 68 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I | 33 | | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343) 2 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 83 consistent gets 53 physical reads 0 redo size 1655 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 26 rows processed
The secondary index is still used by the CBO, although at 83 consistent gets in this example, it’s not as efficient as it could be.
The rebuild of the secondary index can be performed subsequently to repair the stale guesses and improve the efficiency of the index as desired:
SQL> alter index album_sales_iot_total_sales_i rebuild online; Index altered. SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT'; INDEX_NAME STATUS PCT_DIRECT_ACCESS ------------------------------ -------- ----------------- ALBUM_SALES_IOT_PK VALID 0 ALBUM_SALES_IOT_TOTAL_SALES_I VALID 100 ALBUM_SALES_IOT_COUNTRY_ID_I VALID 0 SQL> select * from album_sales_iot where total_sales between 424242 and 424343; 26 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1433198708 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 33 | 858 | 36 (0)| 00:00:01 | |* 1 | INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK | 33 | 858 | 36 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I | 33 | | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343) 2 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 31 consistent gets 0 physical reads 0 redo size 1655 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 26 rows processed
Following the rebuild of the secondary index and getting the PCT_DIRECT_ACCESS back to 100%, the example query is now more efficient, with a reduction of consistent gets down from 83 to just 31.
So IOTs can be less problematic to reorganise and if 90-10 block splits are performed, the impact on the secondary indexes is minimised.
IOT Secondary Indexes – The Logical ROWID Guess Component Part I (Lucky) April 26, 2012
Posted by Richard Foote in Index Organized Tables, IOT, Oracle Indexes, Primary Key, ROWID, Secondary Indexes.8 comments
As discussed previously, an index entry within a Secondary Index on an Index Organized Table (IOT) basically consists of the indexed column(s) and the Logical Rowid, the PK column(s) and a “guess” to the physical block in the IOT containing the corresponding row.
Let’s discuss this “guess” component in a bit more detail.
When the Secondary Index is created, this guess is spot on and will indeed point to the correct block within the IOT structure that contains the row being referenced by the indexed entry.
When I initially created the Secondary Index on the Total_Sales column, all the physical guesses were accurate and indeed pointed to the correct blocks within the IOT structure. This can be confirmed by the following query:
SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim 2 FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I'; INDEX_NAME PCT_DIRECT_ACCESS IOT ------------------------------ ----------------- --- ALBUM_SALES_IOT_TOTAL_SALES_I 100 NO
As we can see, the PCT_DIRECT_ACCESS value is 100, which means that 100% of all the guess components are correct. Therefore, the index behaves in a manner very similar to an ordinary Secondary Index with a rowid, in that all the initial accesses to the IOT are valid and there’s no need to subsequently re-access the IOT via the PK component. From the perspective of finding the required row entries with the IOT structure, the Secondary Index is as efficient as possible when all the guesses are valid.
If we run a little query to access a number of rows via this Secondary Index:
SQL> SELECT * FROM album_sales_iot 2 WHERE total_sales BETWEEN 2742000 and 2743000; 99 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1433198708 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 102 | 1836 | 105 (0)| 00:00:02 | |* 1 | INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK | 102 | 1836 | 105 (0)| 00:00:02 | |* 2 | INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I | 102 | | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000) 2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 110 consistent gets 0 physical reads 0 redo size 3657 bytes sent via SQL*Net to client 590 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 99 rows processed
Note we’re returning 99 rows which requires 110 consistent gets. So a touch over 1 consistent get per row being access. Note these numbers, we’ll reference them again later …
OK, we’re now going to add some more rows to the table. This will result in 50-50 block splits occurring which will in turn cause a whole bunch of rows to move to new physical blocks within the IOT.
SQL> BEGIN 2 FOR i IN 1..5000 LOOP 3 FOR c IN 101..200 LOOP 4 INSERT INTO album_sales_iot 5 VALUES(i,c,ceil(dbms_random.value(1,5000000)), 'Some new rows'); 6 END LOOP; 7 END LOOP; 8 COMMIT; 9 END; 10 / PL/SQL procedure successfully completed.
If we now collect fresh statistics and look at the index statistics again:
SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'ALBUM_SALES_IOT', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim 2 FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I'; INDEX_NAME PCT_DIRECT_ACCESS IOT ------------------------------ ----------------- --- ALBUM_SALES_IOT_TOTAL_SALES_I 58 NO
We notice that the PCT_DIRECT_ACCESS value has dropped significantly to just 58%. This means that only 58% of the guesses are now accurate and that in the other 42% of cases, Oracle is forced to now re-access the IOT again via the PK component stored in the Secondary Indexes. This results in additional consistent gets now likely being required to access the IOT via the index, resulting in a less efficient index.
If we now re-run the original query again:
SQL> SELECT * FROM album_sales_iot 2 WHERE total_sales BETWEEN 2742000 and 2743000; 184 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1433198708 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 202 | 4646 | 376 (0)| 00:00:05 | |* 1 | INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK | 202 | 4646 | 376 (0)| 00:00:05 | |* 2 | INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I | 202 | | 4 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000) 2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 463 consistent gets 0 physical reads 0 redo size 7144 bytes sent via SQL*Net to client 656 bytes received via SQL*Net from client 14 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 184 rows processed
We can see that approximately doubled the number of rows are now returned (184 from 99 rows). However, the number of consistent gets has increased by approximately 4 fold (from 110 to 463). The index is now not as efficient in retrieving rows as it was previously, requiring now some 2.5 consistent gets per row being accessed.
To fix these guesses and make the index more efficient again, one can either ALTER the index with the REBUILD or the UPDATE BLOCK REFERENCES clause:
SQL> alter index album_sales_iot_total_sales_i UPDATE BLOCK REFERENCES; Index altered.
If we now look at some fresh index statistics:
SQL> exec dbms_stats.gather_index_stats(ownname=> null, indname=> 'ALBUM_SALES_IOT_TOTAL_SALES_I', estimate_percent=> null); PL/SQL procedure successfully completed. SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim 2 FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I'; INDEX_NAME PCT_DIRECT_ACCESS IOT ------------------------------ ----------------- --- ALBUM_SALES_IOT_TOTAL_SALES_I 100 NO
We notice that the index now has the PCT_DIRECT_ACCESS back at a nice high 100%. If we re-run the same query again:
SQL> SELECT * FROM album_sales_iot 2 WHERE total_sales BETWEEN 2742000 and 2743000; 184 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1433198708 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 202 | 4646 | 206 (0)| 00:00:03 | |* 1 | INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK | 202 | 4646 | 206 (0)| 00:00:03 | |* 2 | INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I | 202 | | 4 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000) 2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 202 consistent gets 0 physical reads 0 redo size 7144 bytes sent via SQL*Net to client 656 bytes received via SQL*Net from client 14 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk)
The consistent gets have now reduced substantially from 463 to just 202, back at a touch over 1 consistent get per row retrieved …
So, IOT Secondary Indexes can be as efficient as normal secondary indexes, but if the IOT is subject to 50-50 block splits, they’ll likely need to be maintained more regularly to ensure they stay nice and efficient. Another significant disadvantage associated with IOTs that have Secondary Indexes …
There’s a bit more I’ll like to say on the subject but I’ll leave it for a Part II 🙂
IOT Secondary Indexes: Primary Key Considerations (Beauty And The Beast) April 11, 2012
Posted by Richard Foote in Block Dumps, Index Organized Tables, IOT, Oracle Indexes, Primary Key, Secondary Indexes.7 comments
As discussed previously, one of the nice features of an IOT Secondary Index is that it contains the mandatory Primary Key of the IOT, which is always maintained and can be used to access the necessary rows of the IOT regardless of row movement within the IOT itself.
This can also be beneficial if only the PK columns of the table are required by the query (perhaps as part of a join) as a visit to the actual IOT table would be unnecessary.
However, one of the main disadvantages of an IOT Secondary Index is that it contains the PK of the IOT 🙂
Remember, one of the benefits of an IOT is that we don’t have to store columns twice as we would with a conventional Heap table, that being once within the table structure and again within the (often overloaded) PK index. However, with an IOT Secondary index, we must store the PK columns again. In fact, we have to re-store the PK columns again for as many IOT Secondary indexes we have defined for the IOT.
So the overall additional overheads we’re talking about here becomes a product of two important factors. The actual overall size of the PK column(s) and the number of Secondary Indexes we have defined on the IOT. If the average size of the PK is large and/or we have a number of Secondary Indexes, then the overall overheads can be significant, reducing the benefits of the IOT.
If we look at the size of the IOT Secondary Index I created in my previous introductory post:
SQL> select leaf_blocks from dba_indexes where index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I'; LEAF_BLOCKS ----------- 1728
If however we compare this with a secondary index associated with a conventional heap table containing identical data:
SQL> create table not_an_iot as select * from album_sales_IOT; Table created. SQL> create index not_an_iot_total_sales_i on not_an_iot(total_sales); Index created. SQL> select leaf_blocks from dba_indexes where index_name = 'NOT_AN_IOT_TOTAL_SALES_I'; LEAF_BLOCKS ----------- 1171
We notice that the IOT Secondary index is significantly larger, 1728 leaf blocks vs. 1171.
If we compare block dumps of the same index entry from both Secondary Indexes:
row#0[8016] flag: K—–, lock: 0, len=20
col 0; len 2; (2): c1 06
col 1; len 3; (3): c2 15 16
col 2; len 2; (2): c1 5f
tl: 8 fb: –H-FL– lb: 0x0 cc: 1
col 0: [ 4] 01 01 41 f1
Above is the IOT Secondary Index example, which is 20 bytes in length.
row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2): c1 06
col 1; len 6; (6): 01 01 68 7a 00 b4
Above is the Heap Table Secondary Index example, which is only 12 bytes in length.
The 8 bytes required for the table header and physical “guess” overheads within the IOT Secondary Index are almost cancelled out by the 7 bytes of overhead required for the ROWID column within the Heap Table Secondary index. However, most of the difference in length (20 bytes vs. 12 bytes) can be attributed to the 7 bytes required to store the PK columns and their associated length bytes in this example.
Obviously, the larger the PK, the greater the associated overheads. Obviously, the greater the number of IOT Secondary indexes, again the greater the overall associated overheads.
If we create a secondary index on a column that forms part of the PK, Oracle can be a lit bit cleverer. Following, we create an index on the COUNTRY_ID column, which is the second column of our PK (album_id, country_id):
SQL> create index album_sales_iot_country_id_i on album_sales_iot(country_id); Index created.
We notice that for this new index, Oracle has eliminated “redundant” PK columns from the secondary index, as there’s no need to store the entire PK again as the indexed column itself already forms part of the PK:
SQL> select index_name, iot_redundant_pkey_elim from dba_indexes where table_name = 'ALBUM_SALES_IOT'; INDEX_NAME IOT_REDUNDANT_PKEY_ELIM ------------------------------ ------------------------ ALBUM_SALES_IOT_PK NO ALBUM_SALES_IOT_TOTAL_SALES_I NO ALBUM_SALES_IOT_COUNTRY_ID_I YES
A quick look at a block dump of this secondary index will confirm that the PK portion of the index entry only contains the PK columns that are not included in the indexed column list (i.e. just the ALBUM_ID column):
row#0[8020] flag: K—–, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 8 fb: –H-FL– lb: 0x0 cc: 1
col 0: [ 4] 01 01 38 e5
row#1[8004] flag: K—–, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 03
tl: 8 fb: –H-FL– lb: 0x0 cc: 1
col 0: [ 4] 01 01 38 e5
row#2[7988] flag: K—–, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 04
tl: 8 fb: –H-FL– lb: 0x0 cc: 1
col 0: [ 4] 01 01 38 e5
So we have 3 index entries listed here. In each one:
col 0 represents the indexed column (COUNTRY_ID) which happens to be part of the PK
col 1 is the remaining PK column yet to be defined in the index entry (ALBUM_ID)
col 0 (with a length of 4) represents the physical “guess”.
So Oracle still has defined within the index entry the full PK to access the IOT as necessary if the “guess” proves to be wrong.
The key points to take from this post is to fully consider the consequences of a large PK on any defined secondary index on an IOT and to fully consider the suitability of having the table defined as an IOT if you require many secondary indexes to be defined on the table.
More on this “guess” component in my next post …
Indexed Organized Tables – An Introduction to IOT Secondary Indexes (A Second Face) March 19, 2012
Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Secondary Indexes.14 comments
Man, its been ages since I had free time to update the blog, what with birthday parties to organise, Roger Water concerts to attend and Radiohead concerts in the planning !! OK, time to take an initial look at Secondary Indexes for Index Organized Tables (IOTs).
If the IOT needs to be accessed via the Primary Key (PK) column(s), then no problem, the IOT structure must have a PK defined and the logical structure of the IOT ensures that data within the IOT is ordered based on the PK. Therefore, the IOT can be navigated like any conventional PK and the necessary data can be efficiently accessed.
But what if we want to access the data efficiently via Non-PK columns or without specify the leading column of the PK ? Can we create secondary indexes on a IOT ?
When IOTs were first introduced way back in Oracle8, secondary indexes weren’t supported (they came later in 8i). That’s likely due to the fact Oracle had to resolve a tricky issue in relation to indexing an IOT structure, that being what to do when indexing rows that potentially move around all the time ?
With a conventional Heap table, once a row is inserted into the table, it doesn’t generally subsequently move. There are relatively few examples of when this occurs, for example updating the partitioned column of a row such that it needs to be stored in another partition. This is recognised as a rather expensive thing to do as not only do at least two blocks need to be accessed and modified but it also requires associated indexes to be updated as well. As such, it generally requires explicitly allowing such activities to occur (by enabling row movement and the such). Note, when rows migrate to another block due to an increase in row size, indexes are not impacted and still reference the original block and the remaining stub of the row which points to the new block/location of the row.
But with IOTs, the story can be very different. When a 50-50 index block split occurs, roughly half the rows in the leaf block move to a new block. A relatively expensive operation would be even more expensive if Oracle had to also update the index entries of all secondary indexes that referenced all these moved rows. Although rare with Heap tables, rows moving to new locations could be relatively common in an IOT due to associated 50-50 block split operations.
To deal with the difficulties of frequently moving rows within an IOT, Oracle created the IOT Secondary Index structure. It has three main components:
- The indexed column values
- The PK columns of the associated IOT
- A “guess” that points to the physical location of the rows within the IOT, initially at the time the index is created
So the IOT Secondary Index is used in the following fashion. During an index scan, Oracle attempts to use the “guess” to access the block that was the last known physical location of the row within the IOT. If it finds the required row in the IOT, great. The index performs in a similar manner to using a rowid with a conventional secondary index. However, if the required row is nowhere to be seen within the referenced block, Oracle tries again, this time using the PK value contained with the IOT Secondary Index to perform a Unique Scan of the IOT. This is a little more expensive to perform as it requires navigating down the branch structures of the IOT, but is at least guaranteed to find the row this time in its current location.
So in the best case scenario, the index performs similar to that of a normal secondary index. In the worst case scenario where the row has moved, the index is forced to perform an additional Unique Scan of the IOT using the PK but at least this has the potential to be much more efficient that a Fast Full Scan of the IOT in order to find the necessary row.
The key point to note here is that the secondary index is not updated when a block split on the parent IOT occurs. The “guess” via the physical pointer reference simply becomes stale and the PK which is also stored within the secondary index is used as a backup method of accessing the required row.
If we start with a traditionally simple little demo, let’s first create and populate an IOT:
SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sales number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX; Table created. SQL> begin 2 for i in 1..5000 loop 3 for c in 1..100 loop 4 insert into album_sales_iot values (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD'); 5 end loop; 6 end loop; 7 commit; 8 end; 9 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we now run a query based on the non-PK TOTAL_SALES column:
SQL> select * from album_sales_iot where total_sales = 2000; ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR ---------- ---------- ----------- -------------------- 1764 56 2000 GOLD Execution Plan ---------------------------------------------------------- Plan hash value: 1789589470 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 18 | 425 (1)| 00:00:06 | |* 1 | INDEX FAST FULL SCAN| ALBUM_SALES_IOT_PK | 1 | 18 | 425 (1)| 00:00:06 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("TOTAL_SALES"=2000) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 1586 consistent gets 0 physical reads 0 redo size 757 bytes sent via SQL*Net to client 523 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We see that Oracle has no real choice (the PK is of no direct use here) but to perform an expensive FAST FULL INDEX SCAN, even though it correctly knows relatively few rows are to be retrieved.
If we create a secondary index on the IOT however:
SQL> create index album_sales_IOT_total_sales_i on album_sales_iot(total_sales); Index created. SQL> select * from album_sales_iot where total_sales = 2000; ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR ---------- ---------- ----------- -------------------- 1764 56 2000 GOLD Execution Plan ---------------------------------------------------------- Plan hash value: 1433198708 --------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 18 |4 (0)| 00:00:01 | |* 1 | INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK | 1 | 18 |4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I | 1 | |3 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("TOTAL_SALES"=2000) 2 - access("TOTAL_SALES"=2000) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 5 consistent gets 5 physical reads 0 redo size 757 bytes sent via SQL*Net to client 523 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We notice that the index is used as expected and the number of consistent gets has dropped significantly. Notice also that the IOT is accessed subsequently not via Index ROWIDs but by a INDEX UNIQUE SCAN via the IOT PK. More on this later …
If we look at a partial block dump of an index entry within the IOT Secondary index:
row#0[8015] flag: K—–, lock: 0, len=21
col 0; len 3; (3): c2 1f 28
col 1; len 3; (3): c2 15 37
col 2; len 2; (2): c1 1b
tl: 8 fb: –H-FL– lb: 0x0 cc: 1
col 0: [ 4] 01 01 41 da
col 0 represents the indexed value (TOTAL_SALES)
col 1 and col 2 represent the PK columns (ALBUM_ID and COUNTRY_ID)
Following the 3 byte table header overhead required for the “guess”, we have the second col 0, which represents the 4 byte “guess” to the last known physical location of the row.
Much more to follow shortly …
Index Organized Tables – PCTTHRESHOLD (The Wedding Song) February 8, 2012
Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, PCTTHRESHOLD.7 comments
I’ve recently returned from a great two-week holiday, firstly at the Australian Open Tennis (what a final !!) and then up at the Gold Coast in not quite so sunny Queensland. Time now to get back to my blog 🙂
In my previous IOT examples, we had a very large column called Description which we didn’t really want to store within the Index Organized Table as it would cause the resultant index structure to get very inflated and inefficient. All the rows contained a very large Description value so it never made sense to include the Description column within the IOT.
In the following example, the Description column has values of varying lengths. Some of the values remain very large, however many of the Description values are quite moderate in size and wouldn’t be problematic to store within the IOT. Indeed, it would be quite beneficial as it wouldn’t be necessary to perform additional I/Os to the Overflow segment in cases where the Description was quite small in size and required by the application.
PCTTHRESHOLD gives us more flexibility in what is actually stored within the IOT index structure by storing the non-PK columns up to the INCLUDING clause within the IOT but only if the row length to be stored inside the IOT is below a specified percentage threshold of the block size. So with a PCTTHRESHOLD of (say) 5, the non-PK columns up to the INCLUDING clause will be included within the IOT but only if the resultant row size is less than 5% of the blocksize. If a row size were to be greater than the specified percentage threshold of the block size, then any non-PK columns that would violate this length threshold would not be included within the IOT and stored instead within the Overflow segment.
In the following example, every other row is actually quite small and we would want these rows to have the Description value stored within the IOT. Therefore, we have modified the IOT table definition to include the Description column if the resultant row is less than 5% of the (8K in this case) blocksize:
SQL> CREATE TABLE album_sales_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING description PCTTHRESHOLD 5 OVERFLOW TABLESPACE bowie2; Table created. SQL> BEGIN 2 FOR i in 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 if mod(c,2) = 1 then 5 INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description'); 6 else INSERT INTO album_sales_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A short description'); 7 end if; 8 END LOOP; 9 END LOOP; 10 COMMIT; 11 END; 12 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we look at the size of the resultant IOT:
SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE; Index analyzed. SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 2176 3 5 2052
The IOT is only of a moderate size, with 5 branch blocks and 2,052 leaf blocks.
If we look at the size of the Overflow segment:
SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_IOT'; OBJECT_ID ---------- 74209 SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74209'; TABLE_NAME IOT_NAME IOT_TYPE BLOCKS ------------------ ---------------- ------------ ---------- SYS_IOT_OVER_74209 ALBUM_SALES_IOT IOT_OVERFLOW 35715
We see that the vast majority of the storage is still allocated to the Overflow segment, at 35,715 blocks in size.
If look at a partial block dump of an IOT leaf block:
Leaf block dump
===============
header address 461972060=0x1b89225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 1
kdxcoopc 0x97: opcode=7: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 251
kdxcofbo 538=0x21a
kdxcofeo 561=0x231
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21053971=0x1414213
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[561] flag: K—S-, lock: 2, len=23
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 15 fb: –H-F— lb: 0x0 cc: 1
nrid: 0x01811901.0
col 0: [ 5] c4 04 57 1d 44
row#1[584] flag: K—S-, lock: 2, len=36
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 03
tl: 28 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 4] c3 1d 2a 2e
col 1: [19] 41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e
row#2[620] flag: K—S-, lock: 2, len=23
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 04
tl: 15 fb: –H-F— lb: 0x0 cc: 1
nrid: 0x01811901.1
col 0: [ 5] c4 04 22 2d 07
row#3[643] flag: K—S-, lock: 2, len=37
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 05
tl: 29 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 5] c4 04 36 17 52
col 1: [19] 41 20 73 68 6f 72 74 20 64 65 73 63 72 69 70 74 69 6f 6e
We notice the leaf block contains 251 row entries. Half the rows with a Description of 19 bytes have the Description value stored within the IOT leaf block, while the other half of rows with the larger Description values contain a nrid that refers to the corresponding Description within the Overflow segment.
If we analyze the table:
SQL> ANALYZE TABLE album_sales_iot COMPUTE STATISTICS; Table analyzed. SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_IOT'; TABLE_NAME NUM_ROWS CHAIN_CNT BLOCKS ------------------------------ ---------- ---------- ---------- ALBUM_SALES_IOT 500000 250000
We notice that only half the rows are now “chained rows”.
If we run a query that only references the rows with a small Description that are stored within the IOT structure:
SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=0; 50 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1834499174 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 510 | 5 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_IOT_PK | 1 | 510 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) filter(MOD("COUNTRY_ID",2)=0) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 7 consistent gets 0 physical reads 0 redo size 2211 bytes sent via SQL*Net to client 557 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 50 rows processed
The query at 7 consistent gets is relatively efficient as all the required data can be found within the IOT.
If however we run a query that references the larger Description rows:
SQL> SELECT * FROM album_sales_iot WHERE album_id = 42 and mod(country_id,2)=1; 50 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1834499174 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 510 | 5 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_IOT_PK | 1 | 510 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) filter(MOD("COUNTRY_ID",2)=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 18 consistent gets 0 physical reads 0 redo size 4147 bytes sent via SQL*Net to client 557 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 50 rows processed
We see that it’s less efficient at 18 consistent gets as it needs to obviously access a larger volume of data and requires additional I/Os to access the corresponding Overflow segment.
So, with a combination of the INCLUDING and PCTTHRESHOLD clauses, one can control what data is and is not included within the IOT index structure.
Index Organized Tables – Overflow Segment Part II (The Loneliest Guy) January 18, 2012
Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, Primary Key.3 comments
In my previous post on Index Organized Tables (IOT), I introduced the concept of the IOT Overflow Segment, where we can store columns that we may not want to include within the actual IOT index structure. Before we move on, I just wanted to cover off a few additional points that could be a trap for the unwary …
In my experience, the Primary Key (PK) columns of a table are typically the first columns defined in the table. This has certainly been standard practice in most environments I’ve seen. This makes sense in that the PK are in many ways the “key” column(s) in the table and are identified as such by having the prestigious honour of being the first column(s) defined within the table. Most people look at and intuitively expect the first columns in the table to be the PK columns and for that reason alone, it’s probably good practice to consistently define the PK columns in this manner.
However, there’s also a good argument why having the PK columns as the leading columns in the table is precisely the wrong location for them. As many tables are “primarily” accessed via the PK columns and so accessed directly through the associated PK index, the application already knows the PK values of the row in question. Therefore, it’s somewhat inefficient to then have the PK columns the first columns defined in the table as these generally have to be read through and ignored before we get to the non-PK columns that are of direct interest and the reason for visiting the table block in the first place. By placing the PK columns after the most accessed non-PK columns, we avoid having to unnecessarily read through these PK columns again when accessing the table via the PK index.
I personally prefer to define the PK columns first in a standardised manner, with the advantages of avoiding possible confusion and misunderstandings outweighing any possible performance improvements. However, I can at least see the logic and merit of not following this standard with Heap tables.
The same however can not really be said for IOTs and I would strongly recommend defining the PK columns first in an IOT …
I’m going to run the same demo as I did in my last post on the Overflow Segment, but with one subtle change. I’m not going to define the two PK columns first but rather have them defined after my heavily accessed non-PK column:
SQL> CREATE TABLE album_sales_iot(total_sales NUMBER, album_id NUMBER, country_id NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING country_id OVERFLOW TABLESPACE bowie2; Table created.
So in this example, my leading column is the non-PK total_sales column, followed then by the two PK columns. I still only want these 3 columns to be included in the actual IOT structure, so I have my INCLUDING clause only including columns up to the country_id column. I want the remaining large description column to be stored separately in an Overflow segment.
OK, let’s populate this table with the same data we used previously:
SQL> BEGIN 2 FOR i in 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_iot VALUES(ceil(dbms_random.value(1,5000000)), i, c, 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description'); 6 END LOOP; 9 END LOOP; 10 COMMIT; 11 END; 12 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we describe this table, we get the expected listing:
SQL> desc album_sales_iot Name Null? Type ----------------------------------------- -------- ---------------------------- TOTAL_SALES NUMBER ALBUM_ID NOT NULL NUMBER COUNTRY_ID NOT NULL NUMBER DESCRIPTION VARCHAR2(1000)
With the columns listed in the order as we defined them in the table.
If we query the column details from dba_tab_columns:
SQL> select column_id, column_name from dba_tab_columns where table_name = 'ALBUM_SALES_IOT' order by column_id; COLUMN_ID COLUMN_NAME ---------- ------------------------------ 1 TOTAL_SALES 2 ALBUM_ID 3 COUNTRY_ID 4 DESCRIPTION
We again find the column order is as we defined them in the table.
When we run the same query we ran last time that returned the data with 5 consistent gets:
SQL> set arraysize 100 SQL> select album_id, country_id, total_sales from album_sales_iot where album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1834499174 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1300 | 18 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_IOT_PK | 100 | 1300 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 20 consistent gets 0 physical reads 0 redo size 2394 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
We notice that performance is a lot worse, with 20 consistent gets now required. Obviously, something has changed unexpectedly ???
The first clue on what’s going on here can be found by looking at dba_tab_cols:
SQL> select column_id, segment_column_id, column_name from dba_tab_cols where table_name = 'ALBUM_SALES_IOT' order by column_id; COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME ---------- ----------------- ------------------------------ 1 3 TOTAL_SALES 2 1 ALBUM_ID 3 2 COUNTRY_ID 4 4 DESCRIPTION
The SEGMENT_COLUMN_ID column determines the order of the columns as they’re actually stored within the segment and we notice the column order is different. The two PK columns are listed first, with the total_sales column only listed in the 3rd position.
As discussed in the IOT Introduction post, the structure of an index entry in an IOT has the PK columns as the leading columns, following by the non-PK columns in the table portion. This is critical because the PK columns determine the location within the IOT table where new rows need to be inserted and the subsequent ordering of the rows in the table. As such, the PK columns must always be the leading columns of an IOT, despite how the table is actually defined at creation time. If the PK columns are not listed first in the table creation DDL statement, Oracle will automatically re-order the columns and place the PK columns first regardless.
This now has consequences on the INCLUDING clause if specified. In the above table creation statement, the INCLUDING clause specified the country_id column. Although defined as the third column, as it’s a PK column, Oracle has automatically re-ordered the columns such that it’s physically listed as the second column within the IOT segment. Unfortunately the INCLUDING clause is only applied after the re-ordering of the columns and as such, the total_sales column which is now logically listed third and now after the country_id column, is not therefore actually included in the IOT index structure as (perhaps) intended.
A partial block dump of an IOT leaf block will confirm his:
Leaf block dump
===============
header address 298590812=0x11cc225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 421
kdxcofbo 878=0x36e
kdxcofeo 879=0x36f
kdxcoavs 1
kdxlespl 0
kdxlende 0
kdxlenxt 21052811=0x1413d8b
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[879] flag: K—–, lock: 0, len=17
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 9 fb: –H-F— lb: 0x0 cc: 0
nrid: 0x01811911.0
row#1[896] flag: K—–, lock: 0, len=17
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 03
tl: 9 fb: –H-F— lb: 0x0 cc: 0
nrid: 0x01811911.1
As we can see, the IOT row entries only consist of the PK columns and the row reference to the corresponding Overflow segment. None of the non-PK columns (such as total_sales) are co-located within the IOT segment as the table column count is 0 (cc: 0).
As a result, additional consistent gets are now required to fetch the total_sales column from the Overflow segment to satisfy the query. This explains why the query is now less efficient than it was previously.
My recommendation with regard to defining IOTs is to simply list the PK columns first. This will ensure the INCLUDING clause is applied as intended and will generally reduce confusion and misunderstandings. Otherwise, the INCLUDING clause needs to specify a Non-PK column to ensure more than just the PK columns are actually included in the IOT segment, the consequences of which may not be obvious to the casual observer of the DDL or describer of the table.
Jonathan Lewis, a great source of information on indexes and Oracle in general has previously discussed this same IOT Trap on his blog.
Index Organized Tables – Overflow Segment (Shadow Man) January 13, 2012
Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment.14 comments
In my previous introductory IOT post, I illustrated how an Index Organized Table (IOT) might be worth consideration if most or all columns in a table were to be included within an index.
I’m going to use a slightly different demo this time, replacing one of the columns with a much larger DESCRIPTION column, one which is rarely accessed by the application:
SQL> CREATE TABLE album_sales_details_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX; Table created. SQL> BEGIN 2 FOR i in 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_details_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
Sorry for the unimaginative manner of loading the description field but you get the point 🙂
OK, let’s have a look at the size of the IOT:
SQL> ANALYZE INDEX album_sales_det_pk VALIDATE STRUCTURE; Index analyzed. SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 71680 3 116 71429
As expected, the IOT is quite large as it has to accommodate the very large Description field within the IOT index structure. At 71,429 leaf blocks for the 500,000 rows in the table, that’s just 7 rows on average per leaf block.
The application doesn’t generally access the Description column with the following query typical (Note: to make fetching data as efficient as possible, I’ve set the arraysize to 100):
SQL> set arraysize 100 SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 521866300 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1300 | 17 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_DET_PK | 100 | 1300 | 17 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 19 consistent gets 0 physical reads 0 redo size 2387 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
The query requires 19 consistent gets to retrieve the 100 rows because even though the data is extremely well clustered, there are very few rows per leaf block.
If we look at a partial block dump of one of these IOT leaf blocks:
Leaf block dump
===============
header address 548373084=0x20af825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 7
kdxcofbo 50=0x32
kdxcofeo 1011=0x3f3
kdxcoavs 961
kdxlespl 0
kdxlende 0
kdxlenxt 20978307=0x1401a83
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1011] flag: K—–, lock: 0, len=1004
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 996 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 5] c4 04 05 3b 03
col 1: [984]
41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
73 63 72 69 70 74 69 6f 6e
We can see the leaf block only has 7 rows, with the vast majority of space taken up by the very large Description column.
Considering the Description column is so large and/or that it’s rarely accessed, wouldn’t it be nice if we didn’t have to store this column directly within the IOT index structure itself.
Enter the IOT Overflow segment. The IOT Overflow segment enables us to store in another physical location those columns that we don’t necessarily want to store directly within the IOT index structure. So those columns that might be particularly large (or just the occurrences of those columns when the specific values might be too large to store within the IOT index structure) or those columns that are rarely accessed can be stored elsewhere. Effectively, we’re back to having a separate “table” like structure, but the Overflow segment will only hold those columns that we don’t necessarily want to store within the index structure. Unlike a normal Heap table, in which all columns are stored within the table segment.
There are a number of different methods we could use (to be explored further in future posts), for now I’ll use the INCLUDING clause:
SQL> CREATE TABLE album_sales_details_iot2(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk2 PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING total_sales OVERFLOW TABLESPACE bowie2; Table created.
So in the above example, all columns up to and “including” the total_sales column will be included in the IOT index structure. All the following columns listed in the table definition (in this case the Description column) will be store in the Overflow segment, which in the above example will be created within the BOWIE2 tablespace.
If we now populate this table with the identical data as before:
SQL> BEGIN 2 FOR i in 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_details_iot2 VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> ANALYZE INDEX album_sales_det_pk2 VALIDATE STRUCTURE; Index analyzed. SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 1664 3 4 1613
We notice the IOT index structure is now significantly smaller, down from 71,429 to just 1,613 leaf blocks. All the “clutter” has now been removed and is stored elsewhere.
If we now re-run our query:
SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot2 WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2379894191 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1300 | 18 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 | 100 | 1300 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 2390 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
Things are now much more efficient, having reduced the required consistent gets down from 19 to just 5 consistent gets.
If we now look at a partial block dump of an IOT leaf block:
Leaf block dump
===============
header address 441197148=0x1a4c225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 322
kdxcofbo 680=0x2a8
kdxcofeo 703=0x2bf
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21049987=0x1413283
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[703] flag: K—–, lock: 0, len=23
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 15 fb: –H-F— lb: 0x0 cc: 1
nrid: 0x01800081.0
col 0: [ 5] c4 02 5e 0d 25
row#1[726] flag: K—–, lock: 0, len=23
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 03
tl: 15 fb: –H-F— lb: 0x0 cc: 1
nrid: 0x01800081.1
col 0: [ 5] c4 04 41 13 43
We can see the number of index entries in the leaf block has increased from 7 to 322, with the size of the index entry decreasing from 1004 to just 23 bytes. Instead of the Description column being stored within the leaf block, we now have a nrid entry consisting of a 6 byte relative block address and row directory number (0x01800081.0), which effectively points to the actual location of the remaining portion of the row within the Overflow segment. We only therefore have a table column count of 1 (cc:1).
To find out more about the corresponding Overflow segment, we first must determine the OBJECT_ID of the IOT:
SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_DETAILS_IOT2'; OBJECT_ID ---------- 74116
This OBJECT_ID is used to name the corresponding Overflow segment which we can determine from DBA_TABLES as it has a format of SYS_IOT_OVER_object_id:
SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74116'; TABLE_NAME IOT_NAME IOT_TYPE BLOCKS ------------------ ------------------------ ------------ ------- SYS_IOT_OVER_74116 ALBUM_SALES_DETAILS_IOT2 IOT_OVERFLOW 71430
We notice this Overflow segment (at 71,430 blocks) is where the majority of our storage has been allocated.
Although it’s listed as a table, the Overflow segment can’t be directly accessed or manipulated. Any attempt to do so will result in an error:
SQL> select * from SYS_IOT_OVER_74116; select * from SYS_IOT_OVER_74116 * ERROR at line 1: ORA-25191: cannot reference overflow table of an index-organized table
If we look at a partial block dump of the Overflow segment block referenced in the previous IOT block dump:
Block header dump: 0x01800081
Object id on Block? Y
seg/obj: 0x12185 csc: 0x00.17482cc itc: 1 flg: – typ: 1 – DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a.00b.0000a919 0x00c24a2e.03d2.2a C— 0 scn 0x0000.01748279
bdba: 0x01800081
data_block_dump,data header at 0x1a4c2244
===============
tsiz: 0x1fb8
hsiz: 0x20
pbl: 0x1a4c2244
76543210
flag=——–
ntab=1
nrow=7
frre=-1
fsbo=0x20
fseo=0x4a6
avsp=0x486
tosp=0x486
0xe:pti[0] nrow=7 offs=0
0x12:pri[0] offs=0x1bda
0x14:pri[1] offs=0x17fc
0x16:pri[2] offs=0x141e
0x18:pri[3] offs=0x1040
0x1a:pri[4] offs=0xc62
0x1c:pri[5] offs=0x884
0x1e:pri[6] offs=0x4a6
block_row_dump:
tab 0, row 0, @0x1bda
tl: 990 fb: —–L– lb: 0x0 cc: 1
col 0: [984]
41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
73 63 72 69 70 74 69 6f 6e
We notice the Overflow block contains 7 rows as we would expect, as this was all the IOT segment could previously manage when it had to store the large Description column values.
The table row directory contains 7 rows, with the first row (#0) having an offset at address 0x1bda, which is the actual location of the first row within the Overflow block.
Therefore, in order to find a specific Description column value of interest from the IOT, Oracle references the (say) nrid: 0x01800081.0 within the IOT index entry for the row. This in turns points to the relative block address (0x01800081) of the Overflow block containing the description and the corresponding row directory number (0), which in turn specifies the offset (say) 0x1bda to the actual location of the Description value within the Overflow block. Easy !!
If we Analyze the IOT table:
SQL> ANALYZE TABLE album_sales_details_iot2 COMPUTE STATISTICS; Table analyzed. SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_DETAILS_IOT2'; TABLE_NAME NUM_ROWS CHAIN_CNT BLOCKS ------------------------------ ---------- ---------- ---------- ALBUM_SALES_DETAILS_IOT2 500000 500000
We notice all the rows are listed as “Chained Rows“. This is because all the rows have a corresponding Description value stored in the Overflow segment and so the rows are not stored within the one block. As the previous query illustrated, this is no bad thing if we don’t need to reference these additional columns stored in the Overflow segment. It makes the resultant IOT table more compact and efficient to access.
However, on those (hopefully) rarer occasions when we do need to access the columns in the Overflow segment, this will clearly require additional block accesses:
SQL> SELECT * FROM album_sales_details_iot2 WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2379894191 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 99400 | 18 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 | 100 | 99400 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 32 consistent gets 0 physical reads 0 redo size 5541 bytes sent via SQL*Net to client 590 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
The above query which returns the Description column results in the consistent gets increasing to 32 consistent gets, from the 5 consistent gets when the Description wasn’t accessed and from the 19 consistent gets from when the Description column was co-located within the IOT segment. But this is a price we might be willing to pay if this query isn’t frequently executed while the frequently executed queries which don’t access the Description column are more efficient.
The Overflow segment gives us in a manner “the best of both worlds”. The ability to store just those columns of interest within the IOT segment (although these must always include all the Primary Key columns) and those that are less often accessed or too large to be efficiently stored within the IOT can be stored elsewhere. Effectively, it’s an index and table relationship except the table doesn’t have to store again the columns that are already stored within the index.
It’s all good news so far for IOTs …
Index Organized Tables – An Introduction Of Sorts (Pyramid Song) January 10, 2012
Posted by Richard Foote in Block Dumps, CBO, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Primary Key.16 comments
Thought it was high time that I covered in a little detail the subject of Index Organized Tables (IOTs). When used appropriately, they can be an extremely useful method of storing and accessing data. Hopefully by the end of this series, you’ll have a better understanding of IOTs, their respective strengths and weaknesses and so perhaps be in a better position to take advantage of them when appropriate.
As I mentioned in a previous post, Martin Widlake has recently written an excellent series on IOTs, which I highly recommend. I’ll try to cover differing aspects of IOTs that will hopefully be of interest.
To start, let’s cover a very basic little example.
Let’s begin by creating and populating a simple Heap Table that holds information about musical albums (Note using an 8K blocksize in a MSSM tablespace):
SQL> CREATE TABLE album_sales(album_id number, country_id number, total_sales number, album_colour varchar2(20), 2 CONSTRAINT album_sales_pk PRIMARY KEY(album_id, country_id)); Table created. SQL> BEGIN 2 FOR i IN 1..5000 LOOP 3 FOR c IN 1..100 LOOP 4 INSERT INTO album_sales VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
We have a natural Primary Key that consists of two columns and an additional two columns of information.
Let’s look at some basic sizing information on the table and associated Primary Key index:
SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables WHERE table_name = 'ALBUM_SALES'; BLOCKS EMPTY_BLOCKS IOT_TYPE ---------- ------------ ------------ 1570 0 SQL> ANALYZE INDEX album_sales_pk VALIDATE STRUCTURE; Index analyzed. SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats; BLOCKS BR_BLKS LF_BLKS ---------- ---------- ---------- 1152 3 1062
So the table segment consists of 1570 blocks and the index segment 1152, with a total of 1062 leaf blocks.
OK, let’s run a basic query looking for all albums with an album_id=42:
SQL> SELECT * FROM album_sales WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3244723662 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1800 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| ALBUM_SALES | 100 | 1800 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | ALBUM_SALES_PK | 100 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 18 consistent gets 0 physical reads 0 redo size 4084 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
As we can see, things are pretty good. 18 consistent gets in order to return 100 rows isn’t bad at all. Clearly, the index has a good Clustering Factor and can retrieve the 100 required rows in a relatively efficient manner.
However, this is a very frequently executed query and we want to do even better. One thing we notice is that we only have a couple of columns in the table which are not part of the index. Perhaps if we included these columns in the index as well, we can then use the index to extract all the required data and thus eliminate the need to visit the table segment at all. Overloading an index in this manner is a common tuning technique and will hopefully reduce the number of required logical I/Os to run the query.
We can do this by dropping and recreating the index with all the columns, making sure the PK columns remain the leading columns. This will ensure the index can still be used to police the PK constraint:
SQL> ALTER TABLE album_sales DROP PRIMARY KEY; Table altered. SQL> CREATE INDEX album_sales_pk_i ON album_sales(album_id, country_id, total_sales, album_colour) COMPUTE STATISTICS; Index created. SQL> ALTER TABLE album_sales ADD constraint album_sales_pk PRIMARY KEY(album_id, country_id); Table altered.
OK, so the index now contains all the columns in the table and is now used to police the PK constraint:
SQL> select constraint_name, constraint_type, index_name from dba_constraints where constraint_name = 'ALBUM_SALES_PK'; CONSTRAINT_NAME C INDEX_NAME ------------------------------ - ------------------------------ ALBUM_SALES_PK P ALBUM_SALES_PK_I
Let’s now look at the size of the index:
SQL> ANALYZE INDEX album_sales_pk_i VALIDATE STRUCTURE; Index analyzed. SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats; BLOCKS BR_BLKS LF_BLKS ---------- ---------- ---------- 2048 5 2006
OK, as expected the index is now somewhat larger as it now needs to accommodate the extra columns. The number of overall blocks allocated to the index is 2048, with leaf blocks increasing from 1062 to 2006 leaf blocks.
If we now re-run the query:
SQL> SELECT * FROM album_sales WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1126128764 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1800 | 3 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_PK_I | 100 | 1800 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 11 consistent gets 0 physical reads 0 redo size 3568 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
We notice things have indeed improved and we have reduced the number consistent gets from 18 down to just 11. Not a bad improvement !!
If look at a partial block dump of one of the index leaf blocks:
Leaf block dump
===============
header address 484409948=0x1cdf825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 5
kdxcosdc 0
kdxconro 258
kdxcofbo 552=0x228
kdxcofeo 1373=0x55d
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 20972941=0x140058d
kdxleprv 20972939=0x140058b
kdxledsz 0
kdxlebksz 8036
row#0[8010] flag: ——, lock: 0, len=26
col 0; len 2; (2): c1 07
col 1; len 2; (2): c1 12
col 2; len 5; (5): c4 04 15 31 59
col 3; len 4; (4): 47 4f 4c 44
col 4; len 6; (6): 01 40 05 82 00 b7
row#1[7984] flag: ——, lock: 0, len=26
col 0; len 2; (2): c1 07
col 1; len 2; (2): c1 13
col 2; len 5; (5): c4 03 19 2c 3d
col 3; len 4; (4): 47 4f 4c 44
col 4; len 6; (6): 01 40 05 82 00 b8
We notice that each leaf entry is 26 bytes in length. The length of the four columns adds up to 13 bytes. The remaining 13 bytes is basically overhead required for each index entry:
2 bytes for flag and lock information in the index entry header
5 x 1 byte for each of the length bytes for each column
6 bytes for the 5th index column which is the index rowid
So that’s 13 bytes of overhead per index entry in this example index.
Well, everything is currently pretty good. We have the application now performing approximately 40% less work than it was previously. But we have one little issue. With the index now consisting of all the columns in the table and with the application using the index exclusively, what’s the point of now having the table? It’s wasting storage and wasting resources in having to be maintained for no purpose other than having to exist so that the index can in turn exist.
Wouldn’t it be nice if we can somehow just have the index, but without the underlining table. Enter the Index Organized Table (IOT), first introduced way back in Oracle 8.0. It’s basically an index structure that can exist without the need for an underlining table. The index structure itself is the table by which we can store and retrieve the necessary data.
OK, let’s now create a new version of this table with the same data, but this time as an IOT:
SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sals number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX; Table created. SQL> BEGIN 2 FOR i IN 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_IOT VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
The key clause is here ORGANIZATION INDEX. I’ll discuss other options and syntax in coming posts.
If we look now at the table segment:
SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables 2 WHERE table_name = 'ALBUM_SALES_IOT'; BLOCKS EMPTY_BLOCKS IOT_TYPE ---------- ------------ ------------ IOT
We see there is an IOT segment listed but consists of no blocks as it doesn’t physically exist …
If we look at the size of the corresponding index:
SQL> SELECT index_name, table_name, blevel, leaf_blocks FROM dba_indexes 2 WHERE table_name = 'ALBUM_SALES_IOT'; INDEX_NAME TABLE_NAME BLEVEL LEAF_BLOCKS -------------------- --------------- ------- ----------- ALBUM_SALES_IOT_PK ALBUM_SALES_IOT 2 1550 SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE; Index analyzed. SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats; BLOCKS BR_BLKS LF_BLKS ---------- ---------- ---------- 1664 4 1550
We notice it’s smaller than the corresponding overloaded index for the Heap Table. The previous index consisted of 2048 blocks and 2006 leaf blocks but this index is somewhat smaller at just 1664 blocks and 1550 leaf blocks.
If we take a look at a partial block dump of a leaf block from the IOT:
Leaf block dump
===============
header address 483926620=0x1cd8225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 1
kdxcoopc 0x90: opcode=0: iot flags=I— is converted=Y
kdxconco 2
kdxcosdc 2
kdxconro 336
kdxcofbo 708=0x2c4
kdxcofeo 710=0x2c6
kdxcoavs 2
kdxlespl 0
kdxlende 0
kdxlenxt 20976645=0x1401405
kdxleprv 20976643=0x1401403
kdxledsz 0
kdxlebksz 8036
row#0[710] flag: K—S-, lock: 2, len=22
col 0; len 2; (2): c1 08
col 1; len 2; (2): c1 49
tl: 14 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 5] c4 04 2f 10 59
col 1: [ 4] 47 4f 4c 44
row#1[732] flag: K—S-, lock: 2, len=22
col 0; len 2; (2): c1 08
col 1; len 2; (2): c1 4a
tl: 14 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 5] c4 03 01 03 46
col 1: [ 4] 47 4f 4c 44
Firstly, we notice it’s definitely an IOT block dump as the IOT flag is set.
The structure of the index entry is somewhat different here. It basically consists of:
2 bytes for lock and flag info in the index header as previously
Next come the two Primary Key columns with their corresponding length bytes. Note an IOT must have a PK defined.
Following are 3 bytes for the table header consisting of a lock byte, flag byte and a byte to denote the number of table (non PK) columns (in this case 2).
Followed finally by the 2 Non-PK columns and their corresponding length bytes.
Note the big missing component here from the previous block dump is that there is no rowid defined with its corresponding length byte. No need for a rowid if there’s no corresponding table to point down to …
So the overall overhead has been reduced to:
2 byes for the index header
3 bytes for the table header
4 bytes for the 4 column lengths
for a total of 9 bytes, 4 less than the 13 bytes overhead required in the previous example. So the total length of an index entry has reduced down from 26 bytes to just 22 bytes. Hence, the overall reduction in the size of the corresponding IOT index.
So we have saved 1570 table blocks and 384 index blocks in total.
If we now re-run the same query:
SQL> SELECT * FROM album_sales_iot WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1834499174 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1800 | 3 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_IOT_PK | 100 | 1800 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 10 consistent gets 0 physical reads 0 redo size 3575 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
Not only have we saved ourselves some storage and having to maintain two physical segments, but things are a tad more efficient as well, reducing the number of consistent gets down from 11 to 10 as the corresponding index segment we need to access is smaller …
Enough to start with for now and yes the pun in the title is fully intended 🙂
Index Organized Tables – A Start (Star) January 3, 2012
Posted by Richard Foote in Index Organized Tables, Oracle Indexes.8 comments
Happy New Year !!
As has been mentioned to me on quite a number of occasions, I’ve haven’t really covered the area of Index Organized Tables (IOTs) on this blog. This despite covering the subject of IOTs in quite some detail in my Index Internals Seminar. Which I readily admit is a bit of a shame as IOTs are often misunderstood little things and I would say are generally way under-utilised in many environments, I would suggest in part because they are so often misunderstood. The only reason I can really give for not discussing them more is that I simply haven’t gotten around to it 🙂
Well the good news as hinted at late last year, it’s about time I addressed this and started writing a few blog pieces on IOTs.
However, the even better news is that the subject of IOTs have already been covered by Martin Widlake in a truly excellent series of articles on his blog. I would strongly recommend giving them a read as they cover many aspects of IOTs in an interesting and informative manner.
I’ll recap some of the basic principles of IOTs here and then discuss various different aspects (such as the joys of Secondary Indexes on IOTs) in my own unique way 🙂
So, all that’s to come in the coming weeks.
However, I recently received an index question via email during the Christmas holidays which I thought was worthy of a blog piece or two. Although I often respond to emails, I much prefer to discuss index related “stuff” on the blog so everyone can share in the knowledge exchange. This is an interesting question because it relates to why an index keeps growing, even though the number of rows in the table remains constant. It’s an even more interesting question because there are a number of contributing factors for this impact on the index which I thought were worth sharing.
No more clues, I’ll post the question and the specific scenario in the next few days …
Why Are My Indexes Still Valid Quiz ? (Move On) October 11, 2011
Posted by Richard Foote in Index Organized Tables, Oracle Indexes, Quiz.6 comments
OK, this quiz is a nice easy one, the lads at work got this without too much trouble.
Normally, when you MOVE (re-org) a table, all the associated indexes become Unusable. As below:
SQL> create table pink_floyd (id number constraint pf_pk primary key, code number, name varchar2(30)); Table created. SQL> create index pf_code_i on pink_floyd(code); Index created. SQL> insert into pink_floyd select rownum, mod(rownum,100), 'The Dark Side Of The Moon' from dual connect by level <= 10000; 10000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> select index_name, num_rows, status from dba_indexes where table_name = 'PINK_FLOYD'; INDEX_NAME NUM_ROWS STATUS ------------------------------ ---------- -------- PF_PK 10000 VALID PF_CODE_I 10000 VALID SQL> alter table pink_floyd move; Table altered. SQL> select index_name, num_rows, status from dba_indexes where table_name = 'PINK_FLOYD'; INDEX_NAME NUM_ROWS STATUS ------------------------------ ---------- -------- PF_PK 10000 UNUSABLE PF_CODE_I 10000 UNUSABLE
So the indexes are now all unusable ..
However, I previously created another table called BOWIE that has exactly the same columns, indexes and data but when I MOVE this table:
SQL> select index_name, num_rows, status from dba_indexes where table_name = 'BOWIE'; INDEX_NAME NUM_ROWS STATUS ------------------------------ ---------- -------- BOWIE_PK 10000 VALID BOWIE_CODE_I 10000 VALID SQL> alter table bowie move; Table altered. SQL> select index_name, num_rows, status from dba_indexes where table_name = 'BOWIE'; INDEX_NAME NUM_ROWS STATUS ------------------------------ ---------- -------- BOWIE_PK 10000 VALID BOWIE_CODE_I 10000 VALID
All the indexes remain VALID !!
What’s so different about this table and their indexes ??????
I plan to discuss this whole topic (finally) in more detail in the coming weeks …
Oracle11g Bitmap-Join IOTs (Us and Them) January 25, 2011
Posted by Richard Foote in 11g, Bitmap Indexes, Index Organized Tables, Oracle Indexes.6 comments
With each new database release, nice little improvements and enhanced options continually get added. Since 11g R1, two index related features can finally be used in combination with each other.
To demonstrate, I’m first going to create and populate a so-called “large” Data Warehouse table.
SQL> CREATE TABLE big_dwh_table (id NUMBER PRIMARY KEY, album_id NUMBER, artist_id NUMBER, country_id NUMBER, format_id NUMBER, release_date DATE, total_sales NUMBER); Table created. SQL> CREATE SEQUENCE dwh_seq; Sequence created. SQL> create or replace procedure pop_big_dwh_table as 2 v_id number; 3 v_artist_id number; 4 begin 5 for v_album_id in 1..10000 loop 6 v_artist_id:= ceil(dbms_random.value(0,100)); 7 for v_country_id in 1..100 loop 8 select dwh_seq.nextval into v_id from dual; 9 insert into big_dwh_table values (v_id, v_album_id, v_artist_id, v_country_id, ceil(dbms_random.value(0,4)), trunc(sysdate-mod(v_id,ceil(dbms_random.value(0,1000)))), ceil(dbms_random.value(0,500000))); 10 end loop; 11 end loop; 12 commit; 13 end; 14 / Procedure created. SQL> exec pop_big_dwh_table PL/SQL procedure successfully completed.
I’ll next create a standard bitmap index on the ALBUM_ID column and collect a few statistics:
SQL> create bitmap index big_dwh_table_album_id_i on big_dwh_table(album_id); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'BIG_DWH_TABLE', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
OK, I’m now going to create and populate a “smaller” dimension/detail heap table and a few associated indexes:
SQL> CREATE TABLE albums (album_id number, album_details varchar2(30)); Table created. SQL> INSERT INTO albums SELECT rownum, substr(object_name,1,30) FROM dba_objects WHERE rownum <= 10000; 10000 rows created. SQL> commit; Commit complete. SQL> alter table albums add primary key(album_id); Table altered. SQL> create index albums_details_i on albums(album_details); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'ALBUMS', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we now run a little query that joins the two tables together:
SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$'; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1936297994 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 125 | 4250 | 25 (0)| 00:00:01 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 125 | 4250 | 25 (0)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| ALBUMS | 1 | 22 | 2 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | ALBUMS_DETAILS_I | 1 | | 1 (0)| 00:00:01 | | 5 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 6 | BITMAP INDEX SINGLE VALUE | BIG_DWH_TABLE_ALBUM_ID_I | | | | | | 7 | TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE | 100 | 1200 | 25 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("A"."ALBUM_DETAILS"='TAB$') 6 - access("B"."ALBUM_ID"="A"."ALBUM_ID") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 10 consistent gets 0 physical reads 0 redo size 1648 bytes sent via SQL*Net to client 396 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
The resultant execution plan is pretty good and efficient and what we would expect. It performs a nested loop join to join the tables together which based on the relatively small number of rows returned makes sense and uses the b-tree index to get the specific album details from the dimension table and the bitmap index to find the matching albums details from the larger table.
However, as this is a very frequently executed join condition, we can potentially improve things and reduce the 10 consistent gets by introducing a bitmap-join index. A bitmap-join index performs the “join” operation once, when the index is created and during subsequent DML operations by creating an index based on column(s) on the smaller dimension tables that directly references rows in the larger fact table.
SQL> drop index albums_details_i; Index dropped. SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details) FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id; Index created.
So the bitmap-join index is based on the ALBUM_DETAILS column from the smaller ALBUMS table, but it references and has rowids associated with the larger BIG_DWH_TABLE table, with the bitmap-join definition containing details on how the join between the two tables needs to be performed. It if want to know what rows within the larger table have ALBUM_DETAILS of interest, the corresponding bitmap-join index will find all such rows without having to access the smaller ALBUMS table that contains this column.
If we now run the same query as before:
SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$'; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 950886520 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 125 | 1500 | 26 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE | 125 | 1500 | 26 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 3 | BITMAP INDEX SINGLE VALUE | BIG_DWH_ALBUM_DETAILS_I | | | | | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("B"."SYS_NC00008$"='TAB$') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 6 consistent gets 0 physical reads 0 redo size 1648 bytes sent via SQL*Net to client 396 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
We notice the nested loop operation is no longer required. In fact, we don’t need to reference the smaller ALBUMS table at all as all the required information can now be obtained by using the bitmap-join index and direct accesses to the larger table. The number of consistent gets has therefore reduced from 10 down to just 6.
Note in our example, there is no actual Foreign Key (FK) constraint in the larger table (in a Data Warehouse, such constraints may not be necessary and/or get in the way). The bitmap-join index doesn’t require a FK constraint to be in place however it’s necessary that the column in the join condition referencing the detail table be Unique else there could be a many-to-many join condition which wouldn’t make sense when attempting to populate the bitmap-join index.
However, make one of the tables in the Bitmap-Join index an Index Organized Table (IOT), in this case the smaller detail table …
SQL> drop table albums; Table dropped. SQL> CREATE TABLE albums (album_id number primary key, album_details varchar2(30)) organization index; Table created. SQL> INSERT INTO albums SELECT rownum, substr(object_name,1,30) FROM dba_objects WHERE rownum <= 10000; 10000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'ALBUMS', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details) 2 FROM big_dwh_table b, albums a 3 WHERE b.album_id = a.album_id; CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details) * ERROR at line 1: ORA-25966: join index cannot be based on an index organized table
and we get the above error as prior to 11g R1, there was a restriction that no table within a Bitmap-Join index could be an Index Organized Table.
Now, if we run exactly the same demo but in an Oracle11g database:
SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details) FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id; Index created. SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$'; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 950886520 -------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 125 | 1500 | 26 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE | 125 | 1500 | 26 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 3 | BITMAP INDEX SINGLE VALUE | BIG_DWH_ALBUM_DETAILS_I | | | | | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("B"."SYS_NC00008$"='TAB$') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 6 consistent gets 0 physical reads 0 redo size 1648 bytes sent via SQL*Net to client 396 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
It all now works fine.
So since Oracle 11g R1, there’s one less reason not use Index Organized Tables and/or Bitmap-Join indexes in your Data Warehouse 🙂
Indexes And Small tables Part VII (Cluster One) May 27, 2009
Posted by Richard Foote in Index Organized Tables, Oracle Indexes, Small Indexes.19 comments
OK, almost getting to the end here 😉
As discussed previously, despite popular opinion, an index can be just that little bit more efficient than a FTS when accessing very small tables, even if all rows in the table exist in the one table block. And a small efficiency multiplied by a large number can potentially add up and make a noticeable difference.
As we’ve seen, a unique index on such a small table accessing a specific row of interest need only perform one consistent read on the index block and one consistent read on the table block for a grand total of 2 consistent reads, with both consistent gets being the cheaper examinations variety. Not bad, not too bad at all and somewhat cheaper than an equivalent FTS.
However, as I hinted and as many of you have already commented, we can go one step further still in reducing the overheads of such queries on small tables by potentially storing all the columns in the one, non-heap table structure.
One option is to create an Index Organized Table (IOT), storing all columns within a single index structure and thereby eliminating the need to visit a table segment at all.
Following on from the previous demo, let’s recreate the table as an IOT and populate it with the same data:
SQL> drop table small;
Table dropped.
SQL> create table small (id number primary key, name varchar2(10)) organization index;
Table created.
SQL> insert into small select rownum, ‘BOWIE’ from dual connect by level <=100;
100 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’SMALL’, method_opt=>’FOR ALL COLUMNS SIZE 1′);
PL/SQL procedure successfully completed.
If we now run our query again:
SQL> select * from small where id = 42;
ID NAME ---------- ---------- 42 BOWIE Execution Plan ------------------------------------------ |Id|Operation |Name | ------------------------------------------ | 0|SELECT STATEMENT | | |*1| INDEX UNIQUE SCAN|SYS_IOT_TOP_68376 | ------------------------------------------ Statistics ------------------------------------------- 0 recursive calls 0 db block gets 1 consistent gets 0 physical reads 0 redo size 465 bytes sent via SQL*Net to client 396 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We notice that we have now reduced the number of consistent gets down to just one.
Not only is it just one consistent get but if we look at the type of consistent get by running the following query in another session before/after our SELECT above:
SQL> select name, value from v$sesstat s, v$statname n where s.statistic#=n.statistic# and sid = 141 and name like ‘consistent gets%’;
NAME VALUE ----------------------------- ------ consistent gets 32842 consistent gets - examination 6694
SQL> select name, value from v$sesstat s, v$statname n where s.statistic#=n.statistic# and sid = 141 and name like ‘consistent gets%’;
NAME VALUE ----------------------------- ------ consistent gets 32843 (+1) consistent gets - examination 6695 (+1)
We also notice that it’s the cheaper, one latch consistent gets examination.
So we’ve now reduced our overheads down to just one consistent get and just the one latch get as well. It doesn’t really get much cheaper than that.
IOT are one of those under used options in Oracle that really should be considered used a lot more than they are. Yes they can be problematic when used inappropriately (especially if you need to create several secondary indexes) but for scenarios such as this they can be very useful.
I plan to discuss the advantages and disadvantages of IOT in future posts.
Another posssible option to improve things in our little demo is to create a Hash Cluster (as commented by Piet):
SQL> create cluster small_cluster (id number) size 100 single table hashkeys 200;
Cluster created.
SQL> create table small_tab (id number, name varchar2(100)) cluster small_cluster(id);
Table created.
SQL> insert into small_tab select rownum, ‘BOWIE’ from dual connect by level <=100;
100 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’SMALL_TAB’, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE 1′);
PL/SQL procedure successfully completed.
SQL> select * from small_tab where id = 42;
ID NAME ---- ---------- 42 BOWIE Execution Plan --------------------------------------- | Id | Operation | Name | --------------------------------------- | 0 | SELECT STATEMENT | | |* 1 | TABLE ACCESS HASH| SMALL_TAB | --------------------------------------- Statistics ------------------------------------------- 0 recursive calls 0 db block gets 1 consistent gets 0 physical reads 0 redo size 465 bytes sent via SQL*Net to client 396 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Note again we have got our consistent gets down to just one as Oracle can simply determine the correct hash key based on the specified ID value and go directly to the table block containing the row of interest. Note however, this will not be a consistent get – examination, but the more expensive 2 latch and pin the block variety.
However, if you now create a unique index on the ID column:
SQL> create unique index small_tab_i on small_tab(id);
Index created.
And re-run the query:
SQL> select * from small_tab where id = 42;
ID NAME ---- ---------- 42 BOWIE Execution Plan --------------------------------------- | Id | Operation | Name | --------------------------------------- | 0 | SELECT STATEMENT | | |* 1 | TABLE ACCESS HASH| SMALL_TAB | --------------------------------------- Statistics ------------------------------------------- 0 recursive calls 0 db block gets 1 consistent gets 0 physical reads 0 redo size 465 bytes sent via SQL*Net to client 396 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Oracle will still use the hash to determine the location of the row of interest but because it now knows it will only retrieve the one row, will do so with only a consistent get – examination. Back to one consistent get and only one latch get as well.
Again, clusters and all their various types and forms will be discussed in future blog entries.
Perhaps our good old familar heap table might not always be the best and most efficient option when creating these small tables (or even larger tables for that matter).
But for now the key message from this series is that any table, no matter how small can potentially benefit from being indexed. There really is no such thing as a table that’s too small to benefit from an index.Yes the difference might be small and of no real consequence but then again for larger database environments the overall savings might all add up and surprise. Note that the associated costs of having such indexes are also likely to be relatively small so perhaps it might just be worthwhile indexing those small tables after all 😉