Index Organized Tables – Overflow Segment Part II (The Loneliest Guy) January 18, 2012
Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment, Primary Key.3 comments
In my previous post on Index Organized Tables (IOT), I introduced the concept of the IOT Overflow Segment, where we can store columns that we may not want to include within the actual IOT index structure. Before we move on, I just wanted to cover off a few additional points that could be a trap for the unwary …
In my experience, the Primary Key (PK) columns of a table are typically the first columns defined in the table. This has certainly been standard practice in most environments I’ve seen. This makes sense in that the PK are in many ways the “key” column(s) in the table and are identified as such by having the prestigious honour of being the first column(s) defined within the table. Most people look at and intuitively expect the first columns in the table to be the PK columns and for that reason alone, it’s probably good practice to consistently define the PK columns in this manner.
However, there’s also a good argument why having the PK columns as the leading columns in the table is precisely the wrong location for them. As many tables are “primarily” accessed via the PK columns and so accessed directly through the associated PK index, the application already knows the PK values of the row in question. Therefore, it’s somewhat inefficient to then have the PK columns the first columns defined in the table as these generally have to be read through and ignored before we get to the non-PK columns that are of direct interest and the reason for visiting the table block in the first place. By placing the PK columns after the most accessed non-PK columns, we avoid having to unnecessarily read through these PK columns again when accessing the table via the PK index.
I personally prefer to define the PK columns first in a standardised manner, with the advantages of avoiding possible confusion and misunderstandings outweighing any possible performance improvements. However, I can at least see the logic and merit of not following this standard with Heap tables.
The same however can not really be said for IOTs and I would strongly recommend defining the PK columns first in an IOT …
I’m going to run the same demo as I did in my last post on the Overflow Segment, but with one subtle change. I’m not going to define the two PK columns first but rather have them defined after my heavily accessed non-PK column:
SQL> CREATE TABLE album_sales_iot(total_sales NUMBER, album_id NUMBER, country_id NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING country_id OVERFLOW TABLESPACE bowie2; Table created.
So in this example, my leading column is the non-PK total_sales column, followed then by the two PK columns. I still only want these 3 columns to be included in the actual IOT structure, so I have my INCLUDING clause only including columns up to the country_id column. I want the remaining large description column to be stored separately in an Overflow segment.
OK, let’s populate this table with the same data we used previously:
SQL> BEGIN 2 FOR i in 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_iot VALUES(ceil(dbms_random.value(1,5000000)), i, c, 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really long description'); 6 END LOOP; 9 END LOOP; 10 COMMIT; 11 END; 12 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we describe this table, we get the expected listing:
SQL> desc album_sales_iot Name Null? Type ----------------------------------------- -------- ---------------------------- TOTAL_SALES NUMBER ALBUM_ID NOT NULL NUMBER COUNTRY_ID NOT NULL NUMBER DESCRIPTION VARCHAR2(1000)
With the columns listed in the order as we defined them in the table.
If we query the column details from dba_tab_columns:
SQL> select column_id, column_name from dba_tab_columns where table_name = 'ALBUM_SALES_IOT' order by column_id; COLUMN_ID COLUMN_NAME ---------- ------------------------------ 1 TOTAL_SALES 2 ALBUM_ID 3 COUNTRY_ID 4 DESCRIPTION
We again find the column order is as we defined them in the table.
When we run the same query we ran last time that returned the data with 5 consistent gets:
SQL> set arraysize 100 SQL> select album_id, country_id, total_sales from album_sales_iot where album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1834499174 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1300 | 18 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_IOT_PK | 100 | 1300 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 20 consistent gets 0 physical reads 0 redo size 2394 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
We notice that performance is a lot worse, with 20 consistent gets now required. Obviously, something has changed unexpectedly ???
The first clue on what’s going on here can be found by looking at dba_tab_cols:
SQL> select column_id, segment_column_id, column_name from dba_tab_cols where table_name = 'ALBUM_SALES_IOT' order by column_id; COLUMN_ID SEGMENT_COLUMN_ID COLUMN_NAME ---------- ----------------- ------------------------------ 1 3 TOTAL_SALES 2 1 ALBUM_ID 3 2 COUNTRY_ID 4 4 DESCRIPTION
The SEGMENT_COLUMN_ID column determines the order of the columns as they’re actually stored within the segment and we notice the column order is different. The two PK columns are listed first, with the total_sales column only listed in the 3rd position.
As discussed in the IOT Introduction post, the structure of an index entry in an IOT has the PK columns as the leading columns, following by the non-PK columns in the table portion. This is critical because the PK columns determine the location within the IOT table where new rows need to be inserted and the subsequent ordering of the rows in the table. As such, the PK columns must always be the leading columns of an IOT, despite how the table is actually defined at creation time. If the PK columns are not listed first in the table creation DDL statement, Oracle will automatically re-order the columns and place the PK columns first regardless.
This now has consequences on the INCLUDING clause if specified. In the above table creation statement, the INCLUDING clause specified the country_id column. Although defined as the third column, as it’s a PK column, Oracle has automatically re-ordered the columns such that it’s physically listed as the second column within the IOT segment. Unfortunately the INCLUDING clause is only applied after the re-ordering of the columns and as such, the total_sales column which is now logically listed third and now after the country_id column, is not therefore actually included in the IOT index structure as (perhaps) intended.
A partial block dump of an IOT leaf block will confirm his:
Leaf block dump
===============
header address 298590812=0x11cc225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 421
kdxcofbo 878=0x36e
kdxcofeo 879=0x36f
kdxcoavs 1
kdxlespl 0
kdxlende 0
kdxlenxt 21052811=0x1413d8b
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[879] flag: K—–, lock: 0, len=17
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 9 fb: –H-F— lb: 0x0 cc: 0
nrid: 0x01811911.0
row#1[896] flag: K—–, lock: 0, len=17
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 03
tl: 9 fb: –H-F— lb: 0x0 cc: 0
nrid: 0x01811911.1
As we can see, the IOT row entries only consist of the PK columns and the row reference to the corresponding Overflow segment. None of the non-PK columns (such as total_sales) are co-located within the IOT segment as the table column count is 0 (cc: 0).
As a result, additional consistent gets are now required to fetch the total_sales column from the Overflow segment to satisfy the query. This explains why the query is now less efficient than it was previously.
My recommendation with regard to defining IOTs is to simply list the PK columns first. This will ensure the INCLUDING clause is applied as intended and will generally reduce confusion and misunderstandings. Otherwise, the INCLUDING clause needs to specify a Non-PK column to ensure more than just the PK columns are actually included in the IOT segment, the consequences of which may not be obvious to the casual observer of the DDL or describer of the table.
Jonathan Lewis, a great source of information on indexes and Oracle in general has previously discussed this same IOT Trap on his blog.
Index Organized Tables – Overflow Segment (Shadow Man) January 13, 2012
Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Overflow Segment.14 comments
In my previous introductory IOT post, I illustrated how an Index Organized Table (IOT) might be worth consideration if most or all columns in a table were to be included within an index.
I’m going to use a slightly different demo this time, replacing one of the columns with a much larger DESCRIPTION column, one which is rarely accessed by the application:
SQL> CREATE TABLE album_sales_details_iot(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX; Table created. SQL> BEGIN 2 FOR i in 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_details_iot VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
Sorry for the unimaginative manner of loading the description field but you get the point 🙂
OK, let’s have a look at the size of the IOT:
SQL> ANALYZE INDEX album_sales_det_pk VALIDATE STRUCTURE; Index analyzed. SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 71680 3 116 71429
As expected, the IOT is quite large as it has to accommodate the very large Description field within the IOT index structure. At 71,429 leaf blocks for the 500,000 rows in the table, that’s just 7 rows on average per leaf block.
The application doesn’t generally access the Description column with the following query typical (Note: to make fetching data as efficient as possible, I’ve set the arraysize to 100):
SQL> set arraysize 100 SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 521866300 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1300 | 17 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_DET_PK | 100 | 1300 | 17 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 19 consistent gets 0 physical reads 0 redo size 2387 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
The query requires 19 consistent gets to retrieve the 100 rows because even though the data is extremely well clustered, there are very few rows per leaf block.
If we look at a partial block dump of one of these IOT leaf blocks:
Leaf block dump
===============
header address 548373084=0x20af825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 7
kdxcofbo 50=0x32
kdxcofeo 1011=0x3f3
kdxcoavs 961
kdxlespl 0
kdxlende 0
kdxlenxt 20978307=0x1401a83
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1011] flag: K—–, lock: 0, len=1004
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 996 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 5] c4 04 05 3b 03
col 1: [984]
41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
73 63 72 69 70 74 69 6f 6e
We can see the leaf block only has 7 rows, with the vast majority of space taken up by the very large Description column.
Considering the Description column is so large and/or that it’s rarely accessed, wouldn’t it be nice if we didn’t have to store this column directly within the IOT index structure itself.
Enter the IOT Overflow segment. The IOT Overflow segment enables us to store in another physical location those columns that we don’t necessarily want to store directly within the IOT index structure. So those columns that might be particularly large (or just the occurrences of those columns when the specific values might be too large to store within the IOT index structure) or those columns that are rarely accessed can be stored elsewhere. Effectively, we’re back to having a separate “table” like structure, but the Overflow segment will only hold those columns that we don’t necessarily want to store within the index structure. Unlike a normal Heap table, in which all columns are stored within the table segment.
There are a number of different methods we could use (to be explored further in future posts), for now I’ll use the INCLUDING clause:
SQL> CREATE TABLE album_sales_details_iot2(album_id NUMBER, country_id NUMBER, total_sales NUMBER, description VARCHAR2(1000), CONSTRAINT album_sales_det_pk2 PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX INCLUDING total_sales OVERFLOW TABLESPACE bowie2; Table created.
So in the above example, all columns up to and “including” the total_sales column will be included in the IOT index structure. All the following columns listed in the table definition (in this case the Description column) will be store in the Overflow segment, which in the above example will be created within the BOWIE2 tablespace.
If we now populate this table with the identical data as before:
SQL> BEGIN 2 FOR i in 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_details_iot2 VALUES(i, c, ceil(dbms_random.value(1,5000000)), 'A really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really rlly really really really really really long description'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_DETAILS_IOT2', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> ANALYZE INDEX album_sales_det_pk2 VALIDATE STRUCTURE; Index analyzed. SQL> SELECT blocks, height, br_blks, lf_blks FROM index_stats; BLOCKS HEIGHT BR_BLKS LF_BLKS ---------- ---------- ---------- ---------- 1664 3 4 1613
We notice the IOT index structure is now significantly smaller, down from 71,429 to just 1,613 leaf blocks. All the “clutter” has now been removed and is stored elsewhere.
If we now re-run our query:
SQL> SELECT album_id, country_id, total_sales FROM album_sales_details_iot2 WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2379894191 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1300 | 18 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 | 100 | 1300 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 2390 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
Things are now much more efficient, having reduced the required consistent gets down from 19 to just 5 consistent gets.
If we now look at a partial block dump of an IOT leaf block:
Leaf block dump
===============
header address 441197148=0x1a4c225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x90: opcode=0: iot flags=I– is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 322
kdxcofbo 680=0x2a8
kdxcofeo 703=0x2bf
kdxcoavs 23
kdxlespl 0
kdxlende 0
kdxlenxt 21049987=0x1413283
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[703] flag: K—–, lock: 0, len=23
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 02
tl: 15 fb: –H-F— lb: 0x0 cc: 1
nrid: 0x01800081.0
col 0: [ 5] c4 02 5e 0d 25
row#1[726] flag: K—–, lock: 0, len=23
col 0; len 2; (2): c1 02
col 1; len 2; (2): c1 03
tl: 15 fb: –H-F— lb: 0x0 cc: 1
nrid: 0x01800081.1
col 0: [ 5] c4 04 41 13 43
We can see the number of index entries in the leaf block has increased from 7 to 322, with the size of the index entry decreasing from 1004 to just 23 bytes. Instead of the Description column being stored within the leaf block, we now have a nrid entry consisting of a 6 byte relative block address and row directory number (0x01800081.0), which effectively points to the actual location of the remaining portion of the row within the Overflow segment. We only therefore have a table column count of 1 (cc:1).
To find out more about the corresponding Overflow segment, we first must determine the OBJECT_ID of the IOT:
SQL> SELECT object_id FROM user_objects WHERE object_name = 'ALBUM_SALES_DETAILS_IOT2'; OBJECT_ID ---------- 74116
This OBJECT_ID is used to name the corresponding Overflow segment which we can determine from DBA_TABLES as it has a format of SYS_IOT_OVER_object_id:
SQL> SELECT table_name, iot_name, iot_type, blocks FROM user_tables WHERE table_name = 'SYS_IOT_OVER_74116'; TABLE_NAME IOT_NAME IOT_TYPE BLOCKS ------------------ ------------------------ ------------ ------- SYS_IOT_OVER_74116 ALBUM_SALES_DETAILS_IOT2 IOT_OVERFLOW 71430
We notice this Overflow segment (at 71,430 blocks) is where the majority of our storage has been allocated.
Although it’s listed as a table, the Overflow segment can’t be directly accessed or manipulated. Any attempt to do so will result in an error:
SQL> select * from SYS_IOT_OVER_74116; select * from SYS_IOT_OVER_74116 * ERROR at line 1: ORA-25191: cannot reference overflow table of an index-organized table
If we look at a partial block dump of the Overflow segment block referenced in the previous IOT block dump:
Block header dump: 0x01800081
Object id on Block? Y
seg/obj: 0x12185 csc: 0x00.17482cc itc: 1 flg: – typ: 1 – DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a.00b.0000a919 0x00c24a2e.03d2.2a C— 0 scn 0x0000.01748279
bdba: 0x01800081
data_block_dump,data header at 0x1a4c2244
===============
tsiz: 0x1fb8
hsiz: 0x20
pbl: 0x1a4c2244
76543210
flag=——–
ntab=1
nrow=7
frre=-1
fsbo=0x20
fseo=0x4a6
avsp=0x486
tosp=0x486
0xe:pti[0] nrow=7 offs=0
0x12:pri[0] offs=0x1bda
0x14:pri[1] offs=0x17fc
0x16:pri[2] offs=0x141e
0x18:pri[3] offs=0x1040
0x1a:pri[4] offs=0xc62
0x1c:pri[5] offs=0x884
0x1e:pri[6] offs=0x4a6
block_row_dump:
tab 0, row 0, @0x1bda
tl: 990 fb: —–L– lb: 0x0 cc: 1
col 0: [984]
41 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20
72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c
6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72
65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c
79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65
61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79
20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61
6c 6c 79 20 72 65 61 6c 6c 79 20 72 65 61 6c 6c 79 20 6c 6f 6e 67 20 64 65
73 63 72 69 70 74 69 6f 6e
We notice the Overflow block contains 7 rows as we would expect, as this was all the IOT segment could previously manage when it had to store the large Description column values.
The table row directory contains 7 rows, with the first row (#0) having an offset at address 0x1bda, which is the actual location of the first row within the Overflow block.
Therefore, in order to find a specific Description column value of interest from the IOT, Oracle references the (say) nrid: 0x01800081.0 within the IOT index entry for the row. This in turns points to the relative block address (0x01800081) of the Overflow block containing the description and the corresponding row directory number (0), which in turn specifies the offset (say) 0x1bda to the actual location of the Description value within the Overflow block. Easy !!
If we Analyze the IOT table:
SQL> ANALYZE TABLE album_sales_details_iot2 COMPUTE STATISTICS; Table analyzed. SQL> SELECT table_name, num_rows, chain_cnt, blocks from user_tables WHERE table_name = 'ALBUM_SALES_DETAILS_IOT2'; TABLE_NAME NUM_ROWS CHAIN_CNT BLOCKS ------------------------------ ---------- ---------- ---------- ALBUM_SALES_DETAILS_IOT2 500000 500000
We notice all the rows are listed as “Chained Rows“. This is because all the rows have a corresponding Description value stored in the Overflow segment and so the rows are not stored within the one block. As the previous query illustrated, this is no bad thing if we don’t need to reference these additional columns stored in the Overflow segment. It makes the resultant IOT table more compact and efficient to access.
However, on those (hopefully) rarer occasions when we do need to access the columns in the Overflow segment, this will clearly require additional block accesses:
SQL> SELECT * FROM album_sales_details_iot2 WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2379894191 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 99400 | 18 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_DET_PK2 | 100 | 99400 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 32 consistent gets 0 physical reads 0 redo size 5541 bytes sent via SQL*Net to client 590 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
The above query which returns the Description column results in the consistent gets increasing to 32 consistent gets, from the 5 consistent gets when the Description wasn’t accessed and from the 19 consistent gets from when the Description column was co-located within the IOT segment. But this is a price we might be willing to pay if this query isn’t frequently executed while the frequently executed queries which don’t access the Description column are more efficient.
The Overflow segment gives us in a manner “the best of both worlds”. The ability to store just those columns of interest within the IOT segment (although these must always include all the Primary Key columns) and those that are less often accessed or too large to be efficiently stored within the IOT can be stored elsewhere. Effectively, it’s an index and table relationship except the table doesn’t have to store again the columns that are already stored within the index.
It’s all good news so far for IOTs …
Index Organized Tables – An Introduction Of Sorts (Pyramid Song) January 10, 2012
Posted by Richard Foote in Block Dumps, CBO, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Primary Key.16 comments
Thought it was high time that I covered in a little detail the subject of Index Organized Tables (IOTs). When used appropriately, they can be an extremely useful method of storing and accessing data. Hopefully by the end of this series, you’ll have a better understanding of IOTs, their respective strengths and weaknesses and so perhaps be in a better position to take advantage of them when appropriate.
As I mentioned in a previous post, Martin Widlake has recently written an excellent series on IOTs, which I highly recommend. I’ll try to cover differing aspects of IOTs that will hopefully be of interest.
To start, let’s cover a very basic little example.
Let’s begin by creating and populating a simple Heap Table that holds information about musical albums (Note using an 8K blocksize in a MSSM tablespace):
SQL> CREATE TABLE album_sales(album_id number, country_id number, total_sales number, album_colour varchar2(20), 2 CONSTRAINT album_sales_pk PRIMARY KEY(album_id, country_id)); Table created. SQL> BEGIN 2 FOR i IN 1..5000 LOOP 3 FOR c IN 1..100 LOOP 4 INSERT INTO album_sales VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
We have a natural Primary Key that consists of two columns and an additional two columns of information.
Let’s look at some basic sizing information on the table and associated Primary Key index:
SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables WHERE table_name = 'ALBUM_SALES'; BLOCKS EMPTY_BLOCKS IOT_TYPE ---------- ------------ ------------ 1570 0 SQL> ANALYZE INDEX album_sales_pk VALIDATE STRUCTURE; Index analyzed. SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats; BLOCKS BR_BLKS LF_BLKS ---------- ---------- ---------- 1152 3 1062
So the table segment consists of 1570 blocks and the index segment 1152, with a total of 1062 leaf blocks.
OK, let’s run a basic query looking for all albums with an album_id=42:
SQL> SELECT * FROM album_sales WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3244723662 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1800 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| ALBUM_SALES | 100 | 1800 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | ALBUM_SALES_PK | 100 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 18 consistent gets 0 physical reads 0 redo size 4084 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
As we can see, things are pretty good. 18 consistent gets in order to return 100 rows isn’t bad at all. Clearly, the index has a good Clustering Factor and can retrieve the 100 required rows in a relatively efficient manner.
However, this is a very frequently executed query and we want to do even better. One thing we notice is that we only have a couple of columns in the table which are not part of the index. Perhaps if we included these columns in the index as well, we can then use the index to extract all the required data and thus eliminate the need to visit the table segment at all. Overloading an index in this manner is a common tuning technique and will hopefully reduce the number of required logical I/Os to run the query.
We can do this by dropping and recreating the index with all the columns, making sure the PK columns remain the leading columns. This will ensure the index can still be used to police the PK constraint:
SQL> ALTER TABLE album_sales DROP PRIMARY KEY; Table altered. SQL> CREATE INDEX album_sales_pk_i ON album_sales(album_id, country_id, total_sales, album_colour) COMPUTE STATISTICS; Index created. SQL> ALTER TABLE album_sales ADD constraint album_sales_pk PRIMARY KEY(album_id, country_id); Table altered.
OK, so the index now contains all the columns in the table and is now used to police the PK constraint:
SQL> select constraint_name, constraint_type, index_name from dba_constraints where constraint_name = 'ALBUM_SALES_PK'; CONSTRAINT_NAME C INDEX_NAME ------------------------------ - ------------------------------ ALBUM_SALES_PK P ALBUM_SALES_PK_I
Let’s now look at the size of the index:
SQL> ANALYZE INDEX album_sales_pk_i VALIDATE STRUCTURE; Index analyzed. SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats; BLOCKS BR_BLKS LF_BLKS ---------- ---------- ---------- 2048 5 2006
OK, as expected the index is now somewhat larger as it now needs to accommodate the extra columns. The number of overall blocks allocated to the index is 2048, with leaf blocks increasing from 1062 to 2006 leaf blocks.
If we now re-run the query:
SQL> SELECT * FROM album_sales WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1126128764 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1800 | 3 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_PK_I | 100 | 1800 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 11 consistent gets 0 physical reads 0 redo size 3568 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
We notice things have indeed improved and we have reduced the number consistent gets from 18 down to just 11. Not a bad improvement !!
If look at a partial block dump of one of the index leaf blocks:
Leaf block dump
===============
header address 484409948=0x1cdf825c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 5
kdxcosdc 0
kdxconro 258
kdxcofbo 552=0x228
kdxcofeo 1373=0x55d
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 20972941=0x140058d
kdxleprv 20972939=0x140058b
kdxledsz 0
kdxlebksz 8036
row#0[8010] flag: ——, lock: 0, len=26
col 0; len 2; (2): c1 07
col 1; len 2; (2): c1 12
col 2; len 5; (5): c4 04 15 31 59
col 3; len 4; (4): 47 4f 4c 44
col 4; len 6; (6): 01 40 05 82 00 b7
row#1[7984] flag: ——, lock: 0, len=26
col 0; len 2; (2): c1 07
col 1; len 2; (2): c1 13
col 2; len 5; (5): c4 03 19 2c 3d
col 3; len 4; (4): 47 4f 4c 44
col 4; len 6; (6): 01 40 05 82 00 b8
We notice that each leaf entry is 26 bytes in length. The length of the four columns adds up to 13 bytes. The remaining 13 bytes is basically overhead required for each index entry:
2 bytes for flag and lock information in the index entry header
5 x 1 byte for each of the length bytes for each column
6 bytes for the 5th index column which is the index rowid
So that’s 13 bytes of overhead per index entry in this example index.
Well, everything is currently pretty good. We have the application now performing approximately 40% less work than it was previously. But we have one little issue. With the index now consisting of all the columns in the table and with the application using the index exclusively, what’s the point of now having the table? It’s wasting storage and wasting resources in having to be maintained for no purpose other than having to exist so that the index can in turn exist.
Wouldn’t it be nice if we can somehow just have the index, but without the underlining table. Enter the Index Organized Table (IOT), first introduced way back in Oracle 8.0. It’s basically an index structure that can exist without the need for an underlining table. The index structure itself is the table by which we can store and retrieve the necessary data.
OK, let’s now create a new version of this table with the same data, but this time as an IOT:
SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sals number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX; Table created. SQL> BEGIN 2 FOR i IN 1..5000 LOOP 3 FOR c in 1..100 LOOP 4 INSERT INTO album_sales_IOT VALUES (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD'); 5 END LOOP; 6 END LOOP; 7 COMMIT; 8 END; 9 / PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
The key clause is here ORGANIZATION INDEX. I’ll discuss other options and syntax in coming posts.
If we look now at the table segment:
SQL> SELECT blocks, empty_blocks, IOT_TYPE FROM dba_tables 2 WHERE table_name = 'ALBUM_SALES_IOT'; BLOCKS EMPTY_BLOCKS IOT_TYPE ---------- ------------ ------------ IOT
We see there is an IOT segment listed but consists of no blocks as it doesn’t physically exist …
If we look at the size of the corresponding index:
SQL> SELECT index_name, table_name, blevel, leaf_blocks FROM dba_indexes 2 WHERE table_name = 'ALBUM_SALES_IOT'; INDEX_NAME TABLE_NAME BLEVEL LEAF_BLOCKS -------------------- --------------- ------- ----------- ALBUM_SALES_IOT_PK ALBUM_SALES_IOT 2 1550 SQL> ANALYZE INDEX album_sales_iot_pk VALIDATE STRUCTURE; Index analyzed. SQL> SELECT BLOCKS, BR_BLKS, LF_BLKS FROM index_stats; BLOCKS BR_BLKS LF_BLKS ---------- ---------- ---------- 1664 4 1550
We notice it’s smaller than the corresponding overloaded index for the Heap Table. The previous index consisted of 2048 blocks and 2006 leaf blocks but this index is somewhat smaller at just 1664 blocks and 1550 leaf blocks.
If we take a look at a partial block dump of a leaf block from the IOT:
Leaf block dump
===============
header address 483926620=0x1cd8225c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 1
kdxcoopc 0x90: opcode=0: iot flags=I— is converted=Y
kdxconco 2
kdxcosdc 2
kdxconro 336
kdxcofbo 708=0x2c4
kdxcofeo 710=0x2c6
kdxcoavs 2
kdxlespl 0
kdxlende 0
kdxlenxt 20976645=0x1401405
kdxleprv 20976643=0x1401403
kdxledsz 0
kdxlebksz 8036
row#0[710] flag: K—S-, lock: 2, len=22
col 0; len 2; (2): c1 08
col 1; len 2; (2): c1 49
tl: 14 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 5] c4 04 2f 10 59
col 1: [ 4] 47 4f 4c 44
row#1[732] flag: K—S-, lock: 2, len=22
col 0; len 2; (2): c1 08
col 1; len 2; (2): c1 4a
tl: 14 fb: –H-FL– lb: 0x0 cc: 2
col 0: [ 5] c4 03 01 03 46
col 1: [ 4] 47 4f 4c 44
Firstly, we notice it’s definitely an IOT block dump as the IOT flag is set.
The structure of the index entry is somewhat different here. It basically consists of:
2 bytes for lock and flag info in the index header as previously
Next come the two Primary Key columns with their corresponding length bytes. Note an IOT must have a PK defined.
Following are 3 bytes for the table header consisting of a lock byte, flag byte and a byte to denote the number of table (non PK) columns (in this case 2).
Followed finally by the 2 Non-PK columns and their corresponding length bytes.
Note the big missing component here from the previous block dump is that there is no rowid defined with its corresponding length byte. No need for a rowid if there’s no corresponding table to point down to …
So the overall overhead has been reduced to:
2 byes for the index header
3 bytes for the table header
4 bytes for the 4 column lengths
for a total of 9 bytes, 4 less than the 13 bytes overhead required in the previous example. So the total length of an index entry has reduced down from 26 bytes to just 22 bytes. Hence, the overall reduction in the size of the corresponding IOT index.
So we have saved 1570 table blocks and 384 index blocks in total.
If we now re-run the same query:
SQL> SELECT * FROM album_sales_iot WHERE album_id = 42; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1834499174 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1800 | 3 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| ALBUM_SALES_IOT_PK | 100 | 1800 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 10 consistent gets 0 physical reads 0 redo size 3575 bytes sent via SQL*Net to client 589 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
Not only have we saved ourselves some storage and having to maintain two physical segments, but things are a tad more efficient as well, reducing the number of consistent gets down from 11 to 10 as the corresponding index segment we need to access is smaller …
Enough to start with for now and yes the pun in the title is fully intended 🙂
Curious Case Of The Ever Increasing Index Solution (A Big Hurt) January 5, 2012
Posted by Richard Foote in ASSM, Indexing Myth, Oracle Indexes, Quiz.6 comments
Based on the excellent comments in the Quiz post, we have some clever cookies out there 🙂
I guess the first thing to point out is that based in the basic scenario provided, the index shouldn’t ordinarily be continually growing in this fashion. Although the index values are monotonically increasing, the deletions are leaving behind fully emptied leaf blocks which can generally be recycled in future block splits.
OK, so why is this index behaving in this fashion, continually to increase in size while the number of rows in the table remains relatively constant ?
Well, there are a number of contributing factors.
As stated, the index values are indeed monotonically increasing so all inserts into the index are hitting the right-most index leaf block within the index structure and the deletions which all occur on the “left-side” of the index are leaving behind leaf blocks that contain nothing but deleted index entries. As Marcus and David mentioned in the comments, the use of a Reverse Key index would therefore alleviate this problem as subsequent inserts will be performed within the same leaf blocks in the index structure, automatically cleaning out and reusing space used by previously deleted index entries. This solution though may create as many problems as it solves (if say range predicate statements relied on a Non-Reverse index).
Additionally, the processing is being performed with a PL/SQL block. Oracle has a whole lot of smarts to make PL/SQL as efficient as possible and so the manner in which code runs within PL/SQL compared to other native languages can vary significantly. Unfortunately at times, these smarts might not be quite so smart after all.
The tablespace used to store the index is a Locally Managed Tablespace (LMT) with Automatic Segment Storage Management (ASSM). Instead of freelists (or freelist groups), Oracle uses a set of bitmap blocks within the index segment to determine the amount of free space available within its blocks and whether a block is available for inserts. As Vyacheslav and Alberto highlighted in the comments, there are a number of “issues” in the manner in which these bitmap blocks are maintained within PL/SQL processing. This is effectively locking out the vast number of these now effectively empty leaf blocks from being recycled and reconsidered for subsequent inserts. By rebuilding the index once in a Manual Segment Space Management (MSSM) tablespace would also alleviate this issue.
The actual processing involved within the PL/SQL block can also have an impact. The procedure contained the following important statement:
select min(id),max(id) into n,m from bowie;
By removing this statement from the PL/SQL block and either manually defining the values to be processed or passing them through to the procedure, can impact how PL/SQL manages the freespace bitmaps within the segment. For example, if one used something similar to the following:
SQL> declare 2 n number:= 1; 3 m number:= 200000; 4 begin 5 for i in 1..200000 loop 6 delete from bowie where id=n+i-1; 7 insert into bowie values(m+i,'David Bowie'); 8 if (i mod 1000=0) then 9 commit; 10 end if; 11 end loop; 12 commit; 13 end; 14 /
The number of leaf blocks allocated will be nowhere as significant as before and stabilised after a few runs to approximate:
SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;
NAME HEIGHT LF_ROWS DEL_LF_ROWS LF_BLKS BR_BLKS ------------ ---------- ---------- ----------- ---------- ---------- BOWIE_PK 3 289040 89040 744 4
Finally, the PL/SQL procedure only performed a Commit after 1000 iterations. This means that there were 1000 deletions performed during a logical transaction. As Greg mentioned in the comments, Unique Key index values can be reused within a single transaction BUT only if the same actual values are reused. The procedure introduces new values for those that have been deleted and so the deleted entries can’t be reused during the same transaction. This means there will be at least 1000 deleted index entries that can’t be reused during the running of the procedure and sufficient additional leaf blocks to accommodate these 1000 index entries will need to be allocated, even if we use some of the solutions mentioned, such as Reverse Key indexes or MSSM tablespaces. By performing either all the necessary deletions within one transaction followed by the inserts or having a Commit for each delete/insert pairing, these additional blocks won’t be required. For example:
SQL> declare 2 n number:= 1; 3 m number:= 200000; 4 begin 5 for i in 1..200000 loop 6 delete from bowie where id=n+i-1; 7 commit; 8 insert into bowie values(m+i,'David Bowie'); 9 commit; 10 end loop; 11 end; 12 /
Although of course, the inefficiencies in the processing or the potential breaking of business rules may not make the index savings worthwhile.
So in summary, there a number of things we could do to fix this scenario, rather than simply periodically rebuilding the index all the time. Depending on applicability, we could convert the index to a Reverse Key index (or say Hash Partition), we could move the index to a MSSM tablespace, we could modify our procedure logic to remove the reference to fetching MIN/MAX values, not use PL/SQL, or to make the index as efficient as possible, change the transactional logic to not perform large numbers of inserts and deletes within the same transaction.
So there’s quite a lot happening within what on the surface looks like a simple piece of PL/SQL 🙂
Curious Case Of The Ever Increasing Index Quiz (She’ll Drive The Big Car) January 4, 2012
Posted by Richard Foote in Index Internals, Indexing Myth, Oracle Indexes, Quiz.22 comments
I received an email recently that had a nice example of what can potentially go wrong with an index.
Let’s first create a simple table with a unique index and populate it with 200,000 rows (following demo run on 11.2.0.1):
SQL> create table bowie (id number constraint bowie_pk primary key, name varchar2(100)); Table created. SQL> insert into bowie select rownum, 'BOWIE' from dual connect by level <= 200000; 200000 rows created. SQL> commit; Commit complete. SQL> analyze index bowie_pk validate structure; Index analyzed. SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats; NAME HEIGHT LF_ROWS DEL_LF_ROWS LF_BLKS BR_BLKS ------------ ---------- ---------- ----------- ---------- ---------- BOWIE_PK 2 200000 0 374 1
So far, everything is as expected. With have an index with 200,000 rows that currently has 374 leaf blocks.
OK, what we want to do is basically gradually delete the current set of rows and replace them with 200,000 new rows, with ever-increasing Primary Key values. To this end, we create the following procedure:
SQL> create or replace procedure delete_insert_rows 2 as 3 n number; 4 m number; 5 begin 6 select min(id),max(id) into n,m from bowie; 7 for i in 1..200000 loop 8 delete from bowie where id=n+i-1; 9 insert into bowie values(m+i,'David Bowie'); 10 if (i mod 1000=0) then 11 commit; 12 end if; 13 end loop; 14 commit; 15 end; 16 / Procedure created.
So the procedure basically determines the current MIN and MAX values of our PK column and gradually deletes the current rows and then inserts new ones. Every 1000 iterations, we commit the changes. Nothing too complex here.
When we run this procedure for the first time:
SQL> exec delete_insert_rows PL/SQL procedure successfully completed. SQL> analyze index bowie_pk validate structure; Index analyzed. SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats; NAME HEIGHT LF_ROWS DEL_LF_ROWS LF_BLKS BR_BLKS ------------ ---------- ---------- ----------- ---------- ---------- BOWIE_PK 2 293820 93820 619 1
We notice we now have a whole bunch of deleted leaf entries and that the index has grown from 374 to 619 leaf blocks.
If we run the procedure again:
SQL> exec delete_insert_rows PL/SQL procedure successfully completed. SQL> analyze index bowie_pk validate structure; Index analyzed. SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats; NAME HEIGHT LF_ROWS DEL_LF_ROWS LF_BLKS BR_BLKS ------------ ---------- ---------- ----------- ---------- ---------- BOWIE_PK 3 347841 147841 994 3
Things have gone even worse. We still only have 200,000 rows in the table but the index now has an additional 147,841 deleted entries and the number of leaf blocks has again increased substantially to 994 leaf blocks.
If we have a look at a partial treedump of the index:
SQL> select object_id from dba_objects where object_name = 'BOWIE_PK'; OBJECT_ID ---------- 74060 SQL> alter session set events 'immediate trace name treedump level 74060'; Session altered.
—– begin tree dump
branch: 0x100378b 16791435 (0: nrow: 2, level: 2)
branch: 0x1003ce0 16792800 (-1: nrow: 733, level: 1)
leaf: 0x100378e 16791438 (-1: nrow: 149 rrow: 0)
leaf: 0x100378f 16791439 (0: nrow: 571 rrow: 0)
leaf: 0x100378c 16791436 (1: nrow: 291 rrow: 0)
leaf: 0x1003795 16791445 (2: nrow: 571 rrow: 0)
leaf: 0x1003796 16791446 (3: nrow: 433 rrow: 0)
leaf: 0x1003797 16791447 (4: nrow: 4 rrow: 0)
leaf: 0x1003790 16791440 (5: nrow: 571 rrow: 0)
leaf: 0x1003791 16791441 (6: nrow: 146 rrow: 0)
leaf: 0x1003792 16791442 (7: nrow: 571 rrow: 0)
leaf: 0x1003793 16791443 (8: nrow: 288 rrow: 0)
leaf: 0x1003794 16791444 (9: nrow: 571 rrow: 0)
leaf: 0x10037a9 16791465 (10: nrow: 430 rrow: 0)
… (most of the treedump has been cut out, following is the last portion of the dump)
leaf: 0x1003e70 16793200 (248: nrow: 533 rrow: 533)
leaf: 0x1003e74 16793204 (249: nrow: 533 rrow: 533)
leaf: 0x1003e78 16793208 (250: nrow: 533 rrow: 533)
leaf: 0x1003e7c 16793212 (251: nrow: 533 rrow: 533)
leaf: 0x1003e41 16793153 (252: nrow: 533 rrow: 533)
leaf: 0x1003e45 16793157 (253: nrow: 533 rrow: 533)
leaf: 0x1003e49 16793161 (254: nrow: 533 rrow: 533)
leaf: 0x1003e4d 16793165 (255: nrow: 533 rrow: 533)
leaf: 0x1003e51 16793169 (256: nrow: 533 rrow: 533)
leaf: 0x1003e3e 16793150 (257: nrow: 533 rrow: 533)
leaf: 0x1003e03 16793091 (258: nrow: 533 rrow: 533)
leaf: 0x1003e07 16793095 (259: nrow: 236 rrow: 236)
—– end tree dump
We notice that the first portion of the index contains leaf blocks with nothing but deleted index entries. The number of rrows is 0 for a vast number of leaf blocks. We also notice that the root block has a rba of 0x100378b 16791435, which is only a few values below some of the rba values of the left most indexes in the index structure (say) 0x100378e 16791438. Therefore, this highlights that even though these left most blocks in the index structure contain nothing but deleted index entries, Oracle is not recycling them as it should do. Oracle is simply adding new blocks to the index structure rather than recycling empty leaf blocks, resulting in the index growing bigger and bigger.
The leaf blocks however at the right most end of the index structure (the second portion of the partial treedump), shows us a nice compact set of leaf blocks with lots of index entries per block (most with 533 per leaf block) and with no deleted index entries (rrows matches the nrows value).
If we run the procedure 10 times in total, we get an index that looks like the following:
SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats; NAME HEIGHT LF_ROWS DEL_LF_ROWS LF_BLKS BR_BLKS ------------ ---------- ---------- ----------- ---------- ---------- BOWIE_PK 3 1325132 1125132 4136 7
We now have 1,125,132 deleted index entries and the index is now over 10 times the original size, up from 374 to a massive 4,136 leaf blocks, even though the table only contains 200,000 rows.
There are a number of contributing factors here 🙂
The question is why, why is the index behaving in this fashion and what can we do to ensure the index doesn’t grow in this manner and can remain basically the same size as we delete and insert new rows into the table ?
Index Organized Tables – A Start (Star) January 3, 2012
Posted by Richard Foote in Index Organized Tables, Oracle Indexes.8 comments
Happy New Year !!
As has been mentioned to me on quite a number of occasions, I’ve haven’t really covered the area of Index Organized Tables (IOTs) on this blog. This despite covering the subject of IOTs in quite some detail in my Index Internals Seminar. Which I readily admit is a bit of a shame as IOTs are often misunderstood little things and I would say are generally way under-utilised in many environments, I would suggest in part because they are so often misunderstood. The only reason I can really give for not discussing them more is that I simply haven’t gotten around to it 🙂
Well the good news as hinted at late last year, it’s about time I addressed this and started writing a few blog pieces on IOTs.
However, the even better news is that the subject of IOTs have already been covered by Martin Widlake in a truly excellent series of articles on his blog. I would strongly recommend giving them a read as they cover many aspects of IOTs in an interesting and informative manner.
I’ll recap some of the basic principles of IOTs here and then discuss various different aspects (such as the joys of Secondary Indexes on IOTs) in my own unique way 🙂
So, all that’s to come in the coming weeks.
However, I recently received an index question via email during the Christmas holidays which I thought was worthy of a blog piece or two. Although I often respond to emails, I much prefer to discuss index related “stuff” on the blog so everyone can share in the knowledge exchange. This is an interesting question because it relates to why an index keeps growing, even though the number of rows in the table remains constant. It’s an even more interesting question because there are a number of contributing factors for this impact on the index which I thought were worth sharing.
No more clues, I’ll post the question and the specific scenario in the next few days …