jump to navigation

Exadata Storage Indexes Part IV – Fast Full Table Scans (Speed of Life) November 8, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
2 comments

OK, let’s look at Storage Indexes in action.

But first, following is the setup for the various demos to come. I basically create one table called BIG_BOWIE that’s about 1GB in size and then simply create another table called DWH_BOWIE where the contents of this are re-insert into itself a few times to get to about a 60GB table. The various columns have differing distinct values and distributions of data.

I used an X2-2 1/2 rack as my toy and yes, once people saw the table names instantly knew who created them :)

SQL> create table big_bowie (id number not null, album_id number not null, artist_id number not null,
format_id number, release_date date, total_sales number, description varchar2(100));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> create or replace procedure pop_big_bowie as
  2  begin
  3     for v_album_id in 1..100 loop
  4         for v_artist_id in 1..100000 loop
  5             insert into big_bowie values (bowie_seq.nextval, v_album_id, v_artist_id, ceil(dbms_random.value(0,5)) * 2,
  6               trunc(sysdate-ceil(dbms_random.value(0,10000))), ceil(dbms_random.value(0,500000)),
'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS');
  7         end loop;
  8         commit;
  9     end loop;
 10     commit;
 11  end;
 12  /

Procedure created.

SQL> exec pop_big_bowie

PL/SQL procedure successfully completed.

I modified some of the data to have a few occurrences of some specific data. This will be used on a later post.

SQL> update big_bowie set format_id = 3 where mod(id,10000)=0;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> update big_bowie set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE‘, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select blocks from dba_tables where table_name = 'BIG_BOWIE';

    BLOCKS
----------
    134809

Like I said, another DWH_BOWIE table as basically created with this data re-inserted into itself 6 times to create roughly a 60GB table.

OK, let’s now run an “expensive” query on this bigger table without any of the Exadata smart scan magic enabled:

SQL> alter session set cell_offload_processing=false;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:04:43.38

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    8644512  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

So it takes about 4 mins and 43 secs to return the necessary 266,176 rows from an approx. 60GB table (that’s 8.6 million 8K blocks). Not bad really.

However, if we run the same query with the Exadata smarts enabled:

SQL> alter session set cell_offload_processing=true;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:00:03.97

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS STORAGE FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)
       filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    8626963  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

We see that execution times reduce significantly, down to just 4 secs. So we manage to read via a FTS a 60GB table in under 4 seconds, not bad at all.

A hint that something different might have occurred here is the appearance of the “storage” predicate listing. This basically tells us that a smart scan “might” have occurred. Note though that the “reported” physical reads and consistent gets as reported by the database is basically the same as the previous run. More on this in later posts.

The reason for this improvement is due in large part to the use of Exadata Storage Indexes. As discussed previously, a Storage Index can potentially automatically avoid having to read significant portions of data by determining areas of storage that can’t possibly contain data of interest. This is a classic example of Storage Indexes in operation, putting into practice the notion that the quickest way to do something is to avoid doing as much work as possible.

But how to tell whether a Storage Index really did kick in and how much work did it actually save ?

The V$SQL view now has a number of additional columns that provides useful information:

SQL> select sql_text, io_cell_offload_eligible_bytes, io_interconnect_bytes, io_cell_uncompressed_bytes, io_cell_offload_returned_bytes
     from v$sql where sql_id = 'admdhphbh1a1c';

SQL_TEXT
--------------------------------------------------------------------------------
IO_CELL_OFFLOAD_ELIGIBLE_BYTES IO_INTERCONNECT_BYTES IO_CELL_UNCOMPRESSED_BYTES
------------------------------ --------------------- --------------------------
IO_CELL_OFFLOAD_RETURNED_BYTES
------------------------------
select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200
                    1.4132E+11              55217792                 1.6718E+10
                      55217792

There are also numerous new statistics which begin with ‘cell ‘ that I generally capture before and after a particular operation to see the storage related workloads. Two statistics I find particularly useful are:

SQL> select name , value/1024/1024 MB from v$statname n,  v$mystat s
where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME                                                                MB
----------------------------------------------------------- ----------
cell physical IO bytes saved by storage index               59414.9453
cell physical IO interconnect bytes returned by smart scan   26.329895

The cell physical IO bytes saved by storage index statistic denotes how much storage has not had to be read due to Storage Indexes. So this tells us just how useful Storage Indexes have been in reducing physical IO operations. The cell physical IO interconnect bytes returned by smart scan statistic denotes how much data has actually been returned to the database servers as a result of a smart scan.

As the above numbers highlight (note this last query had been the only activity within the session), the Storage Indexes were highly effective and were able to physically skip reading the vast majority of the table (59,414MB) during the Full Table Scan operation and that only a  relatively small amount data (26MB) had to be returned back to the database servers.

By not having to read most of the data, the resultant Full Table Scan on this relatively large table was completed not in minutes as previously, but in a matter of a few seconds.

The potential power of Storage Indexes …

Exadata Storage Indexes Part III – Similarities With Database Indexes (Same Old Scene) October 15, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
2 comments

As discussed previously, there are quite a number of differences between Storage Indexes (SIs) and Database Indexes (DIs). However, there are also a number similarities between both of them as well.

The obvious one is that they’re both designed specifically to reduce the overheads associated with retrieving the required data out of the database. Both index structures provides a method by which Oracle can avoid having to read every row/block in a table when searching for data of interest. It’s just the actual implementation of this mechanism that differs between the two general index types as I’ve previously discussed.

The efficiency of both index types is very largely dependant upon the clustering of the indexed data within the table (i.e. the index Clustoring Factor). The better the clustering of the related indexed data within the table, the more tightly coupled the required data is likely to be and so the more efficient both index types would be in avoiding accessing unnecessary data. If the required data were to be less well clustered and randomly distributed throughout the table, the less efficient would be both index types in retrieving the necessary data. Some actual examples of this to come in future posts.

Both index types have a period of “warming up” before being fully effective. It’s simply the manner in which this warming up process occurs that differs between the two. Database indexes on a freshly bounced database server initially incur substantial physical I/Os (PIOs) until the related index/table data is cached within the database buffer cache (and indeed in the flash cache). These PIOs can significantly reduce the performance of the SQL plans utilising database indexes. Storages indexes on a freshly bounced storage server need to be recreated and can’t immediately eliminate accessing unnecessary storage regions. This can significantly reduce the performance of Full Table Scans until the associated SIs are fully created. Again, some actual demos on all this to come in future posts.

Both index types can use “Index Combine” like logic and use multiple indexes in combination with each other to further reduce the actual number of table blocks that need to be accessed to retrieve the data of interest. Storage and Bitmap database indexes are especially suited to these types of index combine operations, although B-Tree indexes can also be used in this manner.

Both Oracle index types are really quite interesting and often misunderstood and so meets the general theme of this blog, meaning I can quite happily blog about them without shocking too many people in the process :)

Like I said, more to come …

Exadata Storage Indexes Part II – Differences With Database Indexes (Space Dementia) October 9, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
14 comments

Let’s explore some of the key differences between Storage Indexes (SI) and Database Indexes (DI). In no particular order, they include:

SIs are structures that exist only within the storage servers of an Exadata box, while DIs logically exist and can be accessed within the database servers.

SIs are purely memory only structures while DIs are physical segments that take up storage. As such, DIs are relatively expensive to both create and subsequently maintain as they generate considerable undo and redo within the database, can cause concurrency issues and require storage resources. SIs meanwhile require no physical storage and have little impact on DML operations.

SIs are generated automatically and transparently while DIs generally need to be explicitly created (except in some scenarios such as in the creation of Primary/Unique keys when they can be implicitly created).

SIs being memory only structures are transient in that if a storage server were to be restarted, the corresponding SIs are lost and need to be re-created. Additionally, Oracle may decide to drop a SI for a particular column and create one on a different column depending on current load and conditions. DIs are permanent objects that need to be explicitly dropped (except in some scenarios such as the dropping/disabling of Primary/Unique Key constraints when they can be implicitly dropped).

SIs can be stored in memory as they contain very brief summary information, just the min/max value and a null flag for each 1MB storage region. A corresponding DI (especially a B-Tree) would generally be significantly larger as it needs to store all indexed values from the table with associated rowids (unless compressed but still likely much larger even so).

SIs can index only a portion of a table at a specific point in time as they get generated and dropped (see above). DIs index the entire table/partition (prior to 12c), unless using smarts such as decode function-based indexes (which also index the entire table but based only on the results of the function).

SIs are limited to only 8 columns whereas DIs have no such limitations per table.

SIs reference a 1MB storage region whereas a DI references a specific database block (say 8K). Therefore, a DI is more “focused” in terms of the minimum amount of data that needs to be accessed.

SIs basically work by determining which areas of storage can not possibly contain data of interest, accessing just those storage regions that might contain data of interest. Therefore, it’s quite possible for a SI to generate false positives by having to access storage that might in the end not actually contain data of interest after all. A DI meanwhile via the rowid explicitly points to the exact location of an indexed value and does not generate false positives. A bitmap index is a little different in the manner in which rowids are stored and generated (and can have 0 bits set for rows that don’t actually exist) but again do not generate false positives when actually accessing the table blocks.

SIs are only used during Smart Scan operations, which in turn are only performed during direct-reads of full scans of larger database segments (tables / indexes / materialized views and partitions thereof). Therefore SIs are only used when DIs are not.

As SIs access data during a Smart Scan, the resultant data by-passes the Database Buffer Cache and can not be re-used by subsequent database users/operations. Therefore, SI accessed data may need to be frequently physically re-accessed. DIs perform single block reads (except for Fast Full Index Scans) which are cached in the Database Buffer Cache and which can therefore be globally reused by the database. Once cached, it may be unnecessary to subsequently physically re-access the DI retrieved data.

SIs are used, even if the majority of the data needs to be accessed regardless. As SIs are only used during a FTS, the concept of only using an index when it’s the cheaper alternative doesn’t apply to SIs. If a SI can save (say) just 5% of physical I/Os during a FTS, it’s better than no savings at all. DIs meanwhile are only used when the Cost Based Optimizer (CBO) considers it the cheapest option when accessing data.

As SIs are storage based structures, the CBO has no knowledge of their existence and play no part in the CBO cost calculations. DIs are fully known to the CBO and the DI related statistics are an important factor in the CBO calculations. The CBO only determines whether a FTS is the cheaper alternative, however the decision to perform a Direct-Read operation and so potentially enable the use of SIs is a run-time decision not made by the CBO.

SIs can be effectively used for IS NULL predicates, thanks to the null existence flag component of the SI. B-Tree Indexes can’t if all indexed columns are null (as such entries are not indexed) although Bitmap indexes can.

SIs can not be used to police Primary/Unique constraints. DIs can.

SIs can not be used to avoid performance issues in relation to Foreign Keys (such as locking implications and FTS requirements on child tables when deleting parent rows). DIs can.

SIs can not avoid sort operations. DIs can as data read via an index range scan is guaranteed to be returned in the order of the index entries.

SIs can not provide additional statistical information to the CBO, such as accurate selectivity information in multi-column predicates available in concatenated index distinct keys statistics. DIs can.

SIs can not be used to efficiently access the MIN/MAX of a column. DIs can.

Function-Based SIs are not supported. Function-Based DIs are supported.

SIs can not be treated as smaller tables and used as an alternative by the CBO to access just index related data, eg. select count(*), select indexed_column, etc. as SIs do not contain all the required data and are not visible to the CBO anyways. DIs can be treated as smaller tables and accessed accordingly if appropriate.

OK, that’s enough of a list for now to get one thinking about some of these differences :)

In the following posts, I’ll go through the benefits of SIs and show examples of how they’re implemented and used by Oracle.

250 Posts !! October 5, 2012

Posted by Richard Foote in Richard's Musings.
7 comments

WordPress reported that my last post on Exadata Storage Indexes was my 250th !!

Who would have thought …

:)

Exadata Storage Indexes – Part I (Beginning To See The Light) October 4, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
12 comments

Thought I might discuss Exadata Storage Indexes, explore what they are, how they work and discuss their advantages and disadvantages. Following is but a brief introduction on this very cool Exadata feature.

A Storage Index basically maintains summary information about database table data down on the Exadata storage servers. The information consists of the minimum value, the maximum value and a flag to denote whether any Nulls exist for up to 8 different columns within each 1M storage region of disk.

My little diagram above attempts to describe this (click on image for a larger version).

The Storage Indexes are created automatically and transparently based on the SQL predicate information executed by Oracle and passed down to the storage servers from the database servers. Storage Indexes take up no physical storage of themselves and are built and maintained entirely in memory. As only this very basic summary information is stored for a maximum of 8 columns for each 1M of storage, Storage Indexes are very lightweight and can be created and maintained with minimal general overheads.

So how are they used ?

During an Exadata Smart Scan, Oracle can perform predicate filtering down at the storage layer and so only return just the rows of interest back up to the database. As part of this process, Oracle can use the Storage Indexes to visit just the 1M storage regions that can potentially contain rows of interest. Those storage regions that can’t possibly contain data of interest can be eliminated and not accessed at all during the Smart Scan operation.

So, in a very simple example, if we have an SQL predicate such as WHERE CODE = 5, if a corresponding Storage Index on the CODE column of the first 1M region of the table has MIN=2 and MAX=10, Oracle would need to access this portion of the table as the CODE value of interest could potentially exist here. However, if the next 1M storage region had a CODE Storage Index with a MIN=7 and MAX=12, then the CODE value of 5 can’t possibly exist within this portion of the table and can be ignored and not accessed at all during the Smart Scan.

So depending on the column, predicates and data distribution, a Storage Index can potentially eliminate having to physically access significant portions of a table during a Smart Scan. In an extreme example, on a search of a CODE value = 42 where the maximum CODE value that actually exists is say 35, a Smart Scan can perform a so-called Full Table Scan (FTS) via a Storage Index that doesn’t actually have to perform any physical I/O at all and can ignore the entire table.

The less physical I/O performed, the less work required and the faster the response time. If a FTS of say a 500GB table only had to physically read and access a few MBs here and there AND just return the data of interest back to the database servers, that would significantly improve the overall performance and overheads associated with the FTS.

Storage Indexes can be very cool indeed.

Much more to come …

Indexes: Oracle11g New Features Presentation (Get Back) September 19, 2012

Posted by Richard Foote in 11g, 11g New features, Oracle Indexes.
5 comments

I’m in the early stages of compiling an Oracle12c Indexes New Features presentation so I thought I might make available the 11g version I’ve presented previously at Oracle OpenWorld and InSync conferences:

http://richardfoote.files.wordpress.com/2012/09/oracle-indexing-new-features-oracle-11g-release-1-and.pdf

Enjoy :)

InSync12 Conference: Melbourne 21-22 August 2012 July 10, 2012

Posted by Richard Foote in InSync12.
2 comments

I’ll again be presenting at this year’s InSync12 Conference, this year held in Melbourne on 21-22 August at the Sebel Albert Park Hotel. It’s Australia’s main Oracle conference of the year and should again be an excellent event with lots of great speakers.

I’ll be presenting my paper on “Indexing In Exadata“, which should be of interest even if you don’t have an Exadata server at hand. The conference program has just been released, I’ll be presenting at 9:00am on the Wednesday, 22 August 2012.

The day following the conference (Thursday, 23 August 2012), I’ll also be presenting the InSync12 Technology Workshop which is included in the registration fee for full delegates of InSync12. I’ll be presenting a mini version of my Oracle Index Internals seminar. You can find more details of this workshop here.

Hope to catch up with you at InSync12 :)

E4 2012 Enkitec Extreme Exadata Expo (Blue Jean) June 14, 2012

Posted by Richard Foote in Advert, Exadata, Oracle Indexes.
6 comments

I’m very pleased to have been invited to speak at the E4 2012 Enkitec Exadata Expo to be held in Dallas, USA on 13-14 August.

It’s basically the first ever conference that’s dedicated exclusively to Oracle’s exciting Exadata platform. It should be a fabulous event, featuring some of the best Oracle talent going around, including Jonathan Lewis, Tanel Poder, Kerry Osbourne, Cary Millsap, Arup Nanda, Fritz Hoogland, Doug Burns, Karen Morton, Maria Colgan and Peter Bach to name but a few. The full list of speakers is listed here.

My talk will naturally be on “Indexing In Exadata“, the abstract being:

There’s often confusion regarding how indexing requirements may change when moving to Exadata, with some even suggesting that indexes are perhaps no longer required at all. Considering indexes can consume a considerable proportion of total storage within a database and can be crucial to general database performance, care needs to be taken to fully consider indexing requirements when moving to Exadata. This presentation will discuss the indexing structures unique to Exadata, how indexing considerations change (and don’t change), how database usage is critical to indexing requirements and how to implement safely an appropriate indexing strategy when migrating to Exadata that will ensure indexes get used when appropriate without compromising Exadata specific features such as Smart Scans and Storage Indexes.”

I’ve had the opportunity to research and have a good play with Exadata since re-joining the Oracle mothership so I’m looking forward to sharing some of what I’ve learnt.

Hopefully, you can make this historic event. If not in person, you might still be able to attend the event in a virtual capacity.

I’m hoping to confirm my attendance at another conference in the very near future. More details soon.

And yes, I’m planning to blog at some point about the challenges of having an appropriate indexing strategy on Exadata, busting a few myths in the making while I’m at it :)

Indexes vs. Full Table Scan: Picture vs. 1000 Words (Pictures Of Lily) June 8, 2012

Posted by Richard Foote in CBO, Clustering Factor, Oracle Indexes.
19 comments

I’m in the process of writing a number of new presentations and in one I’ve included a favorite little graph of mine that I’ve used over the years to help illustrate the relationship between the cost of using an index vs. the cost of using a Full Table Scan (FTS). It’s occurred to me that I’ve never actually shared this graph on this blog, so I thought it about time I did.

The Cost Based Optimizer (CBO) when choosing between an index scan and a FTS will simply go for the cheapest option. The more rows that are retrieved (or the greater the percentage of rows retrieved), the more expensive the index option as it needs to perform more logical I/Os. There will generally be a point when the selectivity of  a query is such, that so many rows are retrieved, that the index costs will increase beyond those of the FTS and the FTS becomes the cheaper option.

The cost of a FTS meanwhile is pretty well constant regardless of  the number of rows retrieved. It needs to read all the blocks in the table, whatever the selectivity of the query.

Although I’ve not quite reached 1000 words, the below graph illustrates this point:

The red line represents the constant cost of the FTS. The green lines represents the cost of using various indexes, which increases as more rows are retrieved. The “steepness” of the green line and the subsequent increase in cost of the index as more rows are retrieved is due entirely to the Clustering Factor of the index. The steeper the line, the worse (higher) the Clustering Factor, the less efficient the index and the quicker we get to the point when the FTS becomes cheaper. The less steep the line, the better (lower) the Clustering Factor, the more efficient the index and the longer it takes for the FTS to become the cheaper option.

In some rarer cases, the index might be so efficient (or the FTS so inefficient) that the index never reaches the point of the FTS and the CBO decides it’s overall cheaper for the index to potentially access 100% of all rows in a table rather than via a FTS.

Ok, so now you have almost 1000 words and the picture :)

Cost of Virtual Indexes (Little Lies) May 30, 2012

Posted by Richard Foote in CBO, Clustering Factor, Fake Indexes, Oracle Indexes, Virtual Indexes.
5 comments

I’ve previously discussed Virtual Indexes and how they can be used to do basic “what if” analysis if such an index really existed. However, a recent comment on the OTN forums regarding using them to compare index costs made me think a follow-up post regarding the dangers of Virtual Indexes might be warranted.

The big advantage of a Virtual Index of course is that it doesn’t really exist and so consumes no storage and can be created extremely quickly/cheaply. The disadvantage of a Virtual index is that it doesn’t really exist and so Oracle can not collect segment level statistics. Without statistics however, the CBO has a very tough time of doing its job properly …

To illustrate, a simple little demo. I begin by creating a table in which the data in the table is stored in CODE column order. An index on the CODE column would therefore have an excellent (very low) Clustering Factor. Note that the Clustering Factor is the most important index related statistic regarding the efficiency and potential cost of using the index.

SQL> create table bowie_ordered (id number, code number, name varchar2(30));

Table created.

SQL> create sequence bowie_seq;

Sequence created.

SQL> declare
  2  begin
  3  for i in 1..100 loop
  4     for j in 1..10000 loop
  5        insert into bowie_ordered values (bowie_seq.nextval, i, 'DAVID BOWIE');
  6     end loop;
  7  end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

OK, I’m now going to create a Virtual Index on the CODE column and collect 100% accurate statistics on the table:

SQL> create index bowie_ordered_i on bowie_ordered(code) nosegment;

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_ORDERED', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

Oracle allows statistics to be collected on the table and associated Virtual Index (so that existing statistic gathering jobs won’t now fail), however without an associated segment, no statistics can actually be derived for the index itself.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_name='BOWIE_ORDERED_I';

no rows selected

I’ll list the system statistics so anyone who wants to replicate the demo can get similar results (the database blocksize is 8K):

SQL> SELECT pname, pval1 FROM SYS.AUX_STATS$
     WHERE pname IN ('SREADTIM', 'MREADTIM', 'MBRC', 'CPUSPEED');

PNAME                               PVAL1
------------------------------ ----------
SREADTIM                                2
MREADTIM                               10
CPUSPEED                             1000
MBRC                                   20

If we run the following query:

SQL> set arraysize 5000
SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1678744259

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_ORDERED | 10000 |   195K|  1005  (13)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We note that the CBO uses a Full Table Scan as the CBO has no real choice here as Virtual Indexes are not considered by default.

However, if we change the following hidden parameter and re-run:

SQL> alter session set "_use_nosegment_indexes" = true;

Session altered.

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|        9(0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|        9(0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |        1(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3471  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice a few interesting details. Firstly, the CBO has decided to use the Virtual Index however the number of consistent gets remains the same as the previous run so we can clearly see that behind the covers, the Full Table Scan is still performed. The index is not “really there” and so at execution time, the SQL statement is reparsed using the next best available plan.

If we look at the execution plan costs, both the estimate row (10000) and byte values are spot on as these statistics are based on the underlining table/column statistics and the 100 distinct CODE values are evenly distributed. However, the index related costs look remarkably low. Just a cost of 1 to read the index and extract 10,000 index entries (that means an index entry is less than 1 byte in length on average !!). Just a cost of 9 to visit the table and read 10,000 rows. Even with the most efficient of physical indexes, these costings are not realistic and are based on highly questionable default metrics.

Basically, the creation of this Virtual Column is telling us that there is no reason why the index couldn’t potentially be used, IF (that’s a big IF in case no-one noticed) the actual index related statistics are such that the CBO determines the index to be the cheaper option. But it depends on the actual characteristics of the index which can’t be accurately determined until it’s been physically created.

As the Virtual Index suggests the index might be used if it existed, let’s now create it for real:

SQL> drop index bowie_ordered_i;

Index dropped.

SQL> create index bowie_ordered_i on bowie_ordered(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_ORDERED_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_ORDERED_I                   1000000              3546

As predicted, a Clustering Factor of 3546 on an index with 1M index entries is indeed nice and low.

If we now re-run the query again:

SQL> select * from bowie_ordered where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1860500051

-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 | 10000 |   195K|       60(4)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED   | 10000 |   195K|       60(4)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ORDERED_I | 10000 |       |       23(5)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         61  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

Indeed, the index has been used by the CBO. However, note that the costs are substantially higher (and more accurate) than previously suggested with the Virtual Index. Indeed the final cost of 60 is very close to the number of consistent gets (61) required by the execution plan and so suggests the CBO is making reasonable index based calculations here.

OK, another demo, but this time with a table in which the CODE values are distributed throughout the whole table (rather than being perfectly clustered together as in the first example):

SQL> create table bowie_random (id number, code number, name varchar2(30));

Table created.

SQL> insert into bowie_random select rownum, mod(rownum,100)+1, 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_RANDOM', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

When we now create a Virtual Index based on the CODE column and re-run the same query:

SQL> create index bowie_random_i on bowie_random(code) nosegment;

Index created.

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We notice that both the execution plan and all the associated costs are identical to those of the previous example. So although the actual Clustering Factor of the index is likely to be dramatically greater here than it was in the previous example and so likely dramatically impact the costs associated with using this index, the Virtual Index is treated and costed identically. This is the simple consequence of not having the physical index structure by which to calculate the appropriate segment statistics.

If we now physically create this index for real:

SQL> drop index bowie_random_i;

Index dropped.

SQL> create index bowie_random_i on bowie_random(code);

Index created.

SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_
name='BOWIE_RANDOM_I';

INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
BOWIE_RANDOM_I                    1000000            344700

We can see that indeed the Clustering Factor is dramatically worse than before, increasing here from 3546 to 344700.

If we now re-run the query:

SQL> select * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1983602984

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   195K|  1005  (13)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE_RANDOM | 10000 |   195K|  1005  (13)| 00:00:03 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3470  consistent gets
          0  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

We see the CBO has decided to perform the now cheaper Full Table Scan. Although the Virtual Index on this column was used, once the actual characteristics of the index are determined via the index statistics, the CBO has decided the actual physical index was just too expensive to use to retrieve the 1% of rows.

If we re-run the query with an index hint:

SQL> select /*+ index (bowie_random) */ * from bowie_random where code = 42;

10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 711259049

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 10000 |   195K|  3483   (1)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM   | 10000 |   195K|  3483   (1)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_RANDOM_I | 10000 |       |    23   (5)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3472  consistent gets
         21  physical reads
          0  redo size
     100802  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

The index is used but we can see why at a cost of 3483, the Full Table Scan at a cost of only 1005 was selected by the CBO.

Virtual Indexes can be useful to quickly determine whether an index is a viable option if it were to be actually created. However, caution needs to be exercised if Virtual Indexes are used for cost comparison purposes and although Virtual Indexes might be  used by the CBO, it might be another story entirely once the index is physically created and the actual index related statistics determined.

Index Rebuild – Does it use the Index or the Table ? (Nothing Touches Me) May 15, 2012

Posted by Richard Foote in Index Rebuild, Oracle Indexes, Secondary Indexes.
10 comments

A common question that gets asked is does Oracle access the index itself or the parent table during an index rebuild to extract the necessary data for the index ? Thought it might be worth a blog post to discuss.

Now if the index is currently in an UNUSABLE state, then Oracle clearly can’t use the existing index during the index rebuild operation. So we’ll assume both table and index are hunky dory.

OK, to setup the first demo (using 11.2.0.1), we create and populate a table and index with the index being somewhat smaller than the parent table as is most common:

SQL> create table bowie (id number, code number, name1 varchar2(30), name2 varchar2(30), name3 varchar2(30), name4 varchar2(30), name5 varchar2(30), name6 varchar2(30), name7 varchar2(30), name8 varchar2(30), name9 varchar2(30), name10 varchar2(30));

Table created.

SQL> insert into bowie select rownum, mod(rownum, 100), 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE','DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_code_i on bowie(code);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>null, cascade=> true);

PL/SQL procedure successfully completed.

If we look at the corresponding size of table and index:

SQL> select table_name, blocks from dba_tables where table_name = 'BOWIE';

TABLE_NAME                         BLOCKS
------------------------------ ----------
BOWIE                               19277

SQL> select index_name, leaf_blocks from dba_indexes where index_name = 'BOWIE_CODE_I';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE_CODE_I                          1948

As is common, the table is somewhat larger than the corresponding index.

Now in my first demo, I’m just going to perform a normal offline Index Rebuild. I’ll however trace the session to see what might be happening behind the scenes (the good old alter session set events ‘10046 trace name context forever, level 12′; still does the job). I’ll also flush the buffer cache as well to ensure the trace file shows me which blocks from which object get accessed.

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

There’s lots of information of interest in the resultant trace file, well, for someone with an unhealthy interest in Oracle indexes anyways :) However, the portion that’s of direct interest in this discussion is to see which object Oracle accesses in order to read the necessary data for the index rebuild. The trace file will contain a relatively extensive section with the following wait events (the following is just a short sample):

WAIT #6: nam=’db file scattered read’ ela= 933 file#=4 block#=79339 blocks=5 obj#=75737 tim=20402099526
WAIT #6: nam=’db file scattered read’ ela= 1016 file#=4 block#=79344 blocks=8 obj#=75737 tim=20402102334
WAIT #6: nam=’db file scattered read’ ela= 978 file#=4 block#=79353 blocks=7 obj#=75737 tim=20402106904
WAIT #6: nam=’db file scattered read’ ela= 9519 file#=4 block#=80000 blocks=8 obj#=75737 tim=20402119605
WAIT #6: nam=’db file scattered read’ ela= 2800 file#=4 block#=80009 blocks=7 obj#=75737 tim=20402131869

….

If we query the database for the identity of object 75737:

SQL> select object_name from dba_objects where object_id = 75737;

OBJECT_NAME
-----------------------

BOWIE_CODE_I

We can see that Oracle has accessed the data from the Index itself, using multi-block reads. As the index is the smallest segment that contains the necessary data, Oracle can very efficiently read all the required data (the expensive bit) from the index itself, perform a sort of all the data (as a multi-block read will not return the data in a sorted format) and complete the rebuild process relatively quickly. Note the table is locked throughout the entire index rebuild operation preventing DML operations on the table/index and so for an offline index rebuild, Oracle can access the Index segment without complication.

I’m going to repeat the same process but this time perform an Online index rebuild operation:

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie_code_i rebuild online;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

We notice this time there are many more wait events than previously and that another object is referenced:

WAIT #5: nam=’db file scattered read’ ela= 8259 file#=4 block#=5635 blocks=5 obj#=75736 tim=4520179453
WAIT #5: nam=’db file scattered read’ ela= 1656 file#=4 block#=5640 blocks=8 obj#=75736 tim=4520181368
WAIT #5: nam=’db file scattered read’ ela= 891 file#=4 block#=5649 blocks=7 obj#=75736 tim=4520182459
WAIT #5: nam=’db file scattered read’ ela= 886 file#=4 block#=5656 blocks=8 obj#=75736 tim=4520183544
WAIT #5: nam=’db file scattered read’ ela= 827 file#=4 block#=5665 blocks=7 obj#=75736 tim=4520184579

SQL> select object_name from dba_objects where object_id = 75736;

OBJECT_NAME
-------------------------

BOWIE

This time, the much larger BOWIE parent table has been accessed. So with an Online rebuild, Oracle is forced to use the parent table to access the data for the rebuild operation due to the concurrency issues associated with changes being permitted to the underlying table/index during the rebuild process. So although an online index rebuild has availability advantages, it comes at the cost of having to access the parent table which can result in much additional I/O operations. So if you don’t have availability concerns, an offline index rebuild is probably going to be the more efficient option.

In fact, Oracle can be quite clever in deciding which object to access with an offline rebuild …

In this next example, I’m going to create another table/index, only this time the index is somewhat larger than the parent table. This scenario is less common but certainly possible depending on circumstances:

SQL> create table bowie2 (id number, code number, name varchar2(30));

Table created.

SQL> insert into bowie2 select rownum, mod(rownum,100), 'DAVID BOWIE' from dual connect by level<= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie2_code_i on bowie2(code) pctfree 90;

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=>null, cascade=> true);

PL/SQL procedure successfully completed.

SQL> select table_name, blocks from dba_tables where table_name = 'BOWIE2';

TABLE_NAME                         BLOCKS
------------------------------ ----------
BOWIE2                               3520

SQL> select index_name, leaf_blocks from dba_indexes where index_name = 'BOWIE2_CODE_I';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE2_CODE_I                        21726

So the index is indeed much larger than the table. Which object will Oracle access now during an offline rebuild ?

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie2_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

A look at the trace file reveals:

WAIT #15: nam=’db file scattered read’ ela= 2278 file#=4 block#=81723 blocks=5 obj#=75744 tim=8570990574
WAIT #15: nam=’db file scattered read’ ela= 2733 file#=4 block#=81728 blocks=8 obj#=75744 tim=8570994765
WAIT #15: nam=’db file scattered read’ ela= 2398 file#=4 block#=81737 blocks=7 obj#=75744 tim=8570999057
WAIT #15: nam=’db file scattered read’ ela= 2661 file#=4 block#=81744 blocks=8 obj#=75744 tim=8571003369
WAIT #15: nam=’db file scattered read’ ela= 1918 file#=4 block#=81753 blocks=7 obj#=75744 tim=8571006709

SQL> select object_name from dba_objects where object_id = 75744;

OBJECT_NAME
----------------------------

BOWIE2

In this case, the smaller table segment is accessed. So during an offline rebuild, Oracle will access either the table or index, depending on which one is smaller and cheaper to read.

What if we now create another index that also contains the CODE column which is smaller than both the table and the existing index.

SQL> create index bowie2_code_id_i on bowie2(code, id);

Index created.

SQL> select index_name, leaf_blocks from dba_indexes where index_name = 'BOWIE2_CODE_ID_I';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE2_CODE_ID_I                      2642

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie2_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

A look at the trace file reveals:

WAIT #6: nam=’db file scattered read’ ela= 2070 file#=4 block#=85179 blocks=5 obj#=75747 tim=8925949081
WAIT #6: nam=’db file scattered read’ ela= 2864 file#=4 block#=85184 blocks=8 obj#=75747 tim=8925957161
WAIT #6: nam=’db file scattered read’ ela= 2605 file#=4 block#=85193 blocks=7 obj#=75747 tim=8925969901
WAIT #6: nam=’db file scattered read’ ela= 10636 file#=4 block#=85536 blocks=8 obj#=75747 tim=8925989726
WAIT #6: nam=’db file scattered read’ ela= 2188 file#=4 block#=85545 blocks=7 obj#=75747 tim=8925996890

SQL> select object_name from dba_objects where object_id = 75747;

OBJECT_NAME
------------------------------

BOWIE2_CODE_ID_I

In this case, the smaller alterative index is actually accessed. So it might not be the table or the index being rebuilt that gets accessed, but the smallest segment that contains the data of interest which in this case is another index entirely.

My final little demo brings me back to the subject of secondary indexes on Index Organized Tables (IOTs) I’ve been recently discussing. In this example, I create an IOT and a much smaller secondary index:

SQL> create table bowie3 (id number constraint bowie_pk primary key, code number, name1 varchar2(30), name2 varchar2(30), name3 varchar2(30), name4 varchar2(30), name5 varchar2 (30), name6 varchar2(30), name7 varchar2(30), name8 varchar2(30), name9 varchar2(30), name10 varchar2(30)) organization index;

Table created.

SQL> insert into bowie3 select rownum, mod(rownum, 100), 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE', 'DAVID BOWIE','DAVID BOWIE','DAVID BOWIE', 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie3_code_i on bowie3(code);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE3', estimate_percent=>null, cascade=> true);

PL/SQL procedure successfully completed.

SQL> select index_name, leaf_blocks from dba_indexes where table_name = 'BOWIE3';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE_PK                             16950
BOWIE3_CODE_I                         2782

So the secondary index is much smaller. However, if I rebuild it offline:

SQL> alter system flush buffer_cache;

System altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> alter index bowie3_code_i rebuild;

Index altered.

SQL> alter session set events '10046 trace name context off';

Session altered.

A look at the trace file reveals:

WAIT #5: nam=’db file scattered read’ ela= 13019 file#=4 block#=217856 blocks=4 obj#=75733 tim=8949436015
WAIT #5: nam=’db file scattered read’ ela= 1869 file#=4 block#=72915 blocks=5 obj#=75733 tim=8949438360
WAIT #5: nam=’db file scattered read’ ela= 3023 file#=4 block#=72920 blocks=8 obj#=75733 tim=8949442877
WAIT #5: nam=’db file scattered read’ ela= 2381 file#=4 block#=72929 blocks=7 obj#=75733 tim=8949448410
WAIT #5: nam=’db file scattered read’ ela= 2613 file#=4 block#=72936 blocks=8 obj#=75733 tim=8949453521

SQL> select object_name from dba_objects where object_id = 75733;

OBJECT_NAME
---------------------------

BOWIE_PK

In this case, we see that the much larger IOT PK segment is accessed and not the smaller secondary index. When rebuilding the secondary index of an IOT, Oracle has no choice but to access the parent IOT PK segment itself as of course the secondary index doesn’t contain all the necessary information required for the index rebuild operation. The physical guess component within the secondary index might be stale and the only way for Oracle to determine the correct current address of all the rows is to access the IOT PK segment. This is another disadvantage of secondary indexes associated with IOTs, even offline index rebuilds must access the potentially much larger IOT PK segment in order to ensure the correctness of the physical guess components of the logical rowids.

So the general answer of whether an index rebuild accesses the table or index is that it depends and that it could very well be neither of them …

IOT Secondary Indexes – The Logical ROWID Guess Component Part II (Move On) May 8, 2012

Posted by Richard Foote in Index Block Size, Index Organized Tables, IOT, ROWID, Secondary Indexes.
7 comments

Having mentioned a couple of dangers associated with IOT Secondary Indexes, thought I might discuss a couple of their nicer attributes.

In the previous post, we saw how 50-50 index block splits on the ALBUM_SALES_IOT IOT table caused rows to move to new leaf blocks, resulting in a degradation in the PCT_DIRECT_ACCESS value of the associated ALBUM_SALES_IOT_TOTAL_SALES_I secondary index, which in turn resulted in poorer performance when using this index. We had to rebuild the secondary index (or update block references) to make all the “guess” components accurate and the index efficient again and so point to the correct locations within the parent IOT.

So, if you have 50-50 block splits occurring in your IOT, this will degrade the efficiency of the associated IOT Secondary indexes over time.

However, if you don’t have 50-50 block splits and the entries in the IOT don’t move from leaf block to leaf block, then this will not be an issue. Remembering of course that many Primary Key values are based on a sequence which monotonically increases and results in 90-10 block splits rather than 50-50 block splits.  90-10 block splits don’t move data around, Oracle leaves the full blocks alone and simply adds a new block in the IOT Btree structure into which new values are added. Therefore, with IOT data not moving around, the “guess” component of the logical ROWIDS remain valid and don’t go stale over time and so the associated secondary indexes remain nice and efficient.

If we look at the current state of the ALBUM_SALES_IOT_TOTAL_SALES_I secondary index:

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

We notice the PCT_DIRECT_ACCESS is currently nice and efficient at 100%.

If we now add a bunch of new rows into the IOT, but this time with PK values that monotonically increase:

SQL> BEGIN
  2    FOR i IN 5001..10000 LOOP
  3      FOR c IN 201..300 LOOP
  4        INSERT INTO album_sales_iot VALUES(i,c,ceil(dbms_random.value(1,5000000)), 'Yet more new rows');
  5      END LOOP;
  6    END LOOP;
  7    COMMIT;
  8  END;
  9  /

PL/SQL procedure successfully completed.

And collect fresh statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'ALBUM_SALES_IOT', estimate_percent=>null, cascade=>true, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

We notice that the PCT_DIRECT_ACCESS value remains unchanged. So, no 50-50 block split, no PCT_DIRECT_ACCESS degradation with regard the secondary indexes.

OK, another nice feature with IOT Secondary Indexes.

With a “normal” Heap table, if we were to MOVE and reorganise the table, all associated indexes become invalid as the Move results in all the rows being relocated and the indexes are not maintained during this process (as this would add considerably to the overhead in the Move process). All associated indexes have to be rebuilt after the Move operation completes, which is both expensive and adds considerably to the availability issues associated with the whole table reorg process as the table is locked during the Move operation. In short, moving a heap table is an expensive and an availability unfriendly process.

As this little demo illustrates, moving a heap table results in all indexes becoming unusable:

SQL> create table radiohead (id number constraint radiohead_pk primary key, code number, name varchar2(30));

Table created.

SQL> create index code_i on radiohead(code);

Index created.

SQL> insert into radiohead select rownum, rownum, 'OK COMPUTER' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> select index_name, status from dba_indexes where table_name = 'RADIOHEAD';

INDEX_NAME                     STATUS
------------------------------ --------
RADIOHEAD_PK                   VALID
CODE_I                         VALID

SQL> alter table radiohead move;

Table altered.

SQL> select index_name, status from dba_indexes where table_name = 'RADIOHEAD';

INDEX_NAME                     STATUS
------------------------------ --------
RADIOHEAD_PK                   UNUSABLE
CODE_I                         UNUSABLE

However, moving an IOT has a number of advantages over a heap table.

Firstly, as it’s an index structure, it can be reorganised and rebuilt in much the same way as we can rebuild any btree index. Remembering, an index can be rebuilt “online” (on Enterprise Edition), overcoming many of the locking issues associated with moving heap tables.

Additionally, although the physical locations of all the rows in the IOT change following a Move operation, the PK values  themselves don’t change. Therefore, although the PCT_DIRECT_ACCESS value becomes 0, the indexes themselves are still Valid and usable as the PK component can still be used to access the relevant data.

So the syntax to move an IOT table can be expanded to be performed “Online” and all the secondary indexes will remain “Valid”:

SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     STATUS   PCT_DIRECT_ACCESS
------------------------------ -------- -----------------
ALBUM_SALES_IOT_PK             VALID                    0
ALBUM_SALES_IOT_TOTAL_SALES_I  VALID                  100
ALBUM_SALES_IOT_COUNTRY_ID_I   VALID                  100

SQL> alter table album_sales_iot move online;

Table altered.

SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     STATUS   PCT_DIRECT_ACCESS
------------------------------ -------- -----------------
ALBUM_SALES_IOT_PK             VALID                    0
ALBUM_SALES_IOT_TOTAL_SALES_I  VALID                    0
ALBUM_SALES_IOT_COUNTRY_ID_I   VALID                    0

So although the PCT_DIRECT_ACCESS values for the secondary indexes has gone down to 0, making them less efficient as a result, they do at least remain valid and usable by the CBO:

SQL> select * from album_sales_iot where total_sales between 424242 and 424343;

26 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |    33 |   858 |    68   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |    33 |   858 |    68   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |    33 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
   2 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         83  consistent gets
         53  physical reads
          0  redo size
       1655  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         26  rows processed

The secondary index is still used by the CBO, although at 83 consistent gets in this example, it’s not as efficient as it could be.

The rebuild of the secondary index can be performed subsequently to repair the stale guesses and improve the efficiency of the index as desired:

SQL> alter index album_sales_iot_total_sales_i rebuild online;

Index altered.

SQL> select index_name, status, PCT_DIRECT_ACCESS from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     STATUS   PCT_DIRECT_ACCESS
------------------------------ -------- -----------------
ALBUM_SALES_IOT_PK             VALID                    0
ALBUM_SALES_IOT_TOTAL_SALES_I  VALID                  100
ALBUM_SALES_IOT_COUNTRY_ID_I   VALID                    0
SQL> select * from album_sales_iot where total_sales between 424242 and 424343;

26 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |    33 |   858 |    36   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |    33 |   858 |    36   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |    33 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
   2 - access("TOTAL_SALES">=424242 AND "TOTAL_SALES"<=424343)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         31  consistent gets
          0  physical reads
          0  redo size
       1655  bytes sent via SQL*Net to client
        534  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         26  rows processed

Following the rebuild of the secondary index and getting the PCT_DIRECT_ACCESS back to 100%, the example query is now more efficient, with a reduction of consistent gets down from 83 to just 31.

So IOTs can be less problematic to reorganise and if 90-10 block splits are performed, the impact on the secondary indexes is minimised.

IOT Secondary Indexes – The Logical ROWID Guess Component Part I (Lucky) April 26, 2012

Posted by Richard Foote in Index Organized Tables, IOT, Oracle Indexes, Primary Key, ROWID, Secondary Indexes.
8 comments

As discussed previously, an index entry within a Secondary Index on an Index Organized Table (IOT) basically consists of the indexed column(s) and the Logical Rowid, the PK column(s) and a “guess” to the physical block in the IOT containing the corresponding row.

Let’s discuss this “guess” component in a bit more detail.

When the Secondary Index is created, this guess is spot on and will indeed point to the correct block within the IOT structure that contains the row being referenced by the indexed entry.

When I initially created the Secondary Index on the Total_Sales column, all the physical guesses were accurate and indeed pointed to the correct blocks within the IOT structure. This can be confirmed by the following query:

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';
 INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

As we can see, the PCT_DIRECT_ACCESS value is 100, which means that 100% of all the guess components are correct. Therefore, the index behaves in a manner very similar to an ordinary Secondary Index with a rowid, in that all the initial accesses to the IOT are valid and there’s no need to subsequently re-access the IOT via the PK component. From the perspective of finding the required row entries with the IOT structure, the Secondary Index is as efficient as possible when all the guesses are valid.

If we run a little query to access a number of rows via this Secondary Index:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   102 |  1836 |   105   (0)| 00:00:02 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   102 |  1836 |   105   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   102 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        110  consistent gets
          0  physical reads
          0  redo size
       3657  bytes sent via SQL*Net to client
        590  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         99  rows processed

Note we’re returning 99 rows which requires 110 consistent gets. So a touch over 1 consistent get per row being access. Note these numbers, we’ll reference them again later …

OK, we’re now going to add some more rows to the table. This will result in 50-50 block splits occurring which will in turn cause a whole bunch of rows to move to new physical blocks within the IOT.

SQL> BEGIN
  2    FOR i IN 1..5000 LOOP
  3       FOR c IN 101..200 LOOP
  4          INSERT INTO album_sales_iot
  5          VALUES(i,c,ceil(dbms_random.value(1,5000000)), 'Some new rows');
  6       END LOOP;
  7    END LOOP;
  8    COMMIT;
  9  END;
 10  /
 PL/SQL procedure successfully completed.

If we now collect fresh statistics and look at the index statistics again:

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'ALBUM_SALES_IOT', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                 58 NO

We notice that the PCT_DIRECT_ACCESS value has dropped significantly to just 58%. This means that only 58% of the guesses are now accurate and that in the other 42% of cases, Oracle is forced to now re-access the IOT again via the PK component stored in the Secondary Indexes. This results in additional consistent gets now likely being required to access the IOT via the index, resulting in a less efficient index.

If we now re-run the original query again:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

184 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   202 |  4646 |   376   (0)| 00:00:05 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   202 |  4646 |   376   (0)| 00:00:05 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   202 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        463  consistent gets
          0  physical reads
          0  redo size
       7144  bytes sent via SQL*Net to client
        656  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        184  rows processed

We can see that approximately doubled the number of rows are now returned (184 from 99 rows). However, the number of consistent gets has increased by approximately 4 fold (from 110 to 463). The index is now not as efficient in retrieving rows as it was previously, requiring now some 2.5 consistent gets per row being accessed.

To fix these guesses and make the index more efficient again, one can either ALTER the index with the REBUILD or the UPDATE BLOCK REFERENCES clause:

SQL> alter index album_sales_iot_total_sales_i UPDATE BLOCK REFERENCES;

Index altered.

If we now look at some fresh index statistics:

SQL> exec dbms_stats.gather_index_stats(ownname=> null, indname=> 'ALBUM_SALES_IOT_TOTAL_SALES_I', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT index_name, pct_direct_access, iot_redundant_pkey_elim
  2  FROM dba_indexes WHERE index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

INDEX_NAME                     PCT_DIRECT_ACCESS IOT
------------------------------ ----------------- ---
ALBUM_SALES_IOT_TOTAL_SALES_I                100 NO

We notice that the index now has the PCT_DIRECT_ACCESS back at a nice high 100%. If we re-run the same query again:

SQL> SELECT * FROM album_sales_iot
  2  WHERE total_sales BETWEEN 2742000 and 2743000;

184 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |   202 |  4646 |   206   (0)| 00:00:03 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |   202 |  4646 |   206   (0)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |   202 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
   2 - access("TOTAL_SALES">=2742000 AND "TOTAL_SALES"<=2743000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        202  consistent gets
          0  physical reads
          0  redo size
       7144  bytes sent via SQL*Net to client
        656  bytes received via SQL*Net from client
         14  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)

The consistent gets have now reduced substantially from 463 to just 202, back at a touch over 1 consistent get per row retrieved …

So, IOT Secondary Indexes can be as efficient as normal secondary indexes, but if the IOT is subject to 50-50 block splits, they’ll likely need to be maintained more regularly to ensure they stay nice and efficient. Another significant disadvantage associated with IOTs that have Secondary Indexes …

There’s a bit more I’ll like to say on the subject but I’ll leave it for a Part II :)

IOT Secondary Indexes: Primary Key Considerations (Beauty And The Beast) April 11, 2012

Posted by Richard Foote in Block Dumps, Index Organized Tables, IOT, Oracle Indexes, Primary Key, Secondary Indexes.
7 comments

As discussed previously, one of the nice features of an IOT Secondary Index is that it contains the mandatory Primary Key of the IOT, which is always maintained and can be used to access the necessary rows of the IOT regardless of  row movement within the IOT itself.

This can also be beneficial if only the PK columns of the table are required by the query (perhaps as part of a join) as a visit to the actual IOT table would be unnecessary.

However, one of the main disadvantages of an IOT Secondary Index is that it contains the PK of the IOT :)

Remember, one of the benefits of an IOT is that we don’t have to store columns twice as we would with a conventional Heap table, that being once within the table structure and again within the (often overloaded) PK  index. However, with an IOT Secondary index, we must store the PK columns again. In fact, we have to re-store the PK columns again for as many IOT Secondary indexes we have defined for the IOT.

So the overall additional overheads we’re talking about here becomes a product of two important factors. The actual overall size of the PK column(s) and the number of Secondary Indexes we have defined on the IOT. If the average size of the PK is large and/or we have a number of Secondary Indexes, then the overall overheads can be significant, reducing the benefits of the IOT.

If we look at the size of the IOT Secondary Index I created in my previous introductory post:

SQL> select leaf_blocks from dba_indexes where index_name = 'ALBUM_SALES_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1728

If however we compare this with a secondary index associated with a conventional heap table containing identical data:

SQL> create table not_an_iot as select * from album_sales_IOT;

Table created.

SQL> create index not_an_iot_total_sales_i on not_an_iot(total_sales);

Index created.

SQL> select leaf_blocks from dba_indexes where index_name = 'NOT_AN_IOT_TOTAL_SALES_I';

LEAF_BLOCKS
-----------
       1171

We notice that the IOT Secondary index is significantly larger, 1728 leaf blocks vs. 1171.

If we compare block dumps of the same index entry from both Secondary Indexes:

row#0[8016] flag: K—–, lock: 0, len=20
col 0; len 2; (2):  c1 06
col 1; len 3; (3):  c2 15 16
col 2; len 2; (2):  c1 5f
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 f1

Above is the IOT Secondary Index example, which is 20 bytes in length.

row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  01 01 68 7a 00 b4

Above is the Heap Table Secondary Index example, which is only 12 bytes in length.

The 8 bytes required for the table header and physical “guess” overheads within the IOT Secondary Index are almost cancelled out by the 7 bytes of overhead required for the ROWID column within the Heap Table Secondary index. However, most of the difference in length (20 bytes vs. 12 bytes) can be attributed to the 7 bytes required to store the PK columns and their associated length bytes in this example.

Obviously, the larger the PK, the greater the associated overheads. Obviously, the greater the number of IOT Secondary indexes, again the greater the overall associated overheads.

If we create a secondary index on a column that forms part of the PK, Oracle can be a lit bit cleverer. Following, we create an index on the COUNTRY_ID column, which is the second column of our PK (album_id, country_id):

SQL> create index album_sales_iot_country_id_i on album_sales_iot(country_id);

Index created.

We notice that for this new index, Oracle has eliminated “redundant” PK columns from the secondary index, as there’s no need to store the entire PK again as the indexed column itself already forms part of the PK:

SQL> select index_name, iot_redundant_pkey_elim from dba_indexes where table_name = 'ALBUM_SALES_IOT';

INDEX_NAME                     IOT_REDUNDANT_PKEY_ELIM
------------------------------ ------------------------
ALBUM_SALES_IOT_PK             NO
ALBUM_SALES_IOT_TOTAL_SALES_I  NO
ALBUM_SALES_IOT_COUNTRY_ID_I   YES

A quick look at a block dump of this secondary index will confirm that the PK portion of the index entry only contains the PK columns that are not included in the indexed column list (i.e. just the ALBUM_ID column):

row#0[8020] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#1[8004] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 03
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5
row#2[7988] flag: K—–, lock: 0, len=16
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 04
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 38 e5

So we have 3 index entries listed here. In each one:

col 0 represents  the indexed column (COUNTRY_ID) which happens to be part of the PK

col 1 is the remaining PK column yet to be defined in the index entry (ALBUM_ID)

col 0 (with a length of 4) represents the physical “guess”.

So Oracle still has defined within the index entry the full PK to access the IOT as necessary if the “guess” proves to be wrong.

The key points to take from this post is to fully consider the consequences of a large PK on any defined secondary index on an IOT and to fully consider the suitability of having the table defined as an IOT if you require many secondary indexes to be defined on the table.

More on this “guess” component in my next post …

Indexed Organized Tables – An Introduction to IOT Secondary Indexes (A Second Face) March 19, 2012

Posted by Richard Foote in Block Dumps, Index Internals, Index Organized Tables, IOT, Oracle Indexes, Secondary Indexes.
14 comments

Man, its been ages since I had free time to update the blog, what with birthday parties to organise, Roger Water concerts to attend and Radiohead concerts in the planning !! OK, time to take an initial look at Secondary Indexes for Index Organized Tables (IOTs).

If the IOT needs to be accessed via the Primary Key (PK) column(s), then no problem, the IOT structure must have a PK defined and the logical structure of the IOT ensures that data within the IOT is ordered based on the PK. Therefore, the IOT can be navigated like any conventional PK and the necessary data can be efficiently accessed.

But what if we want to access the data efficiently via Non-PK columns or without specify the leading column of the PK ? Can we create secondary indexes on a IOT ?

When IOTs were first introduced way back in Oracle8, secondary indexes weren’t supported (they came later in 8i). That’s likely due to the fact Oracle had to resolve a tricky issue in relation to indexing an IOT structure, that being what to do when indexing rows that potentially move around all the time ?

With a conventional Heap table, once a row is inserted into the table, it doesn’t generally subsequently move. There are relatively few examples of when this occurs, for example updating the partitioned column of a row such that it needs to be stored in another partition. This is recognised as a rather expensive thing to do as not only do at least two blocks need to be accessed and modified but it also requires associated indexes to be updated as well. As such, it generally requires explicitly allowing such activities to occur (by enabling row movement and the such). Note, when rows migrate to another block due to an increase in row size, indexes are not impacted and still reference the original block and the remaining stub of the row which points to the new block/location of the row.

But with IOTs, the story can be very different. When a 50-50 index block split occurs, roughly half the rows in the leaf block move to a new block. A relatively expensive operation would be even more expensive if  Oracle had to also update the index entries of all secondary indexes that referenced all these moved rows. Although rare with Heap tables, rows moving to new locations could be relatively common in an IOT due to associated 50-50 block split operations.

To deal with the difficulties of frequently moving rows within an IOT, Oracle created the IOT Secondary Index structure. It has three main components:

  • The indexed column values
  • The PK columns of the associated IOT
  • A “guess” that points to the physical location of the rows within the IOT, initially at the time the index is created

So the IOT Secondary Index is used in the following fashion. During an index scan, Oracle attempts to use the “guess” to access the block that was the last known physical location of the  row within the IOT. If it finds the required row in the IOT, great. The index performs in a similar manner to using a rowid with a conventional secondary index. However, if the required row is nowhere to be seen within the referenced block, Oracle tries again, this time using the PK value contained with the IOT Secondary Index to perform a Unique Scan of the IOT. This is a little more expensive to perform as it requires navigating down the branch structures of the IOT, but is at least guaranteed to find the row this time in its current location.

So in the best case scenario, the index performs similar to that of a normal secondary index. In the worst case scenario where the row has moved, the index is forced to perform an additional Unique Scan of the IOT using the PK but at least this has the potential to be much more efficient that a Fast Full Scan of the IOT in order to find the necessary row.

The key point to note here is that the secondary index is  not updated when a block split on the parent IOT occurs. The “guess” via the physical pointer reference simply becomes stale and the PK which is also stored within the secondary index is used as a backup method of accessing the required row.

If we start with a traditionally simple little demo, let’s first create and populate an IOT:

SQL> CREATE TABLE album_sales_IOT(album_id number, country_id number, total_sales number, album_colour varchar2(20), CONSTRAINT album_sales_iot_pk PRIMARY KEY(album_id, country_id)) ORGANIZATION INDEX;

Table created.

SQL> begin
  2  for i in 1..5000 loop
  3    for c in 1..100 loop
  4      insert into album_sales_iot values (i, c, ceil(dbms_random.value(1,5000000)), 'GOLD');
  5    end loop;
  6  end loop;
  7  commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=> 'ALBUM_SALES_IOT', cascade=> true, estimate_percent=> null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

If we now run a query based on the non-PK TOTAL_SALES column:

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1789589470

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    |     1 |    18 |   425   (1)| 00:00:06 |
|*  1 |  INDEX FAST FULL SCAN| ALBUM_SALES_IOT_PK |     1 |    18 |   425   (1)| 00:00:06 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1586  consistent gets
          0  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We see that Oracle has no real choice (the PK is of no direct use here) but to perform an expensive FAST FULL INDEX SCAN, even though it correctly knows relatively few rows are to be retrieved.

If we create a secondary index on the IOT however:

SQL> create index album_sales_IOT_total_sales_i on album_sales_iot(total_sales);

Index created.

SQL> select * from album_sales_iot where total_sales = 2000;

  ALBUM_ID COUNTRY_ID TOTAL_SALES ALBUM_COLOUR
---------- ---------- ----------- --------------------
      1764         56        2000 GOLD

 
Execution Plan
----------------------------------------------------------
Plan hash value: 1433198708

---------------------------------------------------------------------------------------------------
| Id  | Operation         | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                               |     1 |    18 |4   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| ALBUM_SALES_IOT_PK            |     1 |    18 |4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN| ALBUM_SALES_IOT_TOTAL_SALES_I |     1 |       |3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("TOTAL_SALES"=2000)
   2 - access("TOTAL_SALES"=2000)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          5  physical reads
          0  redo size
        757  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

 
We notice that the index is used as expected and the number of consistent gets has dropped significantly. Notice also that the IOT is accessed subsequently not via Index ROWIDs but by a INDEX UNIQUE SCAN via the IOT PK. More on this later …

If we look at a partial block dump of an index entry within the IOT Secondary index:

row#0[8015] flag: K—–, lock: 0, len=21
col 0; len 3; (3):  c2 1f 28
col 1; len 3; (3):  c2 15 37
col 2; len 2; (2):  c1 1b
tl: 8 fb: –H-FL– lb: 0x0  cc: 1
col  0: [ 4]  01 01 41 da

col 0 represents the indexed value (TOTAL_SALES)

col 1 and col 2 represent the PK columns (ALBUM_ID and COUNTRY_ID)

Following the 3 byte table header overhead required for the “guess”, we have the second col 0, which represents the 4 byte  “guess” to the last known physical location of the row.

Much more to follow shortly …

Follow

Get every new post delivered to your Inbox.

Join 1,852 other followers