jump to navigation

Indexing A Column With Just One Distinct Value (All The Madmen) August 10, 2011

Posted by Richard Foote in CBO, Oracle Indexes.
11 comments

When one thinks about a column that might benefit from being indexed, one generally considers columns with lots of different values so that the selectivity of the column is such that relatively few rows get selected, making the index appealing to the cost based optimizer.
 
There are of course many exceptions to this generalisation and there are times when an index is the perfect and most efficient method when selecting 100% of all data, even if it involves a normal index range scan reading every row out of the table.
 
There are also times when a column that has very few distinct values should be indexed, even with a standard B-tree index.
 
In this example, I’ll index a column that only has just the 1 single value and the CBO will choose to use the index gladly.
 
Just going to create my standard little table and create an index on the CODE column:
 

 
SQL> create table bowie (id number, code number, name varchar2(50));
 
Table created.
 
SQL> create index bowie_code_i on bowie(code);
 
Index created.

 

 

I’ll now load the table with 1 million rows but all the CODE column will remain unpopulated and consist only of NULLS:
 

 
SQL> insert into bowie select rownum, null, 'Ziggy Stardust and the Spiders From Mars' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.

 

 

I’ll only populate one in every 10,000 rows that a CODE value of 42 (of course):
 

 
SQL> update bowie set code = 42 where mod(id,10000) = 0;
 
100 rows updated.
 
SQL> commit;
 
Commit complete.

 

 

Let’s collect accurate statistics, however note I’m not collecting histograms even though there are relatively few rows that have a CODE value of 42:
 

 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=> true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 

 

Indeed, the column only has the one value (42) which occurs relatively infrequently in my data, most rows however remain as NULL:
 

 
SQL> select code, count(*) from bowie group by code;
 
      CODE   COUNT(*)
---------- ----------
               999900
        42        100

 

 

Note that the index statistics clearly shows it only has the 1 distinct value. Remember, NULLS are not indexed by default in B-Tree indexes and as a result the index is tiny with just the one leaf block:
 

 
SQL> select blevel, leaf_blocks, distinct_keys from dba_indexes where index_name='BOWIE_CODE_I';
 
    BLEVEL LEAF_BLOCKS DISTINCT_KEYS
---------- ----------- -------------
         0           1             1

 

 

Note also that the column statistics clearly highlight there’s just the one distinct value, however it also records that there are many rows (999900) that are NULL:

 
 

 
SQL> select column_name, num_distinct, num_nulls from dba_tab_columns where table_name = 'BOWIE' and column_name = 'CODE';
 
COLUMN_NAME  NUM_DISTINCT  NUM_NULLS
------------ ------------ ----------
CODE                    1     999900

 

 

Therefore, Oracle with accurate statistics and without requiring any histograms has all the information it needs to know when selecting rows that contain the one and only distinct value of our CODE column, that it will only actually be selecting 100 rows out of the 1 million rows in the table.

 
 

 
SQL> select * from bowie where code = 42;
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1602289932
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   100 |  4700 |   101   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |   100 |  4700 |   101   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_CODE_I |   100 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("CODE"=42)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        102  consistent gets
          0  physical reads
          0  redo size
       1423  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 

 

The CBO has the rows estimate/cardinality spot on (100) and has decided that the index is indeed the most efficient access path to select the 100 rows of interest.

With flag and bollean type columns, it might be worth some consideration simply storing the one value when one value is relatively rare, such that the resultant index can take advantage of not having to store any of the corresponding NULL values.
 
Even columns with just the one distinct value can potentially benefit from being indexed …

Bitmap Indexes & Minimize Records_Per_Block (Little Wonder) July 19, 2011

Posted by Richard Foote in Bitmap Indexes, MINIMIZE RECORDS_PER_BLOCK, Oracle Indexes.
Tags:
6 comments

As mentioned in my first post regarding the use of Bitmap Indexes to service Not Equal predicates, an index entry in a Bitmap Index consists of the indexed value and a pair of rowids that determine the range of potential rows spanned by the corresponding bitmap sequence. However, Oracle has no way of determining how many rows might actually be in each specific data block and so must make an assumption that all blocks might hold the maximum number of rows that could potentially fit within a block and assign a bitmap bit accordingly. If a row doesn’t actually exist, then it’s simply a “phantom” row and is assigned a 0 to signify that it doesn’t contain the value of the index entry.
 
This maximum number of possible rows that could potentially fit in a block is called the “Hakan Factor” and is determined at the creation of the table based on the definition of the table (such as number of columns, type of columns, whether they’re nullable, etc.) and of course the block size. The smaller the possible size of the row, the more rows that could fit in a block and the more bits that need to be assigned by the Bitmap Index to cover all possible rowids within the rowid range within a Bitmap Index entry. As an example within an 8K block, taking into consideration block overheads, the maximum number of rows within a block that Oracle can potentially estimate can be as many as 736 rows.
 
These additional 0s that get assigned to cater for rows that might not actually exist, although compressed to some degree, still takes up some space within the index. This additional space can be very significant if:
 
- The difference between the minimum possible size of a row and the actual average size of a row is large (or to put it another way, if the difference between the estimated number of rows per blocks and the actual number of rows per block is large)
- The effective clustering of the indexed data is poor within the table as this will limit the effective compression of the additional 0 bits
 
To highlight how all this can make a significant difference to the size of a Bitmap Index, a simple demo as usual to illustrate.
 
First, I’m going to create a table that has a number of nullable VARCHAR2(100) fields, so they might contain up to 100 characters or perhaps no value at all. The potential size of a row might be tiny or it might be quite large.

 
SQL> create table muse (id number, code number, name1 varchar2(100), name2 varchar2(100), name3 varchar2(100), name4 varchar2(100), name5 varchar2(100), name6 varchar2(100), name7 varchar2(100), name8 varchar2(100), name9 varchar2(100), name10 varchar2(100));
 
Table created.

 
OK, time to populate the table. A couple of key points with the data I’m going to use.
 
Firstly, the CODE column is going to have 100 distinct values but these values will be evenly distributed throughout the entire table. So the clustering associated with this column will be terrible, as bad as it gets.
 
Secondly, although all the VARCHAR2(100) columns might not contain much data (or indeed any at all), in actual fact they’re going to be almost fully populated with data. So although the potential average size of a row could have been quite tiny, in actual fact all the rows are quite large. Although we could potentially have fitted many rows within our (8K) block, in actual fact we’re only going to be able to fit just 7 rows per block. There isn’t actually a single block within our table that contains more than 7 rows.

 
SQL> insert into muse select rownum, mod(rownum,100), 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia','Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'MUSE', estimate_percent=>null, cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select table_name, num_rows, blocks, avg_row_len from dba_tables where table_name='MUSE';
 
TABLE_NAME   NUM_ROWS     BLOCKS AVG_ROW_LEN
---------- ---------- ---------- -----------
MUSE          1000000     145549         998

 
 
Let’s now create a Bitmap Index on the CODE column. I’ll set the PCTFREE to 0 to build the smallest possible index structure:

 
SQL> create bitmap index muse_code_i on muse(code) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, num_rows from dba_indexes where index_name = 'MUSE_CODE_I';
 
INDEX_NAME  LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY   NUM_ROWS
----------- ----------- ----------------------- ----------
MUSE_CODE_I         400                       4        800

 
 
So the Bitmap Index currently consists of 400 leaf blocks.
 
As we now know this table has rows that on average are considerably larger than the minimum possible row size, the Bitmap Index has had to cater for the possible existence of many rows that don’t actually exist. Additionally, as the clustering of the indexed data is very poor, the Bitmap Index will not be able to effectively compress these additional 0 bits as much as it might, as there are bits set to 1 littered all over the place that will hamper the effective compression capabilities of the index (I’ve discuss the impact of the Clustering Factor on the effectiveness of Bitmap Index compression previously).
 
Therefore, it might well be beneficial to more accurately determine the number of rows that really exist within a block. We can change the Hakan Factor by altering the table with the MINIMIZE RECORDS_PER_BLOCK clause. Effectively this results in Oracle performing a full table scan, checking for the number of rows per block (a quick check of the nrow count in the block header suffices) and keeping track of the block that currently contains the most number of rows. The highest value of the nrow count within the table becomes the new Hakan Factor.
 
Let’s give it a go:

 
SQL> alter table muse minimize records_per_block;
alter table muse minimize records_per_block
*
ERROR at line 1:
ORA-28602: statement not permitted on tables containing bitmap indexes

 
Unfortunately, this statement is not permitted if there are already any Bitmap indexes assigned to the table as they have already been based on the current Hakan Factor. All current Bitmap Indexes assigned to the table must first be dropped.

 
SQL> drop index muse_code_i;
 
Index dropped.
 
SQL> alter table muse minimize records_per_block;
 
Table altered.

 
 
OK, so now Oracle has a much more accurate picture of the actual number of rows that exist within a block in this table. The new Hakan Factor is based on the maximum number of rows that actually currently exist within a block in the table (just 7 in this specific example), which is significantly less than was defined previously. Oracle ensures the integrity of the new Hakan Factor from here on in by now limiting the number of rows that can be inserted into blocks within the table to this new value, even if in the future additional rows could potentially have fitted within a block. Once the Hakan Factor is reached, the block is taken off the freelist or marked as full in an ASSM segment.
 
Now any Bitmap Index on this table only has to cater for a relatively small number of rows per block, vastly reducing the number of bits that need to be considered and stored.
 
This can significantly reduce the overall size of associated bitmap indexes:
 

 
SQL> create bitmap index muse_code_i on muse(code) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, num_rows from dba_indexes where index_name = 'MUSE_CODE_I';
 
INDEX_NAME  LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY   NUM_ROWS
----------- ----------- ----------------------- ----------
MUSE_CODE_I         150                       1        300

The new Bitmap Index is now only 150 leaf blocks in size, substantially smaller than the previous 400 leaf blocks.

Bitmap Indexes and Not Equal Part II (Sheep) July 7, 2011

Posted by Richard Foote in Bitmap Indexes, NOT Equal, Oracle Indexes.
11 comments

An excellent comment/question by mdinh made me realise my demos in Part I might be a little extreme in returning 0 rows and perhaps give the false impression that Not Equal conditions are only considered or applicable if no rows are returned. This is not the case and with the bitmap index now considered with Not Equal conditions, the choice of whether or not to actually use the index as usual comes down to the comparative costs associated with the available plans. 

So, I’ll expand on my demo a tab by introducing a new value for the FLAG column: 

SQL> update radiohead
  2  set flag = 1
  3  where rownum < 101;

 100 rows updated.

 SQL> commit;

 Commit complete.

 SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD',cascade=>true, estimate_percent=>null, method_opt=> 'FOR COLUMNS FLAG SIZE 5');

 PL/SQL procedure successfully completed.

 

 
OK, so now we have some 100 rows which have a value of FLAG which are not equal to 42, which are evenly distributed among all 5 CODE values. I’ve created a histogram however on the FLAG column as the 2 values (1 and 42) are not evenly distributed.

Let’s run the query now:

SQL> select * from radiohead where code = 1 and flag <> 42;
20 rows selected.

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2786215024

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    20 |   300 |     46  (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    20 |   300 |     46  (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                  |       |       |            |          |
|   3 |    BITMAP MINUS              |                  |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
|*  5 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_FLAG_I |       |       |            |          |
-------------------------------------------------------------------------------------------------

 
Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CODE"=1)
   5 - access("FLAG"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         47  consistent gets
          0  physical reads
          0  redo size
        775  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

 

 
We notice a few key points. Firstly, as we have a histogram on the FLAG column and the data is perfectly evenly distributed among the CODE values, the CBO has got the estimated cardinality of 20 rows spot on. So all things being equal, we can have some confidence the CBO has done the right thing and selected the most efficient execution plan.

We also notice that the cost has now gone up considerably to 46 (up from 3) but it’s still significantly less than the cost of 761 associated with a Full Table Scan. Therefore, the CBO has still chosen the same execution plan with the two bitmap indexes returning the 20 rows, as it did when it returned none in the previous example.

 
In answer to another comment/question by SJ12345, regarding the use of unbounded predicates, if we now try the following:

SQL> select * from radiohead where code = 1 and flag > 42;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 2939001425

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    10 |   150 |      6  (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    10 |   150 |      6  (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                  |       |       |            |          |
|   3 |    BITMAP AND                |                  |       |       |            |          |
|   4 |     BITMAP MERGE             |                  |       |       |            |          |
|*  5 |      BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("FLAG">42)
       filter("FLAG">42)
   6 - access("CODE"=1)

 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Yep, cheap with Bitmap Indexes says the CBO. Note the difference here though is that the CBO uses a BITMAP MERGE to first get all possible rowid values of FLAG that are > 42 and then uses a BITMAP AND operation in combination with the CODE Bitmap index to get all rowids that match from both Bitmap indexes. However, as it evaluates the Bitmap Index on the FLAG index first and there are no index entries with a value > 42, it doesn’t have to actually worry about the CODE condition as no rows can possibly be returned. Therefore a very tiny 2 consistent gets are all that are necessary. 

The following will looking for anything < than 42, remembering we now have 20 rows that meet this condition:

SQL> select * from radiohead where code = 1 and flag < 42;

20 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2939001425

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    20 |   300 |      8  (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    20 |   300 |      8  (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                  |       |       |            |          |
|   3 |    BITMAP AND                |                  |       |       |            |          |
|   4 |     BITMAP MERGE             |                  |       |       |            |          |
|*  5 |      BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("FLAG"<42)
       filter("FLAG"<42)
   6 - access("CODE"=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        775  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

 
Yep, also cheap with Bitmap Indexes, using the same plan as the previous > than example but using more consistent gets as there are a number of rows that need to be accessed this time (although all in the same data block).

To now complete the picture:

SQL> select * from radiohead where code = 1 and (flag < 42 or flag > 42);

20 rows selected.

 
Execution Plan
----------------------------------------------------------
Plan hash value: 3720408756

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |    30 |   450 |   14   (0)| 00:00:01 |
|   1 |  CONCATENATION                |                  |       |       |           |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    10 |   150 |    6   (0)| 00:00:01 |
|   3 |    BITMAP CONVERSION TO ROWIDS|                  |       |       |           |          |
|   4 |     BITMAP AND                |                  |       |       |           |          |
|   5 |      BITMAP MERGE             |                  |       |       |           |          |
|*  6 |       BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |           |          |
|*  7 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |           |          |
|   8 |   TABLE ACCESS BY INDEX ROWID | RADIOHEAD        |    20 |   300 |    8   (0)| 00:00:01 |
|   9 |    BITMAP CONVERSION TO ROWIDS|                  |       |       |           |          |
|  10 |     BITMAP AND                |                  |       |       |           |          |
|  11 |      BITMAP MERGE             |                  |       |       |           |          |
|* 12 |       BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I |       |       |           |          |
|* 13 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |           |          |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("FLAG">42)
       filter("FLAG">42)
   7 - access("CODE"=1)
  12 - access("FLAG"<42)
       filter(LNNVL("FLAG">42) AND "FLAG"<42)
  13 - access("CODE"=1)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        775  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

We now get a combination of both previous plans, concatenated together. Note in this case, it’s actually a cheaper and more efficient alternative to the first Not Equal example. Having got the setup for this demo, you can of course create the same demo yourselves and have a play and experiment. Me, I’m now off to watch Cadel Evans win the Tour De France :)

Bitmap Indexes and Not Equal (Holy Holy) July 5, 2011

Posted by Richard Foote in Bitmap Indexes, NOT Equal, Oracle Indexes.
14 comments

Way back, I previously discussed how the CBO will simply ignore any possible indexes when determining the best execution plan involving a NOT EQUAL(<>) condition, even if an index might in theory provide the most efficient access path. Oracle just assumes that the vast majority of rows are likely to be returned and so doesn’t even bother to cost and consider any potential indexes. The previous discussion was aimed specifically at B-Tree indexes, but as a comment at the time by Christian Antognini highlighted, things are a little different for Bitmap indexes. Thought it might be worth exploring this point a little further.

To start and to recap, I’ll begin by creating a simple little table, populated with 1,000,000 rows. It has 2 columns of interest for now, one called CODE which has just 5 distinct values and another called FLAG that only has the 1 distinct value (a value of ’42′ naturally):

SQL> create table radiohead (code number not null, type number not null, flag number not null, name varchar2(30));

 Table created.

 SQL> insert into radiohead select mod(rownum,5)+1, mod(rownum,20)+1, 42, 'ZIGGY' from dual connect by level <= 1000000;

 1000000 rows created.

 SQL> commit;

 Commit complete. 

 I’ll begin by creating standard B-Tree indexes on these columns:

SQL> create index radiohead_code_i on radiohead(code);

 Index created.

 SQL> create index radiohead_type_i on radiohead(type);

 Index created.

 SQL> create index radiohead_flag_i on radiohead(flag);

 Index created.

 SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

 PL/SQL procedure successfully completed. 

If we run a query that returns all rows that don’t have a FLAG value of 42 (of which there are none):

SQL> select * from radiohead where flag <> 42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"<>42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 
Note that although the CBO has estimated it will likely only return just the 1 row, it has opted to go for a Full Table Scan. A 10053 trace would show that the index on the FLAG column wasn’t even considered by the CBO. The use of the Not Equal (<>) condition has totally negated the use of the available index.

If we look at a query now on the CODE column:

SQL> select * from radiohead where code = 1;

200000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   200K|  2929K|   761   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |   200K|  2929K|   761   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=1)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2809  consistent gets
          0  physical reads
          0  redo size
    1602034  bytes sent via SQL*Net to client
        824  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     200000  rows processed

 

As there are only 5 evenly distributed values, the CBO has got the cardinality estimate spot on and has decided that visiting the table 200,000 times via the index is just too expensive and that the Full Table Scan is the more efficient method. Fair enough.

If we now run a query that looks for all values of a specific CODE but only if the FLAG is not 42 (which again is going to return 0 rows):


SQL> alter session set events '10053 trace name context forever';

Session altered.

SQL> select * from radiohead where code = 1 and flag <> 42;

no rows selected

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"<>42 AND "CODE"=1)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 
Again, the Full Table Scan is the way to go says the CBO. The index on the FLAG column is not considered and the index on the CODE column is just too expensive. A 10053 trace confirms this:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: RADIOHEAD  Alias: RADIOHEAD
    #Rows: 1000000  #Blks:  2764  AvgRowLen:  15.00
Index Stats::
  Index: RADIOHEAD_CODE_I  Col#: 1
    LVLS: 2  #LB: 1950  #DK: 5  LB/K: 390.00  DB/K: 2755.00  CLUF: 13775.00
  Index: RADIOHEAD_FLAG_I  Col#: 3
    LVLS: 2  #LB: 1950  #DK: 1  LB/K: 1950.00  DB/K: 2755.00  CLUF: 2755.00
  Index: RADIOHEAD_TYPE_I  Col#: 2
    LVLS: 2  #LB: 1950  #DK: 20  LB/K: 97.00  DB/K: 2755.00  CLUF: 55100.00
Access path analysis for RADIOHEAD
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for RADIOHEAD[RADIOHEAD]

  Table: RADIOHEAD  Alias: RADIOHEAD
    Card: Original: 1000000.000000  Rounded: 1  Computed: 0.20  Non Adjusted: 0.20
  Access Path: TableScan
    Cost:  762.05  Resp: 762.05  Degree: 0
      Cost_io: 750.00  Cost_cpu: 259683730
      Resp_io: 750.00  Resp_cpu: 259683730
  Access Path: index (AllEqRange)
    Index: RADIOHEAD_CODE_I
    resc_io: 3147.00  resc_cpu: 114411172
    ix_sel: 0.200000  ix_sel_with_filters: 0.200000 
    Cost: 3152.31  Resp: 3152.31  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 762.05  Degree: 1  Resp: 762.05  Card: 0.20  Bytes: 0

***************************************

Note that the index on the FLAG column is not even mentioned within the possible execution plans and the index on the CODE column has a cost of 3152.31 which is way more than the Full Table Scan cost of 762. So the Full Table Scan is selected, even though no rows are returned and the CBO estimates that just 1 row is likely to be returned. OK, let’s now drop the B-Tree indexes and replace them with Bitmap indexes:


SQL> drop index radiohead_code_i;

Index dropped.

SQL> drop index radiohead_type_i;

Index dropped.

SQL> drop index radiohead_flag_i;

Index dropped.

SQL> create bitmap index radiohead_code_i on radiohead(code);

Index created.

SQL> create bitmap index radiohead_type_i on radiohead(type);

Index created.

SQL> create bitmap index radiohead_flag_i on radiohead(flag);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 
If we now run the same query again on the FLAG column:

SQL> select * from radiohead where flag <> 42;

 no rows selected

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

 ------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

 Predicate Information (identified by operation id):
---------------------------------------------------

    1 - filter("FLAG"<>42)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 

We notice that again the index is not chosen, even though the bitmap index stores references to those rowids where this condition is not true (a bitmap value of 0) and even though the CBO estimates only the 1 row is likely to be returned. To see why this is the case, let’s look at a partial bitmap index entry via a block dump of the bitmap index:

 
Block header dump:  0x01c01d1c
 Object id on Block? Y
 seg/obj: 0x13e38  csc: 0×00.1234e2a  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1c01d18 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0xffff.000.00000000  0×00000000.0000.00  C—    0  scn 0×0000.01234e2a
Leaf block dump
===============
header address 214311524=0xcc62264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0×28
kdxcofeo 960=0x3c0
kdxcoavs 920
kdxlespl 0
kdxlende 0
kdxlenxt 29367581=0x1c01d1d
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——, lock: 0, len=3537
col 0; len 2; (2):  c1 2b
col 1; len 6; (6):  01 40 2c 82 00 00
col 2; len 6; (6):  01 40 2c c4 00 7f
col 3; len 3516; (3516):
 cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff
 ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff ff ff ff
 ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff
 ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff
 ff ff ff ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf
 …

We notice the bitmap index entry consists of the indexed value (c1 2b), a start rowid (01 40 2c 82 00 00), an end rowid (01 40 2c c4 00 7f) and a bitmap string for which a bit corresponds to every row within the rowid range, set to either 1 (for true) or 0 (for false). The 0s are compressed and represented by a value based on the actual number of compressed bits.

However, if the bitmap entry only has a start and end rowid range, how does it actually know the location of all the corresponding rows, as there could be differing number of rows for any of the given data blocks. How does it know just how many rows actually exist within the rowid range ?

The answer is that it can’t possibly know. Therefore, Oracle makes a very important assumption and based on the definition of the table, determines the maximum number of rows that could potentially fit within a block and assigns a bit for every possible rowid that could in theory exist within the specified rowid range (I’ll expand on this point in my next post).

If the rowid actually corresponds to an existing row, then the bit is set accordingly depending on the value of the indexed column for that row. If the rowid doesn’t exist (or doesn’t exist yet), then the corresponding bit is simply set to a 0. If there are a whole bunch of consecutive 0s for rows that don’t exist, they get compressed and the overheads are minimised.

However, the value of a bit set to 0 can therefore potentially mean one of two things. It could mean that the row exists but doesn’t have the value represented by the index entry or it could mean that the row simply doesn’t exist at all. There is no way for Oracle to tell the difference between these two scenarios.

If one is after rows for which the column has a specific value, then no problem, all the bits with a value of 1 must correspond to rows that really do exist and have the column value of interest. However, if one is after all rows for which the column value is not the one represented by a bitmap index entry (as in a <> condition), then referencing all the bits that have a 0 won’t be sufficient as they could potentially point at rows that don’t actually exist and accessing a table looking up rows that don’t exist will open up a can of worms.

Therefore, just like a B-Tree index, the CBO will not consider a Bitmap index for a query that exclusively contains a not equal or not in condition.

If we now look at the second query based on the CODE column:

SQL> select * from radiohead where code = 1;

200000 rows selected.

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   200K|  2929K|   761   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |   200K|  2929K|   761   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=1)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2809  consistent gets
          0  physical reads
          0  redo size
    1602034  bytes sent via SQL*Net to client
        824  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     200000  rows processed

 

We notice that the CBO again chooses the Full Table Scan as again, the query is returning 20% of all rows and deems it too expensive to visit the table 200,000 times to retrieve the data via the index, even if the Bitmap index structure itself is relatively small and efficient. So again, no difference to the B-Tree index example.

However, if we run the third query based on both the CODE column and the <> condition on the FLAG column:

SQL> select * from radiohead where code = 1 and flag <> 42;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1712231689

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |     1 |    15 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | RADIOHEAD        |     1 |    15 |     3   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS |                  |       |       |            |          |
|   3 |    BITMAP MINUS               |                  |       |       |            |          |
|   4 |     BITMAP MINUS              |                  |       |       |            |          |
|*  5 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
|*  6 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_FLAG_I |       |       |            |          |
|*  7 |     BITMAP INDEX SINGLE VALUE | RADIOHEAD_FLAG_I |       |       |            |          |
--------------------------------------------------------------------------------------------------

 
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("CODE"=1)
   6 - access("FLAG" IS NULL)
   7 - access("FLAG"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         47  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Now we see a distinct difference from the B-Tree index example as both Bitmap Indexes have been used rather than a Full Table Scan.

In conjunction with another index that returns rowids of interest that obviously exist, a bitmap index can be used successfully to determine a Not Equal condition. By logically subtracting all the matching rowids of one bitmap index (that contains rowids than aren’t of interest) from the other bitmap index (which contains rowids that are of interest), a list of actual rowids of interest can be determined to access the table. Note this can also potentially be performed by looking up the 0 bits, as corresponding rows do not have the indexed value and any matching rowids can be proven to exist by their appearance within the other Bitmap index.

As most of this processing only involves simple bit comparisons via accesses to relatively small, efficient Bitmap index structures, the relative overheads can be significantly reduced from that of the Full Table Scan (eg. in this example, consistent gets reduced from 2770 to just 47).

So a Not Equal/Not In can be serviced via a Bitmap Index, providing another index is also accessed that returns rowids of interest.

DEL_LF_ROWS Index Rebuild Criteria ? (Codex) May 22, 2011

Posted by Richard Foote in DEL_LF_ROWS, Indexing Myth, Oracle Indexes.
19 comments

In the same article and book where the erroneous claims regarding the BLKS_GETS_PER_ACCESS were made, we also find the following recommendation:
 
Another rebuild condition would be cases where deleted leaf nodes comprise more than 20% of the index nodes“.
 
This is a very common claim however no matter how often it might be stated, it’s fundamentally flawed. Note My Oracle Support previously made these claims as well, such as with Metalink note 122008.1, although this has now been totally revised with the recommendation withdrawn.
 
It’s a flawed index rebuild criteria for two very good reasons.

1) It assumes this deleted space is somehow “wasted” or “deadwood” and needs to be removed. However, in the majority of scenarios, it’s nothing more than free space which will be subsequently reused by later inserts into the index. Therefore, basing a rebuild criteria on the percentage of deleted space will likely include indexes that will not benefit from a rebuild.

2) It assumes the DEL_LF_ROWS index statistic somehow accounts for all the “deleted” space within an index. However this statistic can significantly under-estimate the potential “wasted” space within an index. Therefore basing a rebuild criteria on the percentage of  “currently marked” deleted space can possibly miss indexes that might actually benefit from a rebuild (or coalesce).
 
Any criteria which selects indexes that don’t need to be rebuilt but misses out on those that do, is as I say, fundamentally flawed. Basing a rebuild criteria on the percentage of currently marked deleted index entries suggests a lack of understanding of both how Oracle indexes work and the actual meaning behind DEL_LF_BLKS.
 
This of course has been discussed many times before, but a quick demo to illustrate. First, I’m going to create a table with 10 rows with a unique ID column.

SQL> create table bowie (id number, code number, name varchar2(30));
 
Table created.
 
SQL> insert into bowie select rownum, rownum, 'ZIGGY STARDUST' from dual connect by level <=10;
 
10 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.

 

If we now delete four of these rows:


SQL> delete bowie where id in (2,4,6,8);
 
4 rows deleted.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index bowie_id_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_delete from index_stats;
 
LF_ROWS DEL_LF_ROWS PCT_DELETE
------- ----------- ----------
     10           4         40

 
Indeed, we currently have 4 index entries marked as deleted. If we look at a dump of this index block:
 

Block header dump:  0x01c0521b
 Object id on Block? Y
 seg/obj: 0×13039  csc: 0×00.e30e53  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1c05218 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0×0005.017.00004219  0x00c002b4.0a36.07  –U-    4  fsc 0×0038.00e30e79
Leaf block dump
===============
header address 220340836=0xd222264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 10
kdxcofbo 56=0×38
kdxcofeo 7916=0x1eec
kdxcoavs 7860
kdxlespl 0
kdxlende 4
kdxlenxt 0=0×0
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c0 52 13 00 00
row#1[8012] flag: —D–, lock: 2, len=12
col 0; len 2; (2):  c1 03
col 1; len 6; (6):  01 c0 52 13 00 01
row#2[8000] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  01 c0 52 13 00 02
row#3[7988] flag: —D–, lock: 2, len=12
col 0; len 2; (2):  c1 05
col 1; len 6; (6):  01 c0 52 13 00 03
row#4[7976] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  01 c0 52 13 00 04
row#5[7964] flag: —D–, lock: 2, len=12
col 0; len 2; (2):  c1 07
col 1; len 6; (6):  01 c0 52 13 00 05
row#6[7952] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 08
col 1; len 6; (6):  01 c0 52 13 00 06
row#7[7940] flag: —D–, lock: 2, len=12
col 0; len 2; (2):  c1 09
col 1; len 6; (6):  01 c0 52 13 00 07
row#8[7928] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 0a
col 1; len 6; (6):  01 c0 52 13 00 08
row#9[7916] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 0b
col 1; len 6; (6):  01 c0 52 13 00 09
—– end of leaf block dump —–
 

We note there are currently 4 rows with the delete flag set (eg. row#1[8012] flag: —D–). In order to make the transaction more efficient, Oracle doesn’t physically remove the deleted index entries but marks them as logically deleted. However, any subsequent DML within the index block will result in all the deleted entries being physically removed from the leaf block.

For example, here we insert a new index entry with a value of 42. Note this value didn’t exist previously and is outside all the current values within the index leaf block:

  
SQL> insert into bowie values (42, 1, 'MAJOR TOM');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index bowie_id_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_delete from index_stats;
 
   LF_ROWS DEL_LF_ROWS PCT_DELETE
---------- ----------- ----------
         7           0          0

 

Note that this has resulted in the removal of the previously logically deleted index entries. There are currently no deleted index entries within the leaf block. A block dump now clearly shows the deleted index entries are now all gone:

 
 
Block header dump:  0x01c0521b
 Object id on Block? Y
 seg/obj: 0×13039  csc: 0×00.e31029  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1c05218 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0×0005.012.00004213  0x00c002b4.0a36.31  –U-    1  fsc 0×0000.00e3102b
Leaf block dump
===============
header address 220340836=0xd222264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 7
kdxcofbo 50=0×32
kdxcofeo 7904=0x1ee0
kdxcoavs 7902
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c0 52 13 00 00
row#1[8000] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 04
col 1; len 6; (6):  01 c0 52 13 00 02
row#2[7976] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 06
col 1; len 6; (6):  01 c0 52 13 00 04
row#3[7952] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 08
col 1; len 6; (6):  01 c0 52 13 00 06
row#4[7928] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 0a
col 1; len 6; (6):  01 c0 52 13 00 08
row#5[7916] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 0b
col 1; len 6; (6):  01 c0 52 13 00 09
row#6[7904] flag: ——, lock: 2, len=12
col 0; len 2; (2):  c1 2b
col 1; len 6; (6):  01 c0 52 15 00 00
—– end of leaf block dump —–
 

So deleted space in most scenarios is nothing more than free space that can be cleaned out and reused by subsequent DML operations.
 
Lets look at another example, this time on a larger table with 1M rows with two indexes. The first index is effectively unique while the second index has 100 evenly distributed values (and so 10000 occurrences of each value):
 

SQL> create table radiohead (id number, code number, name varchar2(30));
 
Table created.
 
SQL> insert into radiohead select rownum, mod(rownum, 100), 'OK COMPUTER' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index radiohead_code_i on radiohead(code);
 
Index created.
 
SQL> create index radiohead_id_i on radiohead(id);
 
Index created.

 

If we now delete 40% of the table, those with an ID up to 400000, this has 2 different effects on our indexes. On the ID index, it will result in the first 40% of leaf blocks containing nothing but deleted index entries. On the CODE index, it will result in all the leaf blocks containing approximately 40% of deleted index entries. Note this delete is being performed within a single very large logical transaction, not via lots of small discrete transactions, so that none the deleted entries are cleaned out. Note this also requires a sufficiently large buffer cache to cache all this data to prevent read operations (including the validate structure command itself) from subsequently cleaning out the deleted index entries as well via subsequent block cleanout operations.
 

SQL> delete radiohead where id between 1 and 400000;
 
400000 rows deleted.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index radiohead_code_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_deleted from index_stats;
 
 LF_ROWS DEL_LF_ROWS PCT_DELETED
-------- ----------- -----------
 1000000      400000          40
 
SQL> analyze index radiohead_id_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_deleted from index_stats;
 
 LF_ROWS DEL_LF_ROWS PCT_DELETED
-------- ----------- -----------
 1000000      400000          40

 

So both have a “large” percentage of deleted index entries, although the distribution of such deleted entries differs between the two indexes. A rebuild criteria simply based on the percentage of deleted entries would suggest both indexes need a rebuild. However, if we were to re-insert a similar amount of data again (note, even with a new range of values for the ID column):
 

SQL> insert into radiohead select rownum+1000000, mod(rownum,100), 'THE KING OF LIMBS' from dual connect by level <= 400000;
 
400000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index radiohead_code_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_deleted from index_stats;
 
 LF_ROWS DEL_LF_ROWS PCT_DELETED
-------- ----------- -----------
 1000002           2       .0002

SQL> analyze index radiohead_id_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_deleted from index_stats;
 
 LF_ROWS DEL_LF_ROWS PCT_DELETED
-------- ----------- -----------
 1007633        7633  .757517866
 

We notice that the vast majority of the deleted entries have automatically been cleaned out, without an index rebuild in sight. In the case of the ID index with 40% of index leaf blocks containing nothing but deleted index entries, even though a different range of values were inserted, the effectively “empty” leaf blocks were recycled within the index, removing all the associated deleted index entries in the process.
 
In the case of the CODE index with all the leaf blocks containing approximately 40% of deleted index entries, subsequent inserts into these leaf blocks cleaned out all the deleted index entries and reused the subsequently freed space.
  
Once you begin to understand how Oracle reuses deleted index space and how the free space related to previously deleted entries may not actually be reflected in the DEL_LF_ROWS statistic anyways, one begins to understand why basing an index rebuild criteria on the so-called ratio of deleted index entries is so flawed.

In fact, if you want to avoid some nonsensical  index rebuild criteria based on DEL_LF_ROWS, all you need to do is simply delete yet more rows, as so wonderfully here explained by Jonathan Lewis.

To illustrate, we create a table and index and populate it with 100000 rows:

SQL> create table pink_floyd (id number, code number, name varchar2(30));
 
Table created.
 
SQL> insert into pink_floyd select rownum, mod(rownum, 100), 'WISH YOU WERE HERE' from dual connect by level <= 100000;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index pink_floyd_id_i on pink_floyd(id);
 
Index created.
 
SQL> analyze index pink_floyd_id_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_deleted from index_stats;
 
 LF_ROWS DEL_LF_ROWS PCT_DELETED
-------- ----------- -----------
  100000           0           0

 

So far, so good. Let’s now deleted 40% of the rows throughout the table:


SQL> delete pink_floyd where mod(id,100) between 20 and 29 or mod(id, 100) between 40 and 49 or mod(id, 100) between 60 and 69 or mod(id, 100) between 80 and 89;
 
40000 rows deleted.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index pink_floyd_id_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_deleted from index_stats;
 
 LF_ROWS DEL_LF_ROWS PCT_DELETED
-------- ----------- -----------
  100000       40000          40
 

Oh dear, we have an index with 40% of the rows deleted. However, rather than rebuild the index, let’s simply delete a few more rows …


SQL> delete pink_floyd where mod(id, 100) = 1;
 
1000 rows deleted.
 
SQL> commit;
 
Commit complete.
 
SQL> analyze index pink_floyd_id_i validate structure;
 
Index analyzed.
 
SQL> select lf_rows, del_lf_rows, del_lf_rows/lf_rows*100 pct_deleted from index_stats;
 
 LF_ROWS DEL_LF_ROWS PCT_DELETED
-------- ----------- -----------
   60000        1000  1.66666667

What do you know, now the ratio of deleted rows is just 1.67%, no need to rebuild it now !!

If you have a rebuild criteria which selects indexes that don’t actually need to be rebuilt and it misses out on those that potentially could benefit from a rebuild, while at the same time makes indexes look less in need of a rebuild the more you actually potentially fragment the index with further deletions, well as I said at the start, such a rebuild criteria is fundamentally flawed …

BLKS_GETS_PER_ACCESS Index Rebuild Criteria ? (Twisted Logic) April 20, 2011

Posted by Richard Foote in Oracle Indexes, Validate Structure, Indexing Myth.
17 comments

A recent question on the database OTN forum and a previous request by Charles Hooper that I cover some basic indexing concepts for newbie’s who might be confused by “dubious” information out there in internet land has prompted me to discuss the BLKS_GETS_PER_ACCESS metric, available in INDEX_STATS after an analyze validate structure operation.
 
The OTN thread had someone questioning why after freshly rebuilding an index, the index still had a particularly “high” BLKS_GETS_PER_ACCESS value of 110 and wouldn’t reduce down below a value of 5. They requested if someone could please explain why the BLKS_GETS_PER_ACCESS was not getting reduced, as they wanted it to be below 5.
 
The two obvious questions I had in return were why of earth would anyone want the BLKS_GETS_PER_ACCESS to be below 5 and why on earth would they think rebuilding the index would reduce the BLKS_GETS_PER_ACCESS from a value of 110 down to 5.
 
However a quick Google search confirmed my suspicions, subsequently confirmed by the OP. This “Identifying Which Indexes to Rebuild” article by Don Burleson states:
 
Gets per index access
 
The number of “gets” per access refers to the amount of logical I/O that is required to fetch a row with the index. As you may know, a logical “get” is not necessarily a physical I/O since much of the index may reside in the Oracle buffer cache. However, any SAP index with a number greater than 10 would probably benefit from an index rebuild.”
 
and
 
We might want to rebuild an index if the “block gets” per access is greater than five, since excessive “blocks gets” indicate a fragmented b-tree structure.”
 
The same claims are made on page 727 in “Oracle Tuning: The Definitive Reference” .
 
The problem with all this of course is that it’s a complete nonsense. The very idea of having a rebuild criteria based on BLKS_GETS_PER_ACCESS being greater than value “X” and that a subsequent index rebuild will reduce BLKS_GETS_PER_ACCESS down to less than value “X” shows a complete lack of understanding of what BLKS_GETS_PER_ACCESS actually represents.
 
BLKS_GETS_PER_ACCESS is basically the number of blocks required to get a randomly chosen row of interest via an index.
 
The Oracle Reference Manual describes BLKS_GETS_PER_ACCESS somewhat ambiguously as:

Expected number of consistent mode block reads per row, assuming that a randomly chosen row is accessed using the index. Used to calculate the number of consistent reads that will occur during an index scan.”
 
The key point here is that it’s a “randomly” chosen row when accessed via the specific index.
 
Now in the case of a Unique index, the number of blocks needed to access a random row can easily be determined as simply being the height of the index plus one additional block to access the associated table block. If the index is Unique, there can only be one row per index value which requires precisely one visit to the table.
 
In the following example, we create a table with 1M rows with two indexes. The index on the ID column is effectively unique as there are 1M distinct values while the index on the CODE column is Non-Unique with only 100 distinct values and so with 10000 occurrences of each indexed value.

  
SQL> create table bowie (id number, code number, name varchar2(30));
 
Table created.
 
SQL> insert into bowie select rownum, mod(rownum,100), 'ZIGGY STARDUST' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> create index bowie_code_i on bowie(code);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select index_name, num_rows, distinct_keys from dba_indexes where table_name='BOWIE';
 
INDEX_NAME     NUM_ROWS DISTINCT_KEYS
------------ ---------- -------------
BOWIE_CODE_I    1000000           100
BOWIE_ID_I      1000000       1000000

 
 

If we now collect INDEX_STATS on the effectively Unique index:

  
SQL> analyze index bowie_id_i validate structure;
 
Index analyzed.
 
SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats;
 
HEIGHT    LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------ ---------- ------------- ------------ --------------------
     3    1000000       1000000            1                    4
 

 
 

We can see that the BLKS_GETS_PER_ACCESS is 4 as expected. 3 due to the index height and additional 1 to access the specific row in the associated table block.
 
The formula Oracle uses here is basically:

HEIGHT + (LF_ROWS/DISTINCT_KEYS + 1)/2 = HEIGHT + (ROWS_PER_KEY + 1)/2 = 3 + (1 + 1)/2 = 3 + 2/2 = 3 + 1 = 4.
 
So basically to calculate the BLKS_GETS_PER_ACCESS, we simply take the ROWS_PER_KEY, add 1 to it, then divide the total by 2 and finally add the index height to get the final value. The reason for this exact formula makes more sense when we look at a Non-Unique index.
 
How do we cost the access to a specific row via a non-unique index when there could be multiple occurrences of the specific index value ?
If there are say 100 occurrences of an indexed value, if we want the first of these within the index, then we need to access the same blocks as per the unique index (index height plus 1). However, if we want the last of these 100 occurrences referenced within the index, then we might need to access index height + the 100 blocks until we reach the last occurrence of the indexed value. If we’re simply interested in an “average” row, then on average we might need to access 1/2 the ROWS_PER_KEY in addition to the index height.

So the formula is now basically HEIGHT + ROWS_PER_KEY/2. But as this is only an average guesstimate to begin with and so as we don’t ruin the perfect value we can derive for Unique Indexes, the formula is adjusted a tad by adding 1 to the ROWS_PER_KEY/2 figure so that the result makes sense and is accurate for Unique indexes as well.
 
Hence the final formula of HEIGHT + (ROWS_PER_KEY + 1)/2.
 
If we now look at the index on the CODE column, which has only 100 distinct values (and so 10000 occurrences per distinct value):

 

SQL> analyze index bowie_code_i validate structure;
 
Index analyzed.
 
SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats;
 
HEIGHT    LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------ ---------- ------------- ------------ --------------------
     3    1000000           100        10000               5003.5

 
 

HEIGHT + (ROWS_PER_KEY + 1)/2 = 3 + (10000 + 1)/2 = 3 + 5000.5 = 5003.5.
 
We can see how this 5003.5 figure is actually derived.
 
The actual blocks needed to be accessed when using the index is of course also very highly dependant on the Clustering Factor (CF) of the index but the CF is not calculated as part of Index_stats. The BLKS_GETS_PER_KEY can therefore be viewed as being a guesstimate of the number of blocks required to read a specific indexed value, based on an “average” CF. As the maximum CF is LF_ROWS, an “average” CF can therefore be viewed as simply being 1/2 of LF_ROWS.
 
In the above example, an “average” CF would therefore be 1000000/2 = 500000.
 
To access all table blocks for a specific indexed value would therefore basically be:

500000/distinct keys = 500000/100 = 5000.

If we then add the index height = 5000 + 3 = 5003.
 
5003 is almost identical to 5003.5, the 0.5 difference simply attributed to the additional 1 that’s added in the previous formula.
 
So BLKS_GETS_PER_ACCESS can effectively be viewed as being the either the number of blocks required to access a specific row “on average” within the table or the number of blocks required to read all occurrences of a specific index value IF the index had an average CF.
 
Note that both definitions are nothing but wild guesstimates of the blocks that might actually need to be accessed as both are making very broad assumptions in that the CF is indeed “average”, that the accessed row of interest is indeed “somewhere near the middle” of the index range scan, that the data is indeed evenly distributed, etc. etc. The actual blocks that might need to be accessed when using the index could therefore be significantly different if these basic assumptions are not correct.
 
Now here come a few interesting points.
 
Firstly, note the formula used only takes into consideration the index height and the average expected accesses to the table. The possible accesses to additional index leaf blocks is not considered at all.
 
Why ?
 
Likely because the vast majority of accesses involving an index range scan actually involves the block reads associated with accessing the table, not reading the index itself. As the figure is only a very rough estimate to begin with, it’s somewhat pointless adding a little here or there for the trivial additional leaf blocks that might need to be accessed during a scan . So in order to keep things simple, only the index height is considered in the actual BLKS_GETS_PER_ACCESS costings.
 
Therefore, an index rebuild is likewise going to have a trivial impact on the derived BLKS_GETS_PER_ACCESS as only the index height is considered in its calculation. In the vast majority of cases, rebuilding an index will have absolutely no impact at all as most index heights are not impacted by an index rebuild. In extremely rare occasions, an index might reduce its height but then the final BLKS_GETS_PER_INDEX is only going to reduced by 1. The absolute maximum amount that an index rebuild can impact the BLKS_GETS_PER_ACCESS is just HEIGHT-1.
 
In the above example, the BLKS_GETS_PER_ACCESS is 5003.5, substantially greater than 5 or 10 or even 42. However, rebuilding the index:

  
SQL> alter index bowie_code_i rebuild;
 
Index altered.
 
SQL> analyze index bowie_code_i validate structure;
 
Index analyzed.
 
SQL> select height, lf_rows, distinct_keys, rows_per_key, blks_gets_per_access from index_stats;
 
HEIGHT    LF_ROWS DISTINCT_KEYS ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------ ---------- ------------- ------------ --------------------
     3    1000000           100        10000               5003.5

 

  
makes precisely no difference at all.
 
So what do we do, just keep rebuilding this same index week after week after week as it continually meets the ill-considered index rebuild guidelines …
 
Note also that the actual value of BLKS_GETS_PER_ACCESS could be anything, as it’s primarily based on the number of rows per keys. For a very very large table on an index with relatively few distinct values, this figure could likewise be “very large”, even with the most perfectly compact, defragmented index.
 
Therefore, having an index rebuild criteria based if some nominal value of BLKS_GETS_PER_ACCESS “is excessive” (be it 5 or 10 or 42 or 5000 or whatever) is simply nonsensical as this value has practically nothing to do with the efficiency of an index but is directly proportional to the average number of rows per index key.

Additionally, suggesting an index rebuild will have a significant impact on BLKS_GETS_PER_ACCESS is likewise nonsensical as the only cost directly attributed to the index itself is the index height, which very rarely changes following an index rebuild. In fact, the BLKS_GETS_PER_ACCESS formula specifically negates the impact of any potential index rebuilds by implicitly excluding index block leafs in its calculations.
 
In short, basing an index rebuild criteria on the value of BLKS_GETS_PER_ACCESS is totally ludicrous. It’s actually similar to the silly myth that one should rebuild an index if the clustering factor is “too large”, where in actual fact there is no such threshold value and an index rebuild doesn’t impact the subsequent CF anyways.
 
I hate to think how many indexes have been repeatedly rebuilt unnecessarily based on a rebuild criteria where BLKS_GETS_PER_ACCESS is greater than value “X”, when such an index rebuild has made absolutely no difference to the original rebuild criteria :(

Oracle11g: Analyze Table Validate Structure Cascade “FAST” (Slow Burn) March 31, 2011

Posted by Richard Foote in 11g, 11g New features, Oracle Indexes, Validate Structure.
6 comments

I always take notice when Oracle introduces a new “FAST” option, so it was with some excitement when I first noticed in Oracle 11g Rel 1 there was a new FAST option when running the ANALYZE TABLE CASCADE VALIDATE STRUCTURE command.
 
This was described in the manuals as introducing a hashing scheme that was significantly faster than the traditional cascade validation method.
 
However, when I tested this new feature, the results were rather disappointing to say the least (I’ve tested this on various 11g versions, both R1 and R2 and on various platforms, AIX and Windows). In the example below, the PERSONS table is a rather large table that has a number of associated indexes:

 
SQL> set timing on
SQL> analyze table persons validate structure cascade;
 
Table analyzed.
Elapsed: 00:06:01.84
 
SQL> analyze table persons validate structure cascade fast;
 
Table analyzed.
Elapsed: 00:15:20.27
 
SQL> analyze table persons validate structure cascade;
 
Table analyzed.
Elapsed: 00:02:22.27
 
SQL> analyze table persons validate structure cascade fast;
 
Table analyzed.
Elapsed: 00:15:46.28
 
SQL> analyze table persons validate structure cascade;
 
Table analyzed.
Elapsed: 00:02:23.78
 
SQL> analyze table persons validate structure cascade fast;
 
Table analyzed.
Elapsed: 00:14:58.00

 
When using the so-called “FAST” option, the performance was consistently significantly slower, not faster, when compared to using the default method. Perhaps the default option is “FASTER-STILL” ? (the default is actually “COMPLETE”). Additionally, the results of subsequent executions of the default method often resulted in improved times (eg: elapsed times reduced from 6 mins to around 2.5 mins), whereas the FAST option resulted in  uniformly slower elapsed times.
 
I thought this warranted further investigation and so decided to trace what was actually going on behind the scenes and see where the extra time was being spent.
 
With the Complete default method, a significant number of I/Os were being performing as Oracle had to effectively perform an Index Full Scan on each index, reading the table primarily using single block reads. However, CPU related overheads were relatively low, with most of the elapsed times attributed to the related I/Os. Following is an extract from a formatted trace file:
  

 
analyze table promis.persons validate structure cascade
 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.11         17        915          0           0
Execute      1    145.62     352.22     134061   58166955          3           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2    145.63     352.33     134078   58167870          3           0

 

The total results for the trace were as follows:
 

 
 
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.11         17        915          0           0
Execute      1    145.62     352.22     134061   58166955          3           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2    145.63     352.33     134078   58167870          3           0
 

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       87      0.03       0.02          0          0          0           0
Execute    243      0.04       0.07          0          0          0         100
Fetch      577      0.10       0.31        110       1114          0         766
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      907      0.17       0.41        110       1114          0         866

 
  

We’ll compare these results later but for now note that elapsed times were 352 seconds in total, with the CPU only contributing about 165 seconds of that time.

If we now look behind the scenes of a “FAST” VALIDATE STRUCTURE CASCADE, we notice that statements such as the following are now recursively executed:
 

 
select /*+ full(PROMIS.PERSONS) */ ORA_HASH(ID || rowid)
from
 PROMIS.PERSONS MINUS select /*+ index_ffs(PROMIS.PER_PK) */ ORA_HASH(ID ||
  rowid) from PROMIS.PERSONS
 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     35.23      37.83       7055      10402          6           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     35.23      37.83       7055      10402          6           0

  
 
We notice that Oracle now uses a new method, basically performing a Full Table Scan of the table and processing the indexed column and associated rowid through the ORA_HASH function which in turn returns a computed hash value. The full table scan is a very efficient method of reading all related values from a table but putting each column value and rowid through the ORA-HASH function is a relatively expensive CPU intensive operation. Oracle then does the same for the index by performing an Index Fast Full Scan of the index and processing again the index column value and associated rowid through the ORA-HASH function. Again, this is a very efficient method of reading all values from an index but again relatively CPU intensive putting every indexed value and associated rowid through the ORA-HASH function.
 
Having performed this for both table and index, in theory if the data structures are indeed valid and non-corrupt, they should both return exactly the same results. So by performing a MINUS of both row sets, if no data is returned, we can be sure indeed all is well with both table/index. If however differences are detected, then there’s some disparity between the data in the index and in the table and hence something is wrong. We’ll have no idea exactly what the differences might be as we only have the resultant hash values to go by, but the analyze validate structure command can at least raise an error and say something is indeed wrong when validating the table and its associated indexes.
 
We would actually now need to perform an Analyze Validate Structure again (without the FAST option this time) to determine the exact cause of the issue, but assuming detecting an error is a very rare event, it’s not an additional step we would ordinarily need to perform. So if by reading both table and index via multi-block Full Scans and processing the index data via the ORA-HASH function is ”faster” and more efficient to determine nothing is actually wrong, then it’s a reasonable strategy to take.
 
This processing is then repeated for every index associated with the table we’re analyzing. So in theory, because we’re only performing Full Table and Fast Full Index Scans, we end up performing far fewer multi-block I/Os calls and so can complete this whole Validate Structure processing “Faster”.
 
However, Oracle needs to perform the ORA-HASH function operation for every column in every index, both within the table and associated indexes. So although we end up doing less I/Os, we end up burning much more CPU due to having to call the ORA-HASH function so often. If we look at the total resources when performing the “FAST” validate structure on exactly the same table:

 
 
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          1        145          0           0
Execute      1      0.03       0.02     190294    3086489          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.04       0.03     190295    3086634          2           0
 
 
 
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       49      0.06       0.07          0          0          0           0
Execute     49      0.00       0.00          0          0          0           2
Fetch       46    830.63     878.62     190295    3086514        138          21
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      144    830.69     878.69     190295    3086514        138          23

 
 
We notice the query values are significantly reduced (down to just 3M from 58M). This of course is a good thing as this translates to reduced I/O related overheads.
 
However, if we look at the corresponding CPU related costs, they have increased significantly (up to 830 seconds from just 146 seconds). It’s really expensive putting all the indexed column values through the ORA-HASH function. Additionally, because multi-block Full Scans which aren’t generally cached are being performed during the Fast option, the results remain consistently poor with subsequent executions.

The net result is that the final elapsed times are consistently greater with the FAST option than without (878 seconds up from 352 seconds). Using the FAST option has resulted in slower response times, not faster ones.

Now your mileage may vary based on your specific environments, but my results have not been particularly impressive to say the least. Substantially increased elapsed times (during which the structures remain locked let’s not forget) is not exactly my idea of a “FAST” option.

Oracle11g: Zero Sized Unusable Indexes Part II (Nathan Adler) February 27, 2011

Posted by Richard Foote in 11g, 11g New features, Oracle Indexes, Unusable Indexes.
12 comments

In my previous post, I discussed how Oracle from 11g R2 onwards will automatically drop the segment and associated storage from unusable index objects. Mohamend Houri asked in the comments section the excellent question of just how useful this feature will be in real life cases when typically indexes are not left in an unusuable state for a long time, perhaps only when performing large data loads when such indexes would ordinarily be rebuilt anyways.

Thought the question was worth a seperate blog entry to provide a worthy answer.

The first point I would make is that we need to think a little outside the box and consider how such change in behaviour can open up new possibilities and flexibilities in how we index our tables.

For example, previously a Local Partitioned Index must have the same number of index partitions as the parent table. But what if an index is only useful for the “current” partition, where accessing newish data makes sense via an index. However, historical data in “older” partitions might only be accessed in batch processes via full partition scans. Why have a local index for older partitions when such indexes are never used. Previously, we had no choice, it was a case of if one or some of the partitions needed an index, then all the partitions needed to be indexed. If we made such unnecessary partitioned indexes unusable, we still needed to allocate storage for the index segment. Now, we can make any unnecessary index partition unusable and no storage at all is allocated to such index partitions.

Taking this a step further, we now have a really nice method of potentially indexing only portions of a table that need indexing, values which don’t have any benefit of being indexed (perhaps because the values are too numerous to ever be accessed efficiently via an index) no longer need to be indexed at all.

Here’s a classic example. Following is a table with a flag  in which the vast number of rows in the data have been “processed”. However, we have a few rows, those current rows which are of interest to us, which have not yet been processed (they may have a status of another value). We need an index in order to find the few rows which have not yet been processed but the index needs to also include all the values which are not of interest and have been processed.

 
SQL> create table bowie_stuff (id number, processed varchar2(10));
Table created.

SQL> insert into bowie_stuff select rownum, 'YES' from dual connect by level <= 1000000;
1000000 rows created.

SQL> commit;
Commit complete.

SQL> update bowie_stuff set processed = ‘NO’ where id in (999990, 999992, 999994, 999996, 999998);
5 rows updated.

SQL> commit;
Commit complete.

SQL> create index bowie_stuff_i on bowie_stuff(processed) pctfree 0;
Index created.

SQL> select index_name, leaf_blocks from dba_indexes where index_name = 'BOWIE_STUFF_I';

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
BOWIE_STUFF_I                         1877

SQL> select segment_name, blocks from dba_segments where segment_name = 'BOWIE_STUFF_I';

SEGMENT_NAME             BLOCKS
-------------------- ----------
BOWIE_STUFF_I              1920

 
 

Notice how the index is quite large (1,877 leaf blocks) as it needs to hold values for all 1M rows, even though only a relative handful of values within the index are ultimately of any use.

If we now gather stats (note we need to collect histograms as the column value distribution is very skewed) and run a query to select just the 5 rows that have not actually been processed:

 
SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_STUFF', estimate_percent=>null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_STUFF', estimate_percent=>null, method_opt=> 'FOR COLUMNS PROCESSED SIZE 5');

PL/SQL procedure successfully completed.

SQL> select * from bowie_stuff where processed = 'NO';

Execution Plan
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     5 |    40 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE_STUFF   |     5 |    40 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_STUFF_I |     5 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        540  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

 
 

Note the CBO uses the index but it requires a total of 6 consistent reads.

Not bad but we can do somewhat better and perform less I/O , significantly reduce storage overheads and significantly reduce index maintenance operations, if only we didn’t store the unnecessary index values within the index.

One method could be to create a function-based index based on the decode function and only store non-null values that are of interest. However, this requires the application to likewise use the decode function in order to make use of the index.

Another method is to use a partitioned index and now with this new Oracle11g feature of zero sized unusable indexes, we don’t need any storage at all for the unwanted indexed values.

Let’s now re-create the index as a globally partitioned index, with one partition defined to contain all values of interest and another partition defined to contain the vast number of processed values. Initially, the index is created in an unusable state so no segments and no storage is allocated to any of the partitions:

 
SQL> drop index bowie_stuff_i;

Index dropped.

SQL> create index bowie_stuff_i on bowie_stuff(processed)
  2  global partition by range (processed)
  3  (partition not_processed_part values less than ('YES'),
  4   partition processed_part values less than (MAXVALUE))
  5  unusable;

Index created.

 
 

Next, we’re only going to rebuild the partition containing just the relatively few rows of interest. The partition containing the values that are not of interest is left in an unusable state and so continues to occupy no storage at all:

 
SQL> alter index bowie_stuff_i rebuild partition not_processed_part;

Index altered.

SQL> select index_name, partition_name, leaf_blocks from dba_ind_partitions where index_name = 'BOWIE_STUFF_I';

INDEX_NAME           PARTITION_NAME       LEAF_BLOCKS
-------------------- -------------------- -----------
BOWIE_STUFF_I        PROCESSED_PART                 0
BOWIE_STUFF_I        NOT_PROCESSED_PART             1

SQL> select segment_name, partition_name, blocks from dba_segments where segment_name = 'BOWIE_STUFF_I';

SEGMENT_NAME         PARTITION_NAME           BLOCKS
-------------------- -------------------- ----------
BOWIE_STUFF_I        NOT_PROCESSED_PART            8

 
 

Note how the index is now tiny (reduced from 1,877 leaf blocks to just 1) as it is only now just storing the index entries that are of interest. We have just saved ourselves heaps of storage as the other partition remains unusable and uses no storage at all.

If we now run our query again:

 
SQL> select * from bowie_stuff where processed = 'NO';

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |               |     5 |    45 |     1   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE      |               |     5 |    45 |     1   (0)| 00:00:01 |     1 |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| BOWIE_STUFF   |     5 |    45 |     1   (0)| 00:00:01 |       |       |
|*  3 |    INDEX RANGE SCAN          | BOWIE_STUFF_I |     5 |       |     1   (0)| 00:00:01 |     1 |     1 |
--------------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        542  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

 
 

We notice that the execution plan is just using the tiny index partition and as a result we have reduced our consistent gets down from 6 to just 4. Additionally, we have not had to change our application at all to use the improved index, it was the exact same query as executed previously.

This method can of course be used in Oracle prior to 11g R2 but now with  zero sized unusable indexes, we do not have to allocate any storage at all to those indexes that we may wish to remain in an unusable state for extended or indefinite periods of time. So yes, zero sized unusable indexes can be extremely useful in many real life scenarios :)

Oracle11g: Zero Sized Unusable Indexes (Zeroes) February 25, 2011

Posted by Richard Foote in 11g, 11g New features, Oracle Indexes, Unusable Indexes.
2 comments

Following on from my previous discussion on “Create On Demand” segments, Oracle 11g R2 has also introduced storage saving initiatives in relation to useable indexes.  Starting with a simple Oracle 10g example, we create a table and associated index:
 

 
SQL> create table bowie as select rownum id, 'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 
  

If we now make the index unusable:
 

 
SQL> alter index bowie_id_i unusable;
 
Index altered.
 
SQL> select index_name, blevel, leaf_blocks, num_rows, status, dropped from dba_indexes where index_name = 'BOWIE_ID_I';
 
INDEX_NAME     BLEVEL LEAF_BLOCKS   NUM_ROWS STATUS   DRO
---------- ---------- ----------- ---------- -------- ---
BOWIE_ID_I          2        2226    1000000 UNUSABLE NO
 

SQL> select segment_name, bytes, blocks, extents from dba_segments where segment_name = 'BOWIE_ID_I';
 
SEGMENT_NAME      BYTES     BLOCKS    EXTENTS
------------ ---------- ---------- ----------
BOWIE_ID_I     18874368       2304         18

 
  

We notice that the storage associated with the segment remains, even though the data within the index is now totally useless to us now. The index definition is of course vital but why bother continuing to assign 18 extents of storage (in this example) to the index ?  Oracle 11g Release 2 has now by default changed this behaviour. 
 
Using the same demo as before but running Oracle11g R2:
 

 
SQL> create table bowie as select rownum id, 'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> alter index bowie_id_i unusable;
 
Index altered.
 
SQL> select index_name, blevel, leaf_blocks, num_rows, status, dropped from dba_indexes where index_name = 'BOWIE_ID_I';
 
INDEX_NAME     BLEVEL LEAF_BLOCKS   NUM_ROWS STATUS   DRO
---------- ---------- ----------- ---------- -------- ---
BOWIE_ID_I          2        2226    1000000 UNUSABLE NO
 
SQL> select segment_name, bytes, blocks, extents from dba_segments where segment_name = 'BOWIE_ID_I';
 
no rows selected

 
  

We notice that the storage associated with the object is all gone. The index object remains but the underling segment and storage have been automatically dropped.

If we now look at a partitioning example, where we create 3 types of indexes:
 

 
SQL> CREATE TABLE big_album_sales(id number, album_id number, country_id number,
          release_date date, total_sales number)  PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2007 VALUES LESS THAN (TO_DATE('01-JAN-2008', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2008 VALUES LESS THAN (TO_DATE('01-JAN-2009', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2009 VALUES LESS THAN (TO_DATE('01-JAN-2010', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2010 VALUES LESS THAN (MAXVALUE));
 
Table created.
  

SQL> INSERT INTO big_album_sales SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2000), ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.

 
  

We first create a Non-Partitioned Index:
 

 
SQL> CREATE INDEX big_album_tot_sales_i ON big_album_sales(total_sales);
 
Index created.

 
  

Next a Global Partitioned Index:
 

 
SQL> CREATE INDEX big_album_country_id_i  ON big_album_sales(country_id)
     GLOBAL PARTITION BY RANGE (country_id)
     (PARTITION TS1 VALUES LESS THAN (26),
      PARTITION TS2 VALUES LESS THAN (51),
      PARTITION TS3 VALUES LESS THAN (76),
      PARTITION TS4 VALUES LESS THAN (MAXVALUE));
 
Index created.

  

Finally,  a Local Partitioned index:
  

 
SQL> CREATE INDEX big_album_album_id_i ON big_album_sales(album_id) local;
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'BIG_ALBUM_SALES', estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 
  

If we now split the last table partition, this will effectively make the:
 
1) Non-Partitioned Unusable
2) All partitions of the Global Partitioned index unusable
3) Just the last 2 partitions of the Local Partitioned Index unusable
  

 
SQL> ALTER TABLE big_album_sales SPLIT PARTITION ALBUMS_2010
     AT (TO_DATE('01-JAN-2011', 'DD-MON-YYYY'))
     INTO (PARTITION ALBUMS_2010, PARTITION ALBUMS_2011);
 
Table altered.
 
SQL> select index_name, status from dba_indexes where table_name = 'BIG_ALBUM_SALES';
 
INDEX_NAME               STATUS
------------------------ --------
BIG_ALBUM_TOT_SALES_I    UNUSABLE
BIG_ALBUM_COUNTRY_ID_I   N/A
BIG_ALBUM_ALBUM_ID_I     N/A
 
SQL> select index_name, partition_name, status, leaf_blocks from dba_ind_partitions where index_name like 'BIG_ALBUM_%';
 
INDEX_NAME              PARTITION_NAME STATUS   LEAF_BLOCKS
----------------------- -------------- -------- -----------
BIG_ALBUM_ALBUM_ID_I    ALBUMS_2007    USABLE           807
BIG_ALBUM_ALBUM_ID_I    ALBUMS_2008    USABLE           381
BIG_ALBUM_ALBUM_ID_I    ALBUMS_2009    USABLE           383
BIG_ALBUM_ALBUM_ID_I    ALBUMS_2010    UNUSABLE
BIG_ALBUM_ALBUM_ID_I    ALBUMS_2011    UNUSABLE
BIG_ALBUM_COUNTRY_ID_I  TS1            UNUSABLE         629
BIG_ALBUM_COUNTRY_ID_I  TS2            UNUSABLE         629
BIG_ALBUM_COUNTRY_ID_I  TS3            UNUSABLE         629
BIG_ALBUM_COUNTRY_ID_I  TS4            UNUSABLE         629
 
SQL> select segment_name, partition_name, bytes, blocks from dba_segments where segment_name like 'BIG_ALBUM_%' and segment_type like 'INDEX%';
 
SEGMENT_NAME          PARTITION_NAME    BYTES BLOCKS
--------------------- -------------- -------- ------
BIG_ALBUM_ALBUM_ID_I  ALBUMS_2007     7340032    896
BIG_ALBUM_ALBUM_ID_I  ALBUMS_2008     3145728    384
BIG_ALBUM_ALBUM_ID_I  ALBUMS_2009     4194304    512
BIG_ALBUM_TOT_SALES_I                23068672   2816

 
  

We notice that all segments associated with the Global Partitioned index which are now unusable have been dropped. As have both unusable partitions from the Local Partitioned Index. However, the segment and storage associated with the unusable Non-Partitioned index still remains. Perhaps a missing feature for another time …
 
It’s a nice little touch that the unusable and somewhat useless index segments now get automatically cleaned out in Oracle11g R2, although they did previously act as “placeholders” in that nothing else within the tablespace could come along and use the occupied storage.

Oracle11g Creation On Demand Indexes (Invisible Touch) February 17, 2011

Posted by Richard Foote in 11g, 11g New features, Oracle Indexes.
11 comments

Prior to Oracle11g Release 2, the default and minimum size of a segment is one extent. So in the below example, where we create a table and five associated indexes: 

 
SQL> create table empty (a number, b number, c number, d number, e number);
 
Table created.
 
SQL> create index empty_a_i on empty(a);
 
Index created.
 
SQL> create index empty_b_i on empty(b);
 
Index created.
 
SQL> create index empty_c_i on empty(c);
 
Index created.
 
SQL> create index empty_d_i on empty(d);
 
Index created.
 
SQL> create index empty_e_i on empty(e);
 
Index created.
 

SQL> select segment_name, blocks, bytes, extents from dba_segments where segment_name like 'EMPTY%';
 
SEGMENT_NAME     BLOCKS      BYTES    EXTENTS
------------ ---------- ---------- ----------
EMPTY               128    1048576          1
EMPTY_A_I           128    1048576          1
EMPTY_B_I           128    1048576          1
EMPTY_C_I           128    1048576          1
EMPTY_D_I           128    1048576          1
EMPTY_E_I           128    1048576          1
 
6 rows selected.

  

Each of the segments has been allocated an extent, including each of the indexes.
 
However, since Oracle11g Release 2, this default behaviour has changed. Running exactly the same demo:
 

 
SQL> create table empty (a number, b number, c number, d number, e number);
 
Table created.
 
SQL> create index empty_a_i on empty(a);
 
Index created.
 
SQL> create index empty_b_i on empty(b);
 
Index created.
 
SQL> create index empty_c_i on empty(c);
 
Index created.
 
SQL> create index empty_d_i on empty(d);
 
Index created.
 
SQL> create index empty_e_i on empty(e);
 
Index created.
 
SQL> select segment_name, blocks, bytes, extents from dba_segments where segment_name like 'EMPTY%';
 
no rows selected

 

We can see that no actual segments have been allocated. The default number of extents when creating an object is now effectively zero. Oracle now defers the creation of the segment and the actual allocation of extents and storage until the point in time when the first row gets inserted.
 
This means for those packaged applications where a large number of objects get created of which relatively few are actually ever used by the specific deployment of the application (eg. SAP) , a substantial amount of storage could potentially be saved. It also can save a significant amount of time deploying such applications as the overheads associated with actually creating the never to be used segments can be avoided.
 
There also some subtle performance implications when the application attempts to access some of these “empty” tables. I’m just going to create another table, but this one populated with a bunch of rows and run a query that joins this with the “empty” table:
 

 
SQL> create table bowie as select rownum id, 'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created. 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'EMPTY', cascade=>true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

SQL> select * from bowie, empty where bowie.id=empty.a and bowie.id = 42;
 
no rows selected
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1098778158
 
--------------------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |     1 |    76 |     2   (0)| 00:00:01 |
|   1 |  MERGE JOIN CARTESIAN         |            |     1 |    76 |     2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID | EMPTY      |     1 |    65 |     1   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | EMPTY_A_I  |     1 |       |     1   (0)| 00:00:01 |
|   4 |   BUFFER SORT                 |            |     1 |    11 |     1   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| BOWIE      |     1 |    11 |     1   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | BOWIE_ID_I |     1 |       |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("EMPTY"."A"=42)
   6 - access("BOWIE"."ID"=42)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          0  consistent gets
          0  physical reads
          0  redo size
        303  bytes sent via SQL*Net to client
        239  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 

With a table which has yet to be populated, the CBO knows there can be no rows associated with this table (and hence no returns return from this query). You can’t get any less than 0 consistent gets. Note previously, the CBO would be forced to perform at the very least one read (if not more) as it can’t “know” there are indeed no rows, even with the table statistics set at zero (as they may no longer be accurate) until it actually physically accesses a segment.
 
The segment is not actually created and storage allocated to the object until the first row is inserted into the table. This means that this first insert will be a relatively expensive operation as the work associated with creating the segment and allocating the initial extent needs to be implicitly performed in the background at this time. Not only for the table itself, but for any dependant object as well (eg. indexes).

 
SQL> insert into empty (a, b) values (1,1);
 
1 row created.
 

Execution Plan
----------------------------------------------------------
 
--------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |       |     1 |     1   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | EMPTY |       |            |          |
--------------------------------------------------------------------------
 

Statistics
----------------------------------------------------------
       3535  recursive calls
        178  db block gets
        830  consistent gets
         26  physical reads
      27272  redo size
        367  bytes sent via SQL*Net to client
        321  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
         59  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
SQL> commit;
 
Commit complete.

   

Although it’s only a single row insert, note the high number of recursive calls and logical I/Os due in large part to the segment(s) being implicitly created in the background at this time. Subsequent inserts are nowhere near as expensive:
 

 
SQL> insert into empty (a, b) values (1,1);
 
1 row created.
 

Execution Plan
----------------------------------------------------------
 
--------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |       |     1 |     1   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL | EMPTY |       |            |          |
--------------------------------------------------------------------------
 
Statistics
----------------------------------------------------------
          0  recursive calls
          5  db block gets
          1  consistent gets
          0  physical reads
        692  redo size
        381  bytes sent via SQL*Net to client
        321  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

 

Although, the table was populated with values that only directly impacted 2 of the indexes, once the table segment is created, all dependent segments are created at the same time.

 
SQL> select segment_name, blocks, bytes, extents from dba_segments where segment_name like 'EMPTY%';
 
SEGMENT_NAME     BLOCKS      BYTES    EXTENTS
------------ ---------- ---------- ----------
EMPTY               128    1048576          1
EMPTY_A_I           128    1048576          1
EMPTY_B_I           128    1048576          1
EMPTY_C_I           128    1048576          1
EMPTY_D_I           128    1048576          1
EMPTY_E_I           128    1048576          1
 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'EMPTY', cascade=>true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select index_name, blevel, leaf_blocks, status from dba_indexes where index_name like 'EMPTY%';
 
INDEX_NAME     BLEVEL LEAF_BLOCKS STATUS
---------- ---------- ----------- --------
EMPTY_A_I           0           1 VALID
EMPTY_B_I           0           1 VALID
EMPTY_C_I           0           0 VALID
EMPTY_D_I           0           0 VALID
EMPTY_E_I           0           0 VALID

 

 
Note all the index segments have been created although only 2 of them actually contain data.

Finally, there are some implications with regard to how quotas are enforced that potentially may cause issues. Prior to Oracle11g Release 2, if a user tried to create a segment in a tablespace for which they don’t have sufficient privileges, the operation will fail during the creation of the object:

 
SQL> create user muse identified by muse default tablespace user_data temporary tablespace temp;
 
User created.
 
SQL> grant create session, create table to muse;
 
Grant succeeded.
 
SQL> connect muse/muse;
Connected.
SQL> create table fred (id number primary key using index (create index fred_pk on fred(id) tablespace user_data), name varchar2(20));
create table fred (id number primary key using index (create index fred_pk on fred(id) tablespace user_data), name varchar2(20))
*
ERROR at line 1:
ORA-01950: no privileges on tablespace 'USER_DATA'

 

However with Oracle11g Release 2, as no segment is actually created and hence storage allocated when an object is created, such creation of objects will now initially succeed. No quotas can be violated if no storage is actually used during the creation of the object:
 

 
SQL> create user muse identified by muse default tablespace user_data temporary tablespace temp;
 
User created.
 
SQL> grant create session, create table to muse;
 
Grant succeeded.
 
SQL> connect muse/muse
Connected.
 
SQL> create table fred (id number primary key using index (create index fred_pk on fred(id) tablespace user_data), name varchar2(20));
 
Table created.

  

It’s only when the table is first populated and hence when the segment is actually created and storage allocated, will quota related issues generate errors:

 
SQL> insert into fred values (1, 'BOWIE');
insert into fred values (1, 'BOWIE')
            *
ERROR at line 1:
ORA-01950: no privileges on tablespace 'USER_DATA'

 

An interesting change in default behaviour introduced in Oracle11g Release 2 that can potentially be quite beneficial but perhaps also a little dangerous for the unwary.

Oracle11g Bitmap-Join IOTs (Us and Them) January 25, 2011

Posted by Richard Foote in 11g, Bitmap Indexes, Index Organized Tables, Oracle Indexes.
6 comments

With each new database release, nice little improvements and enhanced options continually get added. Since 11g R1, two index related features can finally be used in combination with each other.
 
To demonstrate, I’m first going to create and populate a so-called “large” Data Warehouse table. 
 

 
SQL> CREATE TABLE big_dwh_table (id NUMBER PRIMARY KEY, album_id NUMBER, artist_id NUMBER, country_id NUMBER, format_id NUMBER, release_date DATE, total_sales NUMBER);
 
Table created.
 
SQL> CREATE SEQUENCE dwh_seq;
 
Sequence created.
 
SQL> create or replace procedure pop_big_dwh_table as
  2  v_id          number;
  3  v_artist_id   number;
  4  begin
  5    for v_album_id in 1..10000 loop
  6        v_artist_id:= ceil(dbms_random.value(0,100));
  7        for v_country_id in 1..100 loop
  8          select dwh_seq.nextval into v_id from dual;
  9          insert into big_dwh_table values (v_id, v_album_id, v_artist_id, v_country_id, ceil(dbms_random.value(0,4)), trunc(sysdate-mod(v_id,ceil(dbms_random.value(0,1000)))), ceil(dbms_random.value(0,500000)));
 10       end loop;
 11    end loop;
 12 commit;
 13 end;
 14 /
 
Procedure created.
 
SQL> exec pop_big_dwh_table
 
PL/SQL procedure successfully completed.

 

I’ll next create a standard bitmap index on the ALBUM_ID column and collect a few statistics:
 

  
SQL> create bitmap index big_dwh_table_album_id_i on big_dwh_table(album_id);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'BIG_DWH_TABLE', estimate_percent=> null, cascade=> true, method_opt=>'FOR  ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 

OK, I’m now going to create and populate a “smaller” dimension/detail heap table and a few associated indexes:

 
SQL> CREATE TABLE albums (album_id number, album_details varchar2(30));
 
Table created.
 
SQL> INSERT INTO albums SELECT rownum, substr(object_name,1,30) FROM dba_objects WHERE rownum <= 10000;
 
10000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> alter table albums add primary key(album_id);
 
Table altered.
 
SQL> create index albums_details_i on albums(album_details);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'ALBUMS', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 

If we now run a little query that joins the two tables together:
 

 
SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$';
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1936297994
 
----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                          |   125 |  4250 |    25   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |                          |       |       |            |          |
|   2 |   NESTED LOOPS                |                          |   125 |  4250 |    25   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| ALBUMS                   |     1 |    22 |     2   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | ALBUMS_DETAILS_I         |     1 |       |     1   (0)| 00:00:01 |
|   5 |    BITMAP CONVERSION TO ROWIDS|                          |       |       |            |          |
|*  6 |     BITMAP INDEX SINGLE VALUE | BIG_DWH_TABLE_ALBUM_ID_I |       |       |            |          |
|   7 |   TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE            |   100 |  1200 |    25   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("A"."ALBUM_DETAILS"='TAB$')
   6 - access("B"."ALBUM_ID"="A"."ALBUM_ID")
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
       1648  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 

The resultant execution plan is pretty good and efficient and what we would expect. It performs a nested loop join to join the tables together which based on the relatively small number of rows returned makes sense and uses the b-tree index to get the specific album details from the dimension table and the bitmap index to find the matching albums details from the larger table.
 
However, as this is a very frequently executed join condition, we can potentially improve things and reduce the 10 consistent gets by introducing a bitmap-join index. A bitmap-join index performs the “join” operation once, when the index is created and during subsequent DML operations by creating an index based on column(s) on the smaller dimension tables that directly references rows in the larger fact table.
    

 
SQL> drop index albums_details_i;
 
Index dropped.
 

SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
     FROM big_dwh_table b, albums a
     WHERE b.album_id = a.album_id;
 
Index created.

 

So the bitmap-join index is based on the ALBUM_DETAILS column from the smaller ALBUMS table, but it references and has rowids associated with the larger BIG_DWH_TABLE table, with the bitmap-join definition containing details on how the join between the two tables needs to be performed. It if want to know what rows within the larger table have ALBUM_DETAILS of interest, the corresponding bitmap-join index will find all such rows without having to access the smaller ALBUMS table that contains this column.
 
If we now run the same query as before:
  

 
SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$';
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 950886520
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   125 |  1500 |    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE           |   125 |  1500 |    26   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                         |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_DWH_ALBUM_DETAILS_I |       |       |            |          |
--------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("B"."SYS_NC00008$"='TAB$')
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1648  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
 
   

We notice the nested loop operation is no longer required. In fact, we don’t need to reference the smaller ALBUMS table at all as all the required information can now be obtained by using the bitmap-join index and direct accesses to the larger table. The number of consistent gets has therefore reduced from 10 down to just 6.
 
Note in our example, there is no actual Foreign Key (FK) constraint in the larger table (in a Data Warehouse, such constraints may not be necessary and/or get in the way). The bitmap-join index doesn’t require a FK constraint to be in place however it’s necessary that the column in the join condition referencing the detail table be Unique else there could be a many-to-many join condition which wouldn’t make sense when attempting to populate the bitmap-join index.
 
However, make one of the tables in the Bitmap-Join index an Index Organized Table (IOT), in this case the smaller detail table …
 

 
SQL> drop table albums;
 
Table dropped.
 
SQL> CREATE TABLE albums (album_id number primary key, album_details varchar2(30)) organization index;
 
Table created.
 
SQL> INSERT INTO albums SELECT rownum, substr(object_name,1,30) FROM dba_objects WHERE rownum <= 10000;
 
10000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(ownname=> 'BOWIE', tabname=> 'ALBUMS', estimate_percent=> null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 

SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
  2       FROM big_dwh_table b, albums a
  3       WHERE b.album_id = a.album_id;
CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
                                               *
ERROR at line 1:
ORA-25966: join index cannot be based on an index organized table

 

 
   
 
and we get the above error as prior to 11g R1, there was a restriction that no table within a Bitmap-Join index could be an Index Organized Table.
 

Now, if we run exactly the same demo but in an Oracle11g database:
 
 

 
SQL> CREATE BITMAP INDEX big_dwh_album_details_i ON big_dwh_table(a.album_details)
     FROM big_dwh_table b, albums a
     WHERE b.album_id = a.album_id;
 
Index created.
 

SQL> SELECT b.id, b.album_id, b.format_id FROM big_dwh_table b, albums a WHERE b.album_id = a.album_id and a.album_details = 'TAB$';
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 950886520
 
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |   125 |  1500 |    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BIG_DWH_TABLE           |   125 |  1500 |    26   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|                         |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BIG_DWH_ALBUM_DETAILS_I |       |       |            |          |
--------------------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("B"."SYS_NC00008$"='TAB$')
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1648  bytes sent via SQL*Net to client
        396  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 

It all now works fine.
 
So since Oracle 11g R1, there’s one less reason not use Index Organized Tables and/or Bitmap-Join indexes in your Data Warehouse :)

Oracle11g IGNORE_ROW_ON_DUPKEY_INDEX Hint (Micro Cuts) December 20, 2010

Posted by Richard Foote in 11g, 11g New features, Oracle Indexes.
17 comments

An interesting new hint was introduced in Oracle11g which provides an alternative approach when inserting data where duplicate values might be an issue.
 
To illustrate, I’m going to create a little table with just the 10 rows with a unique ID column containing values 1 – 10 policed by a Unique index:

 

SQL> create table radiohead (id number constraint radiohead_pk_i primary key using index (create unique index radiohead_pk_i on radiohead(id)), name varchar2(20));
 
Table created.
 
SQL> select index_name, uniqueness, table_name from dba_indexes where index_name='RADIOHEAD_PK_I';
 
INDEX_NAME                     UNIQUENES TABLE_NAME
------------------------------ --------- ------------------------------
RADIOHEAD_PK_I                 UNIQUE    RADIOHEAD
 
SQL> insert into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 10;
 
10 rows created.
 
SQL> commit;
 
Commit complete.

 
 

If we now attempt to add 12 more rows, but including the values 1 – 10:

 
SQL> insert into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 12;
insert into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 12
*
ERROR at line 1:
ORA-00001: unique constraint (BOWIE.RADIOHEAD_PK_I) violated

  
We obviously get a unique constraint violation error.
 
However, Oracle11g allows us to use the IGNORE_ROW_ON_DUPKEY_INDEX hint, which will silently deal with the unique constraint violation errors by simply ignoring and not inserting any row in which the unique values already exist in the table.
 
The hint comes in 2 formats, the first allows us to specify the unique index which contains the unique values to be ignored:
 

 
SQL> insert /*+ ignore_row_on_dupkey_index(radiohead,radiohead_pk_i) */ into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 12;
 
2 rows created.
 
SQL> commit;
 
Commit complete.

 
 

  
Note the 10 duplicate values (1 – 10) have been ignored and have not been inserted into the table but the values 11 and 12 which didn’t previously exist have successfully been inserted into the table.
 
Here we attempt to insert values 1 – 13 into the table, although now values 1 – 12 currently already exist. This time, we’ll use the second format of the hint which allows us to stipulate the column which contains unique values which are to be ignored if they already exist:
 

 
SQL> insert /*+ ignore_row_on_dupkey_index(radiohead(id)) */ into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 13;
 
1 row created.
 
SQL> commit;
 
Commit complete.
 

SQL> select * from radiohead;
 
        ID NAME
---------- --------------------
         1 OK COMPUTER
         2 OK COMPUTER
         3 OK COMPUTER
         4 OK COMPUTER
         5 OK COMPUTER
         6 OK COMPUTER
         7 OK COMPUTER
         8 OK COMPUTER
         9 OK COMPUTER
        10 OK COMPUTER
        11 OK COMPUTER
        12 OK COMPUTER
        13 OK COMPUTER
 
13 rows selected.

 
 

Note in this case, the values 1 – 12 have all been silently ignored with just the value 13 inserted this time into the table.
 
You can’t however use this hint within an update statement …

 
SQL> update /*+ ignore_row_on_dupkey_index(radiohead,radiohead_pk_i) */ radiohead set id = 13 where id = 3;
update /*+ ignore_row_on_dupkey_index(radiohead,radiohead_pk_i) */ radiohead set id = 13 where id = 3
*
ERROR at line 1:
ORA-38917: IGNORE_ROW_ON_DUPKEY_INDEX hint disallowed for this operation

 
 

Interesting difference in the behaviour of this hint. Usually “invalid” hints are just ignored and treated as comments but here if an illegal operation is attempted with the use of this “hint”, an error is invoked.
 
Going to now set up the same demo again, but this time police the Primary Key constraint via a Non-Unique index:

 
SQL> drop table radiohead;
 
Table dropped.
 
SQL> create table radiohead (id number constraint radiohead_pk_i primary key using index (create index radiohead_pk_i on radiohead(id)), name varchar2(20));
 
Table created.
 
SQL> insert into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 10;
 
10 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> select index_name, uniqueness, table_name from dba_indexes where index_name='RADIOHEAD_PK_I';
 
INDEX_NAME                     UNIQUENES TABLE_NAME
------------------------------ --------- ------------------------------
RADIOHEAD_PK_I                 NONUNIQUE RADIOHEAD
 

SQL> insert /*+ ignore_row_on_dupkey_index(radiohead,radiohead_pk_i) */ into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 12;
insert /*+ ignore_row_on_dupkey_index(radiohead,radiohead_pk_i) */ into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 12
                                                                        *
ERROR at line 1:
ORA-38913: Index specified in the index hint is invalid
 

SQL> insert /*+ ignore_row_on_dupkey_index(radiohead(id)) */ into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 13;
insert /*+ ignore_row_on_dupkey_index(radiohead(id)) */ into radiohead select rownum, 'OK COMPUTER' from dual connect by level <= 13
                                                             *
ERROR at line 1:
ORA-38913: Index specified in the index hint is invalid

 
 

Note again an error is invoked here as well as this hint can only be applied via a constraint policed via a Unique index. A Non-Unique index, even with a Unique or PK constraint in place is not sufficient and will generate the above error.
 
Yet another reason to use Unique indexes to police constraints whenever possible and practical …

Oracle11g: New Locking Modes When Policing FK Constraints (A Wolf at the Door) November 10, 2010

Posted by Richard Foote in 11g, Foreign Keys, Locking Issues, Oracle Indexes.
13 comments

As I’ve been focusing mainly with Oracle 11g at work these days, thought I might look at a number of Oracle 11g related topics in the coming weeks.
 
To start with, there’s been a subtle but potentially significant change introduced in Oracle 11g (since 11.1.0.6) with regard to the manner in which locks are held in relation to policing Foreign Key constraints. The following has been tested on both 11.2.0.1 and 11.2.0.2.
 
To set the scene and replicate the issue we hit at work, I’m just going to create a little table (ALBUMS) that has 2 FK constraints pointing to two parent tables (ARTISTS and FORMATS) and populate them with a few rows.
 

 
SQL> CREATE TABLE artists (id NUMBER PRIMARY KEY, artist_name VARCHAR2(30));
 
Table created.
 
SQL> CREATE TABLE formats (id NUMBER PRIMARY KEY, format_name varchar2(30));
 
Table created.
 
SQL> CREATE TABLE albums (id NUMBER, album_name VARCHAR2(30), artist_id NUMBER CONSTRAINT artist_fk REFERENCES artists(id), format_id number
 
CONSTRAINT format_fk REFERENCES formats(id));
 
Table created.
 
SQL> INSERT INTO artists VALUES (1, 'DAVID BOWIE');
 
1 row created.
 
SQL> INSERT INTO artists VALUES (2, 'PINK FLOYD');
 
1 row created.
 
SQL> INSERT INTO formats VALUES (1, 'CD');
 
1 row created.
 
SQL> INSERT INTO formats VALUES (2, 'DVD');
 
1 row created.
 
SQL> INSERT INTO albums VALUES (1, 'LOW', 1, 1);
 
1 row created.
 
SQL> INSERT INTO albums VALUES (2, 'DIAMOND DOGS', 1, 1);
 
1 row created.
 
SQL> COMMIT;
 
Commit complete.

    

OK, when running the following insert statement on the ARTISTS table in 10.2.0.3:
 

 
SQL> insert into artists values (3, 'MUSE');
 
1 row created.

 

A check in the v$lock view will show the transaction holds a TM (DML Enqueue) lock in row-S (SS) mode 2 on the child ALBUMS table due to the FK relationship between these tables.

If another session were to either say delete a row or update the PK from the other parent FORMATS table:

 
SQL> update formats set id = 2 where id = 2;
 
1 row updated.

 
 
It will succeed with no problem for when it temporarily requires a TM share (S) mode 4 lock on the ALBUMS table, it can successfully grab it as the concurrent SS lock does not prevent this from occurring. It requires access to this mode 4 Share lock to ensure there are no transactions currently impacting the ALBUMS table that could potentially violate the constraint following the DML operations on the parent FORMATS table.

However, repeating the same exercise in Oracle 11g and we hit a subtle difference. When running the insert statement again in the ARTISTS table:

 
SQL> insert into artists values (3, 'MUSE');
 
1 row created.

 

A check in the v$lock view will now show the transaction holds a TM (DML Enqueue) lock in row-X (SX) LMODE 3 on the child ALBUMS table, not a LMODE 2 SS level lock as it did in 10g. This is a “higher” level lock mode which has the following consequence on the other session now attempting to either delete or update the PK in the FORMATS table:

 SQL> update formats set id = 2 where id = 2;

 

The session now hangs as it has to wait for the other session to release the DML Enqueue LMODE 3 SX lock before it can in turn grab the required TM mode 4 Share table lock it’s requesting. This is precisely the issue we hit with a somewhat poorly written application trying to perform something akin to the above series of updates from within two different sessions.

This change was introduced by Oracle to eliminate an ORA-600 issue that could occur when deleting a row from a table with a PK while rebuilding an associated FK index that referenced the PK.

However, introducing a more restrictive level of lock in this manner has the side-effect of increasing the likelihood of encountering new locking issues such as this, increasing the likelihood of hitting deadlock scenarios (as discussed here previously by Charles Hooper) and can therefore potentially reduce the overall concurrency capabilities of an application. 
 

The “fix” in this case is to simply create an index on the formats_id FK column (which probably should exist anyways in this case to prevent locking issues on the child table when updating the parent FORMAT table):

  
SQL> CREATE INDEX albums_format_i on albums(format_id);
 
Index created.
 
SQL> insert into artists values (3, 'MUSE');
 
1 row created.

 

In which case the table share lock is no longer required on the ALBUMS table (as Oracle can now use the associated index to effectively police the integrity of the child table following such an operation on a parent table) and the statement no longer hangs in the other session:

 
SQL> update formats set id = 2 where id = 2;
 
1 row updated.

 

This change in the locking behaviour of policing FK constraints is certainly something to be aware of when migrating to Oracle 11g if you potentially have FK constraints that don’t have associated indexes.

Index Block Dumps: Final Demo (Come Together) November 4, 2010

Posted by Richard Foote in Block Dumps, Leaf Blocks, Oracle Indexes.
1 comment so far

The intent of this blog piece is just to bring together the whole discussion of block dumps and how we can use block dumps to demonstrate Oracle behaviour.

First, let’s start with a fresh little demo, creating an index on a NAME column with 500 entries (note this specific demo uses an 11.2.0.1 database running on windows). The column all have a value of ‘BOWIE’ with a distinct number concatenated on the end.

 

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> create index bowie_name_i on bowie(name);

Index created.

SQL> insert into bowie select rownum, 'BOWIE' || rownum from dual connect by level <= 500;

500 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=> true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We notice this index is a blevel 1 index, consisting of a root block pointing down to just 2 leaf blocks:


SQL> select blevel, leaf_blocks from dba_indexes where index_name = 'BOWIE_NAME_I';

    BLEVEL LEAF_BLOCKS
---------- -----------
         1           2

 

I’m just going to show selected portions from the different block dumps, focusing on the dump from disk section (hence flush the buffer cache before each block dump):

SQL> alter system flush buffer_cache;

System altered.

SQL> select header_file, header_block from dba_segments where segment_name='BOWIE_NAME_I';

HEADER_FILE HEADER_BLOCK
----------- ------------
          6          168

  

The specific block of interest will be the second (or last) index leaf block, so I just add 3 to the header block value (note index is in a non ASSM LMT):

SQL> alter system dump datafile 6 block 171;

System altered.

 
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0×0000.003bb7e9 seq: 0×01 flg: 0×04 tail: 0xb7e90601
frmt: 0×02 chkval: 0x285e type: 0×06=trans data
Hex dump of block: st=0, typ_found=1
Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0×00.3bb7e9  itc: 2  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0×0 ver: 0×01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0×0000.003bb7e1
0×02   0×0003.017.00000d3e  0x00c049a3.021a.03  C—    0  scn 0×0000.003bb7e3
Leaf block dump
===============
header address 211493468=0xc9b225c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 300
kdxcofbo 636=0x27c
kdxcofeo 2722=0xaa2
kdxcoavs 2086
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8036
row#0[4414] flag: ——, lock: 0, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4431] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4449] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 31
col 1; len 6; (6):  01 80 00 a1 01 18

We currently have 2 ITL entries in the index leaf block, the first entry used by Oracle to deal with the leaf block split required when loading the data, the second entry for the actual transaction loading the table/index. The kdxcronro count is 300 meaning we currently have 300 index entries in this block. Note the kdxlenxt value is 0 meaning there is no next pointer, ensuring we are indeed looking at the second (or last) index leaf block within the index structure. We’re now going to add a couple of new index entries that will have greater values than all our BOWIEs guaranteeing they’ll be inserted into this leaf block. We’re going to do this by running a couple of separate concurrent transactions running in different sessions:

In one session:

SQL> insert into bowie values (501, 'MAJOR TOM');

1 row created.

In another session:

SQL> insert into bowie values (502, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

 
Back in the first session:

SQL> commit;

Commit complete.

 
So there were 2 concurrent transactions inserting index entries, with the transaction inserting the value “MAJOR TOM” committing last. Looking at a dump of the index block now:

     
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0×0000.003bb95e seq: 0×01 flg: 0×06 tail: 0xb95e0601
frmt: 0×02 chkval: 0x1f40 type: 0×06=trans data
Hex dump of block: st=0, typ_found=1

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0×0000.003bb7e1
0×02   0×0004.016.00000d65  0x00c00e88.029f.03  –U-    1  fsc 0×0000.003bb95e
0×03   0×0007.00a.00000d5c  0x00c02578.0261.02  –U-    1  fsc 0×0000.003bb95a
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 302
kdxcofbo 640=0×280
kdxcofeo 2655=0xa5f
kdxcoavs 2015
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[4390] flag: ——, lock: 0, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4407] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4425] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 31


row#300[2679] flag: ——, lock: 2, len=19
col 0; len 9; (9):  4d 41 4a 4f 52 20 54 4f 4d
col 1; len 6; (6):  01 80 00 a2 00 55
row#301[2655] flag: ——, lock: 3, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56
—– end of leaf block dump —–

     
We notice we now have an additional ITL entry. The first entry is reserved for Oracle service operations (such as block splits). The second entry was therefore grabbed by the first transaction (which inserted “MAJOR TOM”) while a new third ITL entry had to be added to accommodate the second concurrent transaction. At the bottom of the block we can see the 2 new index entries, one currently marked as locked by the transaction in ITL 2 and the other entry containing “ZIGGY STARDUST” locked by the second transaction in ITL 3. These lock bytes (which are no longer required as the transactions have now completed) will be subsequently cleaned out as we shall see …

As the transaction in ITL 2 was the last to commit, its corresponding Scn/fsc (0×0000.003bb95e) is the last transaction to have changed the block and hence is also stored in the block header (scn: 0×0000.003bb95e).

Let’s now add another index entry:

SQL> insert into bowie values (503, 'THIN WHITE DUKE');

1 row created.

SQL> commit;

Commit complete.

Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0×0000.003c327c seq: 0×02 flg: 0×06 tail: 0x327c0602
frmt: 0×02 chkval: 0xb367 type: 0×06=trans data
Hex dump of block: st=0, typ_found=1

Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0×00.3c327b  itc: 3  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0×0 ver: 0×01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0×0000.003bb7e1
0×02   0×0004.016.00000d65  0x00c00e88.029f.03  C—    0  scn 0×0000.003bb95e
0×03   0×0003.010.00000d6c  0x00c015a9.0221.08  –U-    1  fsc 0×0000.003c327c
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 303
kdxcofbo 642=0×282
kdxcofeo 2630=0xa46
kdxcoavs 1988
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[4390] flag: ——, lock: 0, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4407] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4425] flag: ——, lock: 0, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 31
col 1; len 6; (6):  01 80 00 a1 01 18


row#300[2679] flag: ——, lock: 0, len=19
col 0; len 9; (9):  4d 41 4a 4f 52 20 54 4f 4d
col 1; len 6; (6):  01 80 00 a2 00 55
row#301[2630] flag: ——, lock: 3, len=25
col 0; len 15; (15):  54 48 49 4e 20 57 48 49 54 45 20 44 55 4b 45
col 1; len 6; (6):  01 80 00 a2 00 57
row#302[2655] flag: ——, lock: 0, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56
—– end of leaf block dump —–

We notice the previous lock information has now been cleaned out with only this last transaction (reusing the ITL entry of the previously oldest transaction, ITL 3) now having a lock byte set for its corresponding row (“THIN WHITE DUKE”). This transaction’s scn/fsc (0×0000.003c327c) is now the scn marking the block header.

Let’s delete a few rows now, firstly the row containing “MAJOR TOM”:

SQL> delete bowie where name = 'MAJOR TOM';

1 row deleted.

SQL> commit;

Commit complete.

 

And now all the rows that start with BOWIE as a separate transaction:

SQL> delete bowie where name like 'BOWIE%';

500 rows deleted.

SQL> commit;

Commit complete.

 
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0×0000.003c3e8a seq: 0×01 flg: 0×06 tail: 0x3e8a0601
frmt: 0×02 chkval: 0x139e type: 0×06=trans data

Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0×00.3c3e85  itc: 3  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0×0 ver: 0×01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0×0000.003bb7e1
0×02   0×0005.01a.00000d73  0x00c011bf.0268.05  C-U-    0  scn 0×0000.003c3b42
0×03   0×0004.01f.00000d72  0x00c01f0a.02a1.25  –U-  300  fsc 0x171a.003c3e8a
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 302
kdxcofbo 640=0×280
kdxcofeo 2630=0xa46
kdxcoavs 2009
kdxlespl 0
kdxlende 300
kdxlenxt 0=0×0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[4390] flag: —D–, lock: 3, len=17
col 0; len 7; (7):  42 4f 57 49 45 32 38
col 1; len 6; (6):  01 80 00 a1 00 1b
row#1[4407] flag: —D–, lock: 3, len=18
col 0; len 8; (8):  42 4f 57 49 45 32 38 30
col 1; len 6; (6):  01 80 00 a1 01 17
row#2[4425] flag: —D–, lock: 3, len=18


row#299[7995] flag: —D–, lock: 3, len=17
col 0; len 7; (7):  42 4f 57 49 45 39 39
col 1; len 6; (6):  01 80 00 a1 00 62
row#300[2630] flag: ——, lock: 0, len=25
col 0; len 15; (15):  54 48 49 4e 20 57 48 49 54 45 20 44 55 4b 45
col 1; len 6; (6):  01 80 00 a2 00 57
row#301[2655] flag: ——, lock: 0, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56

  
The first transaction used the now oldest ITL slot 2. The second transaction then went on to use ITL slot 3, cleaning out the lock information of the first transaction in ITL 2. It deleted all 300 index entries within the block starting with BOWIE, marking them all as deleted with the D flag in all the index entries and with a 3 lock byte set. Note however the index entry for MAJOR TOM as deleted in the first transaction has already been physically removed from the leaf block …

Again, the transaction in ITL 3 being the last transaction now has its scn/fsc (0x171a.003c3e8a) in the block header (scn: 0×0000.003c3e8a).

Let’s add a couple new rows with 2 transactions to cycle through both ITL entries …

SQL> insert into bowie values (504, 'DAVID JONES');

1 row created.

SQL> commit;

Commit complete.

SQL> insert into bowie values (505, 'SCREAMING LORD BYRON');

1 row created.

SQL> commit;

Commit complete.

 
Block dump from disk:
buffer tsn: 6 rdba: 0x018000ab (6/171)
scn: 0×0000.003c42b0 seq: 0×02 flg: 0×06 tail: 0x42b00602
frmt: 0×02 chkval: 0×0191 type: 0×06=trans data

Block header dump:  0x018000ab
 Object id on Block? Y
 seg/obj: 0x12ad3  csc: 0×00.3c42ae  itc: 3  flg: O  typ: 2 – INDEX
     fsl: 0  fnx: 0×0 ver: 0×01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0005.013.00000d48  0x00c01082.0263.02  CB–    0  scn 0×0000.003bb7e1
0×02   0×0009.001.00000d80  0x00c044f5.029a.03  C—    0  scn 0×0000.003c418d
0×03   0×0001.013.00000e05  0x00c0423b.0267.02  –U-    1  fsc 0×0000.003c42b0
Leaf block dump
===============
header address 211493492=0xc9b2274
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 4
kdxcofbo 44=0x2c
kdxcofeo 2579=0xa13
kdxcoavs 7868
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 25165994=0x18000aa
kdxledsz 0
kdxlebksz 8012
row#0[2609] flag: ——, lock: 0, len=21
col 0; len 11; (11):  44 41 56 49 44 20 4a 4f 4e 45 53
col 1; len 6; (6):  01 80 00 a2 00 55
row#1[2579] flag: ——, lock: 3, len=30
col 0; len 20; (20):  53 43 52 45 41 4d 49 4e 47 20 4c 4f 52 44 20 42 59 52 4f 4e
col 1; len 6; (6):  01 80 00 a2 00 58
row#2[2630] flag: ——, lock: 0, len=25
col 0; len 15; (15):  54 48 49 4e 20 57 48 49 54 45 20 44 55 4b 45
col 1; len 6; (6):  01 80 00 a2 00 57
row#3[2655] flag: ——, lock: 0, len=24
col 0; len 14; (14):  5a 49 47 47 59 20 53 54 41 52 44 55 53 54
col 1; len 6; (6):  01 80 00 a2 00 56
—– end of leaf block dump —–

      
We now notice all the 300 BOWIE entries have now been physically cleaned out of the block as well, cleaned out as part of the block changes required for these final transactions. The leaf block now only contains these 4 index entries, as shown with a kdxconro 4.  The last transaction (inserting “SCREAMING LORD BYRON”) using ITL 3 is the only transaction with its lock byte still set and has its scn/fsc (0×0000.003c42b0) in the block header (scn: 0×0000.003c42b0).

So each concurrent transaction within the index block requires an ITL entry (and Oracle will add them as necessary providing there’s sufficient free space within the block). A transaction will not only make its necessary changes, locking just those index entries associated with the transaction but will also clean out data from previous transactions if present (including index entries marked as deleted by a previous transaction). Finally, it will generally stamp the block header with the corresponding transaction scn.

Hopefully, this highlights how block dumps can be useful to both see and demonstrated Oracle behaviour.

Next, time to look at a number of 11g index related new features …

Index Block Dump: Index Only Section Part II (Station To Station) October 7, 2010

Posted by Richard Foote in Oracle Indexes.
add a comment

Finally, we look at the last portion of the index block dump which refers to the actual 3 index entries in our demo that currently reside within the index leaf block we dumped previously.

row#0[8021] flag: ——, lock: 0, len=15
col 0; len 5; (5):  42 4f 57 49 45
col 1; len 6; (6):  02 01 48 8a 00 00
row#1[8002] flag: ——, lock: 0, len=19
col 0; len 9; (9):  4d 41 4a 4f 52 20 54 4f 4d
col 1; len 6; (6):  02 01 48 8a 00 02
row#2[7987] flag: ——, lock: 0, len=15
col 0; len 5; (5):  5a 49 47 47 59
col 1; len 6; (6):  02 01 48 8a 00 01
—– end of leaf block dump —–
End dump data blocks tsn: 8 file#: 8 minblk 84234 maxblk 84234

Note this is a Non-Unique index and so the index entries have a specific format that enables each index entry to still be defined in a unique manner. This is effectively achieved by including the rowid as an additional column within each index entry. This is necessary because Oracle still needs to have an efficient mechanism by which it can find any specific index entry as required. Including the rowid as the last column within an index entry ensures all duplicate index entries are subsequently ordered based on the rowid. Oracle can therefore use the index structure to navigate to the specific index entry of interest if for example an index entry needs to be deleted due to the corresponding row in the table being likewise deleted.

Using the first entry as an example, an index entry basically consists of:

row#0 – A unique row number (starting at 0)
[8021] – Location of the index entry starting at this offset within the block (noting that index blocks are filled “bottom-up”)
flag: ——, lock: 0 - two bytes for flags and locking information (to be discussed more fully later)
len=15 – overall length of the index entry
col 0: – A unique column number for the index entry (again starting at 0)
len 5; (5): – Length of the indexed column
42 4f 57 49 45 – Value of the indexed column (note hex representation of its ASCII value). In this specific case, the value ‘BOWIE’.

 
Just briefly, if we look at a dump of these records:

SQL> select name, dump(name), rawtohex(name) from bowie;
NAME       DUMP(NAME)                               RAWTOHEX(NAME)
---------- ---------------------------------------- --------------------
BOWIE      Typ=1 Len=5: 66,79,87,73,69              424F574945
ZIGGY      Typ=1 Len=5: 90,73,71,71,89              5A49474759
MAJOR TOM  Typ=1 Len=9: 77,65,74,79,82,32,84,79,77  4D414A4F5220544F4D
 

 

We can see that “BOWIE” is represented by database character values 66,79,87,73,69. If we convert these to Hex, we get 42 4f 57 49 45 as shown in the block dump.

 
This indexed column data is then repeated for each column within the index entry. The second (and in this demo last) column in this example:

col 1; len 6; (6):  02 01 48 8a 00 00

actually represents the 6 bytes for the rowid. As discussed previously, all index entries are effectively made unique by Oracle by including the rowid as an additional column for non-unique indexes.

The index entry portion is then all repeated for each index entry within the index leaf block.

If we to look at the index entries of a unique index as seen in this simple example:

SQL> create table thin_white_duke as select rownum id, 'BOWIE' name from dual connect by level <= 1000;
Table created.
SQL> create unique index thin_white_duke_i on thin_white_duke(id);
Index created.

 
row#0[8024] flag: ——, lock: 0, len=12, data:(6):  02 01 52 8b 00 1f
col 0; len 3; (3):  c2 06 16
row#1[8012] flag: ——, lock: 0, len=12, data:(6):  02 01 52 8b 00 20
col 0; len 3; (3):  c2 06 17
row#2[8000] flag: ——, lock: 0, len=12, data:(6):  02 01 52 8b 00 21
col 0; len 3; (3):  c2 06 18

 
We notice a subtle difference in the format of the index entries.

Rather than be stored as an additional column, the rowid is simply an overhead attribute of the index entry. The unique index value itself is sufficient for Oracle to navigate the index structure to efficiently locate any specific index entry of interest and as such, there’s no requirement to specifically include the rowid as an additional index column. This also saves Oracle having to store a length byte associated with the rowid column definition.

Next, we’ll put all this together and see how the distinct sections within an Oracle block are impacted during standard DML operations, in order to implement both Oracle row level locking and read consistency.

Follow

Get every new post delivered to your Inbox.

Join 1,250 other followers