jump to navigation

Concatenated Bitmap Indexes Part II (Everybody’s Got To Learn Sometime) May 12, 2010

Posted by Richard Foote in Bitmap Indexes, CBO, Concatenated Indexes, Oracle Cost Based Optimizer, Oracle Indexes.
2 comments

A basic little post to conclude this discussion.

The issues regarding whether to go for single column indexes vs. concatenated indexes are similar for Bitmap indexes as they are for B-Tree indexes.
 
It’s generally more efficient to access a concatenated index as it’s only the one index with less processing and less throwaway rowids/rows to contend with.  However it’s more flexible to have single column indexes, especially for Bitmap indexes that are kinda designed to be used concurrently, as concatenated indexes are heavily dependant on the leading column being known in queries.
 
If we look at the second table from Part I which had the concatenated index being significantly larger than the sum of the single column indexes, we notice that it can still have a part to play with the CBO. When we run a query that references both columns in predicates:

SQL> select * from bowie2 where id = 42 and code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4165488265
 
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100 |  1200 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     |   100 |  1200 |    21   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BOWIE2_3_I |       |       |            |          |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("ID"=42 AND "CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        103  consistent gets
         26  physical reads
          0  redo size
       3030  bytes sent via SQL*Net to client
        482  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
    

 
The CBO favours the concatenated index with the total number of consistent gets at 103. This despite the fact the concatenated index has some 10,000 distinct entries and is somewhat larger than the sum of the single column indexes. If we now drop the concatenated index and re-run the same query:
 
  
 

SQL> drop index bowie2_3_i;
 
Index dropped.
 
SQL> select * from bowie2 where id = 42 and code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2338088592
 
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100 |  1200 |    22   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     |   100 |  1200 |    22   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|   3 |    BITMAP AND                |            |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| BOWIE2_1_I |       |       |            |          |
|*  5 |     BITMAP INDEX SINGLE VALUE| BOWIE2_2_I |       |       |            |          |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("ID"=42)
   5 - access("CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        105  consistent gets
          0  physical reads
          0  redo size
       3030  bytes sent via SQL*Net to client
        482  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
 

 

The CBO can use a BITMAP AND operation by accessing and ANDing the associated bitmap columns from both single column indexes. However this is little less efficient than using the single concatenated index (105 vs 103 consistent gets) even though the concatenated index is somewhat larger than the other 2 indexes combined as Oracle needs to access and process two Bitmap index segments, not one. However as is very common, note in both examples, most of the consistent gets are in relation to getting the 100 rows out of the table, not so much with regard to the indexes themselves.
 
However, it we just reference the CODE column in a predicate:

SQL> select * from bowie2 where code = 42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2522233487

-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            | 10000 |   117K|   489   (1)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     | 10000 |   117K|   489   (1)| 00:00:03 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BOWIE2_2_I |       |       |            |          |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2861  consistent gets
          0  physical reads
          0  redo size
     257130  bytes sent via SQL*Net to client
       7742  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed
 

   

Providing it’s cheaper than other alternatives, the single column bitmap index can be considered and used by the CBO. However, if we only had the previous concatenated index:

SQL> drop index bowie2_1_i;

Index dropped.

SQL> drop index bowie2_2_i;

Index dropped.

SQL> create bitmap index bowie2_3_i on bowie2(id,code) pctfree 0;

Index created.

SQL> select * from bowie2 where code = 42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1495904576

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 10000 |   117K|   497   (6)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE2 | 10000 |   117K|   497   (6)| 00:00:03 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3011  consistent gets
       2343  physical reads
          0  redo size
     165134  bytes sent via SQL*Net to client
       7742  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

 

As the leading column is not specified, the concatenated Bitmap index is ineffective and the CBO decides to use a FTS. So it’s a similar scenario as with B-tree indexes.

A concatenated Bitmap index can potentially use less or more space than corresponding single column Bitmap indexes, it depends on the number of index entries that are derived and the distribution of the data with the table. However regardless, a concatenated Bitmap index can still be a viable alternative if at least the leading column is specified and be the more efficient option if all columns are generally specified, even if the overall size of the index is somewhat greater than the sum of the alternative single column Bitmap indexes. Then again, it’s less flexible and may not be considered if the leading column is not referenced.

If columns are generally all specified in SQL predicates, then combining them all in a single concatenated Bitmap index is a viable option. It all depends. Understanding why it depends is of course important in making the correct decision with regard which way to go with Bitmap indexes …

Concatenated Bitmap Indexes Part I (Two Of Us) May 6, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Clustering Factor, Concatenated Indexes, Oracle Indexes.
6 comments

Although Bitmap Indexes are commonly created on one column, you can create multi-column, concatenated Bitmap indexes as well.
 
Many of the same issues and factors in deciding to create a single, multi-column index vs. several, single column indexes apply to Bitmap indexes as they do with B-Tree indexes, although there are a number of key differences to consider as well.
 
The first difference to note is that a bitmap index doesn’t have as many index entries as there are rows in the table (with not null values), as with B-Tree indexes. A Bitmap index can potentially just have only as many index entries as there are distinct values for the indexed columns. This is one of the main reasons why Bitmap indexes can be considerably smaller than equivalent B-Tree indexes. A newly created Bitmap index only needs to have multiple index entries for the same column value if the associated index entry is greater than 1/2 a block size. If an index entry were to be larger than 1/2 a block size, Oracle creates another Bitmap index “piece” for the same indexed value, with a bitmap column covering a different range of rowids. (Note: additional Bitmap index pieces can be created based on subsequent DML, especially in earlier versions of Oracle. To be discussed at a later point in time).
 
Another thing to note regarding a concatenated Bitmap index is that the potential number of index entries is a product of distinct combinations of data of the indexed columns. For example, if two columns have 100 distinct values each, then as separate Bitmap indexes, they potentially may have as few as 100 index entries each. However, when combined as a concatenated Bitmap index, there may be a minimum of just 100 index entries only if there are just 100 different combinations of data between the columns (ie. there is a 1 to 1 relationship between column values). If there are however say 100 x 100 = 10,000 maximum combinations of data, there will be 10,000 index entries as a minimum in the associated concatenated Bitmap index.
 
A key point though is that if there are many more combinations of data when stored in a concatenated index, the occurrence of each distinct value will be far less within the table, meaning there will be many more “0” (not true) values in the corresponding bitmap column for each index entry and so can be compressed more effectively and likely use substantially less space than a corresponding index entry in a single column Bitmap index.
 
A simple little example to illustrate. To start, I’m going to create a 1M row table that has an ID and a CODE Colum, each with 100 distinct values. 

In this first example, there’s an implicit 1 to 1 relationship between these 2 columns (eg. they always have the same corresponding values) such that there’s also only 100 distinct combinations of ID and CODE. Additionally, the values are distributed evenly throughout the table so the effective clustering of the data in relation to the index is awful.
  

SQL> create table bowie as select mod(rownum,100)+1 id, mod(rownum,100)+1 code,'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_1_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_2_i on bowie(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_3_i on bowie(id,code) pctfree 0;
 
Index created.

  

If we look at the size and characteristics of these indexes we notice a couple of interesting points:  

 
SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE_1_I', 'BOWIE_2_I', 'BOWIE_3_I');

INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE_1_I          200        400
BOWIE_2_I          200        400
BOWIE_3_I          200        400

   
Firstly, even though there are only 100 distinct values for each indexed column or columns, there are actually 400 index entries in these indexes. This means there are on average 4 Bitmap index pieces for each distinct indexed value. Oracle can’t fit the associated index entry for a specific value within 1/2 a block (in this example, the block size is 8k). In fact, it takes approximately two 8K index leaf blocks it fit all the index data for a specific value and it therefore requires 4 Bitmap index pieces per indexed value.
 
If we look at a partial block dump of a leaf block from say the BOWIE_1_I:
 

  
row#0[4089] flag: ------, lock: 0, len=3947
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b5 09 00 60
col 2; len 6; (6):  01 c2 b8 f3 00 3f
col 3; len 3926; (3926):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
... 
   
row#1[142] flag: ------, lock: 0, len=3946
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b8 f3 00 a0
col 2; len 6; (6):  02 03 62 5d 00 1f
col 3; len 3925; (3925):
 01 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7
 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c5 18 61 5d 61 c3 1b 5f 63 5f c1
 ...
 

 

We notice there are actually 2 index entries within the block. Each index entry is for the same indexed value (both col 0 have identical values) but because the associated bitmap column (col 3) is so large (3926 and 3925 byes respectively), we need 4 index entries on average to store the bitmap data for each specific indexed value in order for each piece to not exceed the 1/2 block size limit.
 
Remember, for 100 distinct values in a 1M row table, that’s approximately 10,000 occurrences for each distinct value that need to somehow be mapped within the index. Oracle needs 4 index entries per value to fit the necessary bitmap information with the index such that no single index entry exceeds the 1/2 block size limit.  

The next thing to note is that the size of the concatenated Bitmap index is actually about the same size of each of the individual single column Bitmap index (200 leaf blocks). Therefore, the overall storage required to store the two columns in one Bitmap is only half of that required to store the columns as separate Bitmap indexes.
 
If we look at a partial dump of the concatenated index:

    
row#0[4091] flag: ------, lock: 0, len=3945
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 b5 09 00 60
col 3; len 6; (6):  01 c2 b8 f2 00 57
col 4; len 3921; (3921):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
...
 

 

We notice the overall length of an index entry is practically identical to those of the single column index. The storage required to store the additional indexed column (col 1) within the index entry is just 3 byes in the above example, 2 bytes for the numeric index value and 1 byte for its length. Considering the length of the bitmap column (col 4) is in the order of 3920 bytes for each index entry, an additional 3 bytes here or there is trivial and so doesn’t impact the overall size of the Bitmap index.  

OK, let’s look at a slightly different example. The table is again 1M rows with the overall data being similar. However, I’m making a few subtle differences. Firstly, the data for the ID is actually perfectly clustered and is ordered in exactly the same manner as the index. Additionally, the distribution of data between the columns is such that there are now 100 x 100 = 10,000 distinct combinations of ID and CODE values.
 
  

 
SQL> create table bowie2 (id number, code number, name char(5));
 
Table created.
 
SQL> begin
  2  for i in 1..100 loop
  3    for x in 1..100 loop
  4      for y in 1..100 loop
  5        insert into bowie2 values (x, y, 'BOWIE');
  6      end loop;
  7    end loop;
  8  end loop;
  9  commit;
 10  end;
 11  /
PL/SQL procedure successfully completed.
 

SQL> create bitmap index bowie2_1_i on bowie2(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_2_i on bowie2(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_3_i on bowie2(id,code) pctfree 0;
 
Index created.

        

If we look at the size and characteristics of the these Bitmap indexes:
     

SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE2_1_I', 'BOWIE2_2_I', 'BOWIE2_3_I');
 
INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE2_1_I          25        100
BOWIE2_2_I         200        400
BOWIE2_3_I         417      10000

    
   
We see a number of key differences when compared to the Bitmap indexes in the first example. Firstly, the Bitmap index for the ID column is tiny, just 25 leaf blocks compared to the 200 leaf blocks required previously. Additionally, there are only 100 index entries, rather than the 400 previous index entries. This is due to the fact the data is perfectly clustered within the table and as such, all the “1”s (true) are all grouped together and all the “0”s (false) are likewise grouped together and can be compressed very efficiently. The overall size of the bitmap has now reduced such that it can all fit comfortably within 1/2 a block and so just the 1 index entry is necessary for each indexed value.
 
If we look at a partial block dump of the ID Bitmap index:

  
   

 
row#0[6218] flag: ------, lock: 0, len=1818
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 c6 16 00 d8
col 2; len 6; (6):  02 03 78 18 01 3f
col 3; len 1797; (1797):
 cf f0 ff ff ff ff ff ff ff cc ff ff ff ff ff fc de 10 80 ff ff ff 07 ff 21
 ff ff ff ff ff ff ff ff c8 ff ff de 10 80 ff ff ff ff ff ff ff cd ff ff ff
 ff ff 07 ff de 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 07 f8 21 07 ff de
 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 3f ff fd 19 c0 ff ff ff ff ff ff
 ff cd ff ff ff ff ff 03 ff de 10 fe ff ff ff ff ff ff ff cc ff ff ff ff 1f
...

 
  

We see that the bitmap column (col 3) is now only 1797 bytes and the overall index entry is 1818 bytes, which comfortably fits within 1/2 an 8K block.
  

The Bitmap index for the CODE remains unchanged because its values are still poorly clustered and distributed throughout the entire table.
 
The concatenated Bitmap index is now significantly larger than is was previously (417 leaf blocks, up from 200 leaf blocks), primarily because we now have 10,000 distinct values to deal with rather than just 100. However, as the occurrences of each of these 10,000 distinct values is so much rarer (only 100 occurrences per distinct combination of values), the associated bitmap column is going to be relatively small and well compressed for each Bitmap index entry.
 
If we look at a complete index entry from a partial block dump of the concatenated Bitmap index:

  
  

 
row#0[7710] flag: ------, lock: 0, len=326
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 c6 16 00 d8
col 3; len 6; (6):  02 03 78 18 00 d7
col 4; len 302; (302):
 04 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c4 d8 10 c4 80 11
 c7 d8 10 c3 f8 19 c3 80 11 c6 d8 10 c1 d9 10 c1 80 11 c5 f7 19 c0 d9 10 c0
 80 11 c3 d8 10 c3 80 11 c2 d0 19 c2 80 11 c5 d8 10 c5 80 11 c0 d9 10 c4 f7
 19 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c1 80 11 c4 d8 10
 c7 d8 10 c0 87 09 c3 d8 10 c3 80 11 c6 d8 10 c6 80 11 c1 d9 10 c5 de 08 c5
 80 11 c0 d9 10 c0 80 11 c3 d8 10 c3 80 11 c0 a8 f4 ec bb 01 c3 d8 10 c6 d8
 10 c6 80 11 c1 d9 10 c1 80 11 c5 de e4 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80
 11 c6 d8 10 c7 86 09 c2 d9 10 c2 80 11 c5 d8 10 c0 d9 10 c0 80 11 c4 de 08
 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c5 d8 10 c6 86 09 c1 d9 10 c1 80 11 c4
 d8 10 c4 80 11 c7 d8 10 c0 87 09 c3 d8 10 c6 d8 10 c6 80 11 c1 d9 10 c1 80
 11 c5 de 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80 11 c6 d8 10 c7 86 09 c2 d9 10
 c2 80 11 c5 d8 10 c0 d9 10 c4 f7 19 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c6
 f7 19

 
  

We notice the index entries are relatively small at just 326 bytes even though we have 2 indexed columns (col 0 and col 1). The size of the bitmap column (col 4) is just 302 bytes, only a fraction of that of the single column Bitmap indexes. With so few 1s (true) to contend with, the resultant bitmap is far more efficiently compressed than with the single Bitmap column indexes as it has far more 0s (false) than can be effectively compressed.

A concatenated Bitmap index can potentially use less or more space than corresponding single column indexes, it depends on the number of index entries that are derived and the distribution of the data with the table.

I’ll post Part II in the next few days.

Index Monitoring and Index Statistics (The Great Gig In The Sky) September 16, 2008

Posted by Richard Foote in 11g, Concatenated Indexes, Extended Statistics, Index Monitoring, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
8 comments

I write this post whilst listening to Pink Floyd’s masterpiece “The Dark Side Of The Moon” while sadly lamenting the passing away of Richard Wright. RIP and thank you for all the great musical gifts you’ve given me over the years.

In my last post I highlighted an example of where Index Monitoring doesn’t show how indeed Oracle does indeed use an index when checking for the existence of Foreign Key values. Thought I might discuss yet another example of where Oracle does indeed use an index but it’s again not picked up by index monitoring, this time with a slight 11g flavour.

This specific example involves using the statistics associated with the indexes to provide the CBO with useful additional information, although the index itself is not used directly within the execution plan. Dropping the index means losing this information which could possibly result in a different, non optimal execution plans.

Prior to 11g, Oracle can have a hard time of accurately determining the correct selectively where there is a correlation between two (or more) columns in a table. By default, Oracle assumes the selectivity of two distinct columns to be the density of both columns multiplied together. So for example, if one column (say “A”) had 10 distinct values and the other column (say “B” also had 10 distinct values, Oracle assumes the selectivity of both columns combined to be 10 x 10 = 100 distinct values. A predicate such as:

WHERE A = 5 and B = 2

would assume 1% of data would be retrieved if both columns A and B both had 10 distinct possible values.

However, what if there’s a special relationship between the columns and the actual number of distinct combinations was somewhat different ? What if B always equals 2 when A equals 5, what if instead of the theoretical 100 different combinations there were only 10 combinations (or some such) because most of the other possible combination don’t actually exist …

This 9i and 10g demo shows how there is indeed only 10 distinct values for each of two different columns, however there is a direct relationship between these columns such that there is actually only 10 distinct combinations of both these columns (and not the 100 combinations which are possible and which Oracle assumes in it’s selectivity calculations).

Instead of the actual 10,000 rows (or 10% of all data) being selected, Oracle is incorrectly assuming only 1000 rows (or 1%) will be selected. This is a significant error by a order of magnitude which in many cases can result in a less efficient execution plan.

With 11g, Oracle can use the statistics associated with an index to give Oracle some vital extra information. Because if there’s an index based on the two columns, then the number of distinct key values recorded for the index can provide Oracle with a much more accurate estimation of the true selectivity based on the two columns. If a concatenated index based on the two columns only has say 10 distinct values, then Oracle can assume that a specific combination of the two columns is likely to also retrieve 1/10 of all the values and not the 1/100 that are theoretically possible.

This identical 11g demo to the one above shows by having an index on the two columns that have a correlation, Oracle is using the DISTINCT_KEYS statistic for the index to determine the correct selectivity and associated cardinality for the query.

However, the demo clearly shows index monitoring is still not showing the index as being “USED”. If you were to hence drop the index, the CBO loses potentially vital information and the cardinality estimates revert back to being the product of the two column densities as with pre 11g.

By dropping the index which appears to not be used, we can potentially impact other execution plans, even though they don’t directly use the index within the execution plan. The correct cardinality estimates of a table can for example potential drive the order in which the table is subsequently joined or the manner in which it’s joined.

This demo on the possible impact of dropping an “unused” index shows how an execution plan can change to be sub-optimal, even though neither execution actually directly uses the associated index. It’s not a particularly “clever” example, but it does illustrate the potential impact of dropping these so called unused indexes.

Of course, with 11g, we now have the capability of collecting extended statistics. We can potentially determine these same level of statistics by generating statistics on both columns combined. Oracle can determine the actual distinct combinations of columns that are somehow correlated and produce more accurate and detailed statistics with hence make the CBO determine more accurate and reliable cardinality estimates.

This final demo on extended statistics shows how we can recreate the more efficient execution plan and provide the CBO with more detailed extended statistics so that it can accurately determine the correct cardinality estimates without the need to recreate the “unused” index.

Extended statistics can be extremely useful in determining correct cardinality values for column combinations that exist however it still falls somewhat short when it attempts to estimate the expected cardinality for combinations that don’t actually exist, even with histograms.

But that’s a tale for another day.

Now it’s time for “Wish You Were Here” …

Clustering Factor: A Consideration in Concatenated Index Leading Column Decision (Sweet Thing) February 15, 2008

Posted by Richard Foote in Clustering Factor, Concatenated Indexes, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
14 comments

Short but sweet today.

I last discussed how high cardinality columns shouldn’t necessarily be in the leading column of a concatenated index as  they don’t provide the performance benefit as sometimes claimed.

If all things are equal and the columns in the concatenated index are all likely to be referenced, a simple consideration that is often forgotten when deciding which column to have as the leading index column is the Clustering Factor of the corresponding columns.

As previously discussed, the Clustering Factor  determines how well aligned or ordered the index entries are in relation to the rows in the parent table. So if the rows are ordered within the table on a particular column or columns (such as a sequential ID column, a monotonically increasing date or time-stamp, etc), then an index on these columns is likely to have a very good Clustering Factor. Consequently less IOs will be required to retrieve all the required rows via the index as all the required rows will be housed in relatively few, well clustered data blocks.

It therefore makes sense to at least consider the Clustering Factor of the various columns in a concatenated index. Why ? Because if the leading column has a very good Clustering Factor, the concatenated index by definition must also have a very good Clustering Factor as all indexes are primarily ordered based on the leading indexed column. A concatenated index with a good Clustering Factor is going to be more efficient in retrieving rows from the table and more importantly, will be considered more desirably by the CBO when costing access path options.

Of course, the opposite is also true. By having a leading column with a poor Clustering Factor will mean the concatenated index will have a poor Clustering Factor, making it less efficient and less likely to be considered by the CBO.

As such, the Clustering Factor of each corresponding column in a concatenated index is at least worthy of some consideration when making the decision on how best to order the indexed columns.

This demo on Index Column Order and Clustering Factor  shows how the order of columns in a concatenated index has a big impact on the Clustering Factor of the resultant index.

UPDATE: However as Tom Kyte has stated in the comments, in virtually all cases, the Clustering Factor is not really a factor (yes, pun fully intended) as subsequently columns are generally going to impact the CF anyways or the selectivity of the index is such that the improved CF is not relevant anyways.

More relevant considerations regarding the ordering of columns in an index coming I promise :)

It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ? February 13, 2008

Posted by Richard Foote in Concatenated Indexes, Index Internals, Oracle General, Oracle Indexes, Oracle Myths, Performance Tuning.
21 comments

A common myth or mis-perception is that when deciding how to order the columns in a concatenated, multi-column index, one should avoid placing low cardinality columns in front.

For example, if you want to create an index on two columns, column ID which has many many distinct values and column CODE which has very few distinct values, create the index as (ID, CODE) as it’ll be far more efficient than a corresponding index on (CODE, ID).

The reasoning goes that by creating the (CODE, ID) index, one decreases the performance and efficiency of using the index as Oracle will have to scan through multiple index leaf blocks containing the low cardinality column, until it eventually finds the specific index entries of interest.

Or so the theory goes …

In actual fact, there’s no real difference in navigating to the specific leaf block of interest for an index on (ID, CODE) compared to an index based on (CODE, ID), providing both indexed columns are known.

The important fact that’s missed is that the branch index entries contain column entries based on all indexed columns, or at least on as much as is necessary to uniquely identify the required navigational path. Therefore, Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know.

The only slight overhead that an index based on (CODE,ID) will have is that these branch index entries are going to be somewhat larger as it will likely require both columns for the branch index entries but likely only the one column the other way around. However, branch blocks usually take up a small percentage of the overall index structure and this (usually) minor overhead is very unlikely to make a difference to the index height.

This demo on Index Column Cardinality Order shows how Oracle navigates to a specific leaf block of interest in the same manner and with the same costs, regardless of the ordering of low and high cardinality columns in the index. It also shows and describes a couple of index branch block dumps to highlight how Oracle uses the column values to define the necessary navigational path.

So the high cardinality column shouldn’t necessarily be the column given leading column status.

In actual fact there are a number of good reasons why the low cardinality column could be considered as the better option as the leading column. For a start, the index can be compressed much more efficiently if the leading column has lower cardinality. Also, an Index Skip Scan can at least be considered if the leading column has lower cardinality.

Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration, as is the clustering factor of the leading column.

All good discussions for another day :)

Do ROWID Index Row Entry Columns Impact Index Block Splits ? December 20, 2007

Posted by Richard Foote in Concatenated Indexes, Index Block Splits, Index Internals, Oracle Indexes, Richard's Musings, ROWID.
11 comments

Based on a great question by Alberto Dell’Era  in my “Differences Between Unique/Non-Unique” blog entry (comment 9), I thought it might be a useful exercise to show how I go about confirming my understanding of a specific concept by trying to develop a little test case or demo that can illustrate the concept. My “magic incarnation” if you like ;)

The basic question was does the ROWID that constitutes an additional column in a Non-Unique index determine whether a particular row entry is the maximum or equivalent entry or not. Because by implication, this can determine and influence whether Oracle performs the generally preferred 90-10 splits rather than 50-50 block splits for indexed column values that at least equal the maximum value.

The answer is yes because the ROWID column is just another column in the index row entry and is simply treated the same. But how to actually “illustrate” and show this ?

I needed a way therefore to insert a ROWID that was always going to be the maximum ROWID value for a Non-Unique index. Then insert a whole bunch of subsequent ROWIDs of a lesser value than the maximum and inspect via index statistics whether the type of block splits changed from 90-10 to 50-50 block splits. Remember with the Object Number being equal (if it’s there at all), the next significant portion of the ROWID is the Relative File Number.

The plan was (reasonably) simple. Create a tablespace with one data file and fill it with something. Then add a second data file and use this to store the start of my table of interest (and of course create the index). This will create a whole bunch of rows with ROWIDs of a higher Relative File Number than those in the first data file. Then drop the first table and ensure the second table uses the free space created in the first data file. That way, a whole bunch of ROWIDs can be created that are less than existing ROWIDs because it would be using ROWIDs from the first data file, which has a lesser Relative File Number.

It’s the usual process I go through with these things. Find something that’s of interest, have some idea on how I think things work, come up with plans or strategies that will illustrate whether or not what I think is true (ensuring that somewhere in the process I include at least one reference to David Bowie ;). I can then later take the initial strategies and expand them for all applicable database options and features. Then see if anything changes between database versions and platforms.

Hopefully this demo shows you how I went about proving this: Do ROWID Index Row Entry Columns Impact Index Block Splits Demo.

The benefit of then showing these demos is that others can see exactly how I came to a conclusion, potentially try them out for oneself and perhaps see holes or flaws or shortfalls in the strategy or expand or tailor them for individual requirements or environments.

Follow

Get every new post delivered to your Inbox.

Join 1,894 other followers