jump to navigation

Index Advanced Compression: Multi-Column Index Part II (Blow Out) September 24, 2015

Posted by Richard Foote in Advanced Index Compression, Concatenated Indexes, Index Column Order, Index Rebuild, Oracle Indexes.
add a comment

I previously discussed how Index Advanced Compression can automatically determine not only the correct number of columns to compress, but also the correct number of columns to compress within specific leaf blocks of the index.

However, this doesn’t mean we can just order the columns within the index without due consideration from a “compression” perspective. As I’ve discussed previously, the column order within an index can be very important (especially with regard the use of the index if the leading column of the index is not specified in the SQL), including with regard to the possible compression capabilities of an index.

Advanced Index Compression does not change this and if we order columns inappropriately, one of the consequences can be the index simply can’t be compressed.

To illustrate, back to my simple little example:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit;

Commit complete.

But this time, I’m going to create the index with the columns the other way around than I had in the previous post. The effectively unique ID column is now the leading column in the index, followed by the CODE column that indeed has many duplicate values. There is a “myth” that suggests this is actually a more “efficient” way to order an index, put the column with most distinct values first in the index. This is of course not true (yes, I’ve covered this one before as well).

SQL> create index bowie_idx on bowie(id, code) pctfree 0;

Index created.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2363          2

So the index without compression has 2363 leaf blocks.

As the leading column is effectively unique, we simply can’t now compress this index effectively. That’s because compression requires there to be duplicate index entries starting with at least the leading column. If the leading column has few (or no) duplicates, then by compressing the index Oracle is effectively creating a prefixed entry within the leaf block for each and every index entry. The whole point of index (or table) compression is to effectively de-duplicate the index values but there’s nothing to de-duplicate if there are no repeating values in at least the leading column of the index.

If we attempt to just compress fully the index anyways:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3115          2

It actually results in a bigger, not smaller index. The leaf blocks has gone up from 2363 to 3115.

Unlike the previous post where the columns in the index were the other way around, if we attempt to just compress the first column, it makes no difference to the inefficiency of the index compression because the number of prefix entries we create remains exactly the same:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3115          2

So the index remains at the higher 3115 leaf blocks.

The good thing with Advanced Index Compression is that we can “give it a go”, but it will not result in a larger index structure. If there’s nothing to compress within a leaf block, Oracle just ignores it and moves on to the next leaf block. If there’s nothing to compress at all within the index, the index remains the same as if it’s not been compressed:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2363          2

So the index is now back to 2363 leaf blocks, the same as if it wasn’t compressed at all. No it hasn’t helped, but at least it hasn’t made things worse.

So the order of the columns still plays a vital part in the “compress-ability” of the index, even with Index Advanced Compression at your disposal. If both the ID and CODE columns are referenced in your code, then having CODE as the leading column of the index would both improve the manner in which the index can be compressed and make a Skip-Scan index scan viable in the case when the CODE column might not occasionally be specified.

Now, if we change the leading column and create some duplicates (in this case, we update about 10% of the rows to now have duplicate values in the leading ID column):

SQL> update bowie set id=42 where id between 442000 and 542000;

100001 rows updated.

SQL> commit;

Commit complete.

SQL> alter index bowie_idx rebuild nocompress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2338          2

With a whole bunch of IDs with a value of 42, the non-compressed index now has 2338 leaf blocks. Yes, 10% of the leading columns have duplicates, but 90% of the index doesn’t and remains effectively unique. So if we try and compress this index now:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2941          2

The compressed index now has 2941 leaf blocks and is still larger than the 2338 non-compressed index. Yes, it’s compressed the 10% of the index that it could, but the inefficiencies in dealing with the other 90% has resulted in an overall larger index. So not too good really.

Again, compressing just the leading ID column doesn’t improve matters:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2977          2

In fact, at 2977 it’s even worse than compressing all the index because by compressing both columns, we could also effectively compress the duplicate CODE columns as well within that 10% of the index where we had duplicate ID values. With compressing just the ID column, we don’t get the benefit of compressing the duplicate CODE values. So not very good either.

In either case, compressing the index is ineffective as we end up with a bigger, not smaller index.

But not with Index Advanced Compression:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2265          2

We now have a index structure at just 2265 leaf blocks that is indeed smaller than the non-compressed index (2338 leaf blocks) because Oracle can now compress just the 10% of index where compression is effective and just ignore the rest of the index (90%) where compression is ineffective.

The best of both worlds, where Index Advanced Compression can compress just the part of an index where it effectively can and ignore and not make matters worse in any parts of the index where index compression is ineffective.

An indexing no-brainer …

Index Advanced Compression: Multi-Column Index Part I (There There) September 17, 2015

Posted by Richard Foote in 12c, Advanced Index Compression, Concatenated Indexes, Index Rebuild, Oracle Indexes.
add a comment

I’ve discussed Index Advanced Compression here a number of times previously. It’s the really cool additional capability introduced to the Advanced Compression Option with 12.1.0.2, that not only makes compressing indexes a much easier exercise but also enables indexes to be compressed more effectively than previously possible.

Thought I might look at a multi-column index to highlight just how truly cool this new feature is in automatically managing the compression of indexes.

First, let’s create a little table and multi-column index:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_idx on bowie(code, id) pctfree 0;

Index created.

OK, the key thing to note here is that the leading CODE column in the index only has 10 distinct values and so is repeated very frequently. However, the second ID column is effectively unique such that the index entry overall is also likewise effectively unique. I’ve created this index initially with no compression, but with a PCTFREE 0 to make the non-compressed index as small as possible.

If we look at the size of the index:

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2361          2

We notice the index currently has 2361 leaf blocks.

I’ve previously discussed how index compression basically de-duplicates the indexed values by storing them in a pre-fixed table within the index leaf block. These pre-fixed entries are them referenced in the actual index entries, meaning it’s only now necessary to store repeated values once within a leaf block. Only repeated index values within an index leaf block can therefore be effectively compressed.

In this example, it would be pointless in compressing both indexed columns as this would only result in a unique pre-fixed entry for each any every index entry, given that the ID column is unique. In fact, the overhead of having the pre-fixed table for each and every index entry would actually result in a larger, not small overall index structure.

To show how compressing the whole index would be a really dumb idea for this particular index:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3120          2

The COMPRESS option basically compresses the whole index and we note that rather than creating a smaller, compressed index structure, the index is in fact bigger at 3120 leaf blocks.

However, as the leading CODE column in the index only has 10 distinct values and so is heavily repeated, it would make sense to just compress this first CODE column only in the index. This of course requires us to fully understand the data associated with the index.

We can do this by specifying just how many leading columns to compress (in this case just 1):

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2002          2

We note the index is indeed smaller than it was originally, now at just 2002 leaf blocks.

So this requires us to make the correct decision in how many columns in the index to compress. Getting this wrong can result in a worse, not better overall index structure.

Now with Advanced Index Compression, we don’t have to make this decision, we can simply let Oracle do it for us. As discussed previously, Oracle can go through each leaf block and decide how to best compress each leaf block. In this case, it can automatically determine that it’s only beneficial to compress the CODE column throughout the index.

If we compress this index with the new COMPRESS ADVANCED LOW clause:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2002          2

We note we get the index at the nice, small 2002 leaf blocks, as if we used the correct COMPRESS 1 decision.

However, the story gets a little better than this …

Let’s now modify the contents of the table so that we create some duplicates also for the second ID column:

SQL> update bowie set id=42 where id between 442000 and 542000;

100001 rows updated.

SQL> commit;

Commit complete.

OK, so for about 10% of rows, the ID column value is indeed repeated with the value 42. However, for the remaining 90% of rows (and hence index entries), the ID column remains effectively unique. So we have this 10% section of the index where ID is indeed heavily repeated with the value 42, but everywhere else within the index the ID remain unique.

If we rebuild this index again with no compression:

SQL> alter index bowie_idx rebuild nocompress pctfree 0;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2336          2

We now end up with 2336 leaf blocks (a little smaller than before the update as we’re replacing 10% of the IDs with a smaller value of just 42).

However, the vast majority (90%) of the index entries are still unique, so attempting to compress the entire index is again unlikely to be beneficial:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2946          2

Indeed, the index is again now bigger at 2946 than it was when it wasn’t compressed.

We can again effectively compress just the CODE column in the index:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         1977          2

OK, just compressing the CODE column has indeed resulted in a smaller index structure (just 1977 leaf blocks) as it did before.

Without Advanced Index Compression we have the option to not compress the index (the result is average), compress both columns (the result is worse) or compress just the leading column (the result is better). It’s an all or nothing approach to index compression with the best method decided at the overall index level.

We don’t have the option to compress just the leading column when it makes sense to do so, but to also compress both columns in just the 10% portion of the index where it also makes sense to do so (when we have lots of repeating 42 values for ID).

We do have this option though with Advanced Index Compression and indeed this is performed automatically by Oracle in just those leaf blocks where it’s beneficial because the decision on how to compress an index is not performed at the overall index level but at the leaf block level. As such, Advanced Index Compression has the potential to compress an index in a manner that was simply not possible previously:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         1941          2

We notice the index is now even smaller at just 1941 leaf blocks than it was when just compressing the leading column as we now also compress the CODE column in just that 10% of the table where we also had repeating ID values.

I can’t emphasise enough just how cool this feature is !!

In fact, I would recommend something I don’t usually recommend and that is rebuilding all your indexes at least once (where you know the leading column has some repeated values) with the Advanced Index Compression option, so that all indexes can be compressed to their optimal manner.

Note though that this does require the Advanced Compression Option !!

More later🙂

Concatenated Bitmap Indexes Part II (Everybody’s Got To Learn Sometime) May 12, 2010

Posted by Richard Foote in Bitmap Indexes, CBO, Concatenated Indexes, Oracle Cost Based Optimizer, Oracle Indexes.
2 comments

A basic little post to conclude this discussion.

The issues regarding whether to go for single column indexes vs. concatenated indexes are similar for Bitmap indexes as they are for B-Tree indexes.
 
It’s generally more efficient to access a concatenated index as it’s only the one index with less processing and less throwaway rowids/rows to contend with.  However it’s more flexible to have single column indexes, especially for Bitmap indexes that are kinda designed to be used concurrently, as concatenated indexes are heavily dependant on the leading column being known in queries.
 
If we look at the second table from Part I which had the concatenated index being significantly larger than the sum of the single column indexes, we notice that it can still have a part to play with the CBO. When we run a query that references both columns in predicates:

SQL> select * from bowie2 where id = 42 and code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4165488265
 
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100 |  1200 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     |   100 |  1200 |    21   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BOWIE2_3_I |       |       |            |          |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("ID"=42 AND "CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        103  consistent gets
         26  physical reads
          0  redo size
       3030  bytes sent via SQL*Net to client
        482  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
    

 
The CBO favours the concatenated index with the total number of consistent gets at 103. This despite the fact the concatenated index has some 10,000 distinct entries and is somewhat larger than the sum of the single column indexes. If we now drop the concatenated index and re-run the same query:
 
  
 

SQL> drop index bowie2_3_i;
 
Index dropped.
 
SQL> select * from bowie2 where id = 42 and code = 42;
 
100 rows selected.
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2338088592
 
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |   100 |  1200 |    22   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     |   100 |  1200 |    22   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|   3 |    BITMAP AND                |            |       |       |            |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| BOWIE2_1_I |       |       |            |          |
|*  5 |     BITMAP INDEX SINGLE VALUE| BOWIE2_2_I |       |       |            |          |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("ID"=42)
   5 - access("CODE"=42)
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        105  consistent gets
          0  physical reads
          0  redo size
       3030  bytes sent via SQL*Net to client
        482  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
 

 

The CBO can use a BITMAP AND operation by accessing and ANDing the associated bitmap columns from both single column indexes. However this is little less efficient than using the single concatenated index (105 vs 103 consistent gets) even though the concatenated index is somewhat larger than the other 2 indexes combined as Oracle needs to access and process two Bitmap index segments, not one. However as is very common, note in both examples, most of the consistent gets are in relation to getting the 100 rows out of the table, not so much with regard to the indexes themselves.
 
However, it we just reference the CODE column in a predicate:

SQL> select * from bowie2 where code = 42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2522233487

-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            | 10000 |   117K|   489   (1)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE2     | 10000 |   117K|   489   (1)| 00:00:03 |
|   2 |   BITMAP CONVERSION TO ROWIDS|            |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE | BOWIE2_2_I |       |       |            |          |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2861  consistent gets
          0  physical reads
          0  redo size
     257130  bytes sent via SQL*Net to client
       7742  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed
 

   

Providing it’s cheaper than other alternatives, the single column bitmap index can be considered and used by the CBO. However, if we only had the previous concatenated index:

SQL> drop index bowie2_1_i;

Index dropped.

SQL> drop index bowie2_2_i;

Index dropped.

SQL> create bitmap index bowie2_3_i on bowie2(id,code) pctfree 0;

Index created.

SQL> select * from bowie2 where code = 42;

10000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1495904576

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 10000 |   117K|   497   (6)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| BOWIE2 | 10000 |   117K|   497   (6)| 00:00:03 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3011  consistent gets
       2343  physical reads
          0  redo size
     165134  bytes sent via SQL*Net to client
       7742  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10000  rows processed

 

As the leading column is not specified, the concatenated Bitmap index is ineffective and the CBO decides to use a FTS. So it’s a similar scenario as with B-tree indexes.

A concatenated Bitmap index can potentially use less or more space than corresponding single column Bitmap indexes, it depends on the number of index entries that are derived and the distribution of the data with the table. However regardless, a concatenated Bitmap index can still be a viable alternative if at least the leading column is specified and be the more efficient option if all columns are generally specified, even if the overall size of the index is somewhat greater than the sum of the alternative single column Bitmap indexes. Then again, it’s less flexible and may not be considered if the leading column is not referenced.

If columns are generally all specified in SQL predicates, then combining them all in a single concatenated Bitmap index is a viable option. It all depends. Understanding why it depends is of course important in making the correct decision with regard which way to go with Bitmap indexes …

Concatenated Bitmap Indexes Part I (Two Of Us) May 6, 2010

Posted by Richard Foote in Bitmap Indexes, Block Dumps, Clustering Factor, Concatenated Indexes, Oracle Indexes.
6 comments

Although Bitmap Indexes are commonly created on one column, you can create multi-column, concatenated Bitmap indexes as well.
 
Many of the same issues and factors in deciding to create a single, multi-column index vs. several, single column indexes apply to Bitmap indexes as they do with B-Tree indexes, although there are a number of key differences to consider as well.
 
The first difference to note is that a bitmap index doesn’t have as many index entries as there are rows in the table (with not null values), as with B-Tree indexes. A Bitmap index can potentially just have only as many index entries as there are distinct values for the indexed columns. This is one of the main reasons why Bitmap indexes can be considerably smaller than equivalent B-Tree indexes. A newly created Bitmap index only needs to have multiple index entries for the same column value if the associated index entry is greater than 1/2 a block size. If an index entry were to be larger than 1/2 a block size, Oracle creates another Bitmap index “piece” for the same indexed value, with a bitmap column covering a different range of rowids. (Note: additional Bitmap index pieces can be created based on subsequent DML, especially in earlier versions of Oracle. To be discussed at a later point in time).
 
Another thing to note regarding a concatenated Bitmap index is that the potential number of index entries is a product of distinct combinations of data of the indexed columns. For example, if two columns have 100 distinct values each, then as separate Bitmap indexes, they potentially may have as few as 100 index entries each. However, when combined as a concatenated Bitmap index, there may be a minimum of just 100 index entries only if there are just 100 different combinations of data between the columns (ie. there is a 1 to 1 relationship between column values). If there are however say 100 x 100 = 10,000 maximum combinations of data, there will be 10,000 index entries as a minimum in the associated concatenated Bitmap index.
 
A key point though is that if there are many more combinations of data when stored in a concatenated index, the occurrence of each distinct value will be far less within the table, meaning there will be many more “0” (not true) values in the corresponding bitmap column for each index entry and so can be compressed more effectively and likely use substantially less space than a corresponding index entry in a single column Bitmap index.
 
A simple little example to illustrate. To start, I’m going to create a 1M row table that has an ID and a CODE Colum, each with 100 distinct values. 

In this first example, there’s an implicit 1 to 1 relationship between these 2 columns (eg. they always have the same corresponding values) such that there’s also only 100 distinct combinations of ID and CODE. Additionally, the values are distributed evenly throughout the table so the effective clustering of the data in relation to the index is awful.
  

SQL> create table bowie as select mod(rownum,100)+1 id, mod(rownum,100)+1 code,'BOWIE' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_1_i on bowie(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_2_i on bowie(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie_3_i on bowie(id,code) pctfree 0;
 
Index created.

  

If we look at the size and characteristics of these indexes we notice a couple of interesting points:  

 
SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE_1_I', 'BOWIE_2_I', 'BOWIE_3_I');

INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE_1_I          200        400
BOWIE_2_I          200        400
BOWIE_3_I          200        400

   
Firstly, even though there are only 100 distinct values for each indexed column or columns, there are actually 400 index entries in these indexes. This means there are on average 4 Bitmap index pieces for each distinct indexed value. Oracle can’t fit the associated index entry for a specific value within 1/2 a block (in this example, the block size is 8k). In fact, it takes approximately two 8K index leaf blocks it fit all the index data for a specific value and it therefore requires 4 Bitmap index pieces per indexed value.
 
If we look at a partial block dump of a leaf block from say the BOWIE_1_I:
 

  
row#0[4089] flag: ------, lock: 0, len=3947
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b5 09 00 60
col 2; len 6; (6):  01 c2 b8 f3 00 3f
col 3; len 3926; (3926):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
... 
   
row#1[142] flag: ------, lock: 0, len=3946
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 b8 f3 00 a0
col 2; len 6; (6):  02 03 62 5d 00 1f
col 3; len 3925; (3925):
 01 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7
 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c5 18 61 5d 61 c3 1b 5f 63 5f c1
 ...
 

 

We notice there are actually 2 index entries within the block. Each index entry is for the same indexed value (both col 0 have identical values) but because the associated bitmap column (col 3) is so large (3926 and 3925 byes respectively), we need 4 index entries on average to store the bitmap data for each specific indexed value in order for each piece to not exceed the 1/2 block size limit.
 
Remember, for 100 distinct values in a 1M row table, that’s approximately 10,000 occurrences for each distinct value that need to somehow be mapped within the index. Oracle needs 4 index entries per value to fit the necessary bitmap information with the index such that no single index entry exceeds the 1/2 block size limit.  

The next thing to note is that the size of the concatenated Bitmap index is actually about the same size of each of the individual single column Bitmap index (200 leaf blocks). Therefore, the overall storage required to store the two columns in one Bitmap is only half of that required to store the columns as separate Bitmap indexes.
 
If we look at a partial dump of the concatenated index:

    
row#0[4091] flag: ------, lock: 0, len=3945
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 b5 09 00 60
col 3; len 6; (6):  01 c2 b8 f2 00 57
col 4; len 3921; (3921):
 01 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61
 c3 1b 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f
 63 c5 1b 61 5d 61 c3 1b 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d
 61 5d c7 1b 63 5f 63 5f c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 c3 1b
 5f 63 5f 63 c5 1b 61 5d 61 c3 1b 5f 63 5f c1 1c 5d 61 5d 61 c3 1b 5f 63 5f
 c1 1c 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d 61 5d c7 1b 63 5f 63 c5 1b 61 5d
...
 

 

We notice the overall length of an index entry is practically identical to those of the single column index. The storage required to store the additional indexed column (col 1) within the index entry is just 3 byes in the above example, 2 bytes for the numeric index value and 1 byte for its length. Considering the length of the bitmap column (col 4) is in the order of 3920 bytes for each index entry, an additional 3 bytes here or there is trivial and so doesn’t impact the overall size of the Bitmap index.  

OK, let’s look at a slightly different example. The table is again 1M rows with the overall data being similar. However, I’m making a few subtle differences. Firstly, the data for the ID is actually perfectly clustered and is ordered in exactly the same manner as the index. Additionally, the distribution of data between the columns is such that there are now 100 x 100 = 10,000 distinct combinations of ID and CODE values.
 
  

 
SQL> create table bowie2 (id number, code number, name char(5));
 
Table created.
 
SQL> begin
  2  for i in 1..100 loop
  3    for x in 1..100 loop
  4      for y in 1..100 loop
  5        insert into bowie2 values (x, y, 'BOWIE');
  6      end loop;
  7    end loop;
  8  end loop;
  9  commit;
 10  end;
 11  /
PL/SQL procedure successfully completed.
 

SQL> create bitmap index bowie2_1_i on bowie2(id) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_2_i on bowie2(code) pctfree 0;
 
Index created.
 
SQL> create bitmap index bowie2_3_i on bowie2(id,code) pctfree 0;
 
Index created.

        

If we look at the size and characteristics of the these Bitmap indexes:
     

SQL> select index_name,leaf_blocks,num_rows from dba_indexes where index_name in ('BOWIE2_1_I', 'BOWIE2_2_I', 'BOWIE2_3_I');
 
INDEX_NAME LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
BOWIE2_1_I          25        100
BOWIE2_2_I         200        400
BOWIE2_3_I         417      10000

    
   
We see a number of key differences when compared to the Bitmap indexes in the first example. Firstly, the Bitmap index for the ID column is tiny, just 25 leaf blocks compared to the 200 leaf blocks required previously. Additionally, there are only 100 index entries, rather than the 400 previous index entries. This is due to the fact the data is perfectly clustered within the table and as such, all the “1”s (true) are all grouped together and all the “0”s (false) are likewise grouped together and can be compressed very efficiently. The overall size of the bitmap has now reduced such that it can all fit comfortably within 1/2 a block and so just the 1 index entry is necessary for each indexed value.
 
If we look at a partial block dump of the ID Bitmap index:

  
   

 
row#0[6218] flag: ------, lock: 0, len=1818
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c2 c6 16 00 d8
col 2; len 6; (6):  02 03 78 18 01 3f
col 3; len 1797; (1797):
 cf f0 ff ff ff ff ff ff ff cc ff ff ff ff ff fc de 10 80 ff ff ff 07 ff 21
 ff ff ff ff ff ff ff ff c8 ff ff de 10 80 ff ff ff ff ff ff ff cd ff ff ff
 ff ff 07 ff de 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 07 f8 21 07 ff de
 10 fc ff ff ff ff ff ff ff cc ff ff ff ff 3f ff fd 19 c0 ff ff ff ff ff ff
 ff cd ff ff ff ff ff 03 ff de 10 fe ff ff ff ff ff ff ff cc ff ff ff ff 1f
...

 
  

We see that the bitmap column (col 3) is now only 1797 bytes and the overall index entry is 1818 bytes, which comfortably fits within 1/2 an 8K block.
  

The Bitmap index for the CODE remains unchanged because its values are still poorly clustered and distributed throughout the entire table.
 
The concatenated Bitmap index is now significantly larger than is was previously (417 leaf blocks, up from 200 leaf blocks), primarily because we now have 10,000 distinct values to deal with rather than just 100. However, as the occurrences of each of these 10,000 distinct values is so much rarer (only 100 occurrences per distinct combination of values), the associated bitmap column is going to be relatively small and well compressed for each Bitmap index entry.
 
If we look at a complete index entry from a partial block dump of the concatenated Bitmap index:

  
  

 
row#0[7710] flag: ------, lock: 0, len=326
col 0; len 2; (2):  c1 02
col 1; len 2; (2):  c1 02
col 2; len 6; (6):  01 c2 c6 16 00 d8
col 3; len 6; (6):  02 03 78 18 00 d7
col 4; len 302; (302):
 04 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c4 d8 10 c4 80 11
 c7 d8 10 c3 f8 19 c3 80 11 c6 d8 10 c1 d9 10 c1 80 11 c5 f7 19 c0 d9 10 c0
 80 11 c3 d8 10 c3 80 11 c2 d0 19 c2 80 11 c5 d8 10 c5 80 11 c0 d9 10 c4 f7
 19 c7 d8 10 c7 80 11 c2 d9 10 c2 80 11 c6 f7 19 c1 d9 10 c1 80 11 c4 d8 10
 c7 d8 10 c0 87 09 c3 d8 10 c3 80 11 c6 d8 10 c6 80 11 c1 d9 10 c5 de 08 c5
 80 11 c0 d9 10 c0 80 11 c3 d8 10 c3 80 11 c0 a8 f4 ec bb 01 c3 d8 10 c6 d8
 10 c6 80 11 c1 d9 10 c1 80 11 c5 de e4 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80
 11 c6 d8 10 c7 86 09 c2 d9 10 c2 80 11 c5 d8 10 c0 d9 10 c0 80 11 c4 de 08
 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c5 d8 10 c6 86 09 c1 d9 10 c1 80 11 c4
 d8 10 c4 80 11 c7 d8 10 c0 87 09 c3 d8 10 c6 d8 10 c6 80 11 c1 d9 10 c1 80
 11 c5 de 08 c5 80 11 c0 d9 10 c3 d8 10 c3 80 11 c6 d8 10 c7 86 09 c2 d9 10
 c2 80 11 c5 d8 10 c0 d9 10 c4 f7 19 c4 80 11 c7 d8 10 c7 80 11 c2 d9 10 c6
 f7 19

 
  

We notice the index entries are relatively small at just 326 bytes even though we have 2 indexed columns (col 0 and col 1). The size of the bitmap column (col 4) is just 302 bytes, only a fraction of that of the single column Bitmap indexes. With so few 1s (true) to contend with, the resultant bitmap is far more efficiently compressed than with the single Bitmap column indexes as it has far more 0s (false) than can be effectively compressed.

A concatenated Bitmap index can potentially use less or more space than corresponding single column indexes, it depends on the number of index entries that are derived and the distribution of the data with the table.

I’ll post Part II in the next few days.

Index Monitoring and Index Statistics (The Great Gig In The Sky) September 16, 2008

Posted by Richard Foote in 11g, Concatenated Indexes, Extended Statistics, Index Monitoring, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.
8 comments

I write this post whilst listening to Pink Floyd’s masterpiece “The Dark Side Of The Moon” while sadly lamenting the passing away of Richard Wright. RIP and thank you for all the great musical gifts you’ve given me over the years.

In my last post I highlighted an example of where Index Monitoring doesn’t show how indeed Oracle does indeed use an index when checking for the existence of Foreign Key values. Thought I might discuss yet another example of where Oracle does indeed use an index but it’s again not picked up by index monitoring, this time with a slight 11g flavour.

This specific example involves using the statistics associated with the indexes to provide the CBO with useful additional information, although the index itself is not used directly within the execution plan. Dropping the index means losing this information which could possibly result in a different, non optimal execution plans.

Prior to 11g, Oracle can have a hard time of accurately determining the correct selectively where there is a correlation between two (or more) columns in a table. By default, Oracle assumes the selectivity of two distinct columns to be the density of both columns multiplied together. So for example, if one column (say “A”) had 10 distinct values and the other column (say “B” also had 10 distinct values, Oracle assumes the selectivity of both columns combined to be 10 x 10 = 100 distinct values. A predicate such as:

WHERE A = 5 and B = 2

would assume 1% of data would be retrieved if both columns A and B both had 10 distinct possible values.

However, what if there’s a special relationship between the columns and the actual number of distinct combinations was somewhat different ? What if B always equals 2 when A equals 5, what if instead of the theoretical 100 different combinations there were only 10 combinations (or some such) because most of the other possible combination don’t actually exist …

This 9i and 10g demo shows how there is indeed only 10 distinct values for each of two different columns, however there is a direct relationship between these columns such that there is actually only 10 distinct combinations of both these columns (and not the 100 combinations which are possible and which Oracle assumes in it’s selectivity calculations).

Instead of the actual 10,000 rows (or 10% of all data) being selected, Oracle is incorrectly assuming only 1000 rows (or 1%) will be selected. This is a significant error by a order of magnitude which in many cases can result in a less efficient execution plan.

With 11g, Oracle can use the statistics associated with an index to give Oracle some vital extra information. Because if there’s an index based on the two columns, then the number of distinct key values recorded for the index can provide Oracle with a much more accurate estimation of the true selectivity based on the two columns. If a concatenated index based on the two columns only has say 10 distinct values, then Oracle can assume that a specific combination of the two columns is likely to also retrieve 1/10 of all the values and not the 1/100 that are theoretically possible.

This identical 11g demo to the one above shows by having an index on the two columns that have a correlation, Oracle is using the DISTINCT_KEYS statistic for the index to determine the correct selectivity and associated cardinality for the query.

However, the demo clearly shows index monitoring is still not showing the index as being “USED”. If you were to hence drop the index, the CBO loses potentially vital information and the cardinality estimates revert back to being the product of the two column densities as with pre 11g.

By dropping the index which appears to not be used, we can potentially impact other execution plans, even though they don’t directly use the index within the execution plan. The correct cardinality estimates of a table can for example potential drive the order in which the table is subsequently joined or the manner in which it’s joined.

This demo on the possible impact of dropping an “unused” index shows how an execution plan can change to be sub-optimal, even though neither execution actually directly uses the associated index. It’s not a particularly “clever” example, but it does illustrate the potential impact of dropping these so called unused indexes.

Of course, with 11g, we now have the capability of collecting extended statistics. We can potentially determine these same level of statistics by generating statistics on both columns combined. Oracle can determine the actual distinct combinations of columns that are somehow correlated and produce more accurate and detailed statistics with hence make the CBO determine more accurate and reliable cardinality estimates.

This final demo on extended statistics shows how we can recreate the more efficient execution plan and provide the CBO with more detailed extended statistics so that it can accurately determine the correct cardinality estimates without the need to recreate the “unused” index.

Extended statistics can be extremely useful in determining correct cardinality values for column combinations that exist however it still falls somewhat short when it attempts to estimate the expected cardinality for combinations that don’t actually exist, even with histograms.

But that’s a tale for another day.

Now it’s time for “Wish You Were Here” …

Clustering Factor: A Consideration in Concatenated Index Leading Column Decision (Sweet Thing) February 15, 2008

Posted by Richard Foote in Clustering Factor, Concatenated Indexes, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
14 comments

Short but sweet today.

I last discussed how high cardinality columns shouldn’t necessarily be in the leading column of a concatenated index as  they don’t provide the performance benefit as sometimes claimed.

If all things are equal and the columns in the concatenated index are all likely to be referenced, a simple consideration that is often forgotten when deciding which column to have as the leading index column is the Clustering Factor of the corresponding columns.

As previously discussed, the Clustering Factor  determines how well aligned or ordered the index entries are in relation to the rows in the parent table. So if the rows are ordered within the table on a particular column or columns (such as a sequential ID column, a monotonically increasing date or time-stamp, etc), then an index on these columns is likely to have a very good Clustering Factor. Consequently less IOs will be required to retrieve all the required rows via the index as all the required rows will be housed in relatively few, well clustered data blocks.

It therefore makes sense to at least consider the Clustering Factor of the various columns in a concatenated index. Why ? Because if the leading column has a very good Clustering Factor, the concatenated index by definition must also have a very good Clustering Factor as all indexes are primarily ordered based on the leading indexed column. A concatenated index with a good Clustering Factor is going to be more efficient in retrieving rows from the table and more importantly, will be considered more desirably by the CBO when costing access path options.

Of course, the opposite is also true. By having a leading column with a poor Clustering Factor will mean the concatenated index will have a poor Clustering Factor, making it less efficient and less likely to be considered by the CBO.

As such, the Clustering Factor of each corresponding column in a concatenated index is at least worthy of some consideration when making the decision on how best to order the indexed columns.

This demo on Index Column Order and Clustering Factor  shows how the order of columns in a concatenated index has a big impact on the Clustering Factor of the resultant index.

UPDATE: However as Tom Kyte has stated in the comments, in virtually all cases, the Clustering Factor is not really a factor (yes, pun fully intended) as subsequently columns are generally going to impact the CF anyways or the selectivity of the index is such that the improved CF is not relevant anyways.

More relevant considerations regarding the ordering of columns in an index coming I promise🙂

It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ? February 13, 2008

Posted by Richard Foote in Concatenated Indexes, Index Internals, Oracle General, Oracle Indexes, Oracle Myths, Performance Tuning.
23 comments

A common myth or mis-perception is that when deciding how to order the columns in a concatenated, multi-column index, one should avoid placing low cardinality columns in front.

For example, if you want to create an index on two columns, column ID which has many many distinct values and column CODE which has very few distinct values, create the index as (ID, CODE) as it’ll be far more efficient than a corresponding index on (CODE, ID).

The reasoning goes that by creating the (CODE, ID) index, one decreases the performance and efficiency of using the index as Oracle will have to scan through multiple index leaf blocks containing the low cardinality column, until it eventually finds the specific index entries of interest.

Or so the theory goes …

In actual fact, there’s no real difference in navigating to the specific leaf block of interest for an index on (ID, CODE) compared to an index based on (CODE, ID), providing both indexed columns are known.

The important fact that’s missed is that the branch index entries contain column entries based on all indexed columns, or at least on as much as is necessary to uniquely identify the required navigational path. Therefore, Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know.

The only slight overhead that an index based on (CODE,ID) will have is that these branch index entries are going to be somewhat larger as it will likely require both columns for the branch index entries but likely only the one column the other way around. However, branch blocks usually take up a small percentage of the overall index structure and this (usually) minor overhead is very unlikely to make a difference to the index height.

This demo on Index Column Cardinality Order shows how Oracle navigates to a specific leaf block of interest in the same manner and with the same costs, regardless of the ordering of low and high cardinality columns in the index. It also shows and describes a couple of index branch block dumps to highlight how Oracle uses the column values to define the necessary navigational path.

So the high cardinality column shouldn’t necessarily be the column given leading column status.

In actual fact there are a number of good reasons why the low cardinality column could be considered as the better option as the leading column. For a start, the index can be compressed much more efficiently if the leading column has lower cardinality. Also, an Index Skip Scan can at least be considered if the leading column has lower cardinality.

Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration, as is the clustering factor of the leading column.

All good discussions for another day🙂

Do ROWID Index Row Entry Columns Impact Index Block Splits ? December 20, 2007

Posted by Richard Foote in Concatenated Indexes, Index Block Splits, Index Internals, Oracle Indexes, Richard's Musings, ROWID.
11 comments

Based on a great question by Alberto Dell’Era  in my “Differences Between Unique/Non-Unique” blog entry (comment 9), I thought it might be a useful exercise to show how I go about confirming my understanding of a specific concept by trying to develop a little test case or demo that can illustrate the concept. My “magic incarnation” if you like😉

The basic question was does the ROWID that constitutes an additional column in a Non-Unique index determine whether a particular row entry is the maximum or equivalent entry or not. Because by implication, this can determine and influence whether Oracle performs the generally preferred 90-10 splits rather than 50-50 block splits for indexed column values that at least equal the maximum value.

The answer is yes because the ROWID column is just another column in the index row entry and is simply treated the same. But how to actually “illustrate” and show this ?

I needed a way therefore to insert a ROWID that was always going to be the maximum ROWID value for a Non-Unique index. Then insert a whole bunch of subsequent ROWIDs of a lesser value than the maximum and inspect via index statistics whether the type of block splits changed from 90-10 to 50-50 block splits. Remember with the Object Number being equal (if it’s there at all), the next significant portion of the ROWID is the Relative File Number.

The plan was (reasonably) simple. Create a tablespace with one data file and fill it with something. Then add a second data file and use this to store the start of my table of interest (and of course create the index). This will create a whole bunch of rows with ROWIDs of a higher Relative File Number than those in the first data file. Then drop the first table and ensure the second table uses the free space created in the first data file. That way, a whole bunch of ROWIDs can be created that are less than existing ROWIDs because it would be using ROWIDs from the first data file, which has a lesser Relative File Number.

It’s the usual process I go through with these things. Find something that’s of interest, have some idea on how I think things work, come up with plans or strategies that will illustrate whether or not what I think is true (ensuring that somewhere in the process I include at least one reference to David Bowie😉. I can then later take the initial strategies and expand them for all applicable database options and features. Then see if anything changes between database versions and platforms.

Hopefully this demo shows you how I went about proving this: Do ROWID Index Row Entry Columns Impact Index Block Splits Demo.

The benefit of then showing these demos is that others can see exactly how I came to a conclusion, potentially try them out for oneself and perhaps see holes or flaws or shortfalls in the strategy or expand or tailor them for individual requirements or environments.