jump to navigation

Index Advanced Compression vs. Bitmap Indexes (Candidate) October 31, 2014

Posted by Richard Foote in 12c, Advanced Index Compression, Bitmap Indexes, Oracle Indexes.
trackback

A good question from Robert Thorneycroft I thought warranted its own post. He asked:

I have a question regarding bitmapped indexes verses index compression. In your previous blog titled ‘So What Is A Good Cardinality Estimate For A Bitmap Index Column ? (Song 2)’ you came to the conclusion that ‘500,000 distinct values in a 1 million row table’ would still be a viable scenario for deploying bitmapped indexes over non-compressed b-tree indexes.

Now b-tree index compression is common, especially with the release of Advanced Index Compression how does this affect your conclusion? Are there still any rules of thumb which can be used to determine when to deploy bitmapped indexes instead of compressed b-tree indexes or has index compression made bitmapped indexes largely redundant?”

 

If you’re not familiar with Bitmap Indexes, it might be worth having a read of my previous posts on the subject.

Now Advanced Index Compression introduced in 12.1.0.2 has certainly made compressing indexes a lot easier and in many scenarios, more efficient than was previously possible. Does that indeed mean Bitmap Indexes, that are relatively small and automatically compressed, are now largely redundant ?

The answer is no, Bitmap Indexes are still highly relevant in Data Warehouse environments as they have a number of key advantages in the manner they get compressed over B-Tree Indexes.

Compression of a B-Tree index is performed within a leaf block where Oracle effectively de-duplicates the index entries (or parts thereof). This means that a highly repeated index value might need to be stored repeatedly in each leaf block. Bitmap index entries on the other hand can potentially span the entire table and only need to be split if the overall size of the index entries exceeds 1/2 a block. Therefore, the number of indexed values stored in a Bitmap Index can be far less than with a B-tree.

However, it’s in the area of storing the associated rowids where Bitmap Indexes can have the main advantage. With a B-tree index, even when highly compressed, each and every index entry must have an associated rowid stored in the index. If you have say 1 million index entries, that’s 1 million rowids that need to be stored, regardless of the compression ratio. With a Bitmap Index, an index entry has 2 rowids to specify the range of rows covered by the index entry, but this might be sufficient to cover the entire table. So depending on the number of distinct values being indexed in say a million row table, there may be dramatically fewer than 1 million rowids stored in the Bitmap Index.

To show how Bitmap Indexes are generally much smaller than corresponding compressed B-Tree indexes, a few simple examples.

In example 1, I’m going to create a B-Tree Index that is perfect candidate for compression. This index has very large indexed values that are all duplicates and so will compress very effectively:

SQL> create table ziggy (id number, weird varchar2(100));

Table created.

SQL> insert into ziggy select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'
     from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy_weird_i on ziggy(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I';

INDEX_NAME        BLEVEL LEAF_BLOCKS   NUM_ROWS
------------- ---------- ----------- ----------
ZIGGY_WEIRD_I          2        9175    1000000

SQL> drop index ziggy_weird_i2;

Index dropped.

SQL> create index ziggy_weird_i on ziggy(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I';

INDEX_NAME        BLEVEL LEAF_BLOCKS   NUM_ROWS
------------- ---------- ----------- ----------
ZIGGY_WEIRD_I          2        1389    1000000

 

So this index has compressed down from 9175 leaf blocks to just 1389. That’s impressive.

However, this scenario is also the perfect case for a Bitmap Index with large, highly repeated index entries. If we compare the compressed B-Tree Index with a corresponding Bitmap index:

SQL> create bitmap index ziggy_weird_i on ziggy(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I';

INDEX_NAME        BLEVEL LEAF_BLOCKS   NUM_ROWS
------------- ---------- ----------- ----------
ZIGGY_WEIRD_I          1          21         42

 

At just a tiny 21 leaf blocks, the Bitmap Index wins by a mile.

In example 2, I’m going to create an index that still almost a perfect case for compressing a B-Tree Index, but far less so for a Bitmap Index. I’m going to create enough duplicate entries to just about fill a specific leaf block, so that each leaf block only has 1 or 2 distinct index values. However, as we’ll have many more distinct indexed values overall, this means we’ll need more index entries in the corresponding Bitmap Index.

SQL> create table ziggy2 (id number, weird varchar2(100));

Table created.

SQL> insert into ziggy2 select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'||mod(rownum,1385)
     from dual connect by level<=1000000;

1000000 rows created.

SQL> commit;

Commit complete.
SQL> create index ziggy2_weird_i on ziggy2(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY2_WEIRD_I          2        9568    1000000

SQL> drop index ziggy2_weird_i;

Index dropped.

SQL> create index ziggy2_weird_i on ziggy2(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY2_WEIRD_I          2        1401    1000000

 

So we have a relatively large indexed column that has some 1385 distinct values but each value just about fills out a compress leaf block. If we look at the compression of the index, we have reduced the index down from 9568 leaf blocks to just 1401 leaf blocks. Again, a very impressive compression ratio.

Unlike the previous example where we had just the one value, we now have some 1385 index entries that need to be created as a minimum for our Bitmap Index. So how does it compare now ?

SQL> drop index ziggy2_weird_I;

Index dropped.

SQL> create bitmap index ziggy2_weird_i on ziggy2(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY2_WEIRD_I          2         462       1385

 

Although the Bitmap Index is much larger than it was in the previous example, at just 464 leaf blocks it’s still significantly smaller than the corresponding compressed 1401 leaf block B-Tree index.

OK, example 3, we’re going to go into territory where no Bitmap Index should tread (or so many myths would suggest). We going to index a column in which each value only has the one duplicate. So for our 1 million row table, the column will have some 500,000 distinct values.

With relatively few duplicate column values, the compression of our B-Tree Indexes is not going to be as impressive. However, because the indexed values are still relatively large, any reduction here would likely have some overall impact:

SQL> create table ziggy3 (id number, weird varchar2(100));

Table created.

SQL> insert into ziggy3 select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'||mod(rownum,500000)
     from dual connect by level<=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY3_WEIRD_I          2        9891    1000000

SQL> drop index ziggy3_weird_i;

Index dropped.

SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY3_WEIRD_I          2        6017    1000000

 

So the compression ratio is not as good now, coming down to 6017 leaf blocks from 9891. However, this will surely be better than a Bitmap Index with 500,000 distinct values …

 

SQL> drop index ziggy3_weird_i;

Index dropped.

SQL> create bitmap index ziggy3_weird_i on ziggy3(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY3_WEIRD_I          2        5740     500000

 

So even in this extreme example, the Bitmap Index at 5740 leaf blocks is still smaller than the corresponding compressed B-Tree Index at 6017 leaf blocks.

In this last example 4, it’s a scenario similar to the last one, except the index entries themselves are going to be much smaller (a few byte number column vs. the 60 odd byte varchar2). Therefore, the rowids of the index entries will be a much larger proportion of the overall index entry size. Reducing the storage of index values via compression will be far less effective, considering the prefix table in a compressed index comes with some overhead.

SQL> create table ziggy4 (id number, weird number);

Table created.

SQL> insert into ziggy4 select rownum, mod(rownum,500000) from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        1998    1000000

SQL> drop index ziggy4_weird_i;

Index dropped.

SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0 compress advanced low;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        1998    1000000

 

So Index Advanced Compression has decided against compressing this index, it’s just not worth the effort. If we force compression:

 

SQL> drop index ziggy4_weird_i;

Index dropped.

SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0 compress;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        2065    1000000

 

We notice the index has actually increased in size, up to 2065 leaf blocks from 1998. The overheads of the prefix table over-ride the small efficiencies of reducing the duplicate number indexed values.

Meanwhile the corresponding Bitmap Index:

SQL> drop index ziggy4_weird_i;

Index dropped.

SQL> create bitmap index ziggy4_weird_i on ziggy4(weird) pctfree 0;

Index created.

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I';

INDEX_NAME         BLEVEL LEAF_BLOCKS   NUM_ROWS
-------------- ---------- ----------- ----------
ZIGGY4_WEIRD_I          2        1817     500000

 

Is still smaller at 1817 leaf blocks than the best B-Tree index has to offer.

So the answer is no, Bitmap Indexes are not now redundant now we have Index Advanced Compression. In Data Warehouse environments, as long as they don’t reference column values that are approaching uniqueness,  Bitmap Indexes are likely going to be smaller than corresponding compressed B-Tree indexes.

Comments»

1. Abu Fazal Md Abbas - October 31, 2014

Thanks for sharing. Very informative article.

2. Martin Preiss - October 31, 2014

Hi Richard,
looking at your first example I asked myself: why would I like to have an index on a column with just one distinct value – and then answered this question with something in the direction of “index only access” and “counting of values”. The second point created the following idea: if I had a table for which the knowledge of the exact row count was essential, then could it perhaps be a good idea to add a small constant column with a bitmap index on it:

drop table t;

create table t (
    id not null
  , padding
  , count_helper not null
)  
as
select rownum id
     , lpad('*', 50, '*') padding
	 , 0 count_helper
  from dual
connect by level <= 100000;


create bitmap index t_bix on t(count_helper);

create unique index t_ix on t(id);

set autot trace

select count(*) from t;

-------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |       |     1 |            |          |
|   2 |   BITMAP CONVERSION COUNT     |       |   100K|     3   (0)| 00:00:01 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T_BIX |       |            |          |
-------------------------------------------------------------------------------


Statistiken
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          7  consistent gets
          3  physical reads
          0  redo size
        365  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

alter index t_bix invisible;

select count(*) from t;

----------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    30  (10)| 00:00:01 |
|   1 |  SORT AGGREGATE       |      |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T_IX |   100K|    30  (10)| 00:00:01 |
----------------------------------------------------------------------


Statistiken
----------------------------------------------------------
         32  recursive calls
          0  db block gets
        267  consistent gets
        208  physical reads
          0  redo size
        365  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

So the bitmap index is indeed as tiny as expected and the bitmap conversion count seems to be a cheap operation (and the optimizer also thinks so).

Of course maybe this could rather be a job for an additional MView – and the bitmap locking effects could make the idea much less interesting for a table with much concurrent DML – and we would pay with a larger row length.

But do you see a bigger flaw in my considerations?

Regards

Martin

Richard Foote - November 1, 2014

Hi Martin

Yes skinny indexes are great for such queries and bitmap indexes are as skinny as you can get.

You just wouldn’t consider bitmap indexes though with concurrent DML, that’s the path to locking hell.

Yes MVs are an alternative and so is the new APPROX_COUNT_DISTINCT function that provides cheap as good as counts.

Martin Preiss - November 2, 2014

Hi Richard,
thank you for the confirmation. APPROX_COUNT_DISTINCT is certainly an interesting option – if an approximation is sufficient.

3. Robert Thorneycroft - November 4, 2014

Hi Richard,

Great blog again but I have some observations regarding the performance of the various indexes.

In your examples you based the performance of the indexes on the leaf blocks the index requires and made the assumption that smaller will be faster. I thought I would test this so I repeated your third test above and then performed a simple count using non-compressed, compressed and bitmap indexes. Each test was repeated 4 times (I removed some of the output to keep the post a little shorter).

SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0;

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name=’ZIGGY3_WEIRD_I’;

INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS
—————————— ———- ———– ———-
ZIGGY3_WEIRD_I 2 9891 1000000

SQL> select count(*) from ziggy3 where weird like ‘THE RISE%’;
Elapsed: 00:00:01.07
Elapsed: 00:00:00.22
Elapsed: 00:00:00.22
Elapsed: 00:00:00.22

SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0 compress;

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name=’ZIGGY3_WEIRD_I’;

INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS
—————————— ———- ———– ———-
ZIGGY3_WEIRD_I 2 6017 1000000

SQL> select count(*) from ziggy3 where weird like ‘THE RISE%’;
Elapsed: 00:00:00.76
Elapsed: 00:00:00.24
Elapsed: 00:00:00.25
Elapsed: 00:00:00.26

SQL> create bitmap index ziggy3_weird_i on ziggy3(weird) pctfree 0;

SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name=’ZIGGY3_WEIRD_I’;

INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS
—————————— ———- ———– ———-
ZIGGY3_WEIRD_I 2 5937 500000

SQL> select count(*) from ziggy3 where weird like ‘THE RISE%’;
Elapsed: 00:00:00.90
Elapsed: 00:00:00.36
Elapsed: 00:00:00.36
Elapsed: 00:00:00.35

Looking at these results it appears that when the index is cached after the first execution the performance of the query is not only dictated by the number of leaf blocks. The uncompressed index is almost twice as fast completing the query despite being close on double the size.

Also of some interest are the execution times for the first runs. I repeated the dropping and creating of the indexes a total of 4 times and cleared the buffer cache each time and the averages matched almost exactly the first execution times shown above with very little variation from run to run. This seems to indicate that when the index is not well cached the results are closer to the expected behaviour of less leaf blocks delivering faster results, however notice that even in this case the compressed index is still outperforming the slightly smaller bitmap by about 15%.

The explain plans of the initial run of each of the above scenarios gives some further information which might help explain. It appears that the CPU time for the BITMAP is noticably higher than either of the other two scenarios so once the wait time is reduced on subsequent runs due to caching the BITMAP is then at a disadvantage.

SELECT STATEMENT ( Estimated Costs = 329 , Estimated #Rows = 0 )
2 SORT AGGREGATE
1 INDEX FAST FULL SCAN ZIGGY3_WEIRD_I
( Estim. Costs = 329 , Estim. #Rows = 1,112,713 )
Estim. CPU-Costs = 58,221,516 Estim. IO-Costs = 324
Filter Predicates
Performed 9984 disk reads and 10094 buffer gets
Elapsed Time 1,068,091.0
CPU Time 420,000
Wait Time 648,091

SELECT STATEMENT ( Estimated Costs = 202 , Estimated #Rows = 0 )
2 SORT AGGREGATE
1 INDEX FAST FULL SCAN ZIGGY3_WEIRD_I
( Estim. Costs = 201 , Estim. #Rows = 1,112,713 )
Estim. CPU-Costs = 52,651,125 Estim. IO-Costs = 197
Filter Predicates
Performed 5992 disk reads and 6092 buffer gets
Elapsed Time 748,103.0
CPU Time 360,000
Wait Time 388,103

SELECT STATEMENT ( Estimated Costs = 1,082 , Estimated #Rows = 0 )
3 SORT AGGREGATE
2 BITMAP CONVERSION COUNT
( Estim. Costs = 1,081 , Estim. #Rows = 1,112,713 )
Estim. CPU-Costs = 29,682,182 Estim. IO-Costs = 1,079

1 BITMAP INDEX FAST FULL SCAN ZIGGY3_WEIRD_I
Filter Predicates
Performed 5992 disk reads and 6092 buffer gets
Elapsed Time 933,355.0
CPU Time 480,000
Wait Time 453,355

Richard Foote - November 21, 2014

Hi Robert

Great observations, thanks for the demo.

I totally agree, extracting data out of the bitmap indexes takes more CPU than from b-tree because of the work required to determine the rowids.

However, your demo is the worst case scenario for this with all the work being performed within the indexes via the fast full index scans. In most scenarios, the work actually being performed when reading indexes is relatively low compared to that necessary to access the tables so the overall differences in total CPU costs is relatively low. Add to that the flexibility associated with reading from multiple bitmap indexes and they become viable when often not considered.

But absolutely, the additional CPU effort in reading bitmap indexes is a valid consideration.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: