Index Advanced Compression vs. Bitmap Indexes (Candidate) October 31, 2014
Posted by Richard Foote in 12c, Advanced Index Compression, Bitmap Indexes, Oracle Indexes.7 comments
A good question from Robert Thorneycroft I thought warranted its own post. He asked:
“I have a question regarding bitmapped indexes verses index compression. In your previous blog titled ‘So What Is A Good Cardinality Estimate For A Bitmap Index Column ? (Song 2)’ you came to the conclusion that ‘500,000 distinct values in a 1 million row table’ would still be a viable scenario for deploying bitmapped indexes over non-compressed b-tree indexes.
Now b-tree index compression is common, especially with the release of Advanced Index Compression how does this affect your conclusion? Are there still any rules of thumb which can be used to determine when to deploy bitmapped indexes instead of compressed b-tree indexes or has index compression made bitmapped indexes largely redundant?”
If you’re not familiar with Bitmap Indexes, it might be worth having a read of my previous posts on the subject.
Now Advanced Index Compression introduced in 12.1.0.2 has certainly made compressing indexes a lot easier and in many scenarios, more efficient than was previously possible. Does that indeed mean Bitmap Indexes, that are relatively small and automatically compressed, are now largely redundant ?
The answer is no, Bitmap Indexes are still highly relevant in Data Warehouse environments as they have a number of key advantages in the manner they get compressed over B-Tree Indexes.
Compression of a B-Tree index is performed within a leaf block where Oracle effectively de-duplicates the index entries (or parts thereof). This means that a highly repeated index value might need to be stored repeatedly in each leaf block. Bitmap index entries on the other hand can potentially span the entire table and only need to be split if the overall size of the index entries exceeds 1/2 a block. Therefore, the number of indexed values stored in a Bitmap Index can be far less than with a B-tree.
However, it’s in the area of storing the associated rowids where Bitmap Indexes can have the main advantage. With a B-tree index, even when highly compressed, each and every index entry must have an associated rowid stored in the index. If you have say 1 million index entries, that’s 1 million rowids that need to be stored, regardless of the compression ratio. With a Bitmap Index, an index entry has 2 rowids to specify the range of rows covered by the index entry, but this might be sufficient to cover the entire table. So depending on the number of distinct values being indexed in say a million row table, there may be dramatically fewer than 1 million rowids stored in the Bitmap Index.
To show how Bitmap Indexes are generally much smaller than corresponding compressed B-Tree indexes, a few simple examples.
In example 1, I’m going to create a B-Tree Index that is perfect candidate for compression. This index has very large indexed values that are all duplicates and so will compress very effectively:
SQL> create table ziggy (id number, weird varchar2(100)); Table created. SQL> insert into ziggy select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> create index ziggy_weird_i on ziggy(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS ------------- ---------- ----------- ---------- ZIGGY_WEIRD_I 2 9175 1000000 SQL> drop index ziggy_weird_i2; Index dropped. SQL> create index ziggy_weird_i on ziggy(weird) pctfree 0 compress advanced low; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS ------------- ---------- ----------- ---------- ZIGGY_WEIRD_I 2 1389 1000000
So this index has compressed down from 9175 leaf blocks to just 1389. That’s impressive.
However, this scenario is also the perfect case for a Bitmap Index with large, highly repeated index entries. If we compare the compressed B-Tree Index with a corresponding Bitmap index:
SQL> create bitmap index ziggy_weird_i on ziggy(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS ------------- ---------- ----------- ---------- ZIGGY_WEIRD_I 1 21 42
At just a tiny 21 leaf blocks, the Bitmap Index wins by a mile.
In example 2, I’m going to create an index that still almost a perfect case for compressing a B-Tree Index, but far less so for a Bitmap Index. I’m going to create enough duplicate entries to just about fill a specific leaf block, so that each leaf block only has 1 or 2 distinct index values. However, as we’ll have many more distinct indexed values overall, this means we’ll need more index entries in the corresponding Bitmap Index.
SQL> create table ziggy2 (id number, weird varchar2(100)); Table created. SQL> insert into ziggy2 select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'||mod(rownum,1385) from dual connect by level<=1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> create index ziggy2_weird_i on ziggy2(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY2_WEIRD_I 2 9568 1000000 SQL> drop index ziggy2_weird_i; Index dropped. SQL> create index ziggy2_weird_i on ziggy2(weird) pctfree 0 compress advanced low; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY2_WEIRD_I 2 1401 1000000
So we have a relatively large indexed column that has some 1385 distinct values but each value just about fills out a compress leaf block. If we look at the compression of the index, we have reduced the index down from 9568 leaf blocks to just 1401 leaf blocks. Again, a very impressive compression ratio.
Unlike the previous example where we had just the one value, we now have some 1385 index entries that need to be created as a minimum for our Bitmap Index. So how does it compare now ?
SQL> drop index ziggy2_weird_I; Index dropped. SQL> create bitmap index ziggy2_weird_i on ziggy2(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY2_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY2_WEIRD_I 2 462 1385
Although the Bitmap Index is much larger than it was in the previous example, at just 464 leaf blocks it’s still significantly smaller than the corresponding compressed 1401 leaf block B-Tree index.
OK, example 3, we’re going to go into territory where no Bitmap Index should tread (or so many myths would suggest). We going to index a column in which each value only has the one duplicate. So for our 1 million row table, the column will have some 500,000 distinct values.
With relatively few duplicate column values, the compression of our B-Tree Indexes is not going to be as impressive. However, because the indexed values are still relatively large, any reduction here would likely have some overall impact:
SQL> create table ziggy3 (id number, weird varchar2(100)); Table created. SQL> insert into ziggy3 select rownum, 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'||mod(rownum,500000) from dual connect by level<=1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY3_WEIRD_I 2 9891 1000000 SQL> drop index ziggy3_weird_i; Index dropped. SQL> create index ziggy3_weird_i on ziggy3(weird) pctfree 0 compress advanced low; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY3_WEIRD_I 2 6017 1000000
So the compression ratio is not as good now, coming down to 6017 leaf blocks from 9891. However, this will surely be better than a Bitmap Index with 500,000 distinct values …
SQL> drop index ziggy3_weird_i; Index dropped. SQL> create bitmap index ziggy3_weird_i on ziggy3(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY3_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY3_WEIRD_I 2 5740 500000
So even in this extreme example, the Bitmap Index at 5740 leaf blocks is still smaller than the corresponding compressed B-Tree Index at 6017 leaf blocks.
In this last example 4, it’s a scenario similar to the last one, except the index entries themselves are going to be much smaller (a few byte number column vs. the 60 odd byte varchar2). Therefore, the rowids of the index entries will be a much larger proportion of the overall index entry size. Reducing the storage of index values via compression will be far less effective, considering the prefix table in a compressed index comes with some overhead.
SQL> create table ziggy4 (id number, weird number); Table created. SQL> insert into ziggy4 select rownum, mod(rownum,500000) from dual connect by level <=1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY4_WEIRD_I 2 1998 1000000 SQL> drop index ziggy4_weird_i; Index dropped. SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0 compress advanced low; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY4_WEIRD_I 2 1998 1000000
So Index Advanced Compression has decided against compressing this index, it’s just not worth the effort. If we force compression:
SQL> drop index ziggy4_weird_i; Index dropped. SQL> create index ziggy4_weird_i on ziggy4(weird) pctfree 0 compress; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY4_WEIRD_I 2 2065 1000000
We notice the index has actually increased in size, up to 2065 leaf blocks from 1998. The overheads of the prefix table over-ride the small efficiencies of reducing the duplicate number indexed values.
Meanwhile the corresponding Bitmap Index:
SQL> drop index ziggy4_weird_i; Index dropped. SQL> create bitmap index ziggy4_weird_i on ziggy4(weird) pctfree 0; Index created. SQL> select index_name, blevel, leaf_blocks, num_rows from dba_indexes where index_name='ZIGGY4_WEIRD_I'; INDEX_NAME BLEVEL LEAF_BLOCKS NUM_ROWS -------------- ---------- ----------- ---------- ZIGGY4_WEIRD_I 2 1817 500000
Is still smaller at 1817 leaf blocks than the best B-Tree index has to offer.
So the answer is no, Bitmap Indexes are not now redundant now we have Index Advanced Compression. In Data Warehouse environments, as long as they don’t reference column values that are approaching uniqueness, Bitmap Indexes are likely going to be smaller than corresponding compressed B-Tree indexes.
12.1.0.2 Introduction to Zone Maps Part II (Changes) October 30, 2014
Posted by Richard Foote in 12c, Exadata, Oracle Indexes, Zone Maps.1 comment so far
In Part I, I discussed how Zone Maps are new index like structures, similar to Exadata Storage Indexes, that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A Zone being a range of contiguous (8M) blocks.
I showed how a Zone Map was relatively tiny but very effective in reducing the number of consistent gets for a well clustered column (ALBUM_ID).
In this post, we’re going to continue with the demo and look at what happens when we update data in the table with a Zone Map in place.
So lets update the ALBUM_ID column (which currently has a Zone Map defined) for a few rows. The value of ALBUM_ID was previously 1 for all these rows (the full range of values is currently between 1 and 100) but we’re going to update them to 142:
SQL> update big_bowie set album_id=142 where id between 1 and 100; 100 rows updated. SQL> commit; Commit complete.
So the maximum value of ALBUM_ID is now 142, not 100. If we look at the maximum value as currently listed in the Zone Map:
SQL> select max(max_1_album_id) from big_bowie_album_id_zm; MAX(MAX_1_ALBUM_ID) ------------------- 100
We notice the maximum is still defined as being 100. So the update on the table has not actually updated the contents of the Zone Map. So this is a big difference between Zone Maps and conventional indexes, indexes are automatically updated during DML operations, Zone Maps are not (unless the REFRESH ON COMMIT option is specified).
If we look at the state of Zone Map entries that have a minimum of 1 (the previous values of ALBUM_ID before the update):
SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1; ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$ ---------- -------------- -------------- ----------- ----------- ---------- 3.8586E+11 1 2 0 0 66234 3.8586E+11 1 2 0 1 65787 3.8586E+11 1 2 0 0 66223
We notice that one of the entries has a status of 1, meaning that a specific zone has been marked as stale. However, all the other zones are still OK.
If we look at the status of the overall Zone Map:
SQL> select zonemap_name, pruning, refresh_mode, invalid, stale, unusable from dba_zonemaps where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM'; ZONEMAP_NAME PRUNING REFRESH_MODE INVALID STALE UNUSABLE ------------------------- -------- ----------------- ------- ------- -------- BIG_BOWIE_ALBUM_ID_ZM ENABLED LOAD DATAMOVEMENT NO NO NO
We notice that the Zone Map is still “hunky dory” after the update.
If we now re-run the query we ran in Part I:
SQL> select * from big_bowie where album_id = 42; 100000 rows selected. Elapsed: 00:00:00.29 Execution Plan ---------------------------------------------------------- Plan hash value: 1980960934 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 8984K| 3269 (12)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE | 100K| 8984K| 3269 (12)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ALBUM_ID"=42) filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42) Statistics ---------------------------------------------------------- 141 recursive calls 0 db block gets 3238 consistent gets 0 physical reads 0 redo size 3130019 bytes sent via SQL*Net to client 761 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
We see the Zone Map was still used by the CBO. The number of consistent gets has increased (up from 2364 to 3238) as we now have to additional access all the blocks associated with this stale zone, but it’s still more efficient that reading all the blocks from the entire table.
If we want to remove the stale zone entries, we can refresh the Zone Map or rebuild it (for ON DEMAND refresh):
SQL> alter materialized zonemap big_bowie_album_id_zm rebuild; Materialized zonemap altered.
If we now look at the Zone Map entry:
SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1; ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$ ---------- -------------- -------------- ----------- ----------- ---------- 3.8586E+11 1 2 0 0 66234 3.8586E+11 1 142 0 0 65787 3.8586E+11 1 2 0 0 66223
We see that the entry is no longer stale and now correctly reflects the actual maximum value within the zone (142).
If we now re-run the query:
SQL> select * from big_bowie where album_id = 42; 100000 rows selected. Elapsed: 00:00:00.30 Execution Plan ---------------------------------------------------------- Plan hash value: 1980960934 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 8984K| 3269 (12)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE | 100K| 8984K| 3269 (12)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ALBUM_ID"=42) filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42) Statistics ---------------------------------------------------------- 141 recursive calls 0 db block gets 3238 consistent gets 0 physical reads 0 redo size 3130019 bytes sent via SQL*Net to client 761 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
We notice nothing has appreciably changed, the Zone Map is still being used but the number of consistent gets remains the same as before. Why haven’t we returned back to our previous 2364 consistent gets ?
Well, as the range of possible values within the updated zone is now between 1 and 142, the required value of 42 could potentially be found within this zone and so still needs to be accessed just in case. We know that the value of 42 doesn’t exist within this zone, but Oracle has no way of knowing this based on the possible 1 to 142 range.
Hence Zone Maps work best when the data is well clustered and the Min/Max ranges of each zone can be used to limit which zones need to be accessed. If the data was not well clustered and the values within each zone mostly had ranges between the min and max values, then Oracle wouldn’t be able to effectively prune many/any zone and the Zone Map would be useless.
As we’ll see in Part III 🙂
Index Compression Part VI: 12c Index Advanced Compression Block Dumps (Tumble and Twirl) October 9, 2014
Posted by Richard Foote in 12c, Advanced Index Compression, Block Dumps, Index Compression, Oracle Indexes.5 comments
Sometimes, a few pictures (or in this case index block dumps) is better than a whole bunch of words 🙂
In my previous post, I introduced the new Advanced Index Compression feature, whereby Oracle automatically determines how to best compress an index. I showed a simple example of an indexed column that had sections of index entries that were basically unique (and so don’t benefit from compression) and other sections with index entries that had many duplicates (that do compress well). Advanced Index Compression enables Oracle to automatically just compress those index leaf blocks where compression is beneficial.
If we look at a couple of partial block dumps from this index, first a dump from a leaf block that did have duplicate index entries:
Leaf block dump
===============
header address 216542820=0xce82e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 651
kdxcofbo 1346=0x542
kdxcofeo 2172=0x87c
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 25166046=0x18000de
kdxleprv 25166044=0x18000dc
kdxledsz 0
kdxlebksz 8036
kdxlepnro 1
kdxlepnco 1 (Adaptive)
prefix row#0[8031] flag: -P—–, lock: 0, len=5
col 0; len 2; (2): c1 2b
prc 651
row#0[8022] flag: ——-, lock: 0, len=9
col 0; len 6; (6): 01 80 1e 86 00 5c
psno 0
row#1[8013] flag: ——-, lock: 0, len=9
col 0; len 6; (6): 01 80 1e 86 00 5d
psno 0
row#2[8004] flag: ——-, lock: 0, len=9
col 0; len 6; (6): 01 80 1e 86 00 5e
psno 0
row#3[7995] flag: ——-, lock: 0, len=9
col 0; len 6; (6): 01 80 1e 86 00 5f
psno 0
row#4[7986] flag: ——-, lock: 0, len=9
col 0; len 6; (6): 01 80 1e 86 00 60
psno 0
…
row#650[2172] flag: ——-, lock: 0, len=9
col 0; len 6; (6): 01 80 1e 8d 00 10
psno 0
—– end of leaf block Logical dump —–
The red section is a portion of the index header that determines the number of rows in the prefix table of the index (kdxlepnro 1). The prefix table basically lists all the distinct column values in the leaf blocks that are to be compressed. The value 1 denotes there is actually only just the 1 distinct column value in this specific leaf block (i.e. all index entries have the same indexed value). This section also denotes how many of the indexed columns are to be compressed (kdxlepnco 1). As this index only has the one column, it also has a value of 1. Note this value can potentially be anything between 0 (no columns compressed) up to the number of columns in the index. The (Adaptive) reference tells us that Index Advanced Compression has been used and that the values here can change from leaf block to leaf block depending on the data characteristics of the index entries within each leaf block (a dump of a basic compressed index will not have the “Adaptive” reference).
The green section is the compression prefix table and details all the unique combinations of index entries to be compressed within the leaf block. As all indexed values are the same in this index (value 42, internally represented as c1 2b hex), the prefix table only has the one row. prc 651 denotes that all 651 index entries in this leaf block have this specific indexed value.
Next follows all the actual index entries, which now only consist of the rowid (the 6 byte col 0 column) as they all reference psno 0, which is the unique row id of the only row within the prefix table (row#0).
So rather than storing the indexed value 651 times, we can just store the index value (42) just the once within the prefix table and simply reference it from within the actual index entries. This is why index compression can save us storage, storing something once within a leaf block rather than multiple times.
If we now look at a partial block dump of another index leaf block within the index, that consists of many differing (basically unique) index entries:
Leaf block dump
===============
header address 216542820=0xce82e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 449
kdxcofbo 938=0x3aa
kdxcofeo 1754=0x6da
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 25168667=0x1800b1b
kdxleprv 25168665=0x1800b19
kdxledsz 0
kdxlebksz 8036
kdxlepnro 0
kdxlepnco 0 (Adaptive)
row#0[8022] flag: ——-, lock: 0, len=14
col 0; len 4; (4): c3 58 3d 2c
col 1; len 6; (6): 01 80 12 e6 00 41
row#1[8008] flag: ——-, lock: 0, len=14
col 0; len 4; (4): c3 58 3d 2d
col 1; len 6; (6): 01 80 12 e6 00 42
row#2[7994] flag: ——-, lock: 0, len=14
col 0; len 4; (4): c3 58 3d 2e
col 1; len 6; (6): 01 80 12 e6 00 43
…
row#448[1754] flag: ——-, lock: 0, len=14
col 0; len 4; (4): c3 58 41 5c
col 1; len 6; (6): 01 80 12 ee 00 1d
—– end of leaf block Logical dump —–
We notice that in the red section, both kdxlepnro 0 and kdxlepnco 0 (Adaptive) have a value of 0, meaning we have no rows and no columns within the prefix table. As such, we have no prefix table at all here and that this leaf block has simply not been compressed.
If we look at the actual index entries, they all have an additional column now in blue, that being the actual indexed value as all the index values in this leaf block are different from each other. Without some form of index entry duplication, there would be no benefit from compression and Index Advanced Compression has automatically determined this and not bothered to compress this leaf block. An attempt to compress this block would have actually increased the necessary overall storage for these index entries, due to the additional overheads associated with the prefix table (note it has an additional 2 byes of overhead per row within the prefix table).
I’ll next look at an example of a multi-column index and how Index Advanced Compression handles which columns in the index to compress.
Index Compression Part V: 12c Advanced Index Compression (Little Wonder) October 2, 2014
Posted by Richard Foote in 12c, Advanced Index Compression, Index Compression, Oracle Indexes.3 comments
I’ve finally managed to find some free time in the evening to write a new blog piece 🙂
This will have to be the record for the longest time between parts in a series, having written Part IV of this Index Compression series way way back in February 2008 !! Here are the links to the previous articles in the series:
Index Compression Part I (Low)
Index Compression Part II (Down Is The New Up)
Index Compression Part III (2+2=5)
Index Compression Part IV (Packt Like Sardines In a Crushd Tin Box)
As I’ve previously discussed, compressing an index can be an excellent way to permanently reduce the size of an index in a very cost effective manner. Index entries with many duplicate values (or duplicate leading columns within the index) can be “compressed” by Oracle to reduce both storage overheads and potentially access overheads for large index scans. Oracle basically de-duplicates repeated indexed column values within each individual leaf block by storing each unique occurrence in a prefix section within the block, as I explain in the above links.
But it’s important to compress the right indexes in the right manner. If indexes do not have enough repeated data, it’s quite possible to make certain indexes larger rather than smaller when using compression (as the overheads of having the prefix section in the index block outweighs the benefits of limited reduction of repeated values). So one needs to be very selective on which indexes to compress and take care to compress the correct number of columns within the index. Oracle will only protect you from yourself if you attempt to compress all columns in a unique index, as in this scenario there can be no duplicate values to compress. This is all discussed in Part II and Part III of the series.
So, wouldn’t it be nice if Oracle made it all a lot easier for us and automatically decided which indexes to compress, which columns within the index to compress and which indexes to simply not bother compressing at all. Additionally, rather than an all or nothing approach in which all index leaf blocks are compressed in the same manner, wouldn’t it be nice if Oracle decided for each and every individual leaf block within the index how to best compress it. For those index leaf block that have no duplicate entries, do nothing, for those with some repeated columns just compress them and for those leaf blocks with lots of repeated columns and values to compress all of them as efficiently as possible.
Well, wish no more 🙂
With the recent release of Oracle Database 12.1.0.2, one of the really cool new features that got introduced was Advanced Index Compression. Now a warning from the get-go. The use of Advanced Index Compression requires the Advanced Compression Option and this option is automatically enabled with Enterprise Edition. So only use this feature if you are licensed to do so 🙂
The best way as always to see this new feature in action is via a simple little demo.
To begin, I’ll create a table with a CODE column that is populated with unique values:
SQL> create table bowie (id number, code number, name varchar2(30)); Table created. SQL> insert into bowie select rownum, rownum, 'ZIGGY STARDUST' from dual connect by level <= 1000000; 1000000 rows created.
I’ll now create a section of data within the table in which we have many repeated values:
SQL> update bowie set code = 42 where id between 250000 and 499999; 250000 rows updated. SQL> commit; Commit complete.
So I’ve fabricated the data such that the values in the CODE column are effectively unique within 75% of the table but the other 25% consists of repeated values.
From an index compression perspective, this index really isn’t a good candidate for normal compression as most of the CODE data contains unique data that doesn’t compress. However, it’s a shame that we can’t easily just compress the 25% of the index that would benefit from compression (without using partitioning or some such).
If we create a normal B-Tree index on the CODE column without compression:
SQL> create index bowie_code_i on bowie(code); Index created. SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE'; INDEX_NAME LEAF_BLOCKS COMPRESSION -------------------- ----------- ------------- BOWIE_CODE_I 2157 DISABLED
We notice the index consists of 2157 leaf blocks.
If we now try to use normal compression on the index:
SQL> alter index bowie_code_i rebuild compress; Index altered. SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE'; INDEX_NAME LEAF_BLOCKS COMPRESSION -------------------- ----------- ------------- BOWIE_CODE_I 2684 ENABLED
We notice that the compressed index rather than decrease in size has actually increased in size, up to 2684 leaf blocks. So the index has grown by some 25% due to the fact the index predominately contains unique values which don’t compress at all and the resultant prefix section in the leaf blocks becomes nothing more than additional overhead. The 25% section of the index containing all the repeated values has indeed compressed effectively but these savings are more than offset by the increase in size associated with the other 75% of the index where the index entries had no duplication.
However, if we use the new advanced index compression capability via the COMPRESS ADVANCED LOW clause:
SQL> alter index bowie_code_i rebuild compress advanced low; Index altered. SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE'; INDEX_NAME LEAF_BLOCKS COMPRESSION -------------------- ----------- ------------- BOWIE_CODE_I 2054 ADVANCED LOW
We notice the index has now indeed decreased in size from the original 2157 leaf blocks down to 2054. Oracle has effectively ignored all those leaf blocks where compression wasn’t viable and compressed just the 25% of the index where compression was effective. Obviously, the larger the key values (remembering the rowids associated with the index entries can’t be compressed) and the larger the percentage of repeated data, the larger the overall compression returns.
With Advanced Index Compression, it’s viable to simply set it on for all your B-Tree indexes and Oracle will uniquely compress automatically each individual index leaf block for each and every index as effectively as it can for the life of the index.