jump to navigation

Basic Index Compression Made Simple (It Ain’t Easy) August 2, 2017

Posted by Richard Foote in Index Compression, Index statistics, Oracle Indexes, Validate Structure.
add a comment

DavidBowie3

I’ve discussed Index Compression a number of times as it’s an excellent way of minimizing the size of indexes without unnecessary, expensive index rebuilds.

One of the nice features of Advanced Index Compression is that not only does it potentially compress indexes more effectively than possible with Basic Index Compression, but that it also eliminates the needs to understand the indexed data in order to configure the appropriate prefix column count.

As I’ve discussed previously, index compression works by effectively deduplicating repeated indexed values within a leaf block by storing each unique indexed value in a prefix-table in the leaf block and referencing the prefixed value within the actual index entry. This means repeated values need only be stored once per leaf block, hence saving space. However, if there are few or no repeated values, there are no deduplication benefits and the overheads associated with the prefix table can exceed the potential savings (if any), making indexes potentially larger not smaller.

It’s possible however to only compress the leading portion of an index entry, such that only indexed columns that are actually replicated are compressed, leaving less replicated indexed columns uncompressed within the index entry. The decision therefore on how many columns within the index to actually compress is crucial to the compression effectiveness.

It’s this uncertainty and fear of actually making indexes worse with basic compression that puts off a lot of DBAs from implementing index compression and indeed why the “no-brainer” capabilities of Advanced Index Compression is so appealing.

For those that do not have access to the Advanced Compression database option or in the new Oracle Cloud world, access to at least the “High Performance” database package, there is a method that can assist in determining the best manner in which to use basic compression to compress your indexes.

To illustrate, a simple example. We begin by creating a little table that has two columns of interest, an ID column that is effectively unique and a CODE column that only has 10 distinct values and so plenty of duplication:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'ZIGGY STARDUST'
     from dual connect by level >=2000000;

2000000 rows created.

SQL> commit;

Commit complete.

We next create a concatenated index with the CODE column leading, followed by the unique ID column. The index entries as a whole are therefore effectively unique and so compressing the whole index would be ineffective. However, as the leading CODE column has many replicated values, there would be benefit in just compressing this first leading column. However, we need to fully understand the data within the index to correctly determine we need to compress just the first column to effectively compress this index.

SQL> create index bowie_code_id_i on bowie(code, id) pctfree 0;

Index created.

SQL> select num_rows, blevel, leaf_blocks from user_indexes
     where index_name='BOWIE_CODE_ID_I';

  NUM_ROWS     BLEVEL LEAF_BLOCKS
---------- ---------- -----------
   2000000          2        4848

Currently the uncompressed index has 4848 leaf blocks.

But how to effectively compress this index, especially if we don’t really understand the data profile of the indexed columns ?

One possible method is to ANALYZE with VALIDATE STRUCTURE the index and explore a couple of useful columns within INDEX_STATS:

SQL> analyze index bowie_code_id_i validate structure;

Index analyzed.

SQL> select name, height, lf_blks, opt_cmpr_count, opt_cmpr_pctsave
     from index_stats;

NAME             HEIGHT    LF_BLKS OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
--------------- ------- ---------- -------------- ----------------
BOWIE_CODE_ID_I       3       4848              1               14

OPT_CMPR_COUNT tells us how many columns to compress to get optimal benefit from basic index compression. In this example, we should only compress 1 column.

OPT_CMPR_PCTSAVE tells us how much benefit we would likely achieve if we were to compress just this 1 column of the index. In this example, the index will reduce by some 14%.

So let’s go ahead and implement this recommendation:

SQL> alter index bowie_code_id_i rebuild compress 1;

Index altered.

SQL> select num_rows, blevel, leaf_blocks
     from user_indexes where index_name='BOWIE_CODE_ID_I';

  NUM_ROWS     BLEVEL LEAF_BLOCKS
---------- ---------- -----------
   2000000          2        4133

We notice the index is now just 4133 leaf blocks and has actually reduced in size by some 14.75%, not bad compared to the 14% estimate.

If we disregard this advice and just compress the entire index:

SQL> alter index bowie_code_id_i rebuild compress;

Index altered.

SQL> select num_rows, blevel, leaf_blocks from user_indexes
     where index_name='BOWIE_CODE_ID_I';

  NUM_ROWS     BLEVEL LEAF_BLOCKS
---------- ---------- -----------
   2000000          2        6363

We notice the index is now substantially larger at 6363 leaf blocks than it was previously (4848 leaf blocks) when the index was uncompressed.

If we create another index, but this time with ID as the leading column:

SQL> create index bowie_id_code_i on bowie(id, code) pctfree 0;

Index created.

SQL> select num_rows, blevel, leaf_blocks from user_indexes
     where index_name='BOWIE_ID_CODE_I';

  NUM_ROWS     BLEVEL LEAF_BLOCKS
---------- ---------- -----------
   2000000          2        4851

With the leading column effectively unique, there wouldn’t be any benefit in using basic compression on this index as there are no replicated values from the leading column onwards to deduplicate:

SQL> analyze index bowie_id_code_i validate structure;

Index analyzed.

SQL> select name, height, lf_blks, opt_cmpr_count, opt_cmpr_pctsave
     from index_stats;

NAME             HEIGHT    LF_BLKS OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
--------------- ------- ---------- -------------- ----------------
BOWIE_ID_CODE_I       3       4851              0                0

Analyzing the index with VALIDATE STRUCTURE confirms that 0 columns are worth compressing with this index.

An attempt to just compress the leading column would indeed be counter-productive:

SQL> alter index bowie_id_code_i rebuild compress 1;

Index altered.

SQL> select num_rows, blevel, leaf_blocks from user_indexes
     where index_name='BOWIE_ID_CODE_I';

  NUM_ROWS     BLEVEL LEAF_BLOCKS
---------- ---------- -----------
   2000000          2        6361

The index is again much larger at 6361 leaf blocks than it was previously (4851 leaf blocks) when uncompressed.

So the order of the columns within the index is crucial in determining the potential benefit of index compression.

I don’t particularly like using ANALYZE VALIDATE STRUCTURE, not least because it locks the parent table during the analyze operation, but if there’s available downtime or a full copy of the database where locking is not an issue, then this is an effective way to determine how to best compress indexes with basic compression.

Advertisements