jump to navigation

Index Compression Part IV (Packt Like Sardines In a Crushd Tin Box) February 29, 2008

Posted by Richard Foote in Index Compression, Index Internals, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
17 comments

A great question by Brian Tkatch has convinced me there needs to be at least a Part IV in this particular mini-series. The basic question is what size does the column need to be for it to be effectively compressed ?

If a column is only just the one byte, then it can’t really be compressed as the prefix related overheads would likely make the exercise pointless, right ?

Well not quite …

True a larger column that is heavily repeated will yield a better return from compression than a smaller column, but even a single byte column could be effectively compressed, IF there are sufficient repeated values per index leaf block.

Previously, I illustrated a logical representation of how Oracle compresses a block. The following is more of a physical representation of a compressed block:

Prefix 0: David Bowie

0:0

  : ROWID

1: 0

  : ROWID

2: 0

  : ROWID

3: 0

  : ROWID

4: 0

  : ROWID

Prefix 1: David Jones

 

 

5: 1

 

  : ROWID

6: 1

  : ROWID

7: 1

  : ROWID

Remember, only the leading columns in an index can be compressed and index row entries must be sorted by these leading columns. Therefore each prefix entry is referenced by one or more logically consecutive index row entries and each index row entry can only have the one index prefix. The corresponding prefix for a specific index row entry is therefore simply the prefix entry that precedes it within the leaf block.

If we look at sample prefix entries:

prefix row#0[8030] flag: -P—-, lock: 0, len=6
col 0; len 3; (3):  c2 08 43
prc 1
prefix row#1[8015] flag: -P—-, lock: 0, len=6
col 0; len 3; (3):  c2 08 44
prc 1

Note that the prefix row#0 starts at address 8030 while the next prefix entry row#1 starts at address 8015. That’s a total of 15 byes and yet the length of the prefix entry is just 6. Where’s the other 9 bytes ?

Well note the prc value for prefix row#0 is 1 meaning this prefix entry is only referenced by the one index row, as is the next prefix entry as well.  The leading column in this particular example is actually unique meaning each and every prefix entry only has one corresponding index row.

If we look at the corresponding row entry, guess what it’s length will be …

row#0[8021] flag: ——, lock: 0, len=9
col 0; len 6; (6):  04 81 aa 8b 00 69
psno 0
row#1[8006] flag: ——, lock: 0, len=9
col 0; len 6; (6):  04 81 aa 8b 00 6a
psno 1

If you guessed 9, well done. Note that it’s starting address is 8021, which is the starting address of the previous entry 3015 + 6 bytes for the prefix entry which equals 8021 (as Oracle actually fills index blocks from the “bottom up”). Note that index entry row#0 has a psno of 0 meaning it’s corresponding prefix entry references prefix row#0.

Note that the index row length of 9 is made up of 2 byes for locks and flags info, 6 byes for the rowid and 1 byte for the rowid length (as this index is a non-unique index, albeit with unique index values).

Therefore the psno value actually uses 0 bytes as its value can be purely derived from it’s position within the leaf block in relation to its parent prefix value.

Let’s now look at an index row entry with a leading column that is heavily repeated but only one byte in size.

row#0[8025] flag: ——, lock: 0, len=11
col 0; len 1; (1):  41
col 1; len 6; (6):  04 81 12 8a 02 67
row#1[8014] flag: ——, lock: 0, len=11
col 0; len 1; (1):  41
col 1; len 6; (6):  04 81 12 8a 02 68
row#2[8003] flag: ——, lock: 0, len=11
col 0; len 1; (1):  41
col 1; len 6; (6):  04 81 12 8a 02 69

Note the same leading column (hex 41) is repeated again and again within the leaf block and uses a total of 2 bytes of storage, 1 byte for the value and 1 byte for the column length. That means for each and every index row entry, this column uses 2 bytes. Note the length of the index row entries is currently 11 bytes.

Let’s look at the same index compressed.

prefix row#0[8032] flag: -P—-, lock: 0, len=4
col 0; len 1; (1):  41
prc 726
row#0[8023] flag: ——, lock: 0, len=9
col 0; len 6; (6):  04 81 12 8b 00 42
psno 0
row#1[8014] flag: ——, lock: 0, len=9
col 0; len 6; (6):  04 81 12 8b 00 43
psno 0
row#2[8005] flag: ——, lock: 0, len=9
col 0; len 6; (6):  04 81 12 8b 00 44
psno 0

First thing to note is that there is only the one prefix row entry for this particular leaf block and that all 727 index row entries all share the same prefix row entry.

The length of the prefix entry is 4, 2 bytes for these flags and locking info, 1 byte for the prefix column value and 1 bye for the prefix column length. That’s just 4, tiny little  byes.

Note that each index row entry is now only 9 bytes reduced from 11 as we no longer need to store the 2 bytes for the repeated column entry. That means for a total overhead of 4 byes, we are able to save 2 bytes in each and every index row entry.

Not bad at all and as such represents a useful saving. In this particular example, the index reduced from 163 leaf blocks down to 138 leaf blocks, just by compressing a single byte column as the column has very low cardinality.

Note however if this leading column were unique for every row, we would increase the storage of this column from 2 bytes for every index row entry up to 4 bytes for every index row entry as every index entry would have it’s own associated prefix entry. That’s why compression can potentially increase the size of an index if the compressed column(s) has too high a cardinality due to the additional 2 byte overhead required to store all prefix row entries.

The following Index Compression Part IV demo shows all this in a little more detail.

There’s more to index (and indeed table) compression than simply compressing an index (or table) for the sake of it …

BTW, the spelling mistakes in the post title are fully intentional 😉