Index Compression Part IV (Packt Like Sardines In a Crushd Tin Box) February 29, 2008Posted by Richard Foote in Index Compression, Index Internals, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
A great question by Brian Tkatch has convinced me there needs to be at least a Part IV in this particular mini-series. The basic question is what size does the column need to be for it to be effectively compressed ?
If a column is only just the one byte, then it can’t really be compressed as the prefix related overheads would likely make the exercise pointless, right ?
Well not quite …
True a larger column that is heavily repeated will yield a better return from compression than a smaller column, but even a single byte column could be effectively compressed, IF there are sufficient repeated values per index leaf block.
Previously, I illustrated a logical representation of how Oracle compresses a block. The following is more of a physical representation of a compressed block:
Prefix 0: David Bowie
Prefix 1: David Jones
Remember, only the leading columns in an index can be compressed and index row entries must be sorted by these leading columns. Therefore each prefix entry is referenced by one or more logically consecutive index row entries and each index row entry can only have the one index prefix. The corresponding prefix for a specific index row entry is therefore simply the prefix entry that precedes it within the leaf block.
If we look at sample prefix entries:
prefix row#0 flag: -P—-, lock: 0, len=6
col 0; len 3; (3): c2 08 43
prefix row#1 flag: -P—-, lock: 0, len=6
col 0; len 3; (3): c2 08 44
Note that the prefix row#0 starts at address 8030 while the next prefix entry row#1 starts at address 8015. That’s a total of 15 byes and yet the length of the prefix entry is just 6. Where’s the other 9 bytes ?
Well note the prc value for prefix row#0 is 1 meaning this prefix entry is only referenced by the one index row, as is the next prefix entry as well. The leading column in this particular example is actually unique meaning each and every prefix entry only has one corresponding index row.
If we look at the corresponding row entry, guess what it’s length will be …
row#0 flag: ——, lock: 0, len=9
col 0; len 6; (6): 04 81 aa 8b 00 69
row#1 flag: ——, lock: 0, len=9
col 0; len 6; (6): 04 81 aa 8b 00 6a
If you guessed 9, well done. Note that it’s starting address is 8021, which is the starting address of the previous entry 3015 + 6 bytes for the prefix entry which equals 8021 (as Oracle actually fills index blocks from the “bottom up”). Note that index entry row#0 has a psno of 0 meaning it’s corresponding prefix entry references prefix row#0.
Note that the index row length of 9 is made up of 2 byes for locks and flags info, 6 byes for the rowid and 1 byte for the rowid length (as this index is a non-unique index, albeit with unique index values).
Therefore the psno value actually uses 0 bytes as its value can be purely derived from it’s position within the leaf block in relation to its parent prefix value.
Let’s now look at an index row entry with a leading column that is heavily repeated but only one byte in size.
row#0 flag: ——, lock: 0, len=11
col 0; len 1; (1): 41
col 1; len 6; (6): 04 81 12 8a 02 67
row#1 flag: ——, lock: 0, len=11
col 0; len 1; (1): 41
col 1; len 6; (6): 04 81 12 8a 02 68
row#2 flag: ——, lock: 0, len=11
col 0; len 1; (1): 41
col 1; len 6; (6): 04 81 12 8a 02 69
Note the same leading column (hex 41) is repeated again and again within the leaf block and uses a total of 2 bytes of storage, 1 byte for the value and 1 byte for the column length. That means for each and every index row entry, this column uses 2 bytes. Note the length of the index row entries is currently 11 bytes.
Let’s look at the same index compressed.
prefix row#0 flag: -P—-, lock: 0, len=4
col 0; len 1; (1): 41
row#0 flag: ——, lock: 0, len=9
col 0; len 6; (6): 04 81 12 8b 00 42
row#1 flag: ——, lock: 0, len=9
col 0; len 6; (6): 04 81 12 8b 00 43
row#2 flag: ——, lock: 0, len=9
col 0; len 6; (6): 04 81 12 8b 00 44
First thing to note is that there is only the one prefix row entry for this particular leaf block and that all 727 index row entries all share the same prefix row entry.
The length of the prefix entry is 4, 2 bytes for these flags and locking info, 1 byte for the prefix column value and 1 bye for the prefix column length. That’s just 4, tiny little byes.
Note that each index row entry is now only 9 bytes reduced from 11 as we no longer need to store the 2 bytes for the repeated column entry. That means for a total overhead of 4 byes, we are able to save 2 bytes in each and every index row entry.
Not bad at all and as such represents a useful saving. In this particular example, the index reduced from 163 leaf blocks down to 138 leaf blocks, just by compressing a single byte column as the column has very low cardinality.
Note however if this leading column were unique for every row, we would increase the storage of this column from 2 bytes for every index row entry up to 4 bytes for every index row entry as every index entry would have it’s own associated prefix entry. That’s why compression can potentially increase the size of an index if the compressed column(s) has too high a cardinality due to the additional 2 byte overhead required to store all prefix row entries.
The following Index Compression Part IV demo shows all this in a little more detail.
There’s more to index (and indeed table) compression than simply compressing an index (or table) for the sake of it …
BTW, the spelling mistakes in the post title are fully intentional
Index Compression Part III (2+2=5) February 22, 2008Posted by Richard Foote in Index Compression, Index Internals, Oracle General, Oracle Indexes, Performance Tuning.
1 comment so far
As previously discussed, compression is only beneficial when the compressed columns have repeated values within an index leaf block. If a compressed column has repeated values, Oracle will basically store the one copy of the compressed column (or columns) as a prefix entry within a leaf block and each index row entry will then be associated with it’s corresponding prefix entry.
That being the case, it makes no sense in compressing a single column Unique index. Each index entry must be unique, there can’t be any repeatable, duplicate column values. Therefore compression will be not only totally ineffective, but would actually result in an index structure that increases rather than decreases in size due to the extra overheads associated with having a prefix index entry for each and every index row entry.
It also makes no sense in compressing every column in a concatenated, multi-column Unique index. For exactly the same reasons. Each compressed index column combination must be unique and would result in a prefix entry for each and every index row entry. Compression would be worse than useless and result in an increased index structure.
However, Oracle does its best to protect ourselves from ourselves.
For a start, it does not allow one to create a compressed index on a single column Unique index. Attempt to do so and Oracle generates an “ORA-25193: cannot use COMPRESS option for a single column key”. Hey, even the error message is nice and meaningful.
If the Unique index has multiple columns, the default prefix length value (number of compressed columns) is the number of indexed columns minus one, not all columns as it is for a Non-Unique index. See, Oracle is doing its best here to prevent a useless attempt at index compression.
If you specify a prefix length value equal or greater than the number of columns in a Unique index, Oracle generates an “ORA-25194: invalid COMPRESS prefix length value”. These are not restrictions but designed to stop the creation of inefficient compressed indexes.
Note however that Non-Unique indexes can be used to police primary Key (PK) and Unique Key (UK) constraints. I’ve discussed all this previously. The constraints might be Unique, the data might be unique but the index is Non-Unique and so these “protections” fly out the window. There is nothing stopping one creating a single column compressed Non-Unique index to police a PK or UK constraint. There’s nothing preventing you from creating a concatenated, Non-Unique index with all columns compressed, from policing a PK or UK constraint. In fact, you can even create a fully compressed Non-Unique index at the same time as the PK or UK constraint with Oracle’s extended constraint and index creation syntax.
Nothing stopping you, except perhaps the realisation that it would be a very bad and futile thing to implement as the resultant index will be guaranteed to be less efficient than an equivalent nocompress index.
Yes, it might make sense to compress just the leading columns of a Unique index or a Non-Unique index that’s policing a PK or UK constraint, if such leading columns have sufficient repeating values to make compression effective. But it would simply not make sense to compress all columns from such indexes.
See this demo Index Compression Part III on how compression works and doesn’t work for Unique Indexes.
In Part IV, we’ll look at the issue of what specific columns might benefit from compression and look a little closer at how storage is actually saved.
Whoever said there can’t possibly be enough things to discuss on a Blog that focuses mainly on Oracle indexes
Index Compression Part II (Down Is The New Up) February 20, 2008Posted by Richard Foote in Index Compression, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
Compression can be very beneficial when the compressed columns of an index have many repeated values within a leaf block as these repeated values are only stored once and subsequently referred to within the index row entries themselves.
However, when the leading column is very selective or the compressed columns are very selective and have very few or possibly no repeating values in an index, then we have a problem.
In these scenarios, Oracle will create a massive prefix table, housing most if not all the individual column values anyways, as the prefix table stores all unique combinations of compressed column values within a leaf block.
The index row entries will point or refer to a prefix entry which is hardly shared (if at all) by other index row entries.
Compression in these cases is not only pointless but potentially worse than pointless as we now not only have to store a similar amount of index column data anyways, but additional compression related overheads such as those associated with the prefix table entries.
Net result can be no effective compression or indeed leaf blocks that can no longer store as many index values due to these associated compression overheads, resulting in indexes increasing rather than decreasing in size as a consequence.
For concatenated, multi-column indexes, the order of the indexed columns can make a huge difference to the compression of an index. If the leading column or columns (depending on how many columns are compressed) have many repeated column value combinations, not a problem as compression will be effective.
However if the leading column is very selective, then compression by definition will be ineffective, as there must be many distinct column values in the prefix table as a result which are less likely to shared by the index row entries.
For compression to be effective, the prefix table should have considerably fewer entries than index row entries in most leaf blocks. If there are approximately (say) 200 index entries per no-compressed leaf block, you want the number of distinct column combinations within the leaf block to be substantially less than 200.
For index compression to be feasible, ensure low cardinality columns are the leading columns in a concatenated index, else compression may be futile or worse.
See this demo on Index Compression Part II to see how compressing an index can be either effective or plain terrible and how the order of columns in a concatenated index can make all the difference to the effectiveness of index compression.
Next I’ll cover index compression with Unique Indexes …
Index Compression Part I (Low) February 17, 2008Posted by Richard Foote in Index Compression, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
Index compression is perhaps one of the most under used and neglected index options available. It has the potential to substantially reduce the overall size of Non-Unique indexes and multi-column Unique indexes, in some scenarios dramatically so. A smaller index, especially if it stays permanently smaller without any subsequent expensive maintenance operations, it always a nice thing. Not only will it potentially save storage, but if the resultant index contains fewer leaf blocks, that’s potentially fewer LIOs and from the Cost Based Optimizer’s point of view, potentially a cheaper execution plan option.
All possible without a single index rebuild in sight …
However like most things Oracle, index compression also has the potential to be misused and to cause rather than solve problems. So it really helps to understand how index compression works and how it’s actually implemented before we rush into anything.
The first point to understand is that index compression is implemented at the index block level. Basically, Oracle stores each distinct combination of compressed column values found within a specific index leaf block in a “Prefix” table within the leaf block and assigns each combination a unique prefix number.
The more numbers of distinct compressed column values, the more entries in the prefix table and the larger the prefixed related data. The fewer numbers of distinct compressed column values, the fewer entries in the prefix table and the smaller the prefix related data. Generally, the fewer entries in the prefix table, the better the compression.
Oracle now no longer stores these compressed columns within the actual index row entries. These compressed columns are substituted and referenced with the details stored in the prefix table entry, which are shared by all index row entries with the same corresponding prefix value.
Only leading columns (which in a Non-Unique Index can potentially be all the columns in an index, in a Unique Index can potentially be all but the last column in the index) can be compressed. Therefore, the prefix table is in the same logical order as they would appear in the index row entries. The values of the prefix values always appear within the index row entries in a sequential manner, noting that (hopefully) several row entries share the same prefix number.
Let’s say we currently have a nocompress Non-Unique index on a NAME column with the following entries:
0: David Bowie
1: David Bowie
2: David Bowie
3: David Bowie
4: David Bowie
5: David Jones
6: David Jones
7: David Jones
After the index is compressed, the leaf block entries would look logically something like this:
0: David Bowie
1: David Jones
Importantly, each distinct combination of compressed column values is now stored just the once within an individual index leaf block. In the example above, we previously stored “David Bowie” 5 times. In the compressed index leaf block, we now only store “David Bowie” once, with the prefix value now being referenced by each index entry instead.
The net result being we can now (hopefully) store many more index entries per leaf block meaning we don’t need as many leaf blocks in our index.
To compress an index, simply specify the COMPRESS option:
CREATE INDEX bowie_table_i ON bowie_table(col1, col2) COMPRESS 2;
The number after the COMPRESS keyword denotes how many columns to compress. The default is all columns in a Non-Unique index and all columns except the last column in a Unique index.
This demo, Index Compression Part I shows how an appropriately compressed index can substantially reduce the overall size of an index. It also shows a couple of index leaf block dumps to highlight how index compression is implemented.
In Part II, I’ll show you how you can really stuff things up by implementing index compression totally inappropriately …