jump to navigation

Index Compression Part II (Down Is The New Up) February 20, 2008

Posted by Richard Foote in Index Compression, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
trackback

Compression can be very beneficial when the compressed columns of an index have many repeated values within a leaf block as these repeated values are only stored once and subsequently referred to within the index row entries themselves.

However, when the leading column is very selective or the compressed columns are very selective and have very few or possibly no repeating values in an index, then we have a problem.

In these scenarios, Oracle will create a massive prefix table, housing most if not all the individual column values anyways, as the prefix table stores all unique combinations of compressed column values within a leaf block.

The index row entries will point or refer to a prefix entry which is hardly shared (if at all) by other index row entries.

Compression in these cases is not only pointless but potentially worse than pointless as we now not only have to store a similar amount of index column data anyways, but additional compression related overheads such as those associated with the prefix table entries.

Net result can be no effective compression or indeed leaf blocks that can no longer store as many index values due to these associated compression overheads, resulting in indexes increasing rather than decreasing in size as a consequence.

For concatenated, multi-column indexes, the order of the indexed columns can make a huge difference to the compression of an index. If the leading column or columns (depending on how many columns are compressed) have many repeated column value combinations, not a problem as compression will be effective.

However if the leading column is very selective, then compression by definition will be ineffective, as there must be many distinct column values in the prefix table as a result which are less likely to shared by the index row entries.

For compression to be effective, the prefix table should have considerably fewer entries than index row entries in most leaf blocks. If there are approximately (say) 200 index entries per no-compressed leaf block, you want the number of distinct column combinations within the leaf block to be substantially less than 200.

For index compression to be feasible, ensure low cardinality columns are the leading columns in a concatenated index, else compression may be futile or worse.

See this demo on Index Compression Part II to see how compressing an index can be either effective or plain terrible and how the order of columns in a concatenated index can make all the difference to the effectiveness of index compression.

Next I’ll cover index compression with Unique Indexes …

Comments»

1. Avirup Sen - June 27, 2009

Hi Richard,
I posted 2 questions in PART 1 of your blog.

Q1) If a non unique index involving multiple columns is created, then to get the maximum benefits of key compression should one use the column with the lowest cardinality (lowest number of distinct values) to compress?

I got the Answer to that question1 by reading PART 2 of your blog.

Howver, I still have the following questions

1) Need to create non-unique index on a table involving 2 columns (C1, C2)
C2 has the lower cardinality between C1 and C2. But the data in the table is frequently queried using a filter in C1
From Index perspective, I should keep C1 as the first column in the index and from Compression Perspective I should keep C2 as the first column in the index
How to decide on this?

Also another question I have

2) Table T_1 : Having 4 Columns C1, C2, C3, C4 is List Partitioned by C1
And there is a local non unique index on T_1_IDX01 on C1, C2, C3.

C1 in this case has the lowest cardinality and the table is also most frequently accessed by C1. Will it make sense to compress this index using compress 1 since the index is already partitioned? So, for a given partition there will be only 1 distinct value of C1.
Should I be using something like compress 2 in this case?

Please advise.

Richard Foote - June 30, 2009

Hi Avirup

I’ve already answered 1) elsewhere.

2). If C1 is the list partitioned key and you only have the only value per partition, then I wouldn’t bother with having C1 in the index at all.

Each local index will only have the one value for C1 and Oracle will automatically be able to use partition pruning if C1 is specified in the query and know precisely which partitions to access (and which local indexes to use. Therefore having C1 in the index has no practical benefit as the remaining columns can be used to determine the specific rows of interest.

Avirup Sen - July 7, 2009

Hi Richard,
Thanks a lot for the explanation. I will not have C1 as part of the index.

2. Justis Durkee - October 22, 2009

I am seeing the case where the partitioned key is not in the index, and the optimizer is choosing to access to the table by rowid even though the query could be satisfied by the index alone. This is the case when the partition key is a bind variable predicate. It seems the partition key can be helpful in a local index to prevent unnecessary table access.

SQL > define table_name = np
SQL > define percent = 100
SQL >
SQL > drop table np purge;
SQL >
SQL > create table np(
2 type varchar2(1),
3 id number,
4 ref number
5 ) partition by list(type) (
6 partition np_a values(‘a’),
7 partition np_b values(‘b’)
8 )
9 tablespace scratch
10 /
SQL > insert into np values(‘a’,1,2);
SQL > insert into np values(‘b’,3,4);
SQL > create index np_idx on np(id,ref) local(
2 partition np_a,
3 partition np_b
4 ) tablespace scratch
5 /
SQL >
SQL > begin
2 dbms_stats.gather_table_stats(
3 ownname => user,
4 tabname => ‘&table_name’,
5 cascade => true,
6 estimate_percent => &percent
7 );
8 end;
9 /
SQL > explain plan for select id, ref from np where type = :type and id = 1;
SQL > select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
————————————————————————————————————-
Plan hash value: 1548701032

————————————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
————————————————————————————————————-
| 0 | SELECT STATEMENT | | 1 | 8 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE | | 1 | 8 | 2 (0)| 00:00:01 | KEY | KEY |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID| NP | 1 | 8 | 2 (0)| 00:00:01 | KEY | KEY |
|* 3 | INDEX RANGE SCAN | NP_IDX | 1 | | 1 (0)| 00:00:01 | KEY | KEY |
————————————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

3 – access(“ID”=1)
SQL > drop index np_idx;
SQL > create index np_idx on np(type,id,ref) local(
2 partition np_a,
3 partition np_b
4 ) compress tablespace scratch
5 /
SQL >
SQL > begin
2 dbms_stats.gather_table_stats(
3 ownname => user,
4 tabname => ‘&table_name’,
5 cascade => true,
6 estimate_percent => &percent
7 );
8 end;
9 /
SQL > explain plan for select id, ref from np where type = :type and id = 1;
SQL > select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
————————————————————————————————
Plan hash value: 548459601

————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 8 | 1 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE| | 1 | 8 | 1 (0)| 00:00:01 | KEY | KEY |
|* 2 | INDEX RANGE SCAN | NP_IDX | 1 | 8 | 1 (0)| 00:00:01 | KEY | KEY |
————————————————————————————————

Predicate Information (identified by operation id):
—————————————————

2 – access(“TYPE”=:TYPE AND “ID”=1)

Richard Foote - October 25, 2009

Hi Justis

I suspect the issue is simply that Oracle isn’t clever enough to pick up the fact that there’s only just the 1 value per list partition. In this case, in theory yes, it doesn’t need to visit the table but if you take the general case where there could possibly be several values in a particular list partition, then Oracle would be forced to visit the table to ensure you only select a row with the required value of type.

By having the type in the index, now there’s no problems as all the required info is now within the index.

It’s a check (that only one value in each and every list partition) that the CBO doesn’t bother to make as it’s a rather specific case not worth the extra lines of logic to execute.

3. Justis Durkee - October 26, 2009

Hello Richard,

Thanks for the reply. Makes sense. Normally, it is impossible for the optimizer to know the value of a partition key for a particular row just based on the partition it resides in. It would be a keen optimization though :)

Richard Foote - October 28, 2009

Hi Justis

There are times when I’m surprised just how clever the CBO is determining little efficiencies and tricks.

Who knows, detecting just one unique value per partition might be a 12g improvement yet.