jump to navigation

Index Compression Part II (Down Is The New Up) February 20, 2008

Posted by Richard Foote in Index Compression, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
trackback

Compression can be very beneficial when the compressed columns of an index have many repeated values within a leaf block as these repeated values are only stored once and subsequently referred to within the index row entries themselves.

However, when the leading column is very selective or the compressed columns are very selective and have very few or possibly no repeating values in an index, then we have a problem.

In these scenarios, Oracle will create a massive prefix table, housing most if not all the individual column values anyways, as the prefix table stores all unique combinations of compressed column values within a leaf block.

The index row entries will point or refer to a prefix entry which is hardly shared (if at all) by other index row entries.

Compression in these cases is not only pointless but potentially worse than pointless as we now not only have to store a similar amount of index column data anyways, but additional compression related overheads such as those associated with the prefix table entries.

Net result can be no effective compression or indeed leaf blocks that can no longer store as many index values due to these associated compression overheads, resulting in indexes increasing rather than decreasing in size as a consequence.

For concatenated, multi-column indexes, the order of the indexed columns can make a huge difference to the compression of an index. If the leading column or columns (depending on how many columns are compressed) have many repeated column value combinations, not a problem as compression will be effective.

However if the leading column is very selective, then compression by definition will be ineffective, as there must be many distinct column values in the prefix table as a result which are less likely to shared by the index row entries.

For compression to be effective, the prefix table should have considerably fewer entries than index row entries in most leaf blocks. If there are approximately (say) 200 index entries per no-compressed leaf block, you want the number of distinct column combinations within the leaf block to be substantially less than 200.

For index compression to be feasible, ensure low cardinality columns are the leading columns in a concatenated index, else compression may be futile or worse.

See this demo on Index Compression Part II to see how compressing an index can be either effective or plain terrible and how the order of columns in a concatenated index can make all the difference to the effectiveness of index compression.

Next I’ll cover index compression with Unique Indexes …

Comments»

1. Avirup Sen - June 27, 2009

Hi Richard,
I posted 2 questions in PART 1 of your blog.

Q1) If a non unique index involving multiple columns is created, then to get the maximum benefits of key compression should one use the column with the lowest cardinality (lowest number of distinct values) to compress?

I got the Answer to that question1 by reading PART 2 of your blog.

Howver, I still have the following questions

1) Need to create non-unique index on a table involving 2 columns (C1, C2)
C2 has the lower cardinality between C1 and C2. But the data in the table is frequently queried using a filter in C1
From Index perspective, I should keep C1 as the first column in the index and from Compression Perspective I should keep C2 as the first column in the index
How to decide on this?

Also another question I have

2) Table T_1 : Having 4 Columns C1, C2, C3, C4 is List Partitioned by C1
And there is a local non unique index on T_1_IDX01 on C1, C2, C3.

C1 in this case has the lowest cardinality and the table is also most frequently accessed by C1. Will it make sense to compress this index using compress 1 since the index is already partitioned? So, for a given partition there will be only 1 distinct value of C1.
Should I be using something like compress 2 in this case?

Please advise.

Like

Richard Foote - June 30, 2009

Hi Avirup

I’ve already answered 1) elsewhere.

2). If C1 is the list partitioned key and you only have the only value per partition, then I wouldn’t bother with having C1 in the index at all.

Each local index will only have the one value for C1 and Oracle will automatically be able to use partition pruning if C1 is specified in the query and know precisely which partitions to access (and which local indexes to use. Therefore having C1 in the index has no practical benefit as the remaining columns can be used to determine the specific rows of interest.

Like

Avirup Sen - July 7, 2009

Hi Richard,
Thanks a lot for the explanation. I will not have C1 as part of the index.

Like

2. Justis Durkee - October 22, 2009

I am seeing the case where the partitioned key is not in the index, and the optimizer is choosing to access to the table by rowid even though the query could be satisfied by the index alone. This is the case when the partition key is a bind variable predicate. It seems the partition key can be helpful in a local index to prevent unnecessary table access.

SQL > define table_name = np
SQL > define percent = 100
SQL >
SQL > drop table np purge;
SQL >
SQL > create table np(
2 type varchar2(1),
3 id number,
4 ref number
5 ) partition by list(type) (
6 partition np_a values(‘a’),
7 partition np_b values(‘b’)
8 )
9 tablespace scratch
10 /
SQL > insert into np values(‘a’,1,2);
SQL > insert into np values(‘b’,3,4);
SQL > create index np_idx on np(id,ref) local(
2 partition np_a,
3 partition np_b
4 ) tablespace scratch
5 /
SQL >
SQL > begin
2 dbms_stats.gather_table_stats(
3 ownname => user,
4 tabname => ‘&table_name’,
5 cascade => true,
6 estimate_percent => &percent
7 );
8 end;
9 /
SQL > explain plan for select id, ref from np where type = :type and id = 1;
SQL > select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
————————————————————————————————————-
Plan hash value: 1548701032

————————————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
————————————————————————————————————-
| 0 | SELECT STATEMENT | | 1 | 8 | 2 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE | | 1 | 8 | 2 (0)| 00:00:01 | KEY | KEY |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID| NP | 1 | 8 | 2 (0)| 00:00:01 | KEY | KEY |
|* 3 | INDEX RANGE SCAN | NP_IDX | 1 | | 1 (0)| 00:00:01 | KEY | KEY |
————————————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

3 – access(“ID”=1)
SQL > drop index np_idx;
SQL > create index np_idx on np(type,id,ref) local(
2 partition np_a,
3 partition np_b
4 ) compress tablespace scratch
5 /
SQL >
SQL > begin
2 dbms_stats.gather_table_stats(
3 ownname => user,
4 tabname => ‘&table_name’,
5 cascade => true,
6 estimate_percent => &percent
7 );
8 end;
9 /
SQL > explain plan for select id, ref from np where type = :type and id = 1;
SQL > select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
————————————————————————————————
Plan hash value: 548459601

————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
————————————————————————————————
| 0 | SELECT STATEMENT | | 1 | 8 | 1 (0)| 00:00:01 | | |
| 1 | PARTITION LIST SINGLE| | 1 | 8 | 1 (0)| 00:00:01 | KEY | KEY |
|* 2 | INDEX RANGE SCAN | NP_IDX | 1 | 8 | 1 (0)| 00:00:01 | KEY | KEY |
————————————————————————————————

Predicate Information (identified by operation id):
—————————————————

2 – access(“TYPE”=:TYPE AND “ID”=1)

Like

Richard Foote - October 25, 2009

Hi Justis

I suspect the issue is simply that Oracle isn’t clever enough to pick up the fact that there’s only just the 1 value per list partition. In this case, in theory yes, it doesn’t need to visit the table but if you take the general case where there could possibly be several values in a particular list partition, then Oracle would be forced to visit the table to ensure you only select a row with the required value of type.

By having the type in the index, now there’s no problems as all the required info is now within the index.

It’s a check (that only one value in each and every list partition) that the CBO doesn’t bother to make as it’s a rather specific case not worth the extra lines of logic to execute.

Like

3. Justis Durkee - October 26, 2009

Hello Richard,

Thanks for the reply. Makes sense. Normally, it is impossible for the optimizer to know the value of a partition key for a particular row just based on the partition it resides in. It would be a keen optimization though 🙂

Like

Richard Foote - October 28, 2009

Hi Justis

There are times when I’m surprised just how clever the CBO is determining little efficiencies and tricks.

Who knows, detecting just one unique value per partition might be a 12g improvement yet.

Like

4. A few words on Index Compression « The Dutch Prutser's Blog - April 6, 2010

[…] thorough discussion on index compression see Richard Foote’s fabulous Oracle blog. ( part 1 , part 2 , part 3 And part […]

Like

5. briankp - February 16, 2012

Hi Richard,

Since index compression is beneficial as per your article, I am interested to do this. However I have one concern on performance perspective.

Generally, it’s recommended to create an new index with the most selective fields as the leading columns followed by the least selective fields. For instance, a select statement accessing on table A with 2 fields – F1 (very selective with distinct value over 100K ) and F2 (not selective with only 500 distinct values), thus creating an index based on fields F1 and F2 and with this sequence will help to improve the performance of this select statemnet.

I understand that to implement index compression, the leading column should be from the unselective field (less distinct values). I doubt that if an index is built in this way (F2 – compressed as leading column then followed by F1 – uncompressed), will the perfomance be comparatively faster than an uncompressed index.

Hope you can shed some lights on this.

Thanks,
Brian.

Like

6. antony - November 14, 2012

I am confused with definition of lowest cardinality(lowest number of distinct values).

By definition,cardinality is the number of rows returned by an operation.(cardinality=selectivity * number of rows) and selectivity is 1/NDV (assume no histogram on the column)

if we have a table with 100 rows and column c1’s NDV is 1 and c2’s NDV 4.

cardinality(C1)=1/1*100=100
cardinality(C2)=1/4*200=50

So lowest cardinality in this case has highest number of distinct values.

Correct me,if i am wrong.

Thanks

Like

7. Richard Foote - December 4, 2012

Hi Antony

From my Uni days (which is now but a fading memory !!) cardinality is basically the number of elements in a set. From an execution plan, your understanding is correct, it’s the number of rows returned by an operation.

From a column perspective, a low cardinality column has few members in a set or a column with lots of repeated values and thus few distinct values.

So an operation that returns a high cardinality in an execution plan is potentially based a low cardinality column.

Like

antony - December 14, 2012

Thanks Richard!
I am very clear now.

Like


Leave a reply to Richard Foote Cancel reply