jump to navigation

Index Only Values Of Interest: (Little Wonder) January 28, 2008

Posted by Richard Foote in Function Based Indexes, Indexing Tricks, Oracle General, Oracle Indexes.

Thought I might expand a little on the discussion and comments on how NULLs can be indexed and address point #6 on my list of those things you may not have known about indexes

“It’s possible and potentially very useful to just index some column values and not all column values within a table”.

as well as touching on point #4 that “B-Tree Indexes can be extremely useful and beneficial even if the column contains very few distinct values (as low as 1)”.

As previously discussed, index entries which are fully NULL are not indexed by Oracle. We can however use this fact to our advantage.

There are many scenarios whereby we may only search for a rowset based on a subset of the possible values in a column or group of columns. The classic scenario is where we may have a flag or status field denoting “current”, “live”, “not yet processed”, etc. rows and our main transactional queries are only interested in these relatively few rows.

Most rows are “historical”, “processed”, etc. rows and are not generally of interest and when they are of interest represent such a large proportion of the overall table that an index would be inappropriate for these batch jobs or long running reports to access them anyways. Often, (but not always) we might need a histogram to let the CBO know that those column values of interest actually represents a small, non-uniform proportion of the overall rowset.

Because we need to efficiently access those few rows of interest, we generally index the column but in the process also index all the other column values that aren’t of interest as well. It’s all or nothing, right ?

Not necessarily. A possible solution is to use an appropriate function-based index in combination with our understanding that fully null index entries are not actually indexed. For example, let assume we have a very large table that has a STATUS code column. The only column value of interest are those with a status value of ‘BOWIE’, all other values are simply not of direct interest with our OLTP queries. By creating an index such as:

CREATE INDEX index_some_stuff_i ON
index_some_stuff(DECODE(status, ‘BOWIE’, ‘BOWIE’, NULL)) COMPUTE STATISTICS;

the decode function only returns a non-null value for the specific status of “BOWIE”. All other values are converted to nulls and so are not indexed.

We now have an index that consists of nothing but “BOWIE” values. As a result, the index is tiny because the vast majority of column values are simply not indexed. But because the percentage of rows that actually have a status of “BOWIE” is very small, the CBO looks at this index very favourably. By now writing our queries in a manner such as this:

SELECT * FROM index_some_stuff
WHERE(DECODE(status, ‘BOWIE’, ‘BOWIE’, null)) = ‘BOWIE’;

It will hopefully use our nice, small, efficient function-based index.

Not only will this index save us potentially large storage overheads, but if it may be small enough to reduce the height of the index on a permanent basis, thus making the index access more efficient.

See this demo for an example of how we reduced an index with 2924 leaf blocks and a height of 3 down to a height of 1 and just the 1 leaf block.