Index Only Values Of Interest: (Little Wonder)

Index Only Values Of Interest: (Little Wonder) January 28, 2008

Posted by Richard Foote in Function Based Indexes, Indexing Tricks, Oracle General, Oracle Indexes.
trackback

Thought I might expand a little on the discussion and comments on how NULLs can be indexed and address point #6 on my list of those things you may not have known about indexes

“It’s possible and potentially very useful to just index some column values and not all column values within a table”.

as well as touching on point #4 that “B-Tree Indexes can be extremely useful and beneficial even if the column contains very few distinct values (as low as 1)”.

As previously discussed, index entries which are fully NULL are not indexed by Oracle. We can however use this fact to our advantage.

There are many scenarios whereby we may only search for a rowset based on a subset of the possible values in a column or group of columns. The classic scenario is where we may have a flag or status field denoting “current”, “live”, “not yet processed”, etc. rows and our main transactional queries are only interested in these relatively few rows.

Most rows are “historical”, “processed”, etc. rows and are not generally of interest and when they are of interest represent such a large proportion of the overall table that an index would be inappropriate for these batch jobs or long running reports to access them anyways. Often, (but not always) we might need a histogram to let the CBO know that those column values of interest actually represents a small, non-uniform proportion of the overall rowset.

Because we need to efficiently access those few rows of interest, we generally index the column but in the process also index all the other column values that aren’t of interest as well. It’s all or nothing, right ?

Not necessarily. A possible solution is to use an appropriate function-based index in combination with our understanding that fully null index entries are not actually indexed. For example, let assume we have a very large table that has a STATUS code column. The only column value of interest are those with a status value of ‘BOWIE’, all other values are simply not of direct interest with our OLTP queries. By creating an index such as:

CREATE INDEX index_some_stuff_i ON
index_some_stuff(DECODE(status, ‘BOWIE’, ‘BOWIE’, NULL)) COMPUTE STATISTICS;

the decode function only returns a non-null value for the specific status of “BOWIE”. All other values are converted to nulls and so are not indexed.

We now have an index that consists of nothing but “BOWIE” values. As a result, the index is tiny because the vast majority of column values are simply not indexed. But because the percentage of rows that actually have a status of “BOWIE” is very small, the CBO looks at this index very favourably. By now writing our queries in a manner such as this:

SELECT * FROM index_some_stuff
WHERE(DECODE(status, ‘BOWIE’, ‘BOWIE’, null)) = ‘BOWIE’;

It will hopefully use our nice, small, efficient function-based index.

Not only will this index save us potentially large storage overheads, but if it may be small enough to reduce the height of the index on a permanent basis, thus making the index access more efficient.

See this demo for an example of how we reduced an index with 2924 leaf blocks and a height of 3 down to a height of 1 and just the 1 leaf block.

Comments»

1. H.Tonguç Yılmaz - January 28, 2008: After indexing nulls another very useful tip for me Richard, thank you 🙂

LikeLike

Reply
2. Robert - January 28, 2008: Are there any known advantages of using DECODE over CASE or vice versa other than the number of characters to type? Somehow I end up using CASE most of the time, probably because I am sure that it is SQL standard compliant where I am not so sure about DECODE.

LikeLike

Reply
3. Richard Foote - January 28, 2008: Hi Tonguc

Glad it’s useful.

BTW, I notice I’m currently running a solid third out of the 10 proposed seminars you posted. I guess a bronze medal isn’t too bad 😉

I’m hoping to hear around late Feb if another country will join the party with Turkey regarding the Index Internals seminar. I’ll let you know once I hear any news.

LikeLike

Reply
4. Richard Foote - January 28, 2008: Hi Robert

There’s no real reason other than I’ve been using decode for a lot longer than the Case Statement !!

Mind you, considering my typing skills, number of characters is important 😉

LikeLike

Reply
5. Brian Tkatch - January 28, 2008: In demos until this one stats were gathered via:

exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>’BOWIE’, tabname=>’TEST_NULLS’, estimate_percent=>NULL, cascade=>true, method_opt=>’FOR ALL COLUMNS SIZE 1′);

But in this one:

exec dbms_stats.gather_table_stats(ownname=>’BOWIE’, tabname=> �INDEX_SOME_STUFF’, method_opt=>’FOR ALL COLUMNS SIZE 1′, cascade=> true);

skipping “estimate_percent=>NULL”. Is there a particular reason it was skipped this time?

LikeLike

Reply
6. Alberto Dell'Era - January 28, 2008: quote:
As previously mentioned, if the “BOWIE” value is not representative of the general distribution of other STATUS values, we may need a histogram on the STATUS column for the CBO to cost the index access favourably.

You probably meant “a histogram on the hidden (and virtual) column decode(…)” …

LikeLike

Reply
7. Richard Foote - January 29, 2008: Hi Brian

Ummm, would you believe because I intentionally wanted to see if anyone would spot the difference …

Thought not 😉

For demos such as these, I’m generally more comfortable computing statistics to reduce risks of other factors (such as bad stats) coming into play.

I’ve changed the demo to include the estimate_percent parameter to reduce my sense of unease.

Thanks !!

LikeLike

Reply
8. Richard Foote - January 29, 2008: Hi Alberto

Yes I did mean the computed virtual column. I’ve decided to actually remove the second reference altogether as it’s not applicable in the example where I only have one status of interest and it would be a pretty a rare scenario where the “selected” status values were unevenly distributed to make a difference else you wouldn’t have included them in the first place.

It kinda defeats the purpose of the example and so only confuses the issue, so the reference is gone.

Thanks !!

LikeLike

Reply
9. Jeffrey Kemp - January 29, 2008: Thanks Richard for another great article.

We’ve used this function-based index trick to great effect in our automated email application. Whenever the system needs to send an email it inserts a row into a table, which is then queried by a regular job. Each email gets a DATE_TO_SEND, allowing emails to be “scheduled” for a future time, and a DATE_SENT, which is updated to sysdate when the email is sent.

The job needs to run very quickly, we don’t want it doing full table scans to find the emails that haven’t been sent yet and where the DATE_TO_SEND is now or in the past, and we want to keep a fairly long history in the table; so we have used an index based on (CASE WHEN DATE_SENT IS NULL THEN DATE_TO_SEND ELSE NULL END).

The query is something like:
SELECT * FROM emails
WHERE (CASE WHEN DATE_SENT IS NULL THEN DATE_TO_SEND ELSE NULL END) <= SYSDATE
ORDER BY (CASE WHEN DATE_SENT IS NULL THEN DATE_TO_SEND ELSE NULL END);

The nice thing about this index is that not only are the unsent emails returned very quickly, they are returned in priority order without the database performing an additional sort operation.

LikeLike

Reply
10. Brian Tkatch - January 29, 2008: Richard,

ah, thanx. I see that estimate_percent on NULL will compute. Is that the default?

LikeLike

Reply
11. Richard Foote - January 30, 2008: Hi Jeffrey

Very nice !! A perfect example of where this sort of thing can be very useful.

LikeLike

Reply
12. Richard Foote - January 30, 2008: Hi Brian

No, the default is auto_sample_size where Oracle decides when it’s seen enough. Appears to work a lot better in 11g although I’ve yet to fully explore all the changes yet.

LikeLike

Reply
13. Joaquin Gonzalez - March 25, 2008: Hi everybody,

what would be the advantages/disadvantages between to index only values of interest (only ‘S’ value) versus to index the whole column + histogram?

I Can see:

index only values of interest:
————————
– index is smaller
– the change would only affect queries I include the “where decode(col,’S’,’S’,null) = ‘S'”. So I know the total impact of my changes.

index the whole column + histogram:
——————————-
– you can still use the index in querys like select count(*) where col = ‘N’
– the change should increase performance of all querys with predicates on this column (when none of the querys use bind on this column)

I need your help if you can see more advantages for each strategy, in order to decide which to use.

Thanks!!

LikeLike

Reply
14. Richard Foote - March 26, 2008: Hi Joaquin

You have it pretty well sorted out. The only other issue I would add is that the function-based index can only be used by those queries that explicitly use the decode function whereas the second index could be used by any query.

Possibly the more important consideration.

LikeLike

Reply
15. Vladimir - August 21, 2008: Hi Richard,

You use statements
“CREATE INDEX index_some_stuff_i_1 ON index_some_stuff(status);”
and
“CREATE INDEX index_some_stuff_i_2 ON index_some_stuff(DECODE(status, ‘BOWIE’, ‘BOWIE’, NULL)) COMPUTE STATISTICS;”
in demo.

But later you get indexes info from table dba_indexes. Please explain me, from where appear info of the first index there?
(Or there are expenses of copy-paste? 🙂 )

Thanks.

LikeLike

Reply
16. Richard Foote - August 25, 2008: Hi Vladimir

Sorry, I don’t understand the question ??

DBA_INDEXES contains details of all indexes in a database, why shouldn’t the first index be listed as it is ?

LikeLike

Reply
17. Vladimir - August 25, 2008: Hi Richard,

Ok, Ok, you know best!

Because of the my scientific past (unfortunately not the “computer” past) I need the experiment must be repeated with the same conditions..

Thanks.

LikeLike

Reply
18. Richard Foote - August 25, 2008: Hi Vladimir

My father is a scientist (nuclear physics) so I know where you’re coming from.

Hopefully when you run the various demos, you’ll get those same results. Then you can expand and enlarge on the demos to see what else might be learnt.

The demos are but seeds, not the whole tree in itself 🙂

LikeLike

Reply
19. Vladimir - August 25, 2008: Hi, Richard.

..and “Carpent tua poma nepotes”.. 🙂

(Excuse me for small off-topic – I’m in good spirits.)

Thank you for attention and your answers!

LikeLike

Reply

	PP on Costing Concatenated Indexes W…
	Zeeshan Moin on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	PP on The End Of Richard Foote…
	Ram on The Fake Index Trap (“No…
	Richard Foote on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Sayan Malakshinov on The End Of Richard Foote…

Richard Foote's Oracle Blog