How Are NULLs Actually Indexed ? (Fascination) January 30, 2008Posted by Richard Foote in Index Internals, Indexing NULLs, Oracle General, Oracle Indexes.
A nice question by Jeff regarding how Oracle goes about indexing NULLs has prompted me to show how one could go about actually determining the answer. The basic question is are NULLs treated as just another column value and grouped accordingly or does Oracle have to somehow search through all the leaf blocks looking for all occurrences of these mysterious NULLs.
The answer is that NULLs are basically considered to be potentially the largest value possible by Oracle and so are all grouped and sorted together at the “end” of the index structure (assuming the column is the leading column in the concatenated index, else they’ll be listed last for each distinct column that precedes it in the index).
The fact that index range scans are just as efficient when searching for NULLs values as for any other value strongly supports this assumption, but how does one actually prove it ?
The first obvious thing to check would be to create a little table and associated index with a few rows and a few NULL column values thrown in and see the results of a SELECT … ORDER BY. One would expect the order of an ascending index to match the order of the resulting output. Indeed, NULL values are by default listed last in ORDER BY ascending listings suggesting they would likewise be grouped and sorted last within an index.
The next thing to check would possibly be to use the DUMP function to again see what Oracle is likely to do with NULL values. The DUMP function displays the raw decimal representation of the specific character (depending of course on the character-set) . For NULL values however, there’s actually nothing to display other than a NULL text to represent there’s nothing actually there.
The best place to check of course is within the actual index itself. By determining the actual block that stores our example index, we can perform an index block dump and look at the resultant trace file that describes a representation of the index block to see precisely how Oracle deals with NULLs within indexes.
A quick check of the HEADER_FILE and HEADER_BLOCK in DBA_SEGMENTS will give us the index segment header location.To find the associated index root/leaf block simply add 1 to the HEADER_BLOCK.
Dump the block via the
ALTER SYSTEM DUMP DATAFILE a BLOCK b
command and look at the trace file in USER_DUMP_DEST (where ‘a‘ represents the datafile id and ‘b‘ represents the block id determined from dba_segments).
The resultant output clearly shows that yes:
Leading column NULLs values are all grouped together
They are all listed at the “end” of the index structure
Any NULLs in the non-leading indexed columns are listed “last” for each distinct value in the leading columns in which they appear
Any index entry consisting of nothing but NULLs are not actually stored within the index
This NULLs Index Dump demo goes through this entire process with a little working example and describes the relevant section of the index block dump.
I spend some time discussing block dumps in my seminar as it’s an extremely useful tool when determining and learning how Oracle actually works.
Index Only Values Of Interest: (Little Wonder) January 28, 2008Posted by Richard Foote in Function Based Indexes, Indexing Tricks, Oracle General, Oracle Indexes.
Thought I might expand a little on the discussion and comments on how NULLs can be indexed and address point #6 on my list of those things you may not have known about indexes
“It’s possible and potentially very useful to just index some column values and not all column values within a table”.
as well as touching on point #4 that “B-Tree Indexes can be extremely useful and beneficial even if the column contains very few distinct values (as low as 1)”.
As previously discussed, index entries which are fully NULL are not indexed by Oracle. We can however use this fact to our advantage.
There are many scenarios whereby we may only search for a rowset based on a subset of the possible values in a column or group of columns. The classic scenario is where we may have a flag or status field denoting “current”, “live”, “not yet processed”, etc. rows and our main transactional queries are only interested in these relatively few rows.
Most rows are “historical”, “processed”, etc. rows and are not generally of interest and when they are of interest represent such a large proportion of the overall table that an index would be inappropriate for these batch jobs or long running reports to access them anyways. Often, (but not always) we might need a histogram to let the CBO know that those column values of interest actually represents a small, non-uniform proportion of the overall rowset.
Because we need to efficiently access those few rows of interest, we generally index the column but in the process also index all the other column values that aren’t of interest as well. It’s all or nothing, right ?
Not necessarily. A possible solution is to use an appropriate function-based index in combination with our understanding that fully null index entries are not actually indexed. For example, let assume we have a very large table that has a STATUS code column. The only column value of interest are those with a status value of ‘BOWIE’, all other values are simply not of direct interest with our OLTP queries. By creating an index such as:
CREATE INDEX index_some_stuff_i ON
index_some_stuff(DECODE(status, ‘BOWIE’, ‘BOWIE’, NULL)) COMPUTE STATISTICS;
the decode function only returns a non-null value for the specific status of “BOWIE”. All other values are converted to nulls and so are not indexed.
We now have an index that consists of nothing but “BOWIE” values. As a result, the index is tiny because the vast majority of column values are simply not indexed. But because the percentage of rows that actually have a status of “BOWIE” is very small, the CBO looks at this index very favourably. By now writing our queries in a manner such as this:
SELECT * FROM index_some_stuff
WHERE(DECODE(status, ‘BOWIE’, ‘BOWIE’, null)) = ‘BOWIE';
It will hopefully use our nice, small, efficient function-based index.
Not only will this index save us potentially large storage overheads, but if it may be small enough to reduce the height of the index on a permanent basis, thus making the index access more efficient.
See this demo for an example of how we reduced an index with 2924 leaf blocks and a height of 3 down to a height of 1 and just the 1 leaf block.
Radiohead “In Rainbows” Box-Set January 26, 2008Posted by Richard Foote in Music, Radiohead.
“In Rainbows” is the latest offering by one of my favourite bands, Radiohead. For Christmas, I was lucky enough to get the “In Rainbows” Box-set, which is a real treat for any Radiohead fan.
The box-set is wonderfully packaged. In a LP sized slip case, the hard outside-case has black and white artwork at odds to the bright “rainbow” colours of the album proper.
Inside, is a LP sized, foldout case that holds the 2 vinyl 12″ 45 RPM records that feature the 10 songs in the album proper, one stored in each side of the foldout sleave.
In the inside left-hand side of the foldout sits a removable LP sized 16 page booklet, featuring different colourful digital images and artwork by of course Stanley Donwood, in a similar style to the album cover proper.
In the right-hand side, taking up half the space is another non-removable booklet that features the album lyrics and credits. Unfortunately, it doesn’t include the lyrics to the bonus disc. The other half is taken up by the 2 CDs of the album, the 10 song album proper and an 8 song bonus disc, which of course is the real feature of the box-set, consisting of:
MK1 is a one minute piece based on the middle section of Videotape that ends the album proper. It has a very quiet, eerie sort of feel to it that nicely links the main “In Rainbows” album to the bonus disc.
It suddenly jumps into “Down Is The New Up” which is such a perfect title for a Radiohead song. It has a somewhat mellow piano based sound in keeping with the general “In Rainbows” feel, with a catchy little backbeat. However, the atmospherics in the background suggest not all is well. About being given the flick, “Your services are not required. Your future’s bleak. You’re so last week.”, the message is being dampened (unsuccessfully) by the fact that hey “down is the new up”. How putting spin and modern catch phrases on bad news doesn’t change the fact it’s bad news nonetheless. Perhaps a better alternative to that suggested in “No surprises” however.
The next song “Go slowly” is an acoustic guitar based number, which indeed has a slow, melodically soothing rhythm. It’s the least “wordy” of the songs, a simple sort of love song that pleads given time and patients, things will work out if we just go slowly. It’s has a wonderful, optimistic feel with typically beautiful Radiohead arrangements. Thom in his choirboy sounding best.
The next piece MK2 is another 1 minute “filler” (if there’s such a thing with Radiohead), a weird electronic piece that reminds me of the last scene in some old fashioned, science fiction movie. Perhaps the Tardis referred to in “Up On The Ladder” ?
The next song “Last Flowers To The Hospital” is my favourite in this collection. A quiet, piano and acoustic guitar based arrangement that dates all the way back to “OK Computer” where it would have fitted in perfectly (“Appliances have gone berserk”), it’s vintage Radiohead. It just sounds so beautiful with Thom’s vocals perfectly matching the “mood” evoked by the music. The title comes from a sign outside a hospital in Oxford, it’s basically about someone who can no longer cope and just needs someone to listen to them. Radiohead are one of the few bands that have a song of this quality sitting on an extras CD.
Next comes “Up On The Ladder”, which has a guitar riff and a throbbing beat that just gets stuck in your head. Dropped from the “Hail To The Thief” and dating as far back as “Kid A”, its basic theme is alienation both literally and figuratively. Sounds great live.
As does the next song “Bangers & Mash”, the most energetic piece in the collection. About corruption and the poison of power and position, it’s has a typically biting edge, especially with Thom’s sarcastic vocal delivery, it serves as a warning of what can happen when people turn on you. You can easily dance to this, especially if you’ve just ripped somebody off !!
The last song “4 Minute Warning” reminds me of the end to Pink Floyd’s “The Final Cut” both in its mood and theme. It’s bleak, dark and about someone desperately wishing what’s about to happen to them was only a dream. But it’s not a dream and in 4 minutes something really bad is going to happen and trying to ignore it and wish things were different isn’t going to change things. Perhaps down is the new down after all. A quiet, sombre ending, which of course lasts exactly 4 minutes.
Overall, the bonus disc is a great collection of songs, most of which could easily have made the album proper.
You can’t buy this in shops although it’s still available from here. Highly recommended !!
Indexing NULLs: (Empty Spaces) January 23, 2008Posted by Richard Foote in Indexing NULLs, Indexing Tricks, Oracle General, Oracle Indexes, Performance Tuning.
There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.
Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.
However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.
The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.
If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.
So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.
But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?
Well not quite.
Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.
Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.
However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.
See this demo on Indexing Null Values for examples on how this all works.
Introduction To Reverse Key Indexes: Part IV (Cluster One) January 21, 2008Posted by Richard Foote in Clustering Factor, Index statistics, Oracle Indexes, Oracle Myths, Reverse Key Indexes.
There’s a myth that suggests if the Clustering Factor (CF) of an index is greater than a certain ratio when compared to the number of rows in the table, the CF is poor and an index rebuild would be beneficial.
The slight problem with this advice is that the CF actually measures how well aligned the order of the column values are in the table as compared to the order of the index entries in the index. Generally, a table in which the column values are ordered in a similar manner to the index will have a CF closer to the number of blocks in the table. A table in which the column values are ordered in a random manner when compared to the index will have a CF closer to the number of rows in the table.
An index rebuild doesn’t change the ordering of the index row entries and an index rebuild has no impact on the table so therefore the comparative ordering of both remains unchanged. Therefore the CF of an index will be identical after the rebuild as it was before.
Well actually, there is one slight exception to this rule. Reverse Key Indexes.
Generally, rows with monotonically increasing column values are physically inserted in the order of the monotonically increasing columns. This may not be the case however with tables in ASSM tablespaces or tables with multiple freelists or freelist groups as concurrent inserts will be directed to differing blocks. In these cases we may actually have data that is quite well clustered but may have quite poor CF values due to the manner in which the CF is calculated.
Assuming a Non-ASSM, single freelist/freelist group table, the CF of monotonically increasing indexed values would ordinarily be quite good. As a non-reverse index must also have its values in the monotonically column order, the CF of the index is likely to be nice and low.
However, if you were to rebuild the index as a Reverse Key Index, the index values get reversed and “randomly” redistributed within the index structure, totally changing the order of the index entries within the index. As a result, the index values are no longer aligned with those of the table and the CF is likely to now be quite appalling.
Rebuilding an index generally has no impact on the CF as the index row values retain the same logically order. Rebuilding an index to be reverse (or visa-versa) is the exception to the rule as it will physically (and logically) change the index row order.
A Reverse Key Index is likely therefore to have a much worse CF than it’s non-reverse equivalent.
A Reverse Key Index will be ignored for range predicates (as already discussed in Part I) so a poor CF may not have an impact. However, as also discussed, index range scans are still viable in some scenarios so an increased CF may impact execution plans detrimentally.
See this demo on how a Reverse Index Rebuild turns a “perfect” CF into a shocker.
Introduction To Reverse Key Indexes: Part III (A Space Oddity) January 18, 2008Posted by Richard Foote in Index Block Splits, Index Internals, Oracle Indexes, Performance Tuning, Reverse Key Indexes.
A possibly significant difference between a Reverse and a Non-Reverse index is the manner in which space is used in each index and the type of block splitting that takes place.
Most Reverse Key Indexes are created to resolve contention issues as a result of monotonically increasing values. As monotonically increasing values get inserted, each value is greater than all previous values (providing there are no outlier values present) and so fill the “right-most” leaf block. If the “right-most” block is filled by the maximum current value in the index, Oracle performs 90-10 block splits meaning that full index blocks are left behind in the index structure. Assuming no deletes or updates, the index should have virtually 100% used space.
However, it’s equivalent Reverse Key index will have the values reversed and dispersed evenly throughout the index structure. As index blocks fill, there will be a very remote chance of it being due to the maximum indexed value and 50-50 block splits will result. The PCT_USED is likely therefore to be significantly less, averaging approximately 70-75% over time.
Therefore, for indexes with no deletions, a Reverse Key index is likely to be less efficient from a space usage point of view.
However, if there are deletions, the story may differ.
Deleted space can be reused if an insert is subsequently made into an index block with deleted entries or if a leaf block is totally emptied. However, if a leaf block contains any non-deleted entries and if subsequent inserts don’t hit the leaf block, then the deleted space can not reused. As monotonically increasing values in a non-reverse index only ever insert into the “right-most” leaf block, it won’t be able to reuse deleted space if leaf blocks are not totally emptied. Overtime, the number of such “almost but not quite empty” index leaf blocks may in some scenarios increase to significant levels and the index may continue to grow at a greater proportional rate than the table (where the reuse of space is set and controlled by the PCTUSED physical property).
However, Reverse Key indexes will be able to reuse any deleted space as they evenly distribute inserts throughout the index structure. Overtime, the index is likely to grow at a similar proportional rate as the table.
For indexes that have deletions resulting in many sparsely (but not totally emptied) leaf blocks, a Reverse Key index could be more efficient from a space usage point of view.
See this demo Differences in Space Usage Between a Reverse and a Non-Reverse Index for further details.
In Part I, we saw how with Reverse Key Indexes, Oracle will basically take the indexed value, reverse it and then index the reversed value. As a result, data that would ordinarily be logically sorted within an index structure will now be randomly distributed. This therefore negates the use of Reverse Key Indexes with range predicates, with the CBO not even considering them in its costings.
This is all the information we need to dispel a rather bizarre suggestion that has been doing the rounds regarding using Reverse Key Indexes to deal with LIKE predicates that have a leading wildcard. For example, such a suggestion can be found here and within an OTN discussion here.
Basically the suggestion is to:
1) Create a Reverse Key Index on the column to be searched with a LIKE predicate having a leading wildcard (such %, _).
2) Instead of writing the query as usual, e.g.
SELECT * FROM bowie_table WHERE name LIKE ‘%BOWIE’
rewrite the query programmatically such as:
SELECT * FROM bowie_table WHERE name LIKE ‘EIWOB%';
by reversing the required text and now having the wildcard at the end.
The Reverse Key Index stores the data in a reversed format identical to say ‘EIWOB’, so Oracle should be able to use the Reverse Key Index to efficiently find all rows that start with ‘EIWOB’ as they’re all grouped together within the index structure, right ?
Ignoring the fact the example in the above link is somewhat meaningless as it uses a leading and a trailing wildcard in both queries and so assuming the first query only has a leading wildcard and the second query only has a trailing wildcard, this suggested use of a Reverse Key Index can not possibly work on any current version of Oracle.
There are a few fundamental problems with this suggestion but in summary not only will it not work but worse, it will actually return the wrong results.
The suggestion is correct as far as indeed, using a normal index to return data with a LIKE statement containing a leading wildcard will negate the use of an index range scan, the CBO doesn’t even consider it. An index hint may push Oracle to use a Full Index Scan, but not an Index Range Scan.
However using a Reverse Index Key to solve this is unfortunately doomed to failure for two very simple reasons.
One, as we have already seen, Oracle also ignores Index Range Scans for Reverse Key Indexes with range predicates and unfortunately, a query such as WHERE name LIKE ‘EIWOB%’ is a range scan. The CBO simply doesn’t consider the Reverse Key Index in it’s deliberations.
Two, is of course that Oracle has no possible way of knowing that when you say LIKE ‘EIWOB%’, what you really mean is search for all records ending with BOWIE, LIKE ‘%BOWIE’. How can Oracle possibly know this ? If it could use the index (which it can’t) Oracle would only reverse the search string around anyways and use the index to look for indexed entries beginning with ‘BOWIE’ within the index structure, remembering everything is of course stored in reverse within the index.
So Oracle is actually searching for all records starting with ‘EIWOB’, not ending with ‘BOWIE’ which are two entirely different things.
The net result of using this suggested strategy is not good.
1) Oracle ignores the Reverse Key Index anyways as a LIKE ‘EIWOB%’ is a range predicate
2) Oracle therefore performs a Full Table Scan anyways
3) As the query is effectively searching for all records that start with ‘EIWOB’, not as expected all records that end with ‘BOWIE’, the two queries in the example will actually return completely different results
The Reverse Key Indexes Part II Demo shows how this suggested use of a Reverse Key Index is a very very bad idea …
However, if you want to solve the issue of efficiently finding the results of a LIKE ‘%BOWIE’, there are some possible approaches one could take that will use an index and return correct results.
One possible solution (as mentioned in the OTN link listed at the beginning) is to create a Function-Based Index using the REVERSE Function, (Warning: this function is undocumented and unsupported):
CREATE INDEX bowie_reverse_func_i ON bowie(REVERSE(name));
A query such as WHERE REVERSE(name) LIKE ‘EIWOB%’ or better still WHERE REVERSE(name) LIKE REVERSE(‘%BOWIE’) can now both potentially use the index.
The reverse function will reverse the name column (from say ‘DAVID BOWIE’ to ‘EIWOB DIVAD’) and the LIKE range predicate can work with the index as it’s a Function-Based index rather than a Reverse Key Index and it’s not using a LIKE with a leading wildcard. A column containing ‘DAVID BOWIE’, but stored as ‘EIWOB DIVAD’ within the index, can be found efficiently via an index range scan using this Function-Based Index.
I’ve included an example on effectively using a Function-Based Index with the Reverse Function at the end of the above demo. There’s also a discussion and other alternatives at Gints Plivna’s Blog.
Another alternative is to use an Oracle Text Index, which also has the capability of dealing logically with queries such as %BOWIE% but as they say, that’s a topic for another day.
More on Reverse Key Indexes to come as well.
Introduction To Reverse Key Indexes: Part I January 14, 2008Posted by Richard Foote in Index Access Path, Oracle Cost Based Optimizer, Oracle Indexes, Reverse Key Indexes.
Following on from the “8 things You May Not Know About Indexes”, #7 regarding Reverse Key Indexes requires a number of posts to do the subject justice. However, Part I will focus of the specific issue related to point # 7, namely:
“A REVERSE index can quite happily be used by the CBO to perform index range scans within an execution plan”.
Reverse Key Indexes are designed to resolve a specific issue, that being index block contention. Many indexes in busy database environments with lots of concurrent inserts (and in some scenarios updates and deletes as well) can suffer from index block contention (as highlighted by high levels of “buffer busy waits” and “read by other session” wait events for the index segments). Monotonically increasing indexes, such as Primary Keys generated by a sequence, are especially prone to contention as all inserts need to access the maximum “right-most” leaf block. This is of particular concern in RAC environments, where this “hot” index block needs to be accessed by all the instances and is being bounced around the various SGAs causing expensive block transfers between instances.
A solution is make the index a Reverse Key Index.
CREATE INDEX bowie_reverse_idx ON bowie(id) REVERSE;
A Reverse Key Index simply takes the index column values and reverses them before inserting into the index. “Conceptually”, say the next generated ID is 123456, Oracle will reverse it to 654321 before inserting into the index. It will then take the next generated ID 123457 and reverse it to 754321 and insert it into the index and so on. By doing this, inserts are spread across the whole index structure, ensuring the right most block is no longer the only index leaf block being hammered. Index contention is dramatically reduced or eliminated entirely.
Reverse Key Indexes address a specific problem but may in turn introduce a number of problems themselves.
One problem is the simple fact index entries are no longer sorted in their natural order. Value 123456 is no longer adjacent to value 123457 in the index structure, they’re likely to be found in completely different leaf blocks. Therefore a range predicate (such as BETWEEN 123450 and 123460) can no longer be found by a single index probe, Oracle would be forced to search for each specific index value separately as each value in the range is likely to be in differing leaf blocks.
This makes it all just too difficult and troublesome for the Cost Based Optimizer (CBO). As a result, the CBO totally ignores Reverse Key Indexes when processing Range Predicates (eg. BETWEEN, <, >, <=, >=, LIKE etc.). Even innocent looking range predicates such as “BETWEEN 123456 and 123457″, with just the 2 values of interest are ignored by the CBO. A 10053 trace shows how the CBO totally ignores Reverse Key Indexes and doesn’t even bother to cost such accesses when processing Range Predicate conditions.
In the above example and in scenarios where it’s possible and practical to convert range predicates, use an IN clause instead, e.g. “IN (123456, 123457)” as Oracle will convert this into an OR clause with each equality condition usable with the Reverse Key Index.
Oracle is also clever enough to identify equality conditions that may be written as a range scan (e.g. BETWEEN 123456 and 123456) and use a Reverse Key Index accordingly.
Hints won’t work either. You may be able to force Oracle into performing a Full Index Scan but it will not perform an Index Range Scan with a Range Predicate.
But doesn’t all this mean I’m wrong when I suggested a Reverse Key Index can be used by the CBO to use Index Range Scans.
I’ve only described how Oracle ignores the use of a Reverse Key Index for Range Predicates, however Index Range Scans are quite possible.
Remember, a Reverse Key Index will reverse all values and if two values happen to have the same value or two index entries happen to have the same leading column, then all such values are indeed stored together and are logically adjacent to one another.
For example, if the Reverse Key Index is Non-Unique, Oracle must perform an Index Range Scan, even for equality predicates. I discussed this in some detail when discussing the differences between a Unique and a Non-Unique Index. Even if the column or columns have a PK or a Unique Key Constraint, Oracle will still check the next index entry “just in case” there are indeed duplicate values. Also, although usually used for monotonically index columns, there’s nothing preventing you from creating a Reverse Key Index on a Non-Unique column and all duplicate values must reside together in the index structure. Therefore an equality search that uses any Non-Unique Reverse Key Index will generate an Index Range Scan access
But even Unique indexes can be used to perform an Index Range Scan.
If you have a multi-column Unique Index but not all columns are being searched (although the leading column must be known), then again, all index values with the same leading column (or columns) must be stored together in the Reverse Key Index and an Index range Scan can be performed for such equality conditions.
For some examples of what I’ve discussed see this Reverse Key Part I Demo.
So yes, a Reverse Key Index can indeed be used by the CBO to perform Index Range Scans.
There are also a number of other issues (and indeed myths) associated with Reverse Key Indexes that will be discussed in the fullness of time.
Introduction to Fake / Virtual / NOSEGMENT Indexes January 11, 2008Posted by Richard Foote in Fake Indexes, Index Access Path, NOSEGMENT Option, Oracle Cost Based Optimizer, Oracle Indexes, Virtual Indexes.
OK, as promised, answer to index fact #5 you may not have known:
“It’s possible to make the CBO reference and use within an execution plan indexes that don’t in actual fact exist”.
Before I start, please note this feature is not officially documented other than the odd Metalink note and requires the setting of an undocumented parameter to work, so please exercise caution.
Fake Indexes (also known as Virtual or Nosegment Indexes) have been around for a long time, since 8i days. They’re used primarily by Oracle Enterprise Manager and its Tuning Pack which has various wizards that can do “what if” type analysis. One of these is the Index Wizard which can kinda “pretend” to create an index and see what the Cost Based Optimizer might do if such an index really existed.
It’s possible to create these Fake indexes manually by using the NOSEGMENT clause when creating an index:
CREATE INDEX Bowie_idx ON Bowie_Table(Ziggy) NOSEGMENT;
This will populate some (but not many) DD related tables but will not actually create an index segment or consume any actual storage. It’s not maintained in any way by DML operations on the parent table and it can’t be altered or rebuilt as can a conventional, “real” index (it will generate an ORA-08114 error if you try to do so). You can analyze or run dbms_stats over the index but the index is not treated as analyzed as such (as can be seen via a 10053 trace).
It’s also only visible to the CBO, if and only if a session has the following parameter set:
ALTER SESSION SET “_use_nosegment_indexes” = true;
The CBO will now consider the index and potentially include it within an execution plan. However, at execution time Oracle can of course not use the index and will revert to the next best thing.
A Fake index is basically an index you have when you don’t really have an index in order to see if it could be useful if it really existed.
This Fake Indexes Demo shows how they work and can be used.
1 down, 7 to go … ;)
8 Things You May Not Know About Indexes January 10, 2008Posted by Richard Foote in Oracle General, Oracle Indexes, Oracle Myths, Richard's Musings.
There’s a somewhat bizarre wave washing the Blogshere at the moment whereby everyone is being asked to reveal 8 things about themselves that not many people know. It all sounds like a bit of fun, although as a craze, it does seem to have gone a bit berserk.
Now I’m a pretty private sort of person so this isn’t quite my thing. However, in the spirit of it all, I’ve decided to bend the rules a tad and give it an “Oracle Index” theme.
So here are 8 things (from the top of my head) that you may not know about Oracle Indexes:
Index compression can actually make indexes substantially larger, not smaller
It’s not true you can’t index NULL values. A single column or a set of columns containing nothing but NULLs can easily be indexed
Bitmap Indexes can be extremely useful and beneficial even if the column contains thousands of distinct values.
B-Tree Indexes can be extremely useful and beneficial even if the column contains very few distinct values (as low as 1)
It’s possible to make the CBO reference and use within an execution plan indexes that don’t in actual fact exist
It’s possible and potentially very useful to just index some column values and not all column values within a table
A REVERSE index can quite happily be used by the CBO to perform index range scans within an execution plan
An index can potentially be the most efficient and effective way to retrieve anything between 0% and 100% of data from a table
Now if there’s anything in the list you indeed didn’t know, don’t worry, that’s what this Blog is for :)
10,000 Hits Already !! January 9, 2008Posted by Richard Foote in Oracle Blog, Richard's Musings.
I’ve just noticed that the hits counter has just reached 10,000 hits.
I’m quite excited as I actually saw it displaying the 10000, being one of those weird people who would drive around the block a couple of times just to see the mileometer reading in the car reach some “special” number. I screamed with frustration once when I just missed 12345 scroll past, 123456 just seemed such a long way away ….
I must admit I had absolutely no idea how many people would read this Blog when I started it almost a month ago. Starting a Blog was something that was mentioned and suggested to me a few times when I attended and presented at the Unconference at Oracle OpenWorld last year.
Now I must admit, I don’t really know whether 10,000 visits in less than a month, including the Christmas Holidays, is actually a lot or not, but if someone asked me a month ago how long I thought it would take me to reach 10,000 hits, I would have guessed (and hoped somewhat) in 6 months or so. So it’s certainly a lot more hits than I ever thought I would get at this early stage.
So to have generated so much interest, so many comments, there’s over 150 although lots are mine I know :) and so many emails is really really, umm what’s the right word, I guess “nice”.
So thank you all for your interest and involvement !!
If someone can give me a bit of warning just before 20,000 is reached, I would appreciate it !!
Introduction To Linguistic Indexes – Part II January 9, 2008Posted by Richard Foote in Indexing Tricks, Linguistic Indexes, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
As previously discussed, Linguistic Indexes can potentially be useful with case-insensitive searches and sorts.
However, they have a number of issues and disadvantages.
The first issue is that once the NLS_COMP parameter is set to ‘LINGUISTIC’ and the NLS_SORT parameter is set to something other than ‘BINARY’, standard binary indexes can no longer be used and are ignored by the CBO. This means one needs to have a very careful and consistent indexing strategy to ensure no SQL statements are compromised while Linguistic related NLS parameters are set. Simple demo highlighting issues with mixing Linguistic and Binary Indexes here. Note these demos follow those in Introduction To Linguistic Indexes Part I.
The next issue is that Linguistic Indexes are ignored for some types of predicate conditions. MIN, MAX and LIKE can not be used with Linguistic Indexes (although LIKE can now be used with 11g). Simple demo highlighting problems with these predicate conditions here.
Finally, Linguistic Indexes typically use more storage than Binary indexes and so have more associated overheads and costs. The differences in storage is dependent on the charactersets associated with the various indexes. Some examples of differences shown here. Warning: This demo has lots of block dumps !!
Linguistic Indexes are worthy of consideration, but so are the associated costs and disadvantages.
A question on the OTN forum has prompted me to quickly knock up a demo on the possible dangers of the default behaviour in 10g with regard to the METHOD_OPT option in DBMS_STATS.
When collecting statistics with DBMS_STATS in 9i, the default value of METHOD_OPT was ‘FOR ALL COLUMNS SIZE 1′. This basically says to Oracle please only collect basic column statistics (min, max, distinct values etc.), do not collect histograms on these columns. For columns that are evenly distributed and for columns that are not referenced in SQL statements, this is perfectly adequate. If a column was unevenly distributed and detrimentally impacted the CBO’s costings of an execution plan, then one could generate histograms for those particular columns separately.
However, this default behaviour changed in 10g and IMHO this change is possibly the most significant and problematic difference when migrating to 10g.
The new default value of METHOD_OPT with 10g is ‘FOR ALL COLUMNS SIZE AUTO’. This basically means that Oracle will automatically decide for us which columns need histograms and which columns don’t based on what it considers to be the distribution of values within a column and based on the “workload” associated with the table (basically are there any SQL statements running in the database referencing columns which might need histograms for those statements to be costed correctly).
This sounds like an ideal scenario, just let Oracle work it out for us.
However, the problem is that Oracle in many cases doesn’t do a particularly good job at determining when it should generate a histogram and when it shouldn’t. In fact, the likelihood is that Oracle will actually generate many many many unnecessary histograms while still missing out on some columns that should have them.
In environments with few tables and with few users executing few distinct SQL statements, the impact of some unnecessary histograms may be minimal. However in environments with many tables and columns (potentially many thousands) with many users executing many different SQL statements, the ramifications of potentially suddenly having thousands of additional histograms can be disastrous.
Note also that by having a histogram, Oracle changes the manner in which the DENSITY statistic for a column is calculated (as stored in DBA_TAB_COLUMNS). This is often used by Oracle to determine the selectivity of predicates so the impact of suddenly having additional unnecessary histograms can be wider and more significant than one might initially imagine.
Of course, the impact on the shared_pool and the row_cache and it’s associated latches in particular can be extremely ugly indeed if suddenly Oracle had to deal with thousands of new histograms when parsing statements.
This silly little demo, “Dangers of default METHOD_OPT behaviour in 10g“, creates a simple little table with three columns. The first column has an outlier value and as previously discussed here, a histogram might be required to correctly cost range scans. The second column is perfectly distributed, it has 10 distinct values with 100,000 occurrences of each. The third column is also perfectly distributed but it’s a special example in that it has only 1 distinct value.
As you can see by the results of the demo, Oracle has got it wrong one way or the other in varying degrees in all three examples. It hasn’t created a histogram when it was needed and created histograms when they weren’t needed, impacting the Density column statistics as a result.
My advice. Just be very careful when using the default method_opt ‘FOR ALL COLUMNS SIZE AUTO’ behaviour in 10g.
Introduction To Linguistic Indexes – Part I January 3, 2008Posted by Richard Foote in Index Access Path, Indexing Tricks, Linguistic Indexes, Oracle Indexes.
Characters are sorted by default based on numeric values defined by the default character encoding scheme (known as Binary Sorting). For us Australians, this is fine as we (generally) speak English and the English alphabet is nicely sorted in ascending order by ASCII and EBCDIC standards. However, many other languages are not so fortunate as the binary sort does not sort the data in many language’s alphabetic sort order. Oracle has many Globalization Support features to help users in other languages get over these issues (all very interesting and topics for many a Blog entry in the future).
However, even us Australians have issues when it comes to “case-insensitive” searches, where data may be stored in many different cases (eg. Ziggy, ZIGGY, ZiGgY, etc.) and we want to return all data that matches a character value, regardless of its case.
The issue of course is that by default, all text searches are case-sensitive. For example a search WHERE name=’ZIGGY’ will only return ‘ZIGGY’ but not ‘Ziggy’ or ‘ZiGgY’ etc.
The standard fix is for the application to convert the data to a consistent case when performing the search. For example a search WHERE UPPER(Name) = ‘ZIGGY’ will return all values of “ZIGGY” regardless of their case but this will negate the use of any standard index on the Name column.
Therefore, a Function-Based index is required, say based on UPPER(Name), to ensure an efficient index access is possible for case insensitive searches.
However, this often requires an additional index to be created and for the application to be explicitly written to make use of the function-based index defined function.
Now the best cure for this problem is simply to ensure all data is stored in a consistent case (ZIGGY, ZIGGY, ZIGGY) but this may not always be practical or even desirable in some cases.
Another possible solution is the use of a Linguistic Index. This is an index that is created based on a specific case insensitive linguistic language or multilingual option that ensures the index entries are sorted in the linguistic language order, not on the default binary order of the database encoding scheme.
Basic steps are:
1) Create a Linguistic Index, eg.
CREATE INDEX case_search_ling_name_i ON case_search(NLSSORT(name,’NLS_SORT=GENERIC_M_CI’));
2) Set NLS_SORT in the session (or set parameter) to use the required Linguistic sort option , eg.
ALTER SESSION SET NLS_SORT=’GENERIC_M_CI';
Simply append _CI in the Linguistic sort option to make it Case-Insensitive or _AI to make it Accent-Insensitive.
(Note: if binary ordering is generally adequate, NLS_SORT can simply be set to ‘BINARY_CI’ for Binary Case-Insensitive searches)
3) Set NLS_COMP in the session (or set parameter) to use Linguistic Sorts/Case Insensitive Searches, eg.
ALTER SESSION SET NLS_COMP=’LINGUISTIC';
A search now based on WHERE name=’ZIGGY’ will automatically perform a case-insensitive search without the need to modify the application to use specific functions.
For a full demo, see Use Linguistic Indexes Demo.
However, before you rush out and start using Linguistic Indexes to possibly simplify the use of case insensitive searches, note there are various disadvantages to Linguistic Indexes, which can somewhat dampen their appeal. These will be covered in Part II of this series.