Rebuilding Indexes and the Clustering Factor Solution (Move On) September 25, 2011Posted by Richard Foote in Clustering Factor, Index Rebuild, Indexing Myth, Oracle Indexes, Quiz, Reverse Key Indexes.
1 comment so far
Excellent !! This quiz created quite a bit of debate and it was nice to sit back and read some interesting discussions.
The Clustering Factor is basically the measurement of how well aligned the data in the underlining table is in relation to the index and is the number of table related I/Os required to read the entire table via a full index scan. A value of the CF approaching the number of blocks in the table suggests the data is reasonably well sorted/clustered in relation to the index (although the CF could in theory be somewhat less than the blocks in the table of course). A value of the CF approaching the number of index entries suggests the data is not particularly well sorted/clustered in relation to the index and means we may need to re-visit the same table block numerous times to get the required data, thus decreasing the efficiency of using the index, increasing the costs associated with using the index and therefore decreasing the likelihood of the CBO using the index.
So for an index rebuild to actual have an impact on the CF on an index, means either the rows in the table needs to change or the order of the index entries needs to change.
However, when we typically rebuild an index, it has no impact at all on the table and so can’t possibly change the order of the rows there. Additionally, no matter how fragmented or inefficient the index structure might be, an index rebuild doesn’t change the order of the index entries either as they’re always sorted within the index in the order of the indexed columns.
Therefore an index rebuild typically has no impact at all on the CF of an index, no matter the current value of the CF.
However, there is an exception to this rule.
If we rebuild the index and change it from a NOREVERSE index to a REVERSE index, we now do change the order of the index. Significantly so, as the index entries are now in the order of the index column values when reversed. Therefore this can in turn significantly change the CF of an index.
Conversely, if we rebuild an index and change it from REVERSE to NOREVERSE, we likewise significantly change the order of the index entries and hence the value of the CF.
Of course, it’s always nice to see new ideas and something I hadn’t considered was Gary Myer’s comment regarding changing the logic behind a Function-Based Index prior to a rebuild …
So the moral of this story is that no matter how poorly fragmented the index, how high or low the current CF of an index might be, rebuilding an index in order to improve the CF is a futile exercise and will change less than diddly-squat, except in the above mentioned special circumstances.
Now, back to another hearing of Pink Floyd’s masterpiece “The Dark Side of the Moon” in all its surround sound glory
Introduction To Reverse Key Indexes: Part IV (Cluster One) January 21, 2008Posted by Richard Foote in Clustering Factor, Index statistics, Oracle Indexes, Oracle Myths, Reverse Key Indexes.
There’s a myth that suggests if the Clustering Factor (CF) of an index is greater than a certain ratio when compared to the number of rows in the table, the CF is poor and an index rebuild would be beneficial.
The slight problem with this advice is that the CF actually measures how well aligned the order of the column values are in the table as compared to the order of the index entries in the index. Generally, a table in which the column values are ordered in a similar manner to the index will have a CF closer to the number of blocks in the table. A table in which the column values are ordered in a random manner when compared to the index will have a CF closer to the number of rows in the table.
An index rebuild doesn’t change the ordering of the index row entries and an index rebuild has no impact on the table so therefore the comparative ordering of both remains unchanged. Therefore the CF of an index will be identical after the rebuild as it was before.
Well actually, there is one slight exception to this rule. Reverse Key Indexes.
Generally, rows with monotonically increasing column values are physically inserted in the order of the monotonically increasing columns. This may not be the case however with tables in ASSM tablespaces or tables with multiple freelists or freelist groups as concurrent inserts will be directed to differing blocks. In these cases we may actually have data that is quite well clustered but may have quite poor CF values due to the manner in which the CF is calculated.
Assuming a Non-ASSM, single freelist/freelist group table, the CF of monotonically increasing indexed values would ordinarily be quite good. As a non-reverse index must also have its values in the monotonically column order, the CF of the index is likely to be nice and low.
However, if you were to rebuild the index as a Reverse Key Index, the index values get reversed and “randomly” redistributed within the index structure, totally changing the order of the index entries within the index. As a result, the index values are no longer aligned with those of the table and the CF is likely to now be quite appalling.
Rebuilding an index generally has no impact on the CF as the index row values retain the same logically order. Rebuilding an index to be reverse (or visa-versa) is the exception to the rule as it will physically (and logically) change the index row order.
A Reverse Key Index is likely therefore to have a much worse CF than it’s non-reverse equivalent.
A Reverse Key Index will be ignored for range predicates (as already discussed in Part I) so a poor CF may not have an impact. However, as also discussed, index range scans are still viable in some scenarios so an increased CF may impact execution plans detrimentally.
See this demo on how a Reverse Index Rebuild turns a “perfect” CF into a shocker.
Introduction To Reverse Key Indexes: Part III (A Space Oddity) January 18, 2008Posted by Richard Foote in Index Block Splits, Index Internals, Oracle Indexes, Performance Tuning, Reverse Key Indexes.
A possibly significant difference between a Reverse and a Non-Reverse index is the manner in which space is used in each index and the type of block splitting that takes place.
Most Reverse Key Indexes are created to resolve contention issues as a result of monotonically increasing values. As monotonically increasing values get inserted, each value is greater than all previous values (providing there are no outlier values present) and so fill the “right-most” leaf block. If the “right-most” block is filled by the maximum current value in the index, Oracle performs 90-10 block splits meaning that full index blocks are left behind in the index structure. Assuming no deletes or updates, the index should have virtually 100% used space.
However, it’s equivalent Reverse Key index will have the values reversed and dispersed evenly throughout the index structure. As index blocks fill, there will be a very remote chance of it being due to the maximum indexed value and 50-50 block splits will result. The PCT_USED is likely therefore to be significantly less, averaging approximately 70-75% over time.
Therefore, for indexes with no deletions, a Reverse Key index is likely to be less efficient from a space usage point of view.
However, if there are deletions, the story may differ.
Deleted space can be reused if an insert is subsequently made into an index block with deleted entries or if a leaf block is totally emptied. However, if a leaf block contains any non-deleted entries and if subsequent inserts don’t hit the leaf block, then the deleted space can not reused. As monotonically increasing values in a non-reverse index only ever insert into the “right-most” leaf block, it won’t be able to reuse deleted space if leaf blocks are not totally emptied. Overtime, the number of such “almost but not quite empty” index leaf blocks may in some scenarios increase to significant levels and the index may continue to grow at a greater proportional rate than the table (where the reuse of space is set and controlled by the PCTUSED physical property).
However, Reverse Key indexes will be able to reuse any deleted space as they evenly distribute inserts throughout the index structure. Overtime, the index is likely to grow at a similar proportional rate as the table.
For indexes that have deletions resulting in many sparsely (but not totally emptied) leaf blocks, a Reverse Key index could be more efficient from a space usage point of view.
See this demo Differences in Space Usage Between a Reverse and a Non-Reverse Index for further details.
In Part I, we saw how with Reverse Key Indexes, Oracle will basically take the indexed value, reverse it and then index the reversed value. As a result, data that would ordinarily be logically sorted within an index structure will now be randomly distributed. This therefore negates the use of Reverse Key Indexes with range predicates, with the CBO not even considering them in its costings.
This is all the information we need to dispel a rather bizarre suggestion that has been doing the rounds regarding using Reverse Key Indexes to deal with LIKE predicates that have a leading wildcard. For example, such a suggestion can be found here and within an OTN discussion here.
Basically the suggestion is to:
1) Create a Reverse Key Index on the column to be searched with a LIKE predicate having a leading wildcard (such %, _).
2) Instead of writing the query as usual, e.g.
SELECT * FROM bowie_table WHERE name LIKE ‘%BOWIE’
rewrite the query programmatically such as:
SELECT * FROM bowie_table WHERE name LIKE ‘EIWOB%’;
by reversing the required text and now having the wildcard at the end.
The Reverse Key Index stores the data in a reversed format identical to say ‘EIWOB’, so Oracle should be able to use the Reverse Key Index to efficiently find all rows that start with ‘EIWOB’ as they’re all grouped together within the index structure, right ?
Ignoring the fact the example in the above link is somewhat meaningless as it uses a leading and a trailing wildcard in both queries and so assuming the first query only has a leading wildcard and the second query only has a trailing wildcard, this suggested use of a Reverse Key Index can not possibly work on any current version of Oracle.
There are a few fundamental problems with this suggestion but in summary not only will it not work but worse, it will actually return the wrong results.
The suggestion is correct as far as indeed, using a normal index to return data with a LIKE statement containing a leading wildcard will negate the use of an index range scan, the CBO doesn’t even consider it. An index hint may push Oracle to use a Full Index Scan, but not an Index Range Scan.
However using a Reverse Index Key to solve this is unfortunately doomed to failure for two very simple reasons.
One, as we have already seen, Oracle also ignores Index Range Scans for Reverse Key Indexes with range predicates and unfortunately, a query such as WHERE name LIKE ‘EIWOB%’ is a range scan. The CBO simply doesn’t consider the Reverse Key Index in it’s deliberations.
Two, is of course that Oracle has no possible way of knowing that when you say LIKE ‘EIWOB%’, what you really mean is search for all records ending with BOWIE, LIKE ‘%BOWIE’. How can Oracle possibly know this ? If it could use the index (which it can’t) Oracle would only reverse the search string around anyways and use the index to look for indexed entries beginning with ‘BOWIE’ within the index structure, remembering everything is of course stored in reverse within the index.
So Oracle is actually searching for all records starting with ‘EIWOB’, not ending with ‘BOWIE’ which are two entirely different things.
The net result of using this suggested strategy is not good.
1) Oracle ignores the Reverse Key Index anyways as a LIKE ‘EIWOB%’ is a range predicate
2) Oracle therefore performs a Full Table Scan anyways
3) As the query is effectively searching for all records that start with ‘EIWOB’, not as expected all records that end with ‘BOWIE’, the two queries in the example will actually return completely different results
The Reverse Key Indexes Part II Demo shows how this suggested use of a Reverse Key Index is a very very bad idea …
However, if you want to solve the issue of efficiently finding the results of a LIKE ‘%BOWIE’, there are some possible approaches one could take that will use an index and return correct results.
One possible solution (as mentioned in the OTN link listed at the beginning) is to create a Function-Based Index using the REVERSE Function, (Warning: this function is undocumented and unsupported):
CREATE INDEX bowie_reverse_func_i ON bowie(REVERSE(name));
A query such as WHERE REVERSE(name) LIKE ‘EIWOB%’ or better still WHERE REVERSE(name) LIKE REVERSE(‘%BOWIE’) can now both potentially use the index.
The reverse function will reverse the name column (from say ‘DAVID BOWIE’ to ‘EIWOB DIVAD’) and the LIKE range predicate can work with the index as it’s a Function-Based index rather than a Reverse Key Index and it’s not using a LIKE with a leading wildcard. A column containing ‘DAVID BOWIE’, but stored as ‘EIWOB DIVAD’ within the index, can be found efficiently via an index range scan using this Function-Based Index.
I’ve included an example on effectively using a Function-Based Index with the Reverse Function at the end of the above demo. There’s also a discussion and other alternatives at Gints Plivna’s Blog.
Another alternative is to use an Oracle Text Index, which also has the capability of dealing logically with queries such as %BOWIE% but as they say, that’s a topic for another day.
More on Reverse Key Indexes to come as well.
Introduction To Reverse Key Indexes: Part I January 14, 2008Posted by Richard Foote in Index Access Path, Oracle Cost Based Optimizer, Oracle Indexes, Reverse Key Indexes.
Following on from the “8 things You May Not Know About Indexes”, #7 regarding Reverse Key Indexes requires a number of posts to do the subject justice. However, Part I will focus of the specific issue related to point # 7, namely:
“A REVERSE index can quite happily be used by the CBO to perform index range scans within an execution plan”.
Reverse Key Indexes are designed to resolve a specific issue, that being index block contention. Many indexes in busy database environments with lots of concurrent inserts (and in some scenarios updates and deletes as well) can suffer from index block contention (as highlighted by high levels of “buffer busy waits” and “read by other session” wait events for the index segments). Monotonically increasing indexes, such as Primary Keys generated by a sequence, are especially prone to contention as all inserts need to access the maximum “right-most” leaf block. This is of particular concern in RAC environments, where this “hot” index block needs to be accessed by all the instances and is being bounced around the various SGAs causing expensive block transfers between instances.
A solution is make the index a Reverse Key Index.
CREATE INDEX bowie_reverse_idx ON bowie(id) REVERSE;
A Reverse Key Index simply takes the index column values and reverses them before inserting into the index. “Conceptually”, say the next generated ID is 123456, Oracle will reverse it to 654321 before inserting into the index. It will then take the next generated ID 123457 and reverse it to 754321 and insert it into the index and so on. By doing this, inserts are spread across the whole index structure, ensuring the right most block is no longer the only index leaf block being hammered. Index contention is dramatically reduced or eliminated entirely.
Reverse Key Indexes address a specific problem but may in turn introduce a number of problems themselves.
One problem is the simple fact index entries are no longer sorted in their natural order. Value 123456 is no longer adjacent to value 123457 in the index structure, they’re likely to be found in completely different leaf blocks. Therefore a range predicate (such as BETWEEN 123450 and 123460) can no longer be found by a single index probe, Oracle would be forced to search for each specific index value separately as each value in the range is likely to be in differing leaf blocks.
This makes it all just too difficult and troublesome for the Cost Based Optimizer (CBO). As a result, the CBO totally ignores Reverse Key Indexes when processing Range Predicates (eg. BETWEEN, <, >, <=, >=, LIKE etc.). Even innocent looking range predicates such as “BETWEEN 123456 and 123457”, with just the 2 values of interest are ignored by the CBO. A 10053 trace shows how the CBO totally ignores Reverse Key Indexes and doesn’t even bother to cost such accesses when processing Range Predicate conditions.
In the above example and in scenarios where it’s possible and practical to convert range predicates, use an IN clause instead, e.g. “IN (123456, 123457)” as Oracle will convert this into an OR clause with each equality condition usable with the Reverse Key Index.
Oracle is also clever enough to identify equality conditions that may be written as a range scan (e.g. BETWEEN 123456 and 123456) and use a Reverse Key Index accordingly.
Hints won’t work either. You may be able to force Oracle into performing a Full Index Scan but it will not perform an Index Range Scan with a Range Predicate.
But doesn’t all this mean I’m wrong when I suggested a Reverse Key Index can be used by the CBO to use Index Range Scans.
I’ve only described how Oracle ignores the use of a Reverse Key Index for Range Predicates, however Index Range Scans are quite possible.
Remember, a Reverse Key Index will reverse all values and if two values happen to have the same value or two index entries happen to have the same leading column, then all such values are indeed stored together and are logically adjacent to one another.
For example, if the Reverse Key Index is Non-Unique, Oracle must perform an Index Range Scan, even for equality predicates. I discussed this in some detail when discussing the differences between a Unique and a Non-Unique Index. Even if the column or columns have a PK or a Unique Key Constraint, Oracle will still check the next index entry “just in case” there are indeed duplicate values. Also, although usually used for monotonically index columns, there’s nothing preventing you from creating a Reverse Key Index on a Non-Unique column and all duplicate values must reside together in the index structure. Therefore an equality search that uses any Non-Unique Reverse Key Index will generate an Index Range Scan access
But even Unique indexes can be used to perform an Index Range Scan.
If you have a multi-column Unique Index but not all columns are being searched (although the leading column must be known), then again, all index values with the same leading column (or columns) must be stored together in the Reverse Key Index and an Index range Scan can be performed for such equality conditions.
For some examples of what I’ve discussed see this Reverse Key Part I Demo.
So yes, a Reverse Key Index can indeed be used by the CBO to perform Index Range Scans.
There are also a number of other issues (and indeed myths) associated with Reverse Key Indexes that will be discussed in the fullness of time.