Introduction To Reverse Key Indexes: Part I January 14, 2008Posted by Richard Foote in Index Access Path, Oracle Cost Based Optimizer, Oracle Indexes, Reverse Key Indexes.
Following on from the “8 things You May Not Know About Indexes”, #7 regarding Reverse Key Indexes requires a number of posts to do the subject justice. However, Part I will focus of the specific issue related to point # 7, namely:
“A REVERSE index can quite happily be used by the CBO to perform index range scans within an execution plan”.
Reverse Key Indexes are designed to resolve a specific issue, that being index block contention. Many indexes in busy database environments with lots of concurrent inserts (and in some scenarios updates and deletes as well) can suffer from index block contention (as highlighted by high levels of “buffer busy waits” and “read by other session” wait events for the index segments). Monotonically increasing indexes, such as Primary Keys generated by a sequence, are especially prone to contention as all inserts need to access the maximum “right-most” leaf block. This is of particular concern in RAC environments, where this “hot” index block needs to be accessed by all the instances and is being bounced around the various SGAs causing expensive block transfers between instances.
A solution is make the index a Reverse Key Index.
CREATE INDEX bowie_reverse_idx ON bowie(id) REVERSE;
A Reverse Key Index simply takes the index column values and reverses them before inserting into the index. “Conceptually”, say the next generated ID is 123456, Oracle will reverse it to 654321 before inserting into the index. It will then take the next generated ID 123457 and reverse it to 754321 and insert it into the index and so on. By doing this, inserts are spread across the whole index structure, ensuring the right most block is no longer the only index leaf block being hammered. Index contention is dramatically reduced or eliminated entirely.
Reverse Key Indexes address a specific problem but may in turn introduce a number of problems themselves.
One problem is the simple fact index entries are no longer sorted in their natural order. Value 123456 is no longer adjacent to value 123457 in the index structure, they’re likely to be found in completely different leaf blocks. Therefore a range predicate (such as BETWEEN 123450 and 123460) can no longer be found by a single index probe, Oracle would be forced to search for each specific index value separately as each value in the range is likely to be in differing leaf blocks.
This makes it all just too difficult and troublesome for the Cost Based Optimizer (CBO). As a result, the CBO totally ignores Reverse Key Indexes when processing Range Predicates (eg. BETWEEN, <, >, <=, >=, LIKE etc.). Even innocent looking range predicates such as “BETWEEN 123456 and 123457″, with just the 2 values of interest are ignored by the CBO. A 10053 trace shows how the CBO totally ignores Reverse Key Indexes and doesn’t even bother to cost such accesses when processing Range Predicate conditions.
In the above example and in scenarios where it’s possible and practical to convert range predicates, use an IN clause instead, e.g. “IN (123456, 123457)” as Oracle will convert this into an OR clause with each equality condition usable with the Reverse Key Index.
Oracle is also clever enough to identify equality conditions that may be written as a range scan (e.g. BETWEEN 123456 and 123456) and use a Reverse Key Index accordingly.
Hints won’t work either. You may be able to force Oracle into performing a Full Index Scan but it will not perform an Index Range Scan with a Range Predicate.
But doesn’t all this mean I’m wrong when I suggested a Reverse Key Index can be used by the CBO to use Index Range Scans.
I’ve only described how Oracle ignores the use of a Reverse Key Index for Range Predicates, however Index Range Scans are quite possible.
Remember, a Reverse Key Index will reverse all values and if two values happen to have the same value or two index entries happen to have the same leading column, then all such values are indeed stored together and are logically adjacent to one another.
For example, if the Reverse Key Index is Non-Unique, Oracle must perform an Index Range Scan, even for equality predicates. I discussed this in some detail when discussing the differences between a Unique and a Non-Unique Index. Even if the column or columns have a PK or a Unique Key Constraint, Oracle will still check the next index entry “just in case” there are indeed duplicate values. Also, although usually used for monotonically index columns, there’s nothing preventing you from creating a Reverse Key Index on a Non-Unique column and all duplicate values must reside together in the index structure. Therefore an equality search that uses any Non-Unique Reverse Key Index will generate an Index Range Scan access
But even Unique indexes can be used to perform an Index Range Scan.
If you have a multi-column Unique Index but not all columns are being searched (although the leading column must be known), then again, all index values with the same leading column (or columns) must be stored together in the Reverse Key Index and an Index range Scan can be performed for such equality conditions.
For some examples of what I’ve discussed see this Reverse Key Part I Demo.
So yes, a Reverse Key Index can indeed be used by the CBO to perform Index Range Scans.
There are also a number of other issues (and indeed myths) associated with Reverse Key Indexes that will be discussed in the fullness of time.
Introduction to Fake / Virtual / NOSEGMENT Indexes January 11, 2008Posted by Richard Foote in Fake Indexes, Index Access Path, NOSEGMENT Option, Oracle Cost Based Optimizer, Oracle Indexes, Virtual Indexes.
OK, as promised, answer to index fact #5 you may not have known:
“It’s possible to make the CBO reference and use within an execution plan indexes that don’t in actual fact exist”.
Before I start, please note this feature is not officially documented other than the odd Metalink note and requires the setting of an undocumented parameter to work, so please exercise caution.
Fake Indexes (also known as Virtual or Nosegment Indexes) have been around for a long time, since 8i days. They’re used primarily by Oracle Enterprise Manager and its Tuning Pack which has various wizards that can do “what if” type analysis. One of these is the Index Wizard which can kinda “pretend” to create an index and see what the Cost Based Optimizer might do if such an index really existed.
It’s possible to create these Fake indexes manually by using the NOSEGMENT clause when creating an index:
CREATE INDEX Bowie_idx ON Bowie_Table(Ziggy) NOSEGMENT;
This will populate some (but not many) DD related tables but will not actually create an index segment or consume any actual storage. It’s not maintained in any way by DML operations on the parent table and it can’t be altered or rebuilt as can a conventional, “real” index (it will generate an ORA-08114 error if you try to do so). You can analyze or run dbms_stats over the index but the index is not treated as analyzed as such (as can be seen via a 10053 trace).
It’s also only visible to the CBO, if and only if a session has the following parameter set:
ALTER SESSION SET “_use_nosegment_indexes” = true;
The CBO will now consider the index and potentially include it within an execution plan. However, at execution time Oracle can of course not use the index and will revert to the next best thing.
A Fake index is basically an index you have when you don’t really have an index in order to see if it could be useful if it really exisited.
This Fake Indexes Demo shows how they work and can be used.
1 down, 7 to go …
Introduction To Linguistic Indexes – Part II January 9, 2008Posted by Richard Foote in Indexing Tricks, Linguistic Indexes, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
As previously discussed, Linguistic Indexes can potentially be useful with case-insensitive searches and sorts.
However, they have a number of issues and disadvantages.
The first issue is that once the NLS_COMP parameter is set to ‘LINGUISTIC’ and the NLS_SORT parameter is set to something other than ‘BINARY’, standard binary indexes can no longer be used and are ignored by the CBO. This means one needs to have a very careful and consistent indexing strategy to ensure no SQL statements are compromised while Linguistic related NLS parameters are set. Simple demo highlighting issues with mixing Linguistic and Binary Indexes here. Note these demos follow those in Introduction To Linguistic Indexes Part I.
The next issue is that Linguistic Indexes are ignored for some types of predicate conditions. MIN, MAX and LIKE can not be used with Linguistic Indexes (although LIKE can now be used with 11g). Simple demo highlighting problems with these predicate conditions here.
Finally, Linguistic Indexes typically use more storage than Binary indexes and so have more associated overheads and costs. The differences in storage is dependent on the charactersets associated with the various indexes. Some examples of differences shown here. Warning: This demo has lots of block dumps !!
Linguistic Indexes are worthy of consideration, but so are the associated costs and disadvantages.
A question on the OTN forum has prompted me to quickly knock up a demo on the possible dangers of the default behaviour in 10g with regard to the METHOD_OPT option in DBMS_STATS.
When collecting statistics with DBMS_STATS in 9i, the default value of METHOD_OPT was ‘FOR ALL COLUMNS SIZE 1′. This basically says to Oracle please only collect basic column statistics (min, max, distinct values etc.), do not collect histograms on these columns. For columns that are evenly distributed and for columns that are not referenced in SQL statements, this is perfectly adequate. If a column was unevenly distributed and detrimentally impacted the CBO’s costings of an execution plan, then one could generate histograms for those particular columns separately.
However, this default behaviour changed in 10g and IMHO this change is possibly the most significant and problematic difference when migrating to 10g.
The new default value of METHOD_OPT with 10g is ‘FOR ALL COLUMNS SIZE AUTO’. This basically means that Oracle will automatically decide for us which columns need histograms and which columns don’t based on what it considers to be the distribution of values within a column and based on the “workload” associated with the table (basically are there any SQL statements running in the database referencing columns which might need histograms for those statements to be costed correctly).
This sounds like an ideal scenario, just let Oracle work it out for us.
However, the problem is that Oracle in many cases doesn’t do a particularly good job at determining when it should generate a histogram and when it shouldn’t. In fact, the likelihood is that Oracle will actually generate many many many unnecessary histograms while still missing out on some columns that should have them.
In environments with few tables and with few users executing few distinct SQL statements, the impact of some unnecessary histograms may be minimal. However in environments with many tables and columns (potentially many thousands) with many users executing many different SQL statements, the ramifications of potentially suddenly having thousands of additional histograms can be disastrous.
Note also that by having a histogram, Oracle changes the manner in which the DENSITY statistic for a column is calculated (as stored in DBA_TAB_COLUMNS). This is often used by Oracle to determine the selectivity of predicates so the impact of suddenly having additional unnecessary histograms can be wider and more significant than one might initially imagine.
Of course, the impact on the shared_pool and the row_cache and it’s associated latches in particular can be extremely ugly indeed if suddenly Oracle had to deal with thousands of new histograms when parsing statements.
This silly little demo, “Dangers of default METHOD_OPT behaviour in 10g“, creates a simple little table with three columns. The first column has an outlier value and as previously discussed here, a histogram might be required to correctly cost range scans. The second column is perfectly distributed, it has 10 distinct values with 100,000 occurrences of each. The third column is also perfectly distributed but it’s a special example in that it has only 1 distinct value.
As you can see by the results of the demo, Oracle has got it wrong one way or the other in varying degrees in all three examples. It hasn’t created a histogram when it was needed and created histograms when they weren’t needed, impacting the Density column statistics as a result.
My advice. Just be very careful when using the default method_opt ‘FOR ALL COLUMNS SIZE AUTO’ behaviour in 10g.
Differences between Unique and Non-Unique Indexes (Part II) December 21, 2007Posted by Richard Foote in Index Access Path, Index Internals, Indexing Tricks, Oracle Cost Based Optimizer, Oracle Indexes, Primary Key, Unique Indexes.
The most significant difference between a Unique and a Non-Unique index is of course the simple fact that in one index, all index entries MUST be unique and in the other index there can be duplicates of an index entry.
Although an obvious distinction between the two, it’s also a crucial difference as well.
When Oracle uses a Unique Index to scan for a specific value (via an equality predicate on all indexed columns or when policing a constraint ), there can only be one of two possible results. The value can exist returning at the very most one value or the value doesn’t exist returning 0 values. That’s it, 1 row or none. The value either exists or it doesn’t.
This fact means Oracle doesn’t have to worry about a whole bunch of things when dealing with Unique indexes during equality or unique checking processes. It doesn’t have to check the next index entry just in case there’s a second or more entries with the same value. It doesn’t have to worry about the potential of having to skip across to the next leaf page if the specific value it reads happens to be the maximum value in the current leaf page. It doesn’t have to worry about pointers to these “adjacent” leaf blocks changing on it due to block splits. It doesn’t have to concern itself with potentially visiting more than the one table data block during the index access operation.
Life is simple, it’s either 1 row or none.
Not so for Non-Unique indexes. With a Non-Unique index, there are no such guarantees. With a Non-Unique index, there are 3 categories of possibilities. An index scan could return 0 rows, it could return 1 row or it could return more than one row. It could potentially need to go and visit more than the current leaf block to return all the matching rows. It could potentially need to go and visit more than one table block.
Life’s not quite so “simple” for a Non-Unique index.
Note also and most importantly that life gets no easier for a Non-Unique index that polices a PK or Unique key constraint.
Even though there’s a PK or Unique constraint on a column, to Oracle, it’s just another Non-Unique index with the same “vague” possibilities. Remember that PK and Unique constraints can be enabled with NOVALIDATE meaning that even with a PK or Unique constraint, there’s still the possibility of duplicate index entries in the Non-Unique index. Remember constraints can be DEFERRABLE, meaning that even with a PK or Unique constraint, there’s still the possibility of duplicate index entries in the Non-Unique index.
This means that Oracle has to concern itself with a number of additional overheads, including having to “check” the next index entry, “just in case” it matches the required index value. It has to concern itself even with the possibility of having to visit the next index leaf block, “just in case”.
You will note when Oracle performs an equality search using a Unique Index, Oracle will perform an “INDEX UNIQUE SCAN” because the index entries MUST be unique.
You will note however when Oracle performs an equality search using a Non-Unique index, even if there’s a PK or Unique constraint of the column(s), Oracle will perform an INDEX RANGE SCAN, because it needs to scan multiple index entries “just in case”.
So are there any actual implications as a result of any of this ?
When Oracle actually reads an index and processes the associated blocks in the buffer cache(s), Oracle uses a number of latches. These latches are used primarily to “protect” memory structures from concurrent activity. Very simplistically, by grabbing a latch, Oracle effectively performs a “lock” on the associated memory structure, perform whatever activity needs to be performed and releases the latch. These latches get grabbed and released (hopefully) extremely quickly (order of 1/10s of ms), but it’s a non zero value.
The issue with latches is that they’re a point of serialisation. If two (or more) processes want a specific latch, one (or more) has to wait. Latches also burn CPU. Only a teensy weeny bit at a time but some CPU nonetheless. They burn CPU while acquiring the latch and if fail due to latch contention, while attempting again and again to acquire the latch. They also burn CPU while performing the specific operation necessary of the latch.
Basically, the more latches, the greater the potential for contention, the greater the potential for latch related wait activity and perhaps most important of all, more CPU is required. In busy systems, there can be massive numbers of latch events and the best way to tune these events is to reduce where possible the number of latches required by the database environment. It’s one of the key reasons we try and reduce LIOs in a database as much as possible, to reduce the latch and CPU load on the system.
Because of the differences highlighted between Unique and Non-Unique indexes, the number and manner of latches required between the two indexes differs. And it differs significantly …
In this little demo, Latch Differences Between Unique and Non-Unique Indexes Demo, we compare the latches required to read an identical table, using a 2 level index. The differences between the latch overheads of a Unique and a Non-Unique index are most interesting.
When using a Unique Index, Oracle required 3 consistent gets (one for the index root block, one for the leaf block and one for the table block). BUT, each consistent get was a consistent gets – examination, a special type of consistent get which only requires 1 latch (rather than the standard 2 latches).
So that’s a sum of 3 latches.
However, when using a Non-Unique index, Oracle required 4 consistent gets (one for the index root block, one for the leaf block, one for the table block and an additional one to recheck the leaf block for any duplicate index entries). BUT, only the 1 consistent read (for the index root block) was actually the “cheaper” consistent gets – examination, the other 3 were the more costly 2 latch variety.
So that’s a sum of 7 latches.
3 latches for the Unique index and 7 latches for the Non-Unique index.
That’s an increase of 133.3% in latches between the two types of indexes.
Now, the height of the index will change the ratio of latch difference between the two indexes. Also, in a busy system, there could potentially be differences in the types of latches used due to the current state or additional activity in a block.
However, the potential difference in latch requirements between a Unique or Non-Unique index can be very significant. But does a few additional latches here and there really make much of a difference ?
Well, of course it depends. On small scale systems with smaller loads, fewer indexes, fewer users and excess resources, the noticeable differences may be negligible.
However, in larger scale (especially OLTP) environments, a particular index may be accessed 100s or maybe 1000s of times a second. There may be 1000s of tables with 1000s of corresponding PK and Unique constraints policed by 1000s of Unique (or Non-Unique) indexes. It’s therefore not really of question of a few latches here or there. It’s a question of potentially a very significant proportion of overall latch related overheads.
Potentially when accessed, Non-Unique indexes could be generating double the latch related overheads for equality unique scan or unique checking index activity. Remember, the best way to tune latches and reduce latch contention is to simply reduce the requirement and load for latches.
The overall reduction in CPU and latch related wait activity could be significant between Unique and Non-Unique indexes because by using Non-Unique indexes you in the order of double the latches required for such activities.
Note also this doesn’t require any special parameters to be set or special tuning or monitoring by the DBA. It simply requires using Unique indexes to police PK or Unique constraints when there are no requirements of Non-Unique indexes. You then potentially gain a benefit each and every time the index is used for unique scan accesses.
Guess what type of access is extremely common in large scale OLTP environments …
The next time you complain about high CPU consumption or high latch contention and you’re tuned the application to death, just ask yourself how many Non-unique indexes are policing your PK or Unique Key constraints …
Back when Oracle8 was released, Oracle introduced a number of new features with regard to constraints.
The first was the option of making a constraint DEFERRABLE, meaning the policing of a constraint can be deferred until the issuing of the COMMIT, rather than during the execution of an individual statement. This gave application developers more freedom in how they designed code, particularly with regard to the order in which parent – child data is inserted and manipulated.
The second new option was the ability to enable a constraint with NOVALIDATE, meaning Oracle would enable the constraint but not bother to check existing data to ensure nothing violated the constraint.
This could be useful in a number of scenarios. For example, you have data that currently violates the constraint but have urgent business requirements to enable the constraint ASAP preventing further violations, with the intention of cleaning up the existing violations at some future time.
Or you know the data is OK, so to reduce the overheads associated with enabling the constraint (eg. reading all the data to check for violations), you enable the constraint “immediately” with NOVALIDATE, bypassing the “redundant” checking.
Both deferrable and novalidate constraints therefore imply there “could” be data at any given point in time that violates the constraint. Therefore Oracle also introduced the ability to have non-unique indexes (rather than unique indexes) policing either PK or Unique constraints. For deferrable or novalidate constraints, the index must in fact be non-unique, as a unique index would prevent any such (temporary) violations of PK or Unique constraints.
Now, there are a number of interesting and subtle differences in the manner in which Oracle manages and processes a Unique vs. a Non-Unique index (eg. the amount of storage they use, the amount of redo they generate, the number of latches they acquire). This will be discussed in another Blog entry some other day.
Today, I just want to focus on a couple of interesting little side-effects with regard to how the CBO deals (or doesn’t deal)with NOT NULL and CHECK constraints that have been created as Deferrable or Novalidate.
In 9i, the CBO was clever enough to know that if someone searched for a NULL value but the column had a NOT NULL constraint, there couldn’t possibly be any data matching the criteria. Providing you had an index on the column, the CBO would generate an execution plan that used the index, found no NULL values and returned an empty row set accordingly. If you had no index, the CBO would be forced to use a Full Table Scan. So the CBO actually used an index in an efficient manner to search for non-existent nulls.
BUT, if the NOT NULL constraint was either deferrable or novalidated, then Oracle couldn’t know there were no nulls, there just might be. Therefore, Oracle was forced into the FTS regardless of the existence of the constraint or index, as null values are not indexed (unless part of a concatenated index).
See this demo for details: NOT NULLs demo with 9i
Since 10g, the CBO has become smarter. The NOT NULL example works in a very similar manner, except that the index is no longer required. If one searches for a NULL value on a column that has a NOT NULL constraint, the CBO automatically determines there can be no matching rows and returns the empty row set immediately with no LIOs. None, as accessing the data is simply not necessary.
BUT again, it can only do so if and only if the NOT NULL constraint is validated and nondeferrable, otherwise the CBO can’t guarantee no nulls.
See this little demo for details: NOT NULLs demo with 10g
Although we actually have applications that intentionally search for nulls on NOT NULL columns to return empty row sets, it’s not common that an application would perform such a search.
What is much more common is searching for a column value that simply doesn’t exist. If a column value doesn’t meet a business rule, it’s a good idea to police such business rules with Check constraints. 10g has extended the NOT NULL scenario to include Check constraints. If a search attempts to search for a column value that violates a check constraint, Oracle will immediately return an empty row set without performing any LIOs.
But once again, it can only do so if the check constraint has been validated and set as nondeferrable.
See this demo for a 10g check constraint example: Check Constraints with 10g
Making constraints deferrable or enabling them with novalidate can be useful. However, if possible, ensure constraints are not deferrable and validated as this provides the CBO with additional information regarding the columns that it might just put to good use.
Additionally, unless there’s a requirement to the contrary, use unique indexes rather than non-unique indexes to police uniqueness. But that’s a discussion for another day …
Outlier Values – An Enemy Of The Index December 13, 2007Posted by Richard Foote in Index Access Path, Indexing Tricks, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Outlier Values.
Outlier values are basically values that sit way way outside the standard range of a column’s normal value range.
Data can be a funny thing and sometimes there are values that are naturally “exceptional”. However, very commonly, outlier values are used by applications to represent bizarre default values, to avoid confusion with legitimate values. For example, I look after an application that uses the American Date Of Independence as it’s “default” date.
Usually, these weird outlier values are used to avoid nulls values, as nulls can be problematic and can not be indexed (well actually you can index a null column but we’ll leave that for another blog entry).
However, outlier values while (maybe) solving one problem, can introduce some very significant problems in return.
Firstly, the CBO “hates” outlier values as it potentially totally screws up the CBO’s selectivity calculations. The selectivity of a range scan is basically calculated by the CBO to be the number of values in the range of interest divided by the full range of possible values (IE. the max value minus the min value). Therefore if this calculation is invalidated by a massive and disprotionate ”hole” in the full range of possible values, the CBO can get things horribly wrong.
See here for a simple demonstration: Outlier Selectivity Problem
Additionally, indexes “hate” outlier values as it prevents Oracle using the 90-10 block split to keep indexes nice and compact and is forced to use 50-50 block splits instead. Basically a 90-10 block split is considered if and only if the index entry to be inserted is equal or greater than the current maximum value. An outlier value that is also the maximum value, usually means monotonically increasing values (such as sequences, dates, etc.) don’t actually insert the maximum value. Therefore, not only do indexes perform 50-50 splits but this 50% of free space is never used, as all new values are all almost, but not quite, maximum values.
Little demo to highlight this problem: Outlier Index Space Utilisation Problem
In summary, avoid outlier values if at all possible. They generally cause more problems than they solve !!
Invisible Indexes December 11, 2007Posted by Richard Foote in 11g, Index Access Path, Invisible Indexes, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
New in 11g are “Invisible Indexes”, which are basically indexes that exist and are maintained by Oracle but are “invisible” to the CBO. Specific sessions can be set to see these invisible indexes as necessary.
Potentially useful if one has a problematic (and very large) index causing performance issues that you want to make invisible until the specific issue is addressed without the expensive of having to drop and latter recreate the index. Also useful if you want to introduce a new index but want it to be invisible until it’s been given a workout first in a specific “test” session.
Here’s a bit of a demo: Invisible Indexes