Clustering Factor: A Consideration in Concatenated Index Leading Column Decision (Sweet Thing) February 15, 2008Posted by Richard Foote in Clustering Factor, Concatenated Indexes, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning.
Short but sweet today.
I last discussed how high cardinality columns shouldn’t necessarily be in the leading column of a concatenated index as they don’t provide the performance benefit as sometimes claimed.
If all things are equal and the columns in the concatenated index are all likely to be referenced, a simple consideration that is often forgotten when deciding which column to have as the leading index column is the Clustering Factor of the corresponding columns.
As previously discussed, the Clustering Factor determines how well aligned or ordered the index entries are in relation to the rows in the parent table. So if the rows are ordered within the table on a particular column or columns (such as a sequential ID column, a monotonically increasing date or time-stamp, etc), then an index on these columns is likely to have a very good Clustering Factor. Consequently less IOs will be required to retrieve all the required rows via the index as all the required rows will be housed in relatively few, well clustered data blocks.
It therefore makes sense to at least consider the Clustering Factor of the various columns in a concatenated index. Why ? Because if the leading column has a very good Clustering Factor, the concatenated index by definition must also have a very good Clustering Factor as all indexes are primarily ordered based on the leading indexed column. A concatenated index with a good Clustering Factor is going to be more efficient in retrieving rows from the table and more importantly, will be considered more desirably by the CBO when costing access path options.
Of course, the opposite is also true. By having a leading column with a poor Clustering Factor will mean the concatenated index will have a poor Clustering Factor, making it less efficient and less likely to be considered by the CBO.
As such, the Clustering Factor of each corresponding column in a concatenated index is at least worthy of some consideration when making the decision on how best to order the indexed columns.
This demo on Index Column Order and Clustering Factor shows how the order of columns in a concatenated index has a big impact on the Clustering Factor of the resultant index.
UPDATE: However as Tom Kyte has stated in the comments, in virtually all cases, the Clustering Factor is not really a factor (yes, pun fully intended) as subsequently columns are generally going to impact the CF anyways or the selectivity of the index is such that the improved CF is not relevant anyways.
More relevant considerations regarding the ordering of columns in an index coming I promise
A common myth or mis-perception is that when deciding how to order the columns in a concatenated, multi-column index, one should avoid placing low cardinality columns in front.
For example, if you want to create an index on two columns, column ID which has many many distinct values and column CODE which has very few distinct values, create the index as (ID, CODE) as it’ll be far more efficient than a corresponding index on (CODE, ID).
The reasoning goes that by creating the (CODE, ID) index, one decreases the performance and efficiency of using the index as Oracle will have to scan through multiple index leaf blocks containing the low cardinality column, until it eventually finds the specific index entries of interest.
Or so the theory goes …
In actual fact, there’s no real difference in navigating to the specific leaf block of interest for an index on (ID, CODE) compared to an index based on (CODE, ID), providing both indexed columns are known.
The important fact that’s missed is that the branch index entries contain column entries based on all indexed columns, or at least on as much as is necessary to uniquely identify the required navigational path. Therefore, Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know.
The only slight overhead that an index based on (CODE,ID) will have is that these branch index entries are going to be somewhat larger as it will likely require both columns for the branch index entries but likely only the one column the other way around. However, branch blocks usually take up a small percentage of the overall index structure and this (usually) minor overhead is very unlikely to make a difference to the index height.
This demo on Index Column Cardinality Order shows how Oracle navigates to a specific leaf block of interest in the same manner and with the same costs, regardless of the ordering of low and high cardinality columns in the index. It also shows and describes a couple of index branch block dumps to highlight how Oracle uses the column values to define the necessary navigational path.
So the high cardinality column shouldn’t necessarily be the column given leading column status.
In actual fact there are a number of good reasons why the low cardinality column could be considered as the better option as the leading column. For a start, the index can be compressed much more efficiently if the leading column has lower cardinality. Also, an Index Skip Scan can at least be considered if the leading column has lower cardinality.
Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration, as is the clustering factor of the leading column.
All good discussions for another day
Previously, I discussed how an ALTER INDEX … COALESCE is going to be less expensive in terms of using resources than an equivalent ALTER INDEX … SHRINK SPACE COMPACT (or ALTER INDEX … SHRINK SPACE) as the Coalesce doesn’t have to concern itself with ensuring all leaf blocks at the physical end of the index segment have all been moved to allow for the storage to be de-allocated from the index segment. If you just want to de-fragment an index and not necessarily reduce the overall space allocated to the segment, use Coalesce rather than the Shrink options as it’s cheaper.
But what about an ALTER INDEX … REBUILD, when, if ever, should it be used ?
Well the answer is as with most things Oracle, it depends.
We have a table and the application deletes historical data but in a manner in which leaf blocks are not being entirely emptied. Basically, older stuff is removed, but it’s only removed in a random manner, from approximately the “earlier” 10% of the table. The index is sequenced which means only those leaf blocks in the “left-most” 10% of the index structure are impacted but all this deleted space is “deadwood” as new index entries are only being inserted into the “right-most” part of the index.
Note that basically 90% of the index is fine and very well utilised, it’s only 10% of the index that’s problematic. Of the problem 10% of leaf blocks, there’s plenty of free or deleted space, with many leaf blocks almost but not quite empty.
Coalesce (and indeed Shrink) will basically run through these 10% of fragmented leaf blocks and will merge the index row entries into as few leaf blocks as possible. With the 90% of blocks that are fine, Coalesce will basically read and then ignore them from any processing as there’s nothing that can be done for them.
Rebuild on the other hand will take an entirely different approach. It will (generally) read the entire existing index structure and will build a brand new, bright and shining index segment. As part of this process, it will rebuild the entire index, it has no choice (assuming the index isn’t partitioned, but that’s another story) and will rebuild the 90% of the index that was actually perfect to begin with. Rebuilding 90% of something that doesn’t need rebuilding doesn’t sound particularly efficient and indeed it isn’t. As a result, the index rebuild will use substantially more resources and generate substantially more redo than an equivalent Coalesce (or Shrink Space).
We have an application that deletes data and it deletes data throughout the entire index structure. The deletes are significant with a substantial proportion of the overall rows having been deleted. Additionally, the table is not going to be repopulated with anything like the same volume of data or it won’t be repopulated for a substantial period of time. As such, all this deleted index space is “deadwood” as it’s not going to be used any time soon, if at all.
Now typically in this sort of scenario, it’s of course the table as much as the associated indexes that needs to be rebuilt. That’s a key point. However, maybe Full Table Scans are not an issue for this table so the wasted space in the table is not of urgent concern. Maybe the table in not in an ASSM tablespace or in a database that supports a Table Shrink command and maybe moving the table is not an immediate option due to availability concerns. For whatever reason (or lack of reason), the index needs to be de-fragmented.
Note it’s the entire index that’s problematic here and there could be portions of the index that have very few remaining index entries.
Now poor Coalesce (and indeed Shrink) has a bit of an issue here. They both merge index entries from two blocks into the one block where it can. However, if leaf blocks are really empty, these merged index entries may in turn be merged and moved again with index entries from yet another leaf block. And maybe yet again with another leaf block. And again and again … So a specific index entry may actually be moved into several different leaf blocks during the entire process. Each of these moves requires resources and generates redo and takes time.
Now the rebuild has an entirely different approach. As mentioned, it will basically (generally) read the entire exisiting index structure and will build a brand new one, but importantly as it does so will only have to locate a specific index entry once and once only. Also, as it’s the entire index structure that’s problematic, there’s no issue with fixing the entire index, as it’s all “broken”.
As a result of only having to deal with an existing index entry the once vs. the Coalesce which may relocate a specific index entry many times, the index rebuild is going to be substantially more efficient and potentially use significantly less resources and generate less redo.
This demo of the Differences between a Coalesce, Shrink Space and Rebuild shows when one out performs the other.
Basically, Coalesce is particularly efficient and uses less resources when the percentage of the overall index structure that’s problematic and fragmented is relatively small (less than approximately 20-25% of leaf blocks). Rebuild is particularly efficient when the percentage of the overall index structure that’s problematic and fragmented is relatively large and the average degree of fragmentation within an index leaf block is relatively high. Note Pre 10g, an index needed to have at least 50% free space less pctfree in neighbouring leaf blocks for a Coalesce to be effective.
Now Rebuild (and Rebuild Online) potentially have locking implications that need to be considered although as we’ll see later, 11g has addressed some of these issues …
Differences and Similarities Between Index Coalesce and Shrink Space February 6, 2008Posted by Richard Foote in Index Coalesce, Index Shrink, Oracle General, Oracle Indexes, Performance Tuning.
As already discussed, ALTER INDEX COALESCE in 10g onwards works in a very similar manner to ALTER INDEX SHRINK SPACE.
However, there are a number of key differences.
The first thing to point out is that each command has a slightly different purpose.
Coalesce is designed specifically to reduce fragmentation within an index but not to deallocate any freed up blocks which are placed on the freelist and recycled by subsequent block splits.
Shrink is designed specifically to reduce the overall size of an index segment, resetting the High Water Mark (HWM) and releasing any excess storage as necessary.
The key difference being that Shrink must reorganise the index leaf blocks in such a way that all the freed up, now empty blocks are all grouped together at “one end” of the index segment. All these blocks can then be deallocated and removed from the index segment. This means that specific leaf block entries must be removed from these specific blocks, in order to free up the leaf blocks in this manner.
Although Coalesce in 10g performs the operation in a similar manner to that of the Shrink Space, it can be more “lazy” in how it deals with the subsequent empty blocks and places then on the segment freelist as necessary.
COALESCE and SHRINK SPACE COMPACT are logically equivalent commands. Both options will “defragment” an index by “merging” index entries where possible thus reducing the number of blocks within the logical index structure. Both will result in the same number of leaf blocks within the index and both will result in the index height not being changed.
However, there are two key differences.
1) The SHRINK SPACE COMPACT option has the disadvantage of being more expensive to process as it has to concern itself with ensuring all necessary blocks can be emptied from the physical “end” of the index segment to be subsequently deallocated. This will result in more undo and redo being generated during the defragmentation of the index than would have been generated by the same corresponding COALESCE command.
2) The SHRINK SPACE COMPACT option has the advantage of being able to immediately deallocate the empty blocks, thereby reducing the actual size of the index segment by issuing a subsequent SHRINK SPACE option (although of course this can be performed in the one step by issuing SHRINK SPACE in the first place). However, the COALESCE option will not be able to just deallocate the free space. A subsequent Index SHRINK SPACE command on a previously coalesced index will require additional undo and redo than that of a previously “Shrunk” index as the necessary empty blocks are removed from the freelist and redistributed to allow for the de-allocation of blocks and the resetting of the High Water Mark of the index segment.
Note also that the Shrink option can only be used in Automatic Segment Space Management (ASSM) tablespaces.
Use Coalesce when the intent is to just defragment an index, knowing that the freed leaf blocks will be recycled by subsequent block splits, as it uses less resources than an equivalent Index Shrink Space.
Use Shrink Space when the intent is to reduce the actual storage allocated to an index, for example in the scenario where a table has permanently reduced its size and the index is unlikely to reuse the freed storage.
This demo highlights the Differences (and similarities) between an Index Coalesce and an Index Shrink Space.
Note however, that an index REBUILD might actually use substantially less resources than either a Coalesce or a Shrink Space and might reduce the height of an index as well.
But that’s a discussion for another day …
ALTER INDEX COALESCE: 10g Improvements (Jump They Say) February 5, 2008Posted by Richard Foote in Index Coalesce, Oracle General, Oracle Indexes, Performance Tuning.
I thought it might be worth mentioning some interesting changes in the manner in which the ALTER INDEX … COALESCE command works since Oracle 10g.
Basically the purpose of the COALESCE option is to reduce free space within the Leaf Blocks of an index. This is achieved by effectively performing a Full Index Scan of the leaf blocks, comparing the free space available in neighbouring blocks. In 9i, the basic method was to logically start with the left most leaf block and see if it could be coalesced or merged with the 2nd left most block. This required the sum of used space within these 2 blocks to be less than 100% of a block less the PCTFREE value. If so, the contents were merged with the contents of one block placed in the other and with the now empty leaf block removed from the index structure and placed on the index freelist.
It then looked at the 2nd leaf block (which might now be the first block if previously coalesced) and 3rd leaf blocks to see if these could be coalesced. If so they were merged and the empty block placed on the freelist.
And so on and so on until all leaf blocks had been traversed and all possible leaf blocks coalesced.
Note branch blocks are not directly merged during this process, except to be updated with modified pointer information if a leaf block coalesce had taken place. However, if enough leaf blocks are removed such that the branch block contains no more pointers to leaf blocks (or other intermediate branch blocks), it’s also removed from the index structure. However, there must always be at least one branch block from each level remaining hence the height of an index always remains the same during a coalesce operation.
Note if no leaf block had 50% or more free space, nothing would be coalesced as no two consecutive leaf blocks would have sufficient free space in which to be coalesced.
In 10g, the Coalesce operation has been modified somewhat.
An index no longer requires the sum of used space plus PCTFREE in adjacent blocks to be less than 100% of a block be effectively coalesced. For example, the free space in a block can be 25% in one leaf block and just 25% in the adjacent block (hence the combined used space alone being 150% of a block) and 10g can effectively coalesce these leaf blocks together.
This demo show how Coalesce differs between a 9i (184.108.40.206) and a 10g (10.2.0.3) database.
10g introduced the concept of being able to SHRINK an index and the Coalesce option can be viewed as now being very similar to an index Shrink command. Similar but not quite the same.
I’ll cover the similarities and differences between a Coalesce and a Shrink in the next day or two …
Indexing NULLs: (Empty Spaces) January 23, 2008Posted by Richard Foote in Indexing NULLs, Indexing Tricks, Oracle General, Oracle Indexes, Performance Tuning.
There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.
Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.
However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.
The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.
If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.
So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.
But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?
Well not quite.
Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.
Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.
However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.
See this demo on Indexing Null Values for examples on how this all works.
Introduction To Reverse Key Indexes: Part III (A Space Oddity) January 18, 2008Posted by Richard Foote in Index Block Splits, Index Internals, Oracle Indexes, Performance Tuning, Reverse Key Indexes.
A possibly significant difference between a Reverse and a Non-Reverse index is the manner in which space is used in each index and the type of block splitting that takes place.
Most Reverse Key Indexes are created to resolve contention issues as a result of monotonically increasing values. As monotonically increasing values get inserted, each value is greater than all previous values (providing there are no outlier values present) and so fill the “right-most” leaf block. If the “right-most” block is filled by the maximum current value in the index, Oracle performs 90-10 block splits meaning that full index blocks are left behind in the index structure. Assuming no deletes or updates, the index should have virtually 100% used space.
However, it’s equivalent Reverse Key index will have the values reversed and dispersed evenly throughout the index structure. As index blocks fill, there will be a very remote chance of it being due to the maximum indexed value and 50-50 block splits will result. The PCT_USED is likely therefore to be significantly less, averaging approximately 70-75% over time.
Therefore, for indexes with no deletions, a Reverse Key index is likely to be less efficient from a space usage point of view.
However, if there are deletions, the story may differ.
Deleted space can be reused if an insert is subsequently made into an index block with deleted entries or if a leaf block is totally emptied. However, if a leaf block contains any non-deleted entries and if subsequent inserts don’t hit the leaf block, then the deleted space can not reused. As monotonically increasing values in a non-reverse index only ever insert into the “right-most” leaf block, it won’t be able to reuse deleted space if leaf blocks are not totally emptied. Overtime, the number of such “almost but not quite empty” index leaf blocks may in some scenarios increase to significant levels and the index may continue to grow at a greater proportional rate than the table (where the reuse of space is set and controlled by the PCTUSED physical property).
However, Reverse Key indexes will be able to reuse any deleted space as they evenly distribute inserts throughout the index structure. Overtime, the index is likely to grow at a similar proportional rate as the table.
For indexes that have deletions resulting in many sparsely (but not totally emptied) leaf blocks, a Reverse Key index could be more efficient from a space usage point of view.
See this demo Differences in Space Usage Between a Reverse and a Non-Reverse Index for further details.
In Part I, we saw how with Reverse Key Indexes, Oracle will basically take the indexed value, reverse it and then index the reversed value. As a result, data that would ordinarily be logically sorted within an index structure will now be randomly distributed. This therefore negates the use of Reverse Key Indexes with range predicates, with the CBO not even considering them in its costings.
This is all the information we need to dispel a rather bizarre suggestion that has been doing the rounds regarding using Reverse Key Indexes to deal with LIKE predicates that have a leading wildcard. For example, such a suggestion can be found here and within an OTN discussion here.
Basically the suggestion is to:
1) Create a Reverse Key Index on the column to be searched with a LIKE predicate having a leading wildcard (such %, _).
2) Instead of writing the query as usual, e.g.
SELECT * FROM bowie_table WHERE name LIKE ‘%BOWIE’
rewrite the query programmatically such as:
SELECT * FROM bowie_table WHERE name LIKE ‘EIWOB%’;
by reversing the required text and now having the wildcard at the end.
The Reverse Key Index stores the data in a reversed format identical to say ‘EIWOB’, so Oracle should be able to use the Reverse Key Index to efficiently find all rows that start with ‘EIWOB’ as they’re all grouped together within the index structure, right ?
Ignoring the fact the example in the above link is somewhat meaningless as it uses a leading and a trailing wildcard in both queries and so assuming the first query only has a leading wildcard and the second query only has a trailing wildcard, this suggested use of a Reverse Key Index can not possibly work on any current version of Oracle.
There are a few fundamental problems with this suggestion but in summary not only will it not work but worse, it will actually return the wrong results.
The suggestion is correct as far as indeed, using a normal index to return data with a LIKE statement containing a leading wildcard will negate the use of an index range scan, the CBO doesn’t even consider it. An index hint may push Oracle to use a Full Index Scan, but not an Index Range Scan.
However using a Reverse Index Key to solve this is unfortunately doomed to failure for two very simple reasons.
One, as we have already seen, Oracle also ignores Index Range Scans for Reverse Key Indexes with range predicates and unfortunately, a query such as WHERE name LIKE ‘EIWOB%’ is a range scan. The CBO simply doesn’t consider the Reverse Key Index in it’s deliberations.
Two, is of course that Oracle has no possible way of knowing that when you say LIKE ‘EIWOB%’, what you really mean is search for all records ending with BOWIE, LIKE ‘%BOWIE’. How can Oracle possibly know this ? If it could use the index (which it can’t) Oracle would only reverse the search string around anyways and use the index to look for indexed entries beginning with ‘BOWIE’ within the index structure, remembering everything is of course stored in reverse within the index.
So Oracle is actually searching for all records starting with ‘EIWOB’, not ending with ‘BOWIE’ which are two entirely different things.
The net result of using this suggested strategy is not good.
1) Oracle ignores the Reverse Key Index anyways as a LIKE ‘EIWOB%’ is a range predicate
2) Oracle therefore performs a Full Table Scan anyways
3) As the query is effectively searching for all records that start with ‘EIWOB’, not as expected all records that end with ‘BOWIE’, the two queries in the example will actually return completely different results
The Reverse Key Indexes Part II Demo shows how this suggested use of a Reverse Key Index is a very very bad idea …
However, if you want to solve the issue of efficiently finding the results of a LIKE ‘%BOWIE’, there are some possible approaches one could take that will use an index and return correct results.
One possible solution (as mentioned in the OTN link listed at the beginning) is to create a Function-Based Index using the REVERSE Function, (Warning: this function is undocumented and unsupported):
CREATE INDEX bowie_reverse_func_i ON bowie(REVERSE(name));
A query such as WHERE REVERSE(name) LIKE ‘EIWOB%’ or better still WHERE REVERSE(name) LIKE REVERSE(‘%BOWIE’) can now both potentially use the index.
The reverse function will reverse the name column (from say ‘DAVID BOWIE’ to ‘EIWOB DIVAD’) and the LIKE range predicate can work with the index as it’s a Function-Based index rather than a Reverse Key Index and it’s not using a LIKE with a leading wildcard. A column containing ‘DAVID BOWIE’, but stored as ‘EIWOB DIVAD’ within the index, can be found efficiently via an index range scan using this Function-Based Index.
I’ve included an example on effectively using a Function-Based Index with the Reverse Function at the end of the above demo. There’s also a discussion and other alternatives at Gints Plivna’s Blog.
Another alternative is to use an Oracle Text Index, which also has the capability of dealing logically with queries such as %BOWIE% but as they say, that’s a topic for another day.
More on Reverse Key Indexes to come as well.
Introduction To Linguistic Indexes – Part II January 9, 2008Posted by Richard Foote in Indexing Tricks, Linguistic Indexes, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
As previously discussed, Linguistic Indexes can potentially be useful with case-insensitive searches and sorts.
However, they have a number of issues and disadvantages.
The first issue is that once the NLS_COMP parameter is set to ‘LINGUISTIC’ and the NLS_SORT parameter is set to something other than ‘BINARY’, standard binary indexes can no longer be used and are ignored by the CBO. This means one needs to have a very careful and consistent indexing strategy to ensure no SQL statements are compromised while Linguistic related NLS parameters are set. Simple demo highlighting issues with mixing Linguistic and Binary Indexes here. Note these demos follow those in Introduction To Linguistic Indexes Part I.
The next issue is that Linguistic Indexes are ignored for some types of predicate conditions. MIN, MAX and LIKE can not be used with Linguistic Indexes (although LIKE can now be used with 11g). Simple demo highlighting problems with these predicate conditions here.
Finally, Linguistic Indexes typically use more storage than Binary indexes and so have more associated overheads and costs. The differences in storage is dependent on the charactersets associated with the various indexes. Some examples of differences shown here. Warning: This demo has lots of block dumps !!
Linguistic Indexes are worthy of consideration, but so are the associated costs and disadvantages.
A question on the OTN forum has prompted me to quickly knock up a demo on the possible dangers of the default behaviour in 10g with regard to the METHOD_OPT option in DBMS_STATS.
When collecting statistics with DBMS_STATS in 9i, the default value of METHOD_OPT was ‘FOR ALL COLUMNS SIZE 1’. This basically says to Oracle please only collect basic column statistics (min, max, distinct values etc.), do not collect histograms on these columns. For columns that are evenly distributed and for columns that are not referenced in SQL statements, this is perfectly adequate. If a column was unevenly distributed and detrimentally impacted the CBO’s costings of an execution plan, then one could generate histograms for those particular columns separately.
However, this default behaviour changed in 10g and IMHO this change is possibly the most significant and problematic difference when migrating to 10g.
The new default value of METHOD_OPT with 10g is ‘FOR ALL COLUMNS SIZE AUTO’. This basically means that Oracle will automatically decide for us which columns need histograms and which columns don’t based on what it considers to be the distribution of values within a column and based on the “workload” associated with the table (basically are there any SQL statements running in the database referencing columns which might need histograms for those statements to be costed correctly).
This sounds like an ideal scenario, just let Oracle work it out for us.
However, the problem is that Oracle in many cases doesn’t do a particularly good job at determining when it should generate a histogram and when it shouldn’t. In fact, the likelihood is that Oracle will actually generate many many many unnecessary histograms while still missing out on some columns that should have them.
In environments with few tables and with few users executing few distinct SQL statements, the impact of some unnecessary histograms may be minimal. However in environments with many tables and columns (potentially many thousands) with many users executing many different SQL statements, the ramifications of potentially suddenly having thousands of additional histograms can be disastrous.
Note also that by having a histogram, Oracle changes the manner in which the DENSITY statistic for a column is calculated (as stored in DBA_TAB_COLUMNS). This is often used by Oracle to determine the selectivity of predicates so the impact of suddenly having additional unnecessary histograms can be wider and more significant than one might initially imagine.
Of course, the impact on the shared_pool and the row_cache and it’s associated latches in particular can be extremely ugly indeed if suddenly Oracle had to deal with thousands of new histograms when parsing statements.
This silly little demo, “Dangers of default METHOD_OPT behaviour in 10g“, creates a simple little table with three columns. The first column has an outlier value and as previously discussed here, a histogram might be required to correctly cost range scans. The second column is perfectly distributed, it has 10 distinct values with 100,000 occurrences of each. The third column is also perfectly distributed but it’s a special example in that it has only 1 distinct value.
As you can see by the results of the demo, Oracle has got it wrong one way or the other in varying degrees in all three examples. It hasn’t created a histogram when it was needed and created histograms when they weren’t needed, impacting the Density column statistics as a result.
My advice. Just be very careful when using the default method_opt ‘FOR ALL COLUMNS SIZE AUTO’ behaviour in 10g.
Differences between Unique and Non-Unique Indexes (Part III) December 30, 2007Posted by Richard Foote in Constraints, Index Internals, Oracle Indexes, Performance Tuning, Unique Indexes.
A comment by Robert in Part II of this series reminded me of another subtle difference between Unique and Non-Unique Indexes. Now this difference is likely to be of minimal consequence to most applications as most applications don’t generally have problems with Primary Key (PK) or Unique Key (UK) constraint violations (and if they do, this is likely to be the least of their worries). But it’s a interesting difference nonetheless, something to keep in the back of your mind and a little tit-bit to end the year on.
When a row is inserted into a table or when a PK or UK is modified, Oracle of course needs to ensure that either the PK or UK constraint is not violated. If the constraint is policed via a Unique index, as previously discussed, Oracle knows the value must and can only ever be unique and so performs the constraint validation before the Unique index is actually modified. If the PK or UK is violated, the Unique index can not possibly have been changed as all the associated index entries must always be unique and so only the undo (and redo) of the changes associated with the table data blocks are actually generated and need to be subsequently rolled back.
However, if the PK or UK constraint is policed via a Non-Unique index, the mechanism for applying the changes differs somewhat. As the index is Non-Unique, as previously discussed, Oracle is not quite so certain as to the state of play and performs the constraint validation after the associated changes are made to the Non Unique index. If the PK or UK constraint is violated, both undo and redo of the Non-Unique index has been generated and both changes to the table data blocks and the index blocks need to be rolled back.
This means there’s an extra cost associated with violating a constraint if the constraint is policed via a Non-Unique Index vs. a Unique index. When performing media recovery, it also means that there’s an additional cost associated with performing the recovery. Obviously the more frequent the constraint violations, the greater the overall penalties. Also, the larger the PK or UK values, the greater the penalties.
See this little demo to illustrate the differences between a Unique and a Non-Unique index in the redo and undo generated when a constraint is violated: Difference in redo and undo between a Unique and a Non-Unique Index.
As mentioned, this difference in behaviour between Unique and Non-Unique Indexes is unlikely to be an issue. However, in applications or environments where there may be a significant number of such violations, it may be something to keep in the back of your mind.
For a more detailed discussion and where it could be an issue, see Eric Emrick’s presentation.
Local Index Issue With Partitioned PK and Unique Key Constraints December 20, 2007Posted by Richard Foote in Constraints, Index Access Path, Local Indexes, Oracle Indexes, Partitioning, Performance Tuning, Unique Indexes.
Nuno Souto (Noons) also asked a really interesting question on my Differences between Unique and Non-Unique Indexes blog entry (comment 4) that I thought it worthy of a separate blog entry to do the answer justice. The question was:
“Isn’t it still the case that unique indexes cannot be locally partitioned unless the partition key is part of the index key? Not sure if 11g removes this. If still so, that would weigh heavily in favour of non-unique indexing for PK on a table potentially requiring local index partitions.”
Simplistically, the answer to the first part is Yes it is still the case, even in 11g and the answer to the second part is No, it wouldn’t weigh heavily in favour of non-unique indexing for PK on a table requiring local index partitions. It wouldn’t actually be a consideration at all.
Let me explain why.
Firstly, there is a really really good reason why Oracle doesn’t allow us to create a Unique Index in which the Partition key is not part of a Local Index. It’s called protecting us from ourselves !!
Let’s start by mentioning constraints again.
Remember, the main reason we have indexes policing PK and Unique constraints is so that Oracle can very quickly and efficiently determine whether or not a new value already exists. Do a quick index look-up, is the value there, yes or no, allow the insert (or update), yes or no.
Just imagine for one moment what would happen if Oracle actually allowed us to create a Unique Local index in which the index didn’t include the partitioned column(s).
Lets say a table is Range Partitioned on column ‘A’ and we try and create a Unique Local index on just column ‘B’. Let’s assume we have (say) 500 table partitions meaning we must therefore have 500 local index partitions as well. When we insert a new value for our unique index for value B, it will attempt to do so in the corresponding local index partition as governed by the value A for the new row. However Oracle can’t just check this one index partition for uniqueness to ensure value of column B doesn’t already exist, Oracle would need to check all 500 index partitions because it would be possible for our new value of column B to potentially have previously been inserted into any of the other 499 partitions !!
Each and every insert into our partitioned table (partitioned by column A) therefore would require Oracle to check all (say)500 index partitions each and every time to check for duplicates of column B. Again, it’s important to understand that any given value of column B could potentially be in any of the 500 partitions, IF Oracle allowed us to create a Local Partitioned Index just on column B.
Checking all 500 index partitions looking for a specific value of column B would obviously be impractical, inefficient and totally un-scalable. Therefore Oracle doesn’t allow us to do this. It doesn’t allow us to create a Local index in which the indexed columns does’t include the partitioning columns as well.
This is actually a good thing.
If you want to create a Unique index in a partitioned table, you MUST either add all the partitioned columns and make it part of the LOCAL unique index (so that way each and every insert would only have to check the one local partition as this value is known now it’s part of the index) or you must create it as a GLOBAL index (in which again, Oracle only has to check the one index structure).
It actually makes a lot of sense to do this.
Moving onto the second part of the question. Let’s just use a Local Non-Unique index to police our PK constraints then.
Fortunately this isn’t allowed either for exactly the same reasons. You can’t create a Local Non-unique index to police a PK (or Unique) constraint if the Constraint does not also include the partitioned columns. Otherwise again, Oracle would need to check each and every index partition to determine whether the constraint has been violated or not.
If you attempt to use an existing Local Non-Unique index to police a PK or Unique constraint that does not contain the partitioned columns, you will get an error saying it can’t create the (by default Global index) because the useless Local Non-Unique index (from a policing the constraint point of view) already exists.
Again if you want to create a Non-Unique index to police a PK or Unique constraint you must either ensure the constraint includes all the partitioned columns in which case it can be Local or you must use a Global Non-Unique index.
In other words, the rules apply equally to both Unique and Non-Unique indexes.
So it’s not really a case of Oracle not allowing one to create a Local Unique index without including the partitioned columns (although that’s of course true) but really a case of Oracle not allowing a PK or Unique *constraint* to be policed via *any* Local index (whether Unique or Non-Unique), unless the partitioned columns are also included.
Little demo to illustrate: Local Index Issue With Partitioned PK and Unique Key Constraints
Back when Oracle8 was released, Oracle introduced a number of new features with regard to constraints.
The first was the option of making a constraint DEFERRABLE, meaning the policing of a constraint can be deferred until the issuing of the COMMIT, rather than during the execution of an individual statement. This gave application developers more freedom in how they designed code, particularly with regard to the order in which parent – child data is inserted and manipulated.
The second new option was the ability to enable a constraint with NOVALIDATE, meaning Oracle would enable the constraint but not bother to check existing data to ensure nothing violated the constraint.
This could be useful in a number of scenarios. For example, you have data that currently violates the constraint but have urgent business requirements to enable the constraint ASAP preventing further violations, with the intention of cleaning up the existing violations at some future time.
Or you know the data is OK, so to reduce the overheads associated with enabling the constraint (eg. reading all the data to check for violations), you enable the constraint “immediately” with NOVALIDATE, bypassing the “redundant” checking.
Both deferrable and novalidate constraints therefore imply there “could” be data at any given point in time that violates the constraint. Therefore Oracle also introduced the ability to have non-unique indexes (rather than unique indexes) policing either PK or Unique constraints. For deferrable or novalidate constraints, the index must in fact be non-unique, as a unique index would prevent any such (temporary) violations of PK or Unique constraints.
Now, there are a number of interesting and subtle differences in the manner in which Oracle manages and processes a Unique vs. a Non-Unique index (eg. the amount of storage they use, the amount of redo they generate, the number of latches they acquire). This will be discussed in another Blog entry some other day.
Today, I just want to focus on a couple of interesting little side-effects with regard to how the CBO deals (or doesn’t deal)with NOT NULL and CHECK constraints that have been created as Deferrable or Novalidate.
In 9i, the CBO was clever enough to know that if someone searched for a NULL value but the column had a NOT NULL constraint, there couldn’t possibly be any data matching the criteria. Providing you had an index on the column, the CBO would generate an execution plan that used the index, found no NULL values and returned an empty row set accordingly. If you had no index, the CBO would be forced to use a Full Table Scan. So the CBO actually used an index in an efficient manner to search for non-existent nulls.
BUT, if the NOT NULL constraint was either deferrable or novalidated, then Oracle couldn’t know there were no nulls, there just might be. Therefore, Oracle was forced into the FTS regardless of the existence of the constraint or index, as null values are not indexed (unless part of a concatenated index).
See this demo for details: NOT NULLs demo with 9i
Since 10g, the CBO has become smarter. The NOT NULL example works in a very similar manner, except that the index is no longer required. If one searches for a NULL value on a column that has a NOT NULL constraint, the CBO automatically determines there can be no matching rows and returns the empty row set immediately with no LIOs. None, as accessing the data is simply not necessary.
BUT again, it can only do so if and only if the NOT NULL constraint is validated and nondeferrable, otherwise the CBO can’t guarantee no nulls.
See this little demo for details: NOT NULLs demo with 10g
Although we actually have applications that intentionally search for nulls on NOT NULL columns to return empty row sets, it’s not common that an application would perform such a search.
What is much more common is searching for a column value that simply doesn’t exist. If a column value doesn’t meet a business rule, it’s a good idea to police such business rules with Check constraints. 10g has extended the NOT NULL scenario to include Check constraints. If a search attempts to search for a column value that violates a check constraint, Oracle will immediately return an empty row set without performing any LIOs.
But once again, it can only do so if the check constraint has been validated and set as nondeferrable.
See this demo for a 10g check constraint example: Check Constraints with 10g
Making constraints deferrable or enabling them with novalidate can be useful. However, if possible, ensure constraints are not deferrable and validated as this provides the CBO with additional information regarding the columns that it might just put to good use.
Additionally, unless there’s a requirement to the contrary, use unique indexes rather than non-unique indexes to police uniqueness. But that’s a discussion for another day …
Invisible Indexes December 11, 2007Posted by Richard Foote in 11g, Index Access Path, Invisible Indexes, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
New in 11g are “Invisible Indexes”, which are basically indexes that exist and are maintained by Oracle but are “invisible” to the CBO. Specific sessions can be set to see these invisible indexes as necessary.
Potentially useful if one has a problematic (and very large) index causing performance issues that you want to make invisible until the specific issue is addressed without the expensive of having to drop and latter recreate the index. Also useful if you want to introduce a new index but want it to be invisible until it’s been given a workout first in a specific “test” session.
Here’s a bit of a demo: Invisible Indexes