Differences between Unique and Non-Unique Indexes (Part I) December 18, 2007Posted by Richard Foote in Constraints, Deferrable Constraints, Index Internals, Indexing Tricks, Novalidate Constraints, Oracle Indexes, Primary Key, Unique Indexes.
I’ve had a number of comments regarding my earlier blog entry where I recommended avoiding Deferrable and Novalidate constraints unless you need them and consider using Unique Indexes rather than Non-Unique Indexes where possible.
Why such a recommendation, aren’t Unique and Non-Unique indexes practically the same thing when it comes to policing constraints ?
Sure one index explicitly prevents the insertion of duplicates while the other doesn’t. Yes, dropping/disabling a constraint policed by an automatically created Unique index causes the index to be dropped if you forget the KEEP INDEX clause.
But that’s about it, right ?
Well, if you need a constraint to be deferrable, then you must create (either implicitly or explicitly) a Non-Unique index. If you want to enable a constraint with novalidate, then again you can only do so with a Non-Unique index in place policing the constraint.
It does all rather sound like Non-Unique indexes have all the advantages and allows for all the flexibility one could want. Non-Unique indexes allows for both deferrable and novalidate constraints, they don’t get dropped when the associated constraint is dropped / disabled and they can actually police both PK and Unique constraints.
What possible benefits are there in Unique Indexes ?
Well, providing you don’t need your constraints to be deferrable, you validate your constraints when they get created/enabled and you don’t go around dropping PK and/or Unique constraints on too regular a basis (or remember the KEEP INDEX clause if you don’t want your index dropped when you do), then there are a number of reasons why you may just want to consider using Unique indexes over Non-Unique indexes.
There are actually a number of key differences between Unique and Non-Unique indexes, both in the manner in which they’re stored by Oracle and in the manner in which they get processed.
In Part I, I’m just going to focus on the differences in how Oracle physically stores index entries.
In actual fact, there’s really no such thing as a Non-Unique index in Oracle. In order for Oracle to be able to determine the location of any specific index row entry and for Oracle to be able to determine an appropriate “order” for each index row entry, internally, Oracle coverts all Non-Unique indexes into a Unique index. It does this by using the associated ROWID of the index row entry as an additional “column”. As each ROWID is unique, this effectively makes all index entries in a Non-Unique index unique as well. Oracle uses the unique combination of the Non-Unique index value and the associated ROWID to then determine the appropriate order and hence appropriate location within the index structure in which to store the index row entry.
By Oracle making the ROWID an additional column, it also has to allocate an additional byte per index row entry in order to store the length of this column. That’s one teeny weeny little byte extra for each and every index row entry.
So what ?
Well, for indexes that don’t have a particularly large index key length, that one byte can be a significant proportion of the overall key length. Now Oracle needs to allocate 2 byes per row entry for various flags and locking information, it requires 6 bytes for the rowid and 1 byte for each column entry. That’s 9 bytes minimum plus the length of the indexed value itself.
Well how large is a typical unique index entry? Well that of course all depends and some PK / (and especially) Unique values can be quite large. But many many PK values are simply sequenced based numerical values, created nice and small so as to reduce overheads when stored in dependent child tables.
But can it really make any noticeable difference ?
Well, this little demo shows two tables with 1 million numeric PK values: Compare internal index storage between Unique and Non-Unique Indexes
Table test1 is created with a Non-Unique Index, table test2 is created with a Unique Index. The demo shows a partial block dump of a leaf block from each index, highlighting how the Non-Unique index requires an additional byte per index row entry.
The Unique index manages to hold 533 leaf entries in the block while the Non-Unique index could only hold 500. Comparing the total sizes of the two indexes, the Unique index required 1875 leaf blocks while the Non-Unique index required 1999 leaf blocks.
That’s an increase of approximately 6.6% in leaf blocks required for the Non-Unique index to store exactly the same number of index entries as the Unique Index (in this particular example).
That’s 6.6% less storage, that’s a reduction of 6.6% in block splitting and block allocations, that’s a reduction of 6.6% in the cost of full index scans, that’s 6.6% less memory required to cache the index, etc. etc.
The point here is that these savings don’t require any expensive, periodic rebuilding of indexes. They doesn’t require any additional fancy scripts or additional monitoring and processing. The DBA doesn’t have to calculate irrelevant statistics or demand scheduled outages to claim these savings.
This a getting more “dollars for your buck” freebie from Oracle purely and simply by using a Unique index instead of an Non-Unique index.
Note also that not one or two but ALL of your numeric based PKs have the potential to get these types of savings. Obviously the larger the actual PK or Unique key values, the lesser a byte is in proportion to the overall key length and the less percentage savings.
But it’s not a bad payback for many many of your indexes, purely and simply by using Unique indexes instead of Non-unique indexes where possible …
This is but one of the benefits of using Unique Indexes. More (potentially significant) advantages to follow …
Back when Oracle8 was released, Oracle introduced a number of new features with regard to constraints.
The first was the option of making a constraint DEFERRABLE, meaning the policing of a constraint can be deferred until the issuing of the COMMIT, rather than during the execution of an individual statement. This gave application developers more freedom in how they designed code, particularly with regard to the order in which parent – child data is inserted and manipulated.
The second new option was the ability to enable a constraint with NOVALIDATE, meaning Oracle would enable the constraint but not bother to check existing data to ensure nothing violated the constraint.
This could be useful in a number of scenarios. For example, you have data that currently violates the constraint but have urgent business requirements to enable the constraint ASAP preventing further violations, with the intention of cleaning up the existing violations at some future time.
Or you know the data is OK, so to reduce the overheads associated with enabling the constraint (eg. reading all the data to check for violations), you enable the constraint “immediately” with NOVALIDATE, bypassing the “redundant” checking.
Both deferrable and novalidate constraints therefore imply there “could” be data at any given point in time that violates the constraint. Therefore Oracle also introduced the ability to have non-unique indexes (rather than unique indexes) policing either PK or Unique constraints. For deferrable or novalidate constraints, the index must in fact be non-unique, as a unique index would prevent any such (temporary) violations of PK or Unique constraints.
Now, there are a number of interesting and subtle differences in the manner in which Oracle manages and processes a Unique vs. a Non-Unique index (eg. the amount of storage they use, the amount of redo they generate, the number of latches they acquire). This will be discussed in another Blog entry some other day.
Today, I just want to focus on a couple of interesting little side-effects with regard to how the CBO deals (or doesn’t deal)with NOT NULL and CHECK constraints that have been created as Deferrable or Novalidate.
In 9i, the CBO was clever enough to know that if someone searched for a NULL value but the column had a NOT NULL constraint, there couldn’t possibly be any data matching the criteria. Providing you had an index on the column, the CBO would generate an execution plan that used the index, found no NULL values and returned an empty row set accordingly. If you had no index, the CBO would be forced to use a Full Table Scan. So the CBO actually used an index in an efficient manner to search for non-existent nulls.
BUT, if the NOT NULL constraint was either deferrable or novalidated, then Oracle couldn’t know there were no nulls, there just might be. Therefore, Oracle was forced into the FTS regardless of the existence of the constraint or index, as null values are not indexed (unless part of a concatenated index).
See this demo for details: NOT NULLs demo with 9i
Since 10g, the CBO has become smarter. The NOT NULL example works in a very similar manner, except that the index is no longer required. If one searches for a NULL value on a column that has a NOT NULL constraint, the CBO automatically determines there can be no matching rows and returns the empty row set immediately with no LIOs. None, as accessing the data is simply not necessary.
BUT again, it can only do so if and only if the NOT NULL constraint is validated and nondeferrable, otherwise the CBO can’t guarantee no nulls.
See this little demo for details: NOT NULLs demo with 10g
Although we actually have applications that intentionally search for nulls on NOT NULL columns to return empty row sets, it’s not common that an application would perform such a search.
What is much more common is searching for a column value that simply doesn’t exist. If a column value doesn’t meet a business rule, it’s a good idea to police such business rules with Check constraints. 10g has extended the NOT NULL scenario to include Check constraints. If a search attempts to search for a column value that violates a check constraint, Oracle will immediately return an empty row set without performing any LIOs.
But once again, it can only do so if the check constraint has been validated and set as nondeferrable.
See this demo for a 10g check constraint example: Check Constraints with 10g
Making constraints deferrable or enabling them with novalidate can be useful. However, if possible, ensure constraints are not deferrable and validated as this provides the CBO with additional information regarding the columns that it might just put to good use.
Additionally, unless there’s a requirement to the contrary, use unique indexes rather than non-unique indexes to police uniqueness. But that’s a discussion for another day …