Indexing NULLs: (Empty Spaces) January 23, 2008Posted by Richard Foote in Indexing NULLs, Indexing Tricks, Oracle General, Oracle Indexes, Performance Tuning.
There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.
Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.
However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.
The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.
If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.
So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.
But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?
Well not quite.
Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.
Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.
However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.
See this demo on Indexing Null Values for examples on how this all works.