jump to navigation

How Are NULLs Actually Indexed ? (Fascination) January 30, 2008

Posted by Richard Foote in Index Internals, Indexing NULLs, Oracle General, Oracle Indexes.
12 comments

A nice question by Jeff regarding how Oracle goes about indexing NULLs has prompted me to show how one could go about actually determining the answer. The basic question is are NULLs treated as just another column value and grouped accordingly or does Oracle have to somehow search through all the leaf blocks looking for all occurrences of these mysterious NULLs.

The answer is that NULLs are basically considered to be potentially the largest value possible by Oracle and so are all grouped and sorted together at the “end” of the index structure (assuming the column is the leading column in the concatenated index, else they’ll be listed last for each distinct column that precedes it in the index).

The fact that index range scans are just as efficient when searching for NULLs values as for any other value strongly supports this assumption, but how does one actually prove it ?

The first obvious thing to check would be to create a little table and associated index with a few rows and a few NULL column values thrown in and see the results of a SELECT … ORDER BY. One would expect the order of an ascending index to match the order of the resulting output. Indeed, NULL values are by default listed last in ORDER BY ascending listings suggesting they would likewise be grouped and sorted last within an index.

The next thing to check would possibly be to use the DUMP function to again see what Oracle is likely to do with NULL values. The DUMP function displays the raw decimal representation of the specific character (depending of course on the character-set) . For NULL values however, there’s actually nothing to display other than a NULL text to represent there’s nothing actually there.

The best place to check of course is within the actual index itself. By determining the actual block that stores our example index, we can perform an index block dump and look at the resultant trace file that describes a representation of the index block to see precisely how Oracle deals with NULLs within indexes.

A quick check of the HEADER_FILE and HEADER_BLOCK in DBA_SEGMENTS will give us the index segment header location.To find the associated index root/leaf block simply add 1 to the HEADER_BLOCK.

Dump the block via the

ALTER SYSTEM DUMP DATAFILE a BLOCK b

command and look at the trace file in USER_DUMP_DEST (where ‘a‘ represents the datafile id and ‘b represents the block id determined from dba_segments).

The resultant output clearly shows that yes:

  1. Leading column NULLs values are all grouped together
  2. They are all listed at the “end” of the index structure
  3. Any NULLs in the non-leading indexed columns are listed “last” for each distinct value in the leading columns in which they appear
  4. Any index entry consisting of nothing but NULLs are not actually stored within the index

This NULLs Index Dump demo goes through this entire process with a little working example and describes the relevant section of the index block dump.

I spend some time discussing block dumps in my seminar as it’s an extremely useful tool when determining and learning how Oracle actually works.

Indexing NULLs: (Empty Spaces) January 23, 2008

Posted by Richard Foote in Indexing NULLs, Indexing Tricks, Oracle General, Oracle Indexes, Performance Tuning.
44 comments

There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.

Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.

However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.

The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.

If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.

So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.

But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?

Well not quite.

Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.

Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the  index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.

However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.

 See this demo on Indexing Null Values for examples on how this all works.

Follow

Get every new post delivered to your Inbox.

Join 1,861 other followers