jump to navigation

Enable Index To Search For NULLs By Adding Constant to Index List. But Some Constants Better Than Others (Never Let Me Down) November 15, 2018

Posted by Richard Foote in Block Dumps, Index Internals, Indexing NULLs, Leaf Blocks, Oracle Indexes.
7 comments

Never Let Me Down

By default, Oracle doesn’t index an entry if all columns within the index are NULL. However, (as I’ve blogged previously), it’s possible to index all possible NULL values by simply adding a constant value to the index column list. Importantly, the CBO knows when a column has all it’s NULL values indexed and can potentially use the index accordingly.

However, the point of the article is to simple highlight that some constant values are better to use in this scenario than others…

A simple example to illustrate. First, create a table with both the CODE and GRADE columns nullable:

SQL> create table bowie (id number, code number, grade number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,100), mod(rownum,1000), 'DAVID
BOWIE' from dual connect by level 1000000;

999999 rows created.

SQL> insert into bowie values (1000000, null, null, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

If we create an index on both CODE and GRADE columns:

SQL> create index bowie_code_grade_i on bowie(code, grade);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE');

PL/SQL procedure successfully completed.

And then run a query looking for any CODE with a NULL value:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       |    1 |    24 |    1115 (3) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE |    1 |    24 |    1115 (3) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE" IS NULL)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
4030 consistent gets
   0 physical reads
   0 redo size
 784 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

We notice the CBO performs a Full Table Scan even though the CBO knows there’s likely only one row that meets the criteria. Not matter what we do, hint the query, beg, whatever, it’s impossible for the CBO to use the index because the null row is simply not indexed.

Now we come to the rub of the post.

A common recommendation is to simply add a constant to the column list. A constant is always present and associated NULL values are indexed if another index column has a corresponding Non-NULL value. Adding a constant value to the index column list guarantees all NULL values for all index columns must always be present within the index. The CBO recognises this and can therefore potentially use the index to fetch the required NULL values.

However, a common recommendation is also to use a number as the constant. There was a recent tweet I saw a few days ago that had the following example of using the number 1 as the constant value:

SQL> create index bowie_code_grade_i_2 on bowie (code, grade, 1);

Index created.

When we run the query again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

The index is indeed used to return the row with the NULL value of interest and only 4 consistent gets are performed.

So what’s the problem?

Nothing, except that perhaps a better constant might have been used, such as say a single space:

SQL> create index bowie_code_grade_i_3 on bowie(code, grade, ' ');

Index created.

If we run the query yet again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We get the exact same performance. So, what’s the point?

Well, if we look at the size of the corresponding indexes:

SQL> select index_name, leaf_blocks from user_indexes where table_name='BOWIE';

INDEX_NAME                LEAF_BLOCKS
------------------------- -----------
BOWIE_CODE_GRADE_I               2490
BOWIE_CODE_GRADE_I_2             2908
BOWIE_CODE_GRADE_I_3             2769

We notice the original index has the smallest size as expected, as it doesn’t have to index the constant value. But then we notice that the index with the constant value as the number is somewhat larger than the index with the constant value as a space.

Why?

An index block dump of both indexes will highlight why:

First a partial leaf block dump of index with the “1” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 399
kdxcofbo 834=0x342
kdxcofeo 1652=0x674
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29444101=0x1c14805
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8020] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8004] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7988] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 0d 00 af

Next, a partial leaf block dump of index with the space ” ” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 422
kdxcofbo 880=0x370
kdxcofeo 1706=0x6aa
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 29447173=0x1c15405
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8006] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7991] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 0d 00 af

We notice that the number requires 2 bytes, while the space only requires 1 byte.

So storing the constant as a single byte character, rather a 2 byte number is a free saving, which adds up with each and every index entry, by simply being a little more fastidious when selecting which constant value to use in this scenario.

Advertisements

How Are NULLs Actually Indexed ? (Fascination) January 30, 2008

Posted by Richard Foote in Index Internals, Indexing NULLs, Oracle General, Oracle Indexes.
12 comments

A nice question by Jeff regarding how Oracle goes about indexing NULLs has prompted me to show how one could go about actually determining the answer. The basic question is are NULLs treated as just another column value and grouped accordingly or does Oracle have to somehow search through all the leaf blocks looking for all occurrences of these mysterious NULLs.

The answer is that NULLs are basically considered to be potentially the largest value possible by Oracle and so are all grouped and sorted together at the “end” of the index structure (assuming the column is the leading column in the concatenated index, else they’ll be listed last for each distinct column that precedes it in the index).

The fact that index range scans are just as efficient when searching for NULLs values as for any other value strongly supports this assumption, but how does one actually prove it ?

The first obvious thing to check would be to create a little table and associated index with a few rows and a few NULL column values thrown in and see the results of a SELECT … ORDER BY. One would expect the order of an ascending index to match the order of the resulting output. Indeed, NULL values are by default listed last in ORDER BY ascending listings suggesting they would likewise be grouped and sorted last within an index.

The next thing to check would possibly be to use the DUMP function to again see what Oracle is likely to do with NULL values. The DUMP function displays the raw decimal representation of the specific character (depending of course on the character-set) . For NULL values however, there’s actually nothing to display other than a NULL text to represent there’s nothing actually there.

The best place to check of course is within the actual index itself. By determining the actual block that stores our example index, we can perform an index block dump and look at the resultant trace file that describes a representation of the index block to see precisely how Oracle deals with NULLs within indexes.

A quick check of the HEADER_FILE and HEADER_BLOCK in DBA_SEGMENTS will give us the index segment header location.To find the associated index root/leaf block simply add 1 to the HEADER_BLOCK.

Dump the block via the

ALTER SYSTEM DUMP DATAFILE a BLOCK b

command and look at the trace file in USER_DUMP_DEST (where ‘a‘ represents the datafile id and ‘b represents the block id determined from dba_segments).

The resultant output clearly shows that yes:

  1. Leading column NULLs values are all grouped together
  2. They are all listed at the “end” of the index structure
  3. Any NULLs in the non-leading indexed columns are listed “last” for each distinct value in the leading columns in which they appear
  4. Any index entry consisting of nothing but NULLs are not actually stored within the index

This NULLs Index Dump demo goes through this entire process with a little working example and describes the relevant section of the index block dump.

I spend some time discussing block dumps in my seminar as it’s an extremely useful tool when determining and learning how Oracle actually works.

Indexing NULLs: (Empty Spaces) January 23, 2008

Posted by Richard Foote in Indexing NULLs, Indexing Tricks, Oracle General, Oracle Indexes, Performance Tuning.
45 comments

There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.

Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.

However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.

The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.

If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.

So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.

But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?

Well not quite.

Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.

Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the  index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.

However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.

 See this demo on Indexing Null Values for examples on how this all works.