jump to navigation

Enable Index To Search For NULLs By Adding Constant to Index List. But Some Constants Better Than Others (Never Let Me Down) November 15, 2018

Posted by Richard Foote in Block Dumps, Index Internals, Indexing NULLs, Leaf Blocks, Oracle Indexes.
7 comments

Never Let Me Down

By default, Oracle doesn’t index an entry if all columns within the index are NULL. However, (as I’ve blogged previously), it’s possible to index all possible NULL values by simply adding a constant value to the index column list. Importantly, the CBO knows when a column has all it’s NULL values indexed and can potentially use the index accordingly.

However, the point of the article is to simple highlight that some constant values are better to use in this scenario than others…

A simple example to illustrate. First, create a table with both the CODE and GRADE columns nullable:

SQL> create table bowie (id number, code number, grade number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,100), mod(rownum,1000), 'DAVID
BOWIE' from dual connect by level 1000000;

999999 rows created.

SQL> insert into bowie values (1000000, null, null, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

If we create an index on both CODE and GRADE columns:

SQL> create index bowie_code_grade_i on bowie(code, grade);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE');

PL/SQL procedure successfully completed.

And then run a query looking for any CODE with a NULL value:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       |    1 |    24 |    1115 (3) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE |    1 |    24 |    1115 (3) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE" IS NULL)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
4030 consistent gets
   0 physical reads
   0 redo size
 784 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

We notice the CBO performs a Full Table Scan even though the CBO knows there’s likely only one row that meets the criteria. Not matter what we do, hint the query, beg, whatever, it’s impossible for the CBO to use the index because the null row is simply not indexed.

Now we come to the rub of the post.

A common recommendation is to simply add a constant to the column list. A constant is always present and associated NULL values are indexed if another index column has a corresponding Non-NULL value. Adding a constant value to the index column list guarantees all NULL values for all index columns must always be present within the index. The CBO recognises this and can therefore potentially use the index to fetch the required NULL values.

However, a common recommendation is also to use a number as the constant. There was a recent tweet I saw a few days ago that had the following example of using the number 1 as the constant value:

SQL> create index bowie_code_grade_i_2 on bowie (code, grade, 1);

Index created.

When we run the query again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

The index is indeed used to return the row with the NULL value of interest and only 4 consistent gets are performed.

So what’s the problem?

Nothing, except that perhaps a better constant might have been used, such as say a single space:

SQL> create index bowie_code_grade_i_3 on bowie(code, grade, ' ');

Index created.

If we run the query yet again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We get the exact same performance. So, what’s the point?

Well, if we look at the size of the corresponding indexes:

SQL> select index_name, leaf_blocks from user_indexes where table_name='BOWIE';

INDEX_NAME                LEAF_BLOCKS
------------------------- -----------
BOWIE_CODE_GRADE_I               2490
BOWIE_CODE_GRADE_I_2             2908
BOWIE_CODE_GRADE_I_3             2769

We notice the original index has the smallest size as expected, as it doesn’t have to index the constant value. But then we notice that the index with the constant value as the number is somewhat larger than the index with the constant value as a space.

Why?

An index block dump of both indexes will highlight why:

First a partial leaf block dump of index with the “1” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 399
kdxcofbo 834=0x342
kdxcofeo 1652=0x674
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29444101=0x1c14805
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8020] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8004] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7988] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 0d 00 af

Next, a partial leaf block dump of index with the space ” ” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 422
kdxcofbo 880=0x370
kdxcofeo 1706=0x6aa
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 29447173=0x1c15405
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8006] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7991] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 0d 00 af

We notice that the number requires 2 bytes, while the space only requires 1 byte.

So storing the constant as a single byte character, rather a 2 byte number is a free saving, which adds up with each and every index entry, by simply being a little more fastidious when selecting which constant value to use in this scenario.

Advertisements