jump to navigation

Indexing A Column With Just One Distinct Value (All The Madmen) August 10, 2011

Posted by Richard Foote in CBO, Oracle Indexes.

When one thinks about a column that might benefit from being indexed, one generally considers columns with lots of different values so that the selectivity of the column is such that relatively few rows get selected, making the index appealing to the cost based optimizer.
There are of course many exceptions to this generalisation and there are times when an index is the perfect and most efficient method when selecting 100% of all data, even if it involves a normal index range scan reading every row out of the table.
There are also times when a column that has very few distinct values should be indexed, even with a standard B-tree index.
In this example, I’ll index a column that only has just the 1 single value and the CBO will choose to use the index gladly.
Just going to create my standard little table and create an index on the CODE column:

SQL> create table bowie (id number, code number, name varchar2(50));
Table created.
SQL> create index bowie_code_i on bowie(code);
Index created.



I’ll now load the table with 1 million rows but all the CODE column will remain unpopulated and consist only of NULLS:

SQL> insert into bowie select rownum, null, 'Ziggy Stardust and the Spiders From Mars' from dual connect by level <= 1000000;
1000000 rows created.
SQL> commit;
Commit complete.



I’ll only populate one in every 10,000 rows that a CODE value of 42 (of course):

SQL> update bowie set code = 42 where mod(id,10000) = 0;
100 rows updated.
SQL> commit;
Commit complete.



Let’s collect accurate statistics, however note I’m not collecting histograms even though there are relatively few rows that have a CODE value of 42:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=> true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.



Indeed, the column only has the one value (42) which occurs relatively infrequently in my data, most rows however remain as NULL:

SQL> select code, count(*) from bowie group by code;
      CODE   COUNT(*)
---------- ----------
        42        100



Note that the index statistics clearly shows it only has the 1 distinct value. Remember, NULLS are not indexed by default in B-Tree indexes and as a result the index is tiny with just the one leaf block:

SQL> select blevel, leaf_blocks, distinct_keys from dba_indexes where index_name='BOWIE_CODE_I';
---------- ----------- -------------
         0           1             1



Note also that the column statistics clearly highlight there’s just the one distinct value, however it also records that there are many rows (999900) that are NULL:


SQL> select column_name, num_distinct, num_nulls from dba_tab_columns where table_name = 'BOWIE' and column_name = 'CODE';
------------ ------------ ----------
CODE                    1     999900



Therefore, Oracle with accurate statistics and without requiring any histograms has all the information it needs to know when selecting rows that contain the one and only distinct value of our CODE column, that it will only actually be selecting 100 rows out of the 1 million rows in the table.


SQL> select * from bowie where code = 42;
100 rows selected.

Execution Plan
Plan hash value: 1602289932
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT            |              |   100 |  4700 |   101   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |   100 |  4700 |   101   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_CODE_I |   100 |       |     1   (0)| 00:00:01 |

Predicate Information (identified by operation id):
   2 - access("CODE"=42)

          1  recursive calls
          0  db block gets
        102  consistent gets
          0  physical reads
          0  redo size
       1423  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed



The CBO has the rows estimate/cardinality spot on (100) and has decided that the index is indeed the most efficient access path to select the 100 rows of interest.

With flag and bollean type columns, it might be worth some consideration simply storing the one value when one value is relatively rare, such that the resultant index can take advantage of not having to store any of the corresponding NULL values.
Even columns with just the one distinct value can potentially benefit from being indexed …