jump to navigation

Indexes and NOT Equal (Not Now John) August 13, 2008

Posted by Richard Foote in Index Access Path, NOT Equal, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
trackback

The Cost Based Optimizer (CBO) is a rather complex piece of code that has to deal with countless different possible scenarios when trying to determine what the most optimal execution plan might be. It’s also a vitally important piece of code because not only do the decisions need to be reasonably accurate so that it doesn’t generate inefficient execution plans but it needs to make these decisions in a reasonably efficient manner else it wastes resources and most importantly wastes time while it performs its calculations.

So there’s a trade-off between ensuring the CBO makes reasonable decisions while ensuring it makes its decisions in a timely and resource efficient manner. Database performance could be directly impacted if these trade-offs are not managed effectively.

Therefore, there are all sorts of short cuts and assumptions that are coded into the CBO to make its life a little easier. However, these short cuts can sometimes be problematic if they’re not recognised and handled appropriately.

One of these little short cuts worth noting is how the CBO deals with NOT EQUAL (and NOT IN) conditions …

Typically when we have a condition where we just say NOT EQUAL, we’re basically suggesting we’re interested in the vast majority of possible values with the exception of the value specified in the NOT EQUAL condition. We want most values but not if it’s this particular value.

For example, a condition where we state something such as:

WHERE TEXT <> ‘BOWIE’

means we want all the other possible values of TEXT, just not those with the specific value of ‘BOWIE’. In other words, we’re typically interested in the vast majority of possible values when we specify a NOT EQUAL condition.

However, we all know that typically, Oracle will not use an index if generally a relatively “high” percentage of rows are to be selected. It would generally be more efficient and less costly to simply perform a Full Table Scan if most rows are going to be returned anyways.

Therefore the CBO simply ignores indexes when costing a NOT EQUAL condition. Why bother going to all the overhead of calculating the cost of using an index to retrieve the vast majority of rows when a Full Table Scan is going to be the cheaper alternative in the vast majority of such cases. So the CBO doesn’t even bother trying and ignores all indexes that could potentially be used to retrieve the rows based on the NOT EQUAL condition.

But what if the data isn’t evenly distributed and the NOT EQUAL condition actually retrieves only a relatively small proportion of the rows. What if most rows actually have the value specified in the NOT EQUAL condition and the remaining rows constitute a relatively small proportion of the remaining rows ?

When the CBO ignores indexes, it ignores indexes in all cases. Even if 99.99% of rows match the value in the NOT EQUAL condition and there’s only a handful of remaining rows to actually be retrieved, the code path in the CBO is still followed and indexes are ignored regardless. The reason possibly being such queries could be re-written to avoid the use of the NOT EQUAL condition and so its use is still suggesting a large selectivity.

The refusal of the CBO to consider an index with a NOT EQUAL condition can easily be illustrated.

First, let’s create a table and populate a TEXT column with the same value, ‘BOWIE’:

SQL> create table bowie as select rownum id, ‘BOWIE’ text from dual connect by level <= 1000000;

Table created.

Let’s make the TEXT column NOT NULL so the CBO knows all rows have a value for this column:

SQL> alter table bowie modify text not null;

Table altered.

Let’s now add a new row, one that has a different value for the TEXT column:

SQL> insert into bowie values (1000001, ‘ZIGGY’);

1 row created.

Commit complete.

So all rows have a TEXT value of ‘BOWIE’, except for just the one row which has a value of ‘ZIGGY’.

OK, let’s now create an index on this column:

SQL> create index bowie_i on bowie(text);

Index created.

Let’s now collect some statistics on this table, including a histogram on the TEXT column so that the CBO knows the data is not even distributed and that the vast number of values of TEXT are ‘BOWIE’:

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> ‘BOWIE’, cascade=> true, estimate_percent=> null, method_opt=> ‘FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> ‘BOWIE’, cascade=> true, estimate_percent=> null, method_opt=> ‘FOR COLUMNS TEXT SIZE 5′);

PL/SQL procedure successfully completed.

So only one row has a value that is NOT a ‘BOWIE’ which means an index to retrieve this one and only row would be an efficient and appropriate execution path, right ?

Well, let’s see what the CBO decides to do. First, let’s set a 10053 trace so we can see how the CBO has costed it’s possible options.

SQL> alter session set events ’10053 trace name context forever’;

Session altered.

Let’s now execute this simple and innocent looking statement:

SQL> select * from bowie where text <> ‘BOWIE’;

        ID TEXT
---------- -----
   1000001 ZIGGY

---------------------------------
| Id| Operation         | Name  | 
---------------------------------
|  0| SELECT STATEMENT  |       |
|* 1|  TABLE ACCESS FULL| BOWIE | 
---------------------------------

We note that Oracle has decided to not use the index but use a FTS instead.  If we look at the relevant parts of the 10053 trace, we note that the CBO did not even cost or consider using the index. The index was basically ignored and not considered at all:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: BOWIE  Alias: BOWIE
    #Rows: 1000001  #Blks:  2214  AvgRowLen:  10.00
Index Stats::
  Index: BOWIE_I  Col#: 2
    LVLS: 2  #LB: 2370  #DK: 2  LB/K: 1185.00  DB/K: 1105.00  CLUF: 2210.00
Access path analysis for BOWIE
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for BOWIE[BOWIE]
  Column (#2):
    NewDensity:0.000000, OldDensity:0.000000 BktCnt:1000001, PopBktCnt:1000000, PopValCnt:1, NDV:2
  Table: BOWIE  Alias: BOWIE
    Card: Original: 1000001.000000  Rounded: 1  Computed: 1.00  Non Adjusted: 1.00
  Access Path: TableScan
    Cost:  620.67  Resp: 620.67  Degree: 0
      Cost_io: 601.00  Cost_cpu: 435767288
      Resp_io: 601.00  Resp_cpu: 435767288
  Best:: AccessPath: TableScan
         Cost: 620.67  Degree: 1  Resp: 620.67  Card: 1.00  Bytes: 0

You can try to hint the query but the CBO will still ignore any RANGE SCAN operation because the CBO can’t know what all other possible potential values that are not ‘BOWIE’ might be (remembering the statistics may not necessarily be accurate). It can perform a FULL INDEX SCAN but this means reading all the leaf nodes that contain all the unwanted ‘BOWIE’ values and so it still an inefficient option:

SQL> select /*+ index (bowie bowie_i) */ * from bowie where text <> ‘BOWIE’;

-----------------------------------
| Id| Operation                   |
-----------------------------------
|  0| SELECT STATEMENT            |
|  1|  TABLE ACCESS BY INDEX ROWID|
|* 2|   INDEX FULL SCAN           |
-----------------------------------

The INDEX RANGE SCAN is simply not an option …

What is an option of course is to simply rewrite the query. One can just write the query in the “positive” sense and the index is now considered and used:

SQL> select * from bowie where text = ‘ZIGGY’;

-----------------------------------
| Id| Operation                   |
-----------------------------------
|  0| SELECT STATEMENT            |
|  1|  TABLE ACCESS BY INDEX ROWID|
|* 2|   INDEX RANGE SCAN          |
-----------------------------------

Or, if there a many different distinct values that are not ‘BOWIE’ but which in total still constitute a relatively small percentage of the total rows, then it could be re-written as follows which can make use of the index in an effective manner by concatenating two separate index range scans:

SQL> select * from bowie where text < ‘BOWIE’ or text > ‘BOWIE’;

        ID TEXT
---------- -----
   1000001 ZIGGY
------------------------------------
| Id| Operation                    |
------------------------------------
|  0| SELECT STATEMENT             |
|  1|  CONCATENATION               |
|  2|   TABLE ACCESS BY INDEX ROWID|
|* 3|    INDEX RANGE SCAN          |
|  4|   TABLE ACCESS BY INDEX ROWID|
|* 5|    INDEX RANGE SCAN          |
------------------------------------

Note this same issue applies to NOT IN conditions.

Be very careful when using NOT EQUAL conditions and be mindful of the impact they may have with your indexes.

About these ads

Comments»

1. Gus Spier - August 13, 2008

How about a bitmap index? In this test scenario, we have two values spread across 100001 rows. Now I have to wait until the morning and get to test the sameset this test against that kind of index!

2. Richard Foote - August 13, 2008

Hi Gus

Exactly the same thing applies for bitmap indexes, no difference, they’re simply ignored as well.

If Oracle thinks it’s going to retrieve most rows, a bitmap index is just as bad and inefficient as a b-tree index.

3. David Aldridge - August 13, 2008

Did you paste the wrong plan for the first select, Richard?

How about “select text from bowie where text ‘BOWIE’” … you’d imagine that the CBO would head straight for the index for that one if the table average row length were big enough to make an FTS less efficient than a full or fast full index scan.

4. Richard Foote - August 13, 2008

Hi David

Yes I did, thank-you !!

My cutting ‘n’ pasting skills have been letting me down lately as I try to format output in a readable manner.

Thanks again !!

5. Christian Antognini - August 14, 2008

Richard

About the bitmap indexes… There is at last one difference between b-tree and bitmap indexes. When several bitmap indexes can be combined the inequality can be optimized through a index scan.

Here an example (I hope that the formatting is not too bad…):

SQL> SELECT * FROM t WHERE n4 != 4 AND n5 = 5;

—————————————-
| Operation | Name |
—————————————-
| SELECT STATEMENT | |
| TABLE ACCESS BY INDEX ROWID | T |
| BITMAP CONVERSION TO ROWIDS | |
| BITMAP MINUS | |
| BITMAP MINUS | |
| BITMAP INDEX SINGLE VALUE| I_N5 |
| BITMAP INDEX SINGLE VALUE| I_N4 |
| BITMAP INDEX SINGLE VALUE | I_N4 |
—————————————-

5 – access(“N5″=5)
6 – access(“N4″=4)
7 – access(“N4″ IS NULL)

Cheers,
Chris

6. Richard Foote - August 14, 2008

Hi Chris

Yes, very good point, thank you. This is actually something I cover in my index seminar when I discuss bitmap indexes.

If you have a set of rowids and you want to elliminate some of them based on a NOT condition on another index, bitmap indexes can do this very well. It can scan the bitmap looking for all the rowids you don’t want and effectively subtracts them from a set of rowids you do want.

However, a single NOT condition will not be serviced via a bitmap index.

7. Log Buffer #110: A Carnival of the Vanities for DBAs - August 15, 2008

[...] Another guy wearing the deerstalker hat was Richard Foote. He has an excellent look at how the CBO deals with indexes and NOT EQUAL and NOT IN conditions. [...]

8. Rafael Stoever - August 18, 2008

But In my case this is in my application and it runs for the value text ‘ZIGGY’ for example, the cost ends up being equal to the first example
select * from bowie where text ‘BOWIE’

What the best solution?

9. Brian Tkatch - August 18, 2008

That is odd. Richard thanx fopr pointing that out.

It seems that BETWEEN works as well:

SQL> CREATE TABLE Bowie(Id NOT NULL, Text NOT NULL)
2 AS
3 SELECT RowNum, ‘Bowie’ FROM Dual CONNECT BY Level
SQL> CREATE INDEX Bowie_I ON Bowie(Text);

Index created.

SQL> EXEC DBMS_Stats.Gather_Table_Stats(OwnName=> NULL, TabName=> ‘Bowie’);

PL/SQL procedure successfully completed.

SQL> EXPLAIN PLAN FOR SELECT * FROM Bowie WHERE Text NOT BETWEEN ‘Bowie’ AND ‘Bowie’;

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

PLAN_TABLE_OUTPUT
—————————————————————————————–
Plan hash value: 1396266692

—————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————-
| 0 | SELECT STATEMENT | | 2 | 20 | 8 (0)| 00:00:01 |
| 1 | CONCATENATION | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| BOWIE | 1 | 10 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | BOWIE_I | 1 | | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| BOWIE | 1 | 10 | 4 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | BOWIE_I | 1 | | 3 (0)| 00:00:01 |
—————————————————————————————-

10. Richard Foote - August 25, 2008

Hi Rafael

The best solution is the most efficient solution …

If you’re only after a handful of rows in a “large” table, you’re typically going to want to access them via an appropriate index in an appropriate manner.

If by rewriting the query, you can make the CBO at least consider an index access path in such situations, then that’s generally the way to go.

11. Richard Foote - August 25, 2008

Hi Brian

A NOT BETWEEN on a single value is equivalent to a “” on a single value and so is treated as in my second example.

Good point.

12. jeff - August 26, 2008

Richard -
another good article.
You gave a good example of turning a negative into a positive (“” to ).
How about an in list predicate?
for example…
select deal_nbr
from big_table
where deal_nbr not in (select deal_nbr from deal_lookup_t)
there is an index on the big_table deal_nbr not being used in this form.

Do recommend a common approach to such a query? I’ve played with various forms of inline views and such but can’t get it to where i want/need.

thanks,
jeff

13. Pete Scott - August 28, 2008

Another way I have come across to allow efficient access for NOT type predicates is through a CASE construct. I have seen this work well for a grocer that needed to report “NOT tobacco” sales at department level… something like:
… (case when department = 1 then ‘tobacco’ else ‘non-tobacco’ end) = ‘non-tobacco’ …

But there again I do odd things…

14. Richard Foote - August 28, 2008

Hi Jeff

Unfortunately, not easily for exactly the same reasons as discussed.

If such as statement was critical and regularly run, it might mean changing how data is processed and populated with the use of additional information (such as an additional flag column or some such).

15. Richard Foote - August 28, 2008

Hi Pete

Nothing wrong with odd, especially if it works ;)

16. Bitmap Indexes and Not Equal (Holy Holy) « Richard Foote’s Oracle Blog - July 5, 2011

[...] back, I previously discussed how the CBO will simply ignore any possible indexes when determining the best execution plan [...]

17. Nathan Marston - August 26, 2011

Richard,

Another gotcha to watch out for are “NOT IN” clauses and composite indexes.

Suppose we have a table T with columns A, B, and C. We have a query with a NOT IN predicate on A, and equality predicates on B and C.

(As is often the case with an “IN” or “NOT IN” predicate, column A has a small number of distinct values).

It so happens that the most selective condition in the query is the one on column “A”, but adding B and C improves its selectivity so we decide to create a composite index on (A, B, C).

Here’s where things “go wrong”.

Because of the behaviour you describe above, that index cannot be used to range scan for the values in the “NOT IN condition” (which is what I hoped for by leading with column A in the index).

However, Oracle CAN use it to satisfy the predicates on columns B and C – and once it does that, it can filter the results to satisfy the NOT IN predicate on column A.

So yes, Oracle uses the index we wanted it to – just not in the way we expected.

Richard Foote - August 29, 2011

Thanks for that Nathan, something to be aware of.

18. Book Review: Oracle Database 11gR2 Performance Tuning Cookbook (Part 1) « Charles Hooper's Oracle Notes - March 4, 2012

[...] to a less than predicate and a greater than predicate with a logical OR between the two predicates (reference page [...]

19. mnitu61 - October 16, 2012

Hi Richard,

From at least a theoretical point of view it is in fact possible to build an example so that performing an FULL INDEX SCAN will be an efficient option. It suffice just to modify a little bit your example:

create table bowie as select rownum id, cast(to_char(null) as varchar2(10)) text from dual connect by level <= 1000000
/
insert into bowie values (1000001, ‘ZIGGY’)
/
insert into bowie values (1000002, ‘BOWIE’)
/
create index bowie_i on bowie(text)
/
exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE', cascade=> true, estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

So it will be nice if this query use the FULL INDEX SCAN

select * from bowie where text <> ‘BOWIE’
/
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     6 |   446   (3)| 00:00:06 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     1 |     6 |   446   (3)| 00:00:06 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TEXT"'BOWIE')


Statistiques
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1547  consistent gets
          0  physical reads
          0  redo size
        491  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Oh no, it seems the cbo doesn’t consider null being different from BOWIE.
So let help him a little

select * from bowie where text <> ‘BOWIE’ and text is not null
/
-----------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     1 |     6 |     2   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE   |     1 |     6 |     2   (0)|
|*  2 |   INDEX FULL SCAN           | BOWIE_I |     1 |       |     1   (0)|
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TEXT" IS NOT NULL AND "TEXT"'BOWIE')


Statistiques
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        495  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Thanks for your very interesting blog.

Richard Foote - October 16, 2012

Hi Marius

Nice, thanks for your example. I’ve just fixed up the formatting for you so others can follow it a little easier.

Thanks again :)

20. Tonydba - March 12, 2013

what if when using NOT IN with many values like NOT IN(‘A’,’B’,’C’)?How to rewrite it?

Richard Foote - April 30, 2013

Hi Tony

As discussed, if the number of rows of interest are low and you are excluding lots of distinct values, rewrite the query if possible to select the actual values of interest or introduce a new flag column that can identify such rows.

21. orastar - March 31, 2013

It seems BOWIE also has bad performance if the index is Big.
Is there any other way to tell CBO “Relax!!! I did not mean you have to do Full Table Scan ?? “

Richard Foote - April 30, 2013

Hi orastar

Depending on how the index is accessed, the size of the index is of a lesser concern if it can outperform other alternatives. The best way to make the CBO “relax” and for it not used a FTS inappropriately is to give it accurate statistics on which to base its decisions (or to give it the generally uncommon “take it easy, sit back and have a cup of tea” hint).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,703 other followers

%d bloggers like this: