jump to navigation

Bitmap Indexes and Not Equal (Holy Holy) July 5, 2011

Posted by Richard Foote in Bitmap Indexes, NOT Equal, Oracle Indexes.
trackback

Way back, I previously discussed how the CBO will simply ignore any possible indexes when determining the best execution plan involving a NOT EQUAL(<>) condition, even if an index might in theory provide the most efficient access path. Oracle just assumes that the vast majority of rows are likely to be returned and so doesn’t even bother to cost and consider any potential indexes. The previous discussion was aimed specifically at B-Tree indexes, but as a comment at the time by Christian Antognini highlighted, things are a little different for Bitmap indexes. Thought it might be worth exploring this point a little further.

To start and to recap, I’ll begin by creating a simple little table, populated with 1,000,000 rows. It has 2 columns of interest for now, one called CODE which has just 5 distinct values and another called FLAG that only has the 1 distinct value (a value of ’42′ naturally):

SQL> create table radiohead (code number not null, type number not null, flag number not null, name varchar2(30));

 Table created.

 SQL> insert into radiohead select mod(rownum,5)+1, mod(rownum,20)+1, 42, 'ZIGGY' from dual connect by level <= 1000000;

 1000000 rows created.

 SQL> commit;

 Commit complete. 

 I’ll begin by creating standard B-Tree indexes on these columns:

SQL> create index radiohead_code_i on radiohead(code);

 Index created.

 SQL> create index radiohead_type_i on radiohead(type);

 Index created.

 SQL> create index radiohead_flag_i on radiohead(flag);

 Index created.

 SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

 PL/SQL procedure successfully completed. 

If we run a query that returns all rows that don’t have a FLAG value of 42 (of which there are none):

SQL> select * from radiohead where flag <> 42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"<>42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 
Note that although the CBO has estimated it will likely only return just the 1 row, it has opted to go for a Full Table Scan. A 10053 trace would show that the index on the FLAG column wasn’t even considered by the CBO. The use of the Not Equal (<>) condition has totally negated the use of the available index.

If we look at a query now on the CODE column:

SQL> select * from radiohead where code = 1;

200000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   200K|  2929K|   761   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |   200K|  2929K|   761   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=1)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2809  consistent gets
          0  physical reads
          0  redo size
    1602034  bytes sent via SQL*Net to client
        824  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     200000  rows processed

 

As there are only 5 evenly distributed values, the CBO has got the cardinality estimate spot on and has decided that visiting the table 200,000 times via the index is just too expensive and that the Full Table Scan is the more efficient method. Fair enough.

If we now run a query that looks for all values of a specific CODE but only if the FLAG is not 42 (which again is going to return 0 rows):


SQL> alter session set events '10053 trace name context forever';

Session altered.

SQL> select * from radiohead where code = 1 and flag <> 42;

no rows selected

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"<>42 AND "CODE"=1)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 
Again, the Full Table Scan is the way to go says the CBO. The index on the FLAG column is not considered and the index on the CODE column is just too expensive. A 10053 trace confirms this:

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: RADIOHEAD  Alias: RADIOHEAD
    #Rows: 1000000  #Blks:  2764  AvgRowLen:  15.00
Index Stats::
  Index: RADIOHEAD_CODE_I  Col#: 1
    LVLS: 2  #LB: 1950  #DK: 5  LB/K: 390.00  DB/K: 2755.00  CLUF: 13775.00
  Index: RADIOHEAD_FLAG_I  Col#: 3
    LVLS: 2  #LB: 1950  #DK: 1  LB/K: 1950.00  DB/K: 2755.00  CLUF: 2755.00
  Index: RADIOHEAD_TYPE_I  Col#: 2
    LVLS: 2  #LB: 1950  #DK: 20  LB/K: 97.00  DB/K: 2755.00  CLUF: 55100.00
Access path analysis for RADIOHEAD
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for RADIOHEAD[RADIOHEAD]

  Table: RADIOHEAD  Alias: RADIOHEAD
    Card: Original: 1000000.000000  Rounded: 1  Computed: 0.20  Non Adjusted: 0.20
  Access Path: TableScan
    Cost:  762.05  Resp: 762.05  Degree: 0
      Cost_io: 750.00  Cost_cpu: 259683730
      Resp_io: 750.00  Resp_cpu: 259683730
  Access Path: index (AllEqRange)
    Index: RADIOHEAD_CODE_I
    resc_io: 3147.00  resc_cpu: 114411172
    ix_sel: 0.200000  ix_sel_with_filters: 0.200000 
    Cost: 3152.31  Resp: 3152.31  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 762.05  Degree: 1  Resp: 762.05  Card: 0.20  Bytes: 0

***************************************

Note that the index on the FLAG column is not even mentioned within the possible execution plans and the index on the CODE column has a cost of 3152.31 which is way more than the Full Table Scan cost of 762. So the Full Table Scan is selected, even though no rows are returned and the CBO estimates that just 1 row is likely to be returned. OK, let’s now drop the B-Tree indexes and replace them with Bitmap indexes:


SQL> drop index radiohead_code_i;

Index dropped.

SQL> drop index radiohead_type_i;

Index dropped.

SQL> drop index radiohead_flag_i;

Index dropped.

SQL> create bitmap index radiohead_code_i on radiohead(code);

Index created.

SQL> create bitmap index radiohead_type_i on radiohead(type);

Index created.

SQL> create bitmap index radiohead_flag_i on radiohead(flag);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

 
If we now run the same query again on the FLAG column:

SQL> select * from radiohead where flag <> 42;

 no rows selected

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

 ------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    15 |   762   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |     1 |    15 |   762   (2)| 00:00:10 |
-------------------------------------------------------------------------------

 Predicate Information (identified by operation id):
---------------------------------------------------

    1 - filter("FLAG"<>42)

 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2770  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 

We notice that again the index is not chosen, even though the bitmap index stores references to those rowids where this condition is not true (a bitmap value of 0) and even though the CBO estimates only the 1 row is likely to be returned. To see why this is the case, let’s look at a partial bitmap index entry via a block dump of the bitmap index:

 
Block header dump:  0x01c01d1c
 Object id on Block? Y
 seg/obj: 0x13e38  csc: 0×00.1234e2a  itc: 2  flg: E  typ: 2 – INDEX
     brn: 0  bdba: 0x1c01d18 ver: 0×01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0×01   0×0000.000.00000000  0×00000000.0000.00  —-    0  fsc 0×0000.00000000
0×02   0xffff.000.00000000  0×00000000.0000.00  C—    0  scn 0×0000.01234e2a
Leaf block dump
===============
header address 214311524=0xcc62264
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0×28
kdxcofeo 960=0x3c0
kdxcoavs 920
kdxlespl 0
kdxlende 0
kdxlenxt 29367581=0x1c01d1d
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——, lock: 0, len=3537
col 0; len 2; (2):  c1 2b
col 1; len 6; (6):  01 40 2c 82 00 00
col 2; len 6; (6):  01 40 2c c4 00 7f
col 3; len 3516; (3516):
 cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff
 ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff ff ff ff
 ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff
 ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff
 ff ff ff ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf
 …

We notice the bitmap index entry consists of the indexed value (c1 2b), a start rowid (01 40 2c 82 00 00), an end rowid (01 40 2c c4 00 7f) and a bitmap string for which a bit corresponds to every row within the rowid range, set to either 1 (for true) or 0 (for false). The 0s are compressed and represented by a value based on the actual number of compressed bits.

However, if the bitmap entry only has a start and end rowid range, how does it actually know the location of all the corresponding rows, as there could be differing number of rows for any of the given data blocks. How does it know just how many rows actually exist within the rowid range ?

The answer is that it can’t possibly know. Therefore, Oracle makes a very important assumption and based on the definition of the table, determines the maximum number of rows that could potentially fit within a block and assigns a bit for every possible rowid that could in theory exist within the specified rowid range (I’ll expand on this point in my next post).

If the rowid actually corresponds to an existing row, then the bit is set accordingly depending on the value of the indexed column for that row. If the rowid doesn’t exist (or doesn’t exist yet), then the corresponding bit is simply set to a 0. If there are a whole bunch of consecutive 0s for rows that don’t exist, they get compressed and the overheads are minimised.

However, the value of a bit set to 0 can therefore potentially mean one of two things. It could mean that the row exists but doesn’t have the value represented by the index entry or it could mean that the row simply doesn’t exist at all. There is no way for Oracle to tell the difference between these two scenarios.

If one is after rows for which the column has a specific value, then no problem, all the bits with a value of 1 must correspond to rows that really do exist and have the column value of interest. However, if one is after all rows for which the column value is not the one represented by a bitmap index entry (as in a <> condition), then referencing all the bits that have a 0 won’t be sufficient as they could potentially point at rows that don’t actually exist and accessing a table looking up rows that don’t exist will open up a can of worms.

Therefore, just like a B-Tree index, the CBO will not consider a Bitmap index for a query that exclusively contains a not equal or not in condition.

If we now look at the second query based on the CODE column:

SQL> select * from radiohead where code = 1;

200000 rows selected.

 
Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   200K|  2929K|   761   (2)| 00:00:10 |
|*  1 |  TABLE ACCESS FULL| RADIOHEAD |   200K|  2929K|   761   (2)| 00:00:10 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=1)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       2809  consistent gets
          0  physical reads
          0  redo size
    1602034  bytes sent via SQL*Net to client
        824  bytes received via SQL*Net from client
         41  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     200000  rows processed

 

We notice that the CBO again chooses the Full Table Scan as again, the query is returning 20% of all rows and deems it too expensive to visit the table 200,000 times to retrieve the data via the index, even if the Bitmap index structure itself is relatively small and efficient. So again, no difference to the B-Tree index example.

However, if we run the third query based on both the CODE column and the <> condition on the FLAG column:

SQL> select * from radiohead where code = 1 and flag <> 42;

no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 1712231689

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes |  Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |     1 |    15 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | RADIOHEAD        |     1 |    15 |     3   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS |                  |       |       |            |          |
|   3 |    BITMAP MINUS               |                  |       |       |            |          |
|   4 |     BITMAP MINUS              |                  |       |       |            |          |
|*  5 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I |       |       |            |          |
|*  6 |      BITMAP INDEX SINGLE VALUE| RADIOHEAD_FLAG_I |       |       |            |          |
|*  7 |     BITMAP INDEX SINGLE VALUE | RADIOHEAD_FLAG_I |       |       |            |          |
--------------------------------------------------------------------------------------------------

 
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("CODE"=1)
   6 - access("FLAG" IS NULL)
   7 - access("FLAG"=42)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         47  consistent gets
          0  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        384  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

Now we see a distinct difference from the B-Tree index example as both Bitmap Indexes have been used rather than a Full Table Scan.

In conjunction with another index that returns rowids of interest that obviously exist, a bitmap index can be used successfully to determine a Not Equal condition. By logically subtracting all the matching rowids of one bitmap index (that contains rowids than aren’t of interest) from the other bitmap index (which contains rowids that are of interest), a list of actual rowids of interest can be determined to access the table. Note this can also potentially be performed by looking up the 0 bits, as corresponding rows do not have the indexed value and any matching rowids can be proven to exist by their appearance within the other Bitmap index.

As most of this processing only involves simple bit comparisons via accesses to relatively small, efficient Bitmap index structures, the relative overheads can be significantly reduced from that of the Full Table Scan (eg. in this example, consistent gets reduced from 2770 to just 47).

So a Not Equal/Not In can be serviced via a Bitmap Index, providing another index is also accessed that returns rowids of interest.

About these ads

Comments»

1. Hemant K Chitale - July 6, 2011

Beautiful walkthrough.

It starts with answering the question “Why doesn’t the optimizer know that this query will not return any rows ?” which we are asked many times. The answer “The statistics don’t confirm absence of the value” still doesn’t get through to many people.

The final example with Bitmap Minus operations opens up possibilities.

Thanks (I just hope that I can remember this when I come across a need for it !)

Richard Foote - July 7, 2011

Hi Hemant

Thanks for the nice comments, much appreciated :)

2. mdinh - July 7, 2011

I am assuming this only works because you know the data and there is only 1 distinct value for FLAG?

Richard Foote - July 7, 2011

No that’s not correct. Great question though but it needed a new post to answer it properly.

3. SJ12345 - July 7, 2011

Great explanation. Curious to know if you tried using flag > 42 and flag < 42 separately.

Richard Foote - July 7, 2011

Thank-you.

See my followup post where you can see for yourself :)

4. Bitmap Indexes and Not Equal Part II (Sheep) « Richard Foote’s Oracle Blog - July 7, 2011

[...] excellent comment/question by mdinh made me realise my demos in Part I might be a little extreme in returning 0 rows and perhaps give [...]

5. Roelof - July 13, 2011

when i saw the explain plan for the your last statement I was wondering why oracle would do an extra index scan (6 – access(“FLAG” IS NULL)) on a column that you defined earlier as not null. Did you by any chance remove the not null constraint? because that is the only way for me to get the same explain plan. you got. testes both on 10.2.0.4 and 11.2.0.2.

Thanks for the good explanation.

Richard Foote - July 19, 2011

Hi Roelof

Yes, I probably did remove it !! Thanks for the heads-up :)

6. Bitmap Indexes & Minimize Records_Per_Block (Little Wonder) « Richard Foote’s Oracle Blog - July 19, 2011

[...] mentioned in my first post regarding the use of Bitmap Indexes to service Not Equal predicates, an index entry in a Bitmap Index consists of the indexed value and a pair of rowids [...]

7. Rick - February 10, 2012

RE Bitmap Indexes and Not Equal (Holy Holy) – and the paragraph

However, the value of a bit set to 0 can therefore potentially mean one of two things. It could mean that the row exists but doesn’t have the value represented by the index entry or it could mean that the row simply doesn’t exist at all. There is no way for Oracle to tell the difference between these two scenarios.

I concluded (incorrectly apparently) that this meant that Oracle could not use a sole bitmap index (on radiohead like the first example) to perform a count(*) since some bits = 0 may be for a non-existent row.

So I made that comment (https://forums.oracle.com/forums/thread.jspa?threadID=2342710&messageID=10137862#10137862) And Billy Verreynne promptly showed an explain plan that appears to only use the index.

How does this correlate to your paragraph?

Richard Foote - February 10, 2012

Hi Rick

Oracle can use a bitmap index to perform a count(*) (and very efficiently as well as most bitmap indexes are relatively small) as it only needs to count the ’1′s in the bitmap string (not 0′s) to do so. And as null values are also included in the bitmap index, you don’t even need the column to have a not null constraint.

The point of this article is that you can’t use the bitmap index to do a count of a predicate (without another condition), as indeed the ’0s’ do not necessarily indicate the presence of a row, whereas all 1 values do.

8. Rick - February 10, 2012

I think the OP and I just figured it out. The BITMAP CONVERSION COUNT reference in the execution plan must use ALL of the index entries. Since every existing row is in the index somewhere (even if it is NULL) Oracle can OR them all together to determine which rows exist and count them.

If nothing else you made us think. And that has to be a good thing!

Richard Foote - February 10, 2012

Hi Rick

Indeed, the Bitmap Conversion Count reference tells you the CBO has used the bitmap index to perform a count operation. Remember, it’s the 1 values in the bitmap index that are critical.

Thinking is never a bad thing :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,808 other followers

%d bloggers like this: