InSync11 Conference Fast Approaching (Khe Sanh) July 27, 2011
Posted by Richard Foote in InSync11.3 comments
Just a short note to remind everyone that the excellent InSync11 Conference to be held this year at the Sydney Convention Centre on 16-17 August 2011 is but a few weeks away. With a great lineup of experts such as Tom Kyte, Tim Hall, Graham Wood, Chris Muir, Connor McDonald, Tony Jambu, Marcelle Kratochvil to name but a very few (of my mates), it should be a fantastic event for anyone interested in Oracle Technology.
I’ll be presenting a little something on “10 Things You possibly Don’t Know About Oracle Indexes” so I hope to catch up with some of you there 🙂
PS: Don’t let the picture of the folks on the InSync11 website frontpage put you off from attending (can you spot me, the odd one out !!)
Bitmap Indexes & Minimize Records_Per_Block (Little Wonder) July 19, 2011
Posted by Richard Foote in Bitmap Indexes, MINIMIZE RECORDS_PER_BLOCK, Oracle Indexes.Tags: Minimize Records_Per_Block
8 comments
As mentioned in my first post regarding the use of Bitmap Indexes to service Not Equal predicates, an index entry in a Bitmap Index consists of the indexed value and a pair of rowids that determine the range of potential rows spanned by the corresponding bitmap sequence. However, Oracle has no way of determining how many rows might actually be in each specific data block and so must make an assumption that all blocks might hold the maximum number of rows that could potentially fit within a block and assign a bitmap bit accordingly. If a row doesn’t actually exist, then it’s simply a “phantom” row and is assigned a 0 to signify that it doesn’t contain the value of the index entry.
This maximum number of possible rows that could potentially fit in a block is called the “Hakan Factor” and is determined at the creation of the table based on the definition of the table (such as number of columns, type of columns, whether they’re nullable, etc.) and of course the block size. The smaller the possible size of the row, the more rows that could fit in a block and the more bits that need to be assigned by the Bitmap Index to cover all possible rowids within the rowid range within a Bitmap Index entry. As an example within an 8K block, taking into consideration block overheads, the maximum number of rows within a block that Oracle can potentially estimate can be as many as 736 rows.
These additional 0s that get assigned to cater for rows that might not actually exist, although compressed to some degree, still takes up some space within the index. This additional space can be very significant if:
– The difference between the minimum possible size of a row and the actual average size of a row is large (or to put it another way, if the difference between the estimated number of rows per blocks and the actual number of rows per block is large)
– The effective clustering of the indexed data is poor within the table as this will limit the effective compression of the additional 0 bits
To highlight how all this can make a significant difference to the size of a Bitmap Index, a simple demo as usual to illustrate.
First, I’m going to create a table that has a number of nullable VARCHAR2(100) fields, so they might contain up to 100 characters or perhaps no value at all. The potential size of a row might be tiny or it might be quite large.
SQL> create table muse (id number, code number, name1 varchar2(100), name2 varchar2(100), name3 varchar2(100), name4 varchar2(100), name5 varchar2(100), name6 varchar2(100), name7 varchar2(100), name8 varchar2(100), name9 varchar2(100), name10 varchar2(100)); Table created.
OK, time to populate the table. A couple of key points with the data I’m going to use.
Firstly, the CODE column is going to have 100 distinct values but these values will be evenly distributed throughout the entire table. So the clustering associated with this column will be terrible, as bad as it gets.
Secondly, although all the VARCHAR2(100) columns might not contain much data (or indeed any at all), in actual fact they’re going to be almost fully populated with data. So although the potential average size of a row could have been quite tiny, in actual fact all the rows are quite large. Although we could potentially have fitted many rows within our (8K) block, in actual fact we’re only going to be able to fit just 7 rows per block. There isn’t actually a single block within our table that contains more than 7 rows.
SQL> insert into muse select rownum, mod(rownum,100), 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia','Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia', 'Black Holes and Revelations is a really good album featuring my favourite track Knights Of Cydonia' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'MUSE', estimate_percent=>null, cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks, avg_row_len from dba_tables where table_name='MUSE'; TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN ---------- ---------- ---------- ----------- MUSE 1000000 145549 998
Let’s now create a Bitmap Index on the CODE column. I’ll set the PCTFREE to 0 to build the smallest possible index structure:
SQL> create bitmap index muse_code_i on muse(code) pctfree 0; Index created. SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, num_rows from dba_indexes where index_name = 'MUSE_CODE_I'; INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY NUM_ROWS ----------- ----------- ----------------------- ---------- MUSE_CODE_I 400 4 800
So the Bitmap Index currently consists of 400 leaf blocks.
As we now know this table has rows that on average are considerably larger than the minimum possible row size, the Bitmap Index has had to cater for the possible existence of many rows that don’t actually exist. Additionally, as the clustering of the indexed data is very poor, the Bitmap Index will not be able to effectively compress these additional 0 bits as much as it might, as there are bits set to 1 littered all over the place that will hamper the effective compression capabilities of the index (I’ve discuss the impact of the Clustering Factor on the effectiveness of Bitmap Index compression previously).
Therefore, it might well be beneficial to more accurately determine the number of rows that really exist within a block. We can change the Hakan Factor by altering the table with the MINIMIZE RECORDS_PER_BLOCK clause. Effectively this results in Oracle performing a full table scan, checking for the number of rows per block (a quick check of the nrow count in the block header suffices) and keeping track of the block that currently contains the most number of rows. The highest value of the nrow count within the table becomes the new Hakan Factor.
Let’s give it a go:
SQL> alter table muse minimize records_per_block; alter table muse minimize records_per_block * ERROR at line 1: ORA-28602: statement not permitted on tables containing bitmap indexes
Unfortunately, this statement is not permitted if there are already any Bitmap indexes assigned to the table as they have already been based on the current Hakan Factor. All current Bitmap Indexes assigned to the table must first be dropped.
SQL> drop index muse_code_i; Index dropped. SQL> alter table muse minimize records_per_block; Table altered.
OK, so now Oracle has a much more accurate picture of the actual number of rows that exist within a block in this table. The new Hakan Factor is based on the maximum number of rows that actually currently exist within a block in the table (just 7 in this specific example), which is significantly less than was defined previously. Oracle ensures the integrity of the new Hakan Factor from here on in by now limiting the number of rows that can be inserted into blocks within the table to this new value, even if in the future additional rows could potentially have fitted within a block. Once the Hakan Factor is reached, the block is taken off the freelist or marked as full in an ASSM segment.
Now any Bitmap Index on this table only has to cater for a relatively small number of rows per block, vastly reducing the number of bits that need to be considered and stored.
This can significantly reduce the overall size of associated bitmap indexes:
SQL> create bitmap index muse_code_i on muse(code) pctfree 0; Index created. SQL> select index_name, leaf_blocks, avg_leaf_blocks_per_key, num_rows from dba_indexes where index_name = 'MUSE_CODE_I'; INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY NUM_ROWS ----------- ----------- ----------------------- ---------- MUSE_CODE_I 150 1 300
The new Bitmap Index is now only 150 leaf blocks in size, substantially smaller than the previous 400 leaf blocks.
Bitmap Indexes and Not Equal Part II (Sheep) July 7, 2011
Posted by Richard Foote in Bitmap Indexes, NOT Equal, Oracle Indexes.11 comments
An excellent comment/question by mdinh made me realise my demos in Part I might be a little extreme in returning 0 rows and perhaps give the false impression that Not Equal conditions are only considered or applicable if no rows are returned. This is not the case and with the bitmap index now considered with Not Equal conditions, the choice of whether or not to actually use the index as usual comes down to the comparative costs associated with the available plans.
So, I’ll expand on my demo a tab by introducing a new value for the FLAG column:
SQL> update radiohead 2 set flag = 1 3 where rownum < 101; 100 rows updated. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD',cascade=>true, estimate_percent=>null, method_opt=> 'FOR COLUMNS FLAG SIZE 5'); PL/SQL procedure successfully completed.
OK, so now we have some 100 rows which have a value of FLAG which are not equal to 42, which are evenly distributed among all 5 CODE values. I’ve created a histogram however on the FLAG column as the 2 values (1 and 42) are not evenly distributed.
Let’s run the query now:
SQL> select * from radiohead where code = 1 and flag <> 42; 20 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2786215024 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 300 | 46 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | RADIOHEAD | 20 | 300 | 46 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | | 3 | BITMAP MINUS | | | | | | |* 4 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I | | | | | |* 5 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_FLAG_I | | | | | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("CODE"=1) 5 - access("FLAG"=42) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 47 consistent gets 0 physical reads 0 redo size 775 bytes sent via SQL*Net to client 395 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 20 rows processed
We notice a few key points. Firstly, as we have a histogram on the FLAG column and the data is perfectly evenly distributed among the CODE values, the CBO has got the estimated cardinality of 20 rows spot on. So all things being equal, we can have some confidence the CBO has done the right thing and selected the most efficient execution plan.
We also notice that the cost has now gone up considerably to 46 (up from 3) but it’s still significantly less than the cost of 761 associated with a Full Table Scan. Therefore, the CBO has still chosen the same execution plan with the two bitmap indexes returning the 20 rows, as it did when it returned none in the previous example.
In answer to another comment/question by SJ12345, regarding the use of unbounded predicates, if we now try the following:
SQL> select * from radiohead where code = 1 and flag > 42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2939001425 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 150 | 6 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | RADIOHEAD | 10 | 150 | 6 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | | 3 | BITMAP AND | | | | | | | 4 | BITMAP MERGE | | | | | | |* 5 | BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I | | | | | |* 6 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I | | | | | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("FLAG">42) filter("FLAG">42) 6 - access("CODE"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2 consistent gets 0 physical reads 0 redo size 435 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Yep, cheap with Bitmap Indexes says the CBO. Note the difference here though is that the CBO uses a BITMAP MERGE to first get all possible rowid values of FLAG that are > 42 and then uses a BITMAP AND operation in combination with the CODE Bitmap index to get all rowids that match from both Bitmap indexes. However, as it evaluates the Bitmap Index on the FLAG index first and there are no index entries with a value > 42, it doesn’t have to actually worry about the CODE condition as no rows can possibly be returned. Therefore a very tiny 2 consistent gets are all that are necessary.
The following will looking for anything < than 42, remembering we now have 20 rows that meet this condition:
SQL> select * from radiohead where code = 1 and flag < 42; 20 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2939001425 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 300 | 8 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | RADIOHEAD | 20 | 300 | 8 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS| | | | | | | 3 | BITMAP AND | | | | | | | 4 | BITMAP MERGE | | | | | | |* 5 | BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I | | | | | |* 6 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I | | | | | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("FLAG"<42) filter("FLAG"<42) 6 - access("CODE"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 6 consistent gets 0 physical reads 0 redo size 775 bytes sent via SQL*Net to client 395 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 20 rows processed
Yep, also cheap with Bitmap Indexes, using the same plan as the previous > than example but using more consistent gets as there are a number of rows that need to be accessed this time (although all in the same data block).
To now complete the picture:
SQL> select * from radiohead where code = 1 and (flag < 42 or flag > 42); 20 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3720408756 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 30 | 450 | 14 (0)| 00:00:01 | | 1 | CONCATENATION | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID | RADIOHEAD | 10 | 150 | 6 (0)| 00:00:01 | | 3 | BITMAP CONVERSION TO ROWIDS| | | | | | | 4 | BITMAP AND | | | | | | | 5 | BITMAP MERGE | | | | | | |* 6 | BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I | | | | | |* 7 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I | | | | | | 8 | TABLE ACCESS BY INDEX ROWID | RADIOHEAD | 20 | 300 | 8 (0)| 00:00:01 | | 9 | BITMAP CONVERSION TO ROWIDS| | | | | | | 10 | BITMAP AND | | | | | | | 11 | BITMAP MERGE | | | | | | |* 12 | BITMAP INDEX RANGE SCAN | RADIOHEAD_FLAG_I | | | | | |* 13 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I | | | | | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - access("FLAG">42) filter("FLAG">42) 7 - access("CODE"=1) 12 - access("FLAG"<42) filter(LNNVL("FLAG">42) AND "FLAG"<42) 13 - access("CODE"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 775 bytes sent via SQL*Net to client 395 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 20 rows processed
We now get a combination of both previous plans, concatenated together. Note in this case, it’s actually a cheaper and more efficient alternative to the first Not Equal example. Having got the setup for this demo, you can of course create the same demo yourselves and have a play and experiment. Me, I’m now off to watch Cadel Evans win the Tour De France 🙂
Bitmap Indexes and Not Equal (Holy Holy) July 5, 2011
Posted by Richard Foote in Bitmap Indexes, NOT Equal, Oracle Indexes.14 comments
Way back, I previously discussed how the CBO will simply ignore any possible indexes when determining the best execution plan involving a NOT EQUAL(<>) condition, even if an index might in theory provide the most efficient access path. Oracle just assumes that the vast majority of rows are likely to be returned and so doesn’t even bother to cost and consider any potential indexes. The previous discussion was aimed specifically at B-Tree indexes, but as a comment at the time by Christian Antognini highlighted, things are a little different for Bitmap indexes. Thought it might be worth exploring this point a little further.
To start and to recap, I’ll begin by creating a simple little table, populated with 1,000,000 rows. It has 2 columns of interest for now, one called CODE which has just 5 distinct values and another called FLAG that only has the 1 distinct value (a value of ’42’ naturally):
SQL> create table radiohead (code number not null, type number not null, flag number not null, name varchar2(30)); Table created. SQL> insert into radiohead select mod(rownum,5)+1, mod(rownum,20)+1, 42, 'ZIGGY' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit; Commit complete.
I’ll begin by creating standard B-Tree indexes on these columns:
SQL> create index radiohead_code_i on radiohead(code); Index created. SQL> create index radiohead_type_i on radiohead(type); Index created. SQL> create index radiohead_flag_i on radiohead(flag); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we run a query that returns all rows that don’t have a FLAG value of 42 (of which there are none):
SQL> select * from radiohead where flag <> 42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2516349655 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 15 | 762 (2)| 00:00:10 | |* 1 | TABLE ACCESS FULL| RADIOHEAD | 1 | 15 | 762 (2)| 00:00:10 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("FLAG"<>42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2770 consistent gets 0 physical reads 0 redo size 435 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Note that although the CBO has estimated it will likely only return just the 1 row, it has opted to go for a Full Table Scan. A 10053 trace would show that the index on the FLAG column wasn’t even considered by the CBO. The use of the Not Equal (<>) condition has totally negated the use of the available index.
If we look at a query now on the CODE column:
SQL> select * from radiohead where code = 1; 200000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2516349655 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 200K| 2929K| 761 (2)| 00:00:10 | |* 1 | TABLE ACCESS FULL| RADIOHEAD | 200K| 2929K| 761 (2)| 00:00:10 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=1) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 2809 consistent gets 0 physical reads 0 redo size 1602034 bytes sent via SQL*Net to client 824 bytes received via SQL*Net from client 41 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 200000 rows processed
As there are only 5 evenly distributed values, the CBO has got the cardinality estimate spot on and has decided that visiting the table 200,000 times via the index is just too expensive and that the Full Table Scan is the more efficient method. Fair enough.
If we now run a query that looks for all values of a specific CODE but only if the FLAG is not 42 (which again is going to return 0 rows):
SQL> alter session set events '10053 trace name context forever'; Session altered. SQL> select * from radiohead where code = 1 and flag <> 42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2516349655 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 15 | 762 (2)| 00:00:10 | |* 1 | TABLE ACCESS FULL| RADIOHEAD | 1 | 15 | 762 (2)| 00:00:10 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("FLAG"<>42 AND "CODE"=1) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 2770 consistent gets 0 physical reads 0 redo size 435 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Again, the Full Table Scan is the way to go says the CBO. The index on the FLAG column is not considered and the index on the CODE column is just too expensive. A 10053 trace confirms this:
***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
Table: RADIOHEAD Alias: RADIOHEAD
#Rows: 1000000 #Blks: 2764 AvgRowLen: 15.00
Index Stats::
Index: RADIOHEAD_CODE_I Col#: 1
LVLS: 2 #LB: 1950 #DK: 5 LB/K: 390.00 DB/K: 2755.00 CLUF: 13775.00
Index: RADIOHEAD_FLAG_I Col#: 3
LVLS: 2 #LB: 1950 #DK: 1 LB/K: 1950.00 DB/K: 2755.00 CLUF: 2755.00
Index: RADIOHEAD_TYPE_I Col#: 2
LVLS: 2 #LB: 1950 #DK: 20 LB/K: 97.00 DB/K: 2755.00 CLUF: 55100.00
Access path analysis for RADIOHEAD
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for RADIOHEAD[RADIOHEAD]
Table: RADIOHEAD Alias: RADIOHEAD
Card: Original: 1000000.000000 Rounded: 1 Computed: 0.20 Non Adjusted: 0.20
Access Path: TableScan
Cost: 762.05 Resp: 762.05 Degree: 0
Cost_io: 750.00 Cost_cpu: 259683730
Resp_io: 750.00 Resp_cpu: 259683730
Access Path: index (AllEqRange)
Index: RADIOHEAD_CODE_I
resc_io: 3147.00 resc_cpu: 114411172
ix_sel: 0.200000 ix_sel_with_filters: 0.200000
Cost: 3152.31 Resp: 3152.31 Degree: 1
Best:: AccessPath: TableScan
Cost: 762.05 Degree: 1 Resp: 762.05 Card: 0.20 Bytes: 0
***************************************
Note that the index on the FLAG column is not even mentioned within the possible execution plans and the index on the CODE column has a cost of 3152.31 which is way more than the Full Table Scan cost of 762. So the Full Table Scan is selected, even though no rows are returned and the CBO estimates that just 1 row is likely to be returned. OK, let’s now drop the B-Tree indexes and replace them with Bitmap indexes:
SQL> drop index radiohead_code_i; Index dropped. SQL> drop index radiohead_type_i; Index dropped. SQL> drop index radiohead_flag_i; Index dropped. SQL> create bitmap index radiohead_code_i on radiohead(code); Index created. SQL> create bitmap index radiohead_type_i on radiohead(type); Index created. SQL> create bitmap index radiohead_flag_i on radiohead(flag); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'RADIOHEAD', cascade=> true, estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we now run the same query again on the FLAG column:
SQL> select * from radiohead where flag <> 42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2516349655 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 15 | 762 (2)| 00:00:10 | |* 1 | TABLE ACCESS FULL| RADIOHEAD | 1 | 15 | 762 (2)| 00:00:10 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("FLAG"<>42) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 2770 consistent gets 0 physical reads 0 redo size 435 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
We notice that again the index is not chosen, even though the bitmap index stores references to those rowids where this condition is not true (a bitmap value of 0) and even though the CBO estimates only the 1 row is likely to be returned. To see why this is the case, let’s look at a partial bitmap index entry via a block dump of the bitmap index:
Block header dump: 0x01c01d1c
Object id on Block? Y
seg/obj: 0x13e38 csc: 0x00.1234e2a itc: 2 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x1c01d18 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x02 0xffff.000.00000000 0x00000000.0000.00 C— 0 scn 0x0000.01234e2a
Leaf block dump
===============
header address 214311524=0xcc62264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0x28
kdxcofeo 960=0x3c0
kdxcoavs 920
kdxlespl 0
kdxlende 0
kdxlenxt 29367581=0x1c01d1d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[4499] flag: ——, lock: 0, len=3537
col 0; len 2; (2): c1 2b
col 1; len 6; (6): 01 40 2c 82 00 00
col 2; len 6; (6): 01 40 2c c4 00 7f
col 3; len 3516; (3516):
cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff
ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff ff ff ff
ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff
ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cd ff
ff ff ff ff 07 ff 29 ff ff ff ff ff ff ff ff cf ff ff ff ff ff ff ff ff cf
…
We notice the bitmap index entry consists of the indexed value (c1 2b), a start rowid (01 40 2c 82 00 00), an end rowid (01 40 2c c4 00 7f) and a bitmap string for which a bit corresponds to every row within the rowid range, set to either 1 (for true) or 0 (for false). The 0s are compressed and represented by a value based on the actual number of compressed bits.
However, if the bitmap entry only has a start and end rowid range, how does it actually know the location of all the corresponding rows, as there could be differing number of rows for any of the given data blocks. How does it know just how many rows actually exist within the rowid range ?
The answer is that it can’t possibly know. Therefore, Oracle makes a very important assumption and based on the definition of the table, determines the maximum number of rows that could potentially fit within a block and assigns a bit for every possible rowid that could in theory exist within the specified rowid range (I’ll expand on this point in my next post).
If the rowid actually corresponds to an existing row, then the bit is set accordingly depending on the value of the indexed column for that row. If the rowid doesn’t exist (or doesn’t exist yet), then the corresponding bit is simply set to a 0. If there are a whole bunch of consecutive 0s for rows that don’t exist, they get compressed and the overheads are minimised.
However, the value of a bit set to 0 can therefore potentially mean one of two things. It could mean that the row exists but doesn’t have the value represented by the index entry or it could mean that the row simply doesn’t exist at all. There is no way for Oracle to tell the difference between these two scenarios.
If one is after rows for which the column has a specific value, then no problem, all the bits with a value of 1 must correspond to rows that really do exist and have the column value of interest. However, if one is after all rows for which the column value is not the one represented by a bitmap index entry (as in a <> condition), then referencing all the bits that have a 0 won’t be sufficient as they could potentially point at rows that don’t actually exist and accessing a table looking up rows that don’t exist will open up a can of worms.
Therefore, just like a B-Tree index, the CBO will not consider a Bitmap index for a query that exclusively contains a not equal or not in condition.
If we now look at the second query based on the CODE column:
SQL> select * from radiohead where code = 1; 200000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2516349655 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 200K| 2929K| 761 (2)| 00:00:10 | |* 1 | TABLE ACCESS FULL| RADIOHEAD | 200K| 2929K| 761 (2)| 00:00:10 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=1) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 2809 consistent gets 0 physical reads 0 redo size 1602034 bytes sent via SQL*Net to client 824 bytes received via SQL*Net from client 41 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 200000 rows processed
We notice that the CBO again chooses the Full Table Scan as again, the query is returning 20% of all rows and deems it too expensive to visit the table 200,000 times to retrieve the data via the index, even if the Bitmap index structure itself is relatively small and efficient. So again, no difference to the B-Tree index example.
However, if we run the third query based on both the CODE column and the <> condition on the FLAG column:
SQL> select * from radiohead where code = 1 and flag <> 42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1712231689 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost(%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 15 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | RADIOHEAD | 1 | 15 | 3 (0)| 00:00:01 | | 2 | BITMAP CONVERSION TO ROWIDS | | | | | | | 3 | BITMAP MINUS | | | | | | | 4 | BITMAP MINUS | | | | | | |* 5 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_CODE_I | | | | | |* 6 | BITMAP INDEX SINGLE VALUE| RADIOHEAD_FLAG_I | | | | | |* 7 | BITMAP INDEX SINGLE VALUE | RADIOHEAD_FLAG_I | | | | | -------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("CODE"=1) 6 - access("FLAG" IS NULL) 7 - access("FLAG"=42) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 47 consistent gets 0 physical reads 0 redo size 435 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Now we see a distinct difference from the B-Tree index example as both Bitmap Indexes have been used rather than a Full Table Scan.
In conjunction with another index that returns rowids of interest that obviously exist, a bitmap index can be used successfully to determine a Not Equal condition. By logically subtracting all the matching rowids of one bitmap index (that contains rowids than aren’t of interest) from the other bitmap index (which contains rowids that are of interest), a list of actual rowids of interest can be determined to access the table. Note this can also potentially be performed by looking up the 0 bits, as corresponding rows do not have the indexed value and any matching rowids can be proven to exist by their appearance within the other Bitmap index.
As most of this processing only involves simple bit comparisons via accesses to relatively small, efficient Bitmap index structures, the relative overheads can be significantly reduced from that of the Full Table Scan (eg. in this example, consistent gets reduced from 2770 to just 47).
So a Not Equal/Not In can be serviced via a Bitmap Index, providing another index is also accessed that returns rowids of interest.