jump to navigation

So What Is A Good Cardinality Estimate For A Bitmap Index Column ? (Song 2) April 13, 2010

Posted by Richard Foote in Bitmap Indexes, Non-Unique Indexes, Oracle Indexes, Oracle Myths.
16 comments

As I’ve discussed previously, using a Bitmap index on a unique column makes little sense as the underling index must be larger than a corresponding B-tree index due to the implicit additional overheads associated with Bitmap indexes. As such, Oracle doesn’t permit the use of a Bitmap Index on a declared unique column or to police a unique constraint.  Therefore, some amount of index entry duplication is necessary for a Bitmap index to be considered.

However, an interesting question is how much duplication is actually necessary ? At what point does a Bitmap index have the potential to be equivalent or better than a corresponding B-Tree index ?  The answer will perhaps surprise many, especially those that only consider Bitmap Indexes to be viable for so-called “low cardinality”columns on large tables where there could be many millions of occurrences of each distinct value in a column.
 
If one actually looks at what comprises an index entry in each type of index and understands somewhat how the bitmap column is comprised and effectively compressed, the rough ballpark answer becomes quite easy to determine.
 
Remember, for a non-compressed, non-unique B-Tree index, an index entry comprises:
 
The indexed column or columns (however long the index column values might be)
6 bytes for the rowid (assuming a local index or index on non-partitioned tables)
2 bytes for flag and lock bytes
1 byte for each index column (a minimum of 2 bytes)
 
So that’s 10 bytes plus the size of the actual indexed column for a single column B-Tree index. The key point here however is that there’s an index entry for each and every not null index value.
 
For a Bitmap index, an index entry comprises:
 
The index column(s)
2 x 6 byte rowid
2 bytes for flag and lock bytes
1 byte for each index column (a minimum of 4 bytes)
? bytes for the actual bitmap sequence
 
So the additional overheads comes down to the additional 6 byte rowid and the length of the bitmap column. The key point here though is that there may only need be one index entry (or Bitmap index “piece”) for each distinct indexed value. However, if the number of occurrences of each index value is very low (eg: say single figures), then it’s almost certain only one bitmap index entry (piece) would be necessary for each indexed value.
 
The number of bytes required for the actual bitmap column depends on many factors, including the number of occurrences of each indexed value and the clustering of the data in the table. However again, if the number of occurrences of each index value is very low (eg: say single figures), it means the vast majority of bits are 0 (false) within each bitmap sequence and so can be compressed extremely efficiently. If there are only a handful of 1 (true) bits within a bitmap index entry, the bitmap column is going to be tiny and effectively compressed to only a few bytes.
 
Therefore, the actual additional overheads for a bitmap index with few repeated values is only the 7 byte overhead for the additional rowid and its length and a few bytes for the actual bitmap column. But remember, this single bitmap index entry can cater for all occurrences of the indexed column, whereas the B-Tree index requires an index entry for each and every occurrence of the index column.
 
It doesn’t take much for these additional overheads for each Bitmap index entry to start to cancel out …
 
If we look at an indexed column (say length 4 bytes) that has on average just the one duplicate value:
 
Total for a B-Tree Index would be:
 
4 bytes index column
6 bytes rowid
2 bytes for each index column length byte(remembering the rowid is an index column in a non-unique index)
2 bytes for flag and lock bytes
 
= 14 bytes x 2 (for each index entry as there’s a duplicate value) = 28 bytes in total
 
For a corresponding Bitmap index:
 
4 bytes index column
12 bytes for the two rowids
4 bytes for each index column length byte (remembering the rowids and bitmap sequence are effectively additional indexed columns)
2 bytes for flag and lock bytes
2 bytes is all it takes for the bitmap sequence column (if there’s only 2 actual true bits per index entry)
 
= 24 bytes in total.
 
So we’re already in a position for a Bitmap index to potentially be the smaller and more efficient index type, even when there’s only just one duplicate on average per index column value …
 
If we have another duplicate value (so there are on average 3 occurrences of each index value), then the overheads for such a B-Tree becomes:
 
3 x 14 bytes = 42 bytes
 
but the overheads for the bitmap index only increases by a byte or so for the necessary increase in the bitmap column. So the difference in space between the two index types starts to widen significantly.
 
Obviously, the size of the index column becomes a factor in the potential savings with a Bitmap index as it only has to potentially store the index column once whereas the (non-compressed) B-Tree index needs to store all occurrences of the index value. To illustrate the comparative differences between a B-Tree and a Bitmap Index, I’m going to create various tables with columns that have different levels of cardinality and different clustering attributes for their indexed columns and compare the size differences between B-Tree and Bitmap indexes. The indexed column will simply be a small NUMBER type column to make it just that bit harder for the Bitmap index to be the more efficient.

In the first example, 1/3 of all values are unique while the remaining 2/3 of values have just 2 occurrences of each value. The column is certainly not unique but is arguably “approaching” uniqueness.
Initially, the indexed column is very well clustered within the table (although with a bitmap index, the clustering factor in the index statistics is useless as it simply denotes the number of index entries within the index).
 

SQL> create table bowie as select rownum id, ceil(rownum/1.5) key, 'David Bowie' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> select count(distinct key) from bowie;
 
COUNT(DISTINCTKEY)
------------------
            666667
 
SQL> create bitmap index bowie_bitmap_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BITMAP_I           2129            666667
 
SQL> drop index bowie_bitmap_i;
 
Index dropped.
 
SQL> create index bowie_btree_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BTREE_I            1999              3738

 

Although some may claim such a Bitmap index, one with 666,667 distinct values in a 1 million row table would be thousands of times larger and less efficient than an equivalent B-Tree index, it’s actually quite a close call. The bitmap index is only 124 leaf blocks different or approximately 6.5% larger in size than the B-Tree index.
 
If we create an equivalent table but this time with the clustering of the data all over the place:
 
 


SQL> drop table bowie2;
 
Table dropped.
 
SQL> create table bowie2 as select * from bowie order by mod(id, 100);
 
Table created.
 
SQL> create bitmap index bowie2_bitmap_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BITMAP_I          2246            666667
 
SQL> drop index bowie2_bitmap_i;
 
Index dropped.
 
SQL> create index bowie2_btree_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BTREE_I           1999            999722

 

We notice that the B-Tree index remains the same size but the Bitmap index is now a little larger by 117 additional leaf blocks. So even with an awful clustering, the Bitmap index is only 247 leaf blocks or approximately 12.3% larger than the B-Tree index. So the B-Tree just wins out in this case, the column is still just that bit too unique for the Bitmap index where we have 666,667 distinct values in a 1 million row table.
 
OK, let’s see how the indexes compare when the index column has on average 1 duplicate for each and every indexed value. There are just 2 values for each and every indexed value, 500,000 distinct values in a 1 million row table: 
 

 
SQL> drop table bowie;

Table dropped.

SQL> create table bowie as select rownum id, ceil(rownum/2) key, 'David Bowie' name from dual connect by level <= 1000000;

Table created.

SQL> create bitmap index bowie_bitmap_i on bowie(key) pctfree 0;

Index created.

SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BITMAP_I';

INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BITMAP_I           1628            500000

SQL> drop index bowie_bitmap_i;

Index dropped.

SQL> create index bowie_btree_i on bowie(key) pctfree 0;

Index created.

SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BTREE_I';

INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BTREE_I            1998              3728

 

OK, now the Bitmap index is well ahead. On a well clustered column that has 500,000 distinct values in a 1 million row table, the B-Tree index is now larger by 370 leaf blocks or by 22.7%. What if the data is poorly clustered:


SQL> drop table bowie2;
 
Table dropped.
 
SQL> create table bowie2 as select * from bowie order by mod(id, 100);
 
Table created.
 
SQL> create bitmap index bowie2_bitmap_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BITMAP_I          1806            500000
 
SQL> drop index bowie2_bitmap_i;
 
Index dropped.
 
SQL> create index bowie2_btree_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE2_BTREE_I           1998            999803

 

OK, the Bitmap index is now larger by 178 leaf blocks than it was before, but the equivalent B-Tree index is still larger by 10.6%.
 
Again, this is on a relatively small, extremely poorly clustered indexed column that has 500,000 distinct values in just a 1 million row table. The Bitmap index is smaller and more efficient than the equivalent B-Tree index.
 
If we now use a column that has on average 4 occurrences for each distinct column value (with 250,000 distinct values in a 1 million row table), the differences between a Bitmap and a B-Tree index begin to widen significantly.
 


SQL> drop table bowie;
 
Table dropped.
 
SQL> drop table bowie2;
 
Table dropped.
 
SQL> create table bowie as select rownum id, ceil(rownum/4) key, 'David Bowie' name from dual connect by level <= 1000000;
 
Table created.
 
SQL> create bitmap index bowie_bitmap_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BITMAP_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BITMAP_I            829            250000
 
SQL> drop index bowie_bitmap_i;
 
Index dropped.
 
SQL> create index bowie_btree_i on bowie(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_BTREE_I';
 
INDEX_NAME        LEAF_BLOCKS CLUSTERING_FACTOR
----------------- ----------- -----------------
BOWIE_BTREE_I            1995              3753

 

On a well clustered column with 250,000 distinct values in a 1 million row table, the Bitmap index is less that 1/2 the size of that of an equivalent B-Tree index. If the data were less so clustered:


SQL> create table bowie2 as select * from bowie order by mod(id, 100);
 
Table created.
 
SQL> create bitmap index bowie2_bitmap_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BITMAP_I';
 
INDEX_NAME      LEAF_BLOCKS CLUSTERING_FACTOR
--------------- ----------- -----------------
BOWIE2_BITMAP_I        1145            250000
 
SQL> drop index bowie2_bitmap_i;
 
Index dropped.
 
SQL> create index bowie2_btree_i on bowie2(key) pctfree 0;
 
Index created.
 
SQL> select index_name, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE2_BTREE_I';
 
INDEX_NAME      LEAF_BLOCKS CLUSTERING_FACTOR
--------------- ----------- -----------------
BOWIE2_BTREE_I         1995            999881

 

Even with a really poor clustered index column, the B-Tree index is still some 74% larger than the Bitmap index with some 250,000 distinct values.
 
Despite many claims to the contrary, including the rewrite of the awful Burleson article  that started this series on Bitmap Indexes, where it still claims that “there are rare cases where a bitmap on a column with 10,000 key values might be appropriate” and “in most cases, there will not be a lot of adjacent index values, so it quite rare to see extensive compression“, in reality it’s not that rare at all for a column with “large” numbers of distinct values to be indexed effectively via a Bitmap Index. But, you actually need to understand Bitmap Indexes to appreciated this fact and have at least some understanding on how Oracle stores and compresses the bitmap column. The Burleson description of Bitmap index compression in the above article is totally wrong and so hence are its overall conclusions. 

Here’s an actual, “real life” example. On a 2.2 million row table with a column on people last names, (name columns are often considered way too distinct for Bitmap indexes), there were approximately 6.5 occurrences of each last name on average over the whole table. The most compact B-Tree index on the Last Name column required 5286 leaf blocks but the equivalent Bitmap index only required 2151 leaf blocks, way less than 1/2 the size, even though the clustering factor was terrible at nearly 2 million.
 
As a rough rule, any column that has an on average of just 1 duplicate per distinct column value (just 2 occurrences per distinct value) is a potential candidate for a bitmap index.
 
Bitmap indexes should only be considered in Data Warehouse, low concurrency DML type environments due to their locking implications and certainly pre 10g, Bitmap indexes had growth issues after significant DML changes. However it’s a complete nonsense to suggest that Bitmap indexes should only be considered with columns with “few” distinct values, else things will run 100s of times slower.
 
500,000 distinct values in a 1 million row table is not really that “few” at all is it …

Indexes On Small Tables Part III (Another Brick In The Wall Part III) April 29, 2009

Posted by Richard Foote in Non-Unique Indexes, Oracle Indexes, Small Indexes.
11 comments

So far, in Part I and Part II of this little series, we’ve looked at how Oracle performs a Full Table Scan (FTS) when accessing a small table. With very small tables, Oracle needs to still access the table segment header and perform a number of fetch operations, even if the table is as small as you can get with all the rows in the table residing in one table block and even if we’re only interested in accessing the one row. In the little example we went through, Oracle still needs to access 2 distinct table blocks and perform 4 consistent get operations to read this one row of interest.

OK, let’s now finally introduce an index into the discussion.  To begin with, we’ll just create a standard, Non-Unique index on the ID column.

SQL> create index small_id_i on small(id);

Index created.

Now surely, there would be no real benefit in creating such an index on such a tiny table. This would be a separate index segment from the existing table segment so that if Oracle were to ever use the index to retrieve rows, it would need to visit both the index and table segments.

How can that possibly be more efficient than reading directly from the only table block in the table segment that contains rows ?

Well, let’s see what happens:

SQL> select * from small where id = 42;

        ID NAME
---------- -----
        42 BOWIE

--------------------------------------------
|Id|Operation                   |Name      |
--------------------------------------------
| 0|SELECT STATEMENT            |          |
| 1| TABLE ACCESS BY INDEX ROWID|SMALL     |
|*2|  INDEX RANGE SCAN          |SMALL_ID_I|
--------------------------------------------

Statistics
-------------------------------------------
  0  recursive calls
  0  db block gets
  3  consistent gets
  0  physical reads
  0  redo size
465  bytes sent via SQL*Net to client
396  bytes received via SQL*Net from client
  2  SQL*Net roundtrips to/from client
  0  sorts (memory)
  0  sorts (disk)
  1  rows processed

Well, we notice 2 very interesting points.

Firstly, the CBO does indeed decide to use the index, even though the table is so tiny.

Secondly, we notice that the actual number of consistent gets has actually reduced from 4 with the FTS down to just 3.  The index scan does indeed require less consistent gets than the equivalent FTS.

If we look at the consistent get stats via another session before and after the SELECT staement as we did in Part I, we can again confirm only 3 consistent gets were necessary:

SQL> SELECT n.name, s.value FROM v$sesstat s, v$statname n
WHERE s.statistic# =n.statistic# AND s.sid = 134 AND n.name LIKE ‘consistent%’;

NAME                            VALUE
----------------------------- -------
consistent gets                   110
consistent gets - examination      25

And again after the SELECT on the small table:

SQL> SELECT n.name, s.value FROM v$sesstat s, v$statname n
WHERE s.statistic# =n.statistic# AND s.sid = 134 AND n.name LIKE ‘consistent%’;

 
NAME                            VALUE
----------------------------- -------
consistent gets                   113 (+3)
consistent gets - examination      25

The consistent gets indeed increased by 3 rather than 4 but again as in the FTS example, none of these consistent gets were the cheaper “consistent gets – examination” (to be discussed later).

But why did the index scan only require 3 consistent gets ? Again, by flushing the buffer cache and tracing the session, we get the necessary information to answer the question:

SQL> alter system flush buffer_cache;

System altered.

SQL> exec dbms_monitor.session_trace_enable(waits=> true);

PL/SQL procedure successfully completed.

SQL> select * from small where id = 42;

        ID NAME
---------- -----
        42 BOWIE

SQL> exec dbms_monitor.session_trace_disable();

PL/SQL procedure successfully completed.

If we now look at an example trace file:
=====================
PARSING IN CURSOR #2 len=33 dep=0 uid=88 oct=3 lid=88 tim=24452815463 hv=2225904586 ad=’23bdd320′ sqlid=’7pjs08q2at6ya’
select * from small where id = 42
END OF STMT
PARSE #2:c=0,e=7613,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=24452815457
EXEC #2:c=0,e=42,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=24452815593
WAIT #2: nam=’SQL*Net message to client’ ela= 5 driver id=1111838976 #bytes=1 p3=0 obj#=12446 tim=24452815631
WAIT #2: nam=’db file sequential read’ ela= 17035 file#=7 block#=118026 blocks=1 obj#=99587 tim=24452832933
WAIT #2: nam=’db file sequential read’ ela= 5592 file#=7 block#=117898 blocks=1 obj#=99586 tim=24452838643

FETCH #2:c=0,e=23057,p=2,cr=2,cu=0,mis=0,r=1,dep=0,og=1,tim=24452838728
WAIT #2: nam=’SQL*Net message from client’ ela= 394 driver id=1111838976 #bytes=1 p3=0 obj#=99586 tim=24452839197
FETCH #2:c=0,e=22,p=0,cr=1,cu=0,mis=0,r=0,dep=0,og=1,tim=24452839275
STAT #2 id=1 cnt=1 pid=0 pos=1 obj=99586 op=’TABLE ACCESS BY INDEX ROWID SMALL (cr=3 pr=2 pw=2 time=0 us cost=2 size=9 card=1)’
STAT #2 id=2 cnt=1 pid=1 pos=1 obj=99587 op=’INDEX RANGE SCAN SMALL_ID_I (cr=2 pr=1 pw=1 time=0 us cost=1 size=0 card=1)’
WAIT #2: nam=’SQL*Net message to client’ ela= 3 driver id=1111838976 #bytes=1 p3=0 obj#=99586 tim=24452839399

The critical part here is the first “db file sequential read” wait event. By looking at the obj#=99587, we can determine this refers to the SMALL_ID_I index segment. By looking at the associated file#=7 block#=118026, a block dump reveals this to be the index leaf block containing the index row entries. Notice how we do not actually access the index segment header at all prior to accessing the leaf block of the index.

The following “db file sequential read” wait event on obj#=99586 corresponds to the table segment and file#=7 blockblock#=117898 corresponds to the table block containing the rows. So the first FETCH uses 2 consistent reads (cr=2) to return the first (and in this case only) row of interest (r=1) by first reading the index leaf block and then reading the table data block containing the row of interest.

Oracle then performs another FETCH to read the leaf block again to determine whether there are any other rows of interest (cr=1) of which there are none (r=0).

There is no PK or Unique constraint on the ID column in table and the index is Non-Unique so there is no possible way for Oracle to know there is indeed only one row of interest hence why Oracle needs to again visit the leaf block just in case there are multiple rows that have an ID = 42.

So that’s our 3 consistent gets, 1 to read the leaf block, 1 to read the table block and 1 to again read the index block again due to a SQL*PLUS quirk to perform a secondary fetch operation for non-unique scans as the first fetch only checks and fetches for one row.

Now in our specific example, that’s a saving of 1 consistent get and effectively 2 latch gets by using the Non-Unique index vs. performing the FTS on this effectively 1 block table. The savings would of course be more substantial if the small table had more than just the one data block containing rows.

Now you might well say, is it really worth the effort, what’s 1 consistent get in the grand scheme of things. Well, that’s 1 consistent get each and every time such a query needs to access this specific table. And this is just one table, you may have many such lookup tables in your databases. In heavily used applications, such lookup tables can potentially be accessed many millions of times per hour. Now all this can add up to potentially many millions of reduced consistent gets, many millions of reduced latch requests, which all in turn can add up to considerable CPU savings as well as the significantly reduced contention issues, increasing the general scalability of your applications.

And as we’ll see, we can actually do a lot better than saving just 1 consistent get by using an index to access such small tables …

However, the key question at this point is why doesn’t Oracle have to visit the index segment header as it does with the table segment during a FTS and how does Oracle know where to find the index block it needs ?

That will be answered in the next instalment coming soon …

Differences Between Unique and Non-Unique Indexes Part 4.5 (Fix You) March 30, 2009

Posted by Richard Foote in Fragmented Indexes, Index Internals, Non-Unique Indexes, Oracle Indexes, Unique Indexes.
5 comments

In my last post, Part IV in this series, we looked at how a Unique Index can reuse space deleted within the same logical transaction, whereas a Non-Unique Index can not.  Deleted space within a Non-Unique index can only be reused by subsequent transactions.

It’s sometimes important to appreciate this distinction because as discussed in the various OTN and Ask Tom threads mentioned in Part IV, there are times when this can make a significant difference to the manageability and efficiency of the resultant index.

Now, it’s not everyday someone might for example delete all rows in a table and repopulate it again within a single transaction (the TRUNCATE command was of course developed for a reason). However, perhaps an application was developed without your involvement, perhaps a large proportion but not all of the data is being deleted or as someone mentioned on OTN, perhaps the table in question is a Materialized View being fully refreshed within a refresh group. There could therefore be occasions when a single transaction might indeed perform a large delete followed by a similarly sized insert.

In which case, whether an index is defined as Unique or Non-Unique might make a difference …

To begin with, let’s populate a table with 1M rows and create an associated Unique index:

SQL> insert into bowie select rownum, ‘BOWIE’ from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index bowie_idx on bowie(id);

Index created.

 

Let’s look at the size of  this newly created Unique index:

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2176       2087           0

 

OK, let’s now delete the entire table and repopulate it again, within the same logical transaction

SQL> delete bowie;

1000000 rows deleted.

SQL> insert into bowie select rownum, ‘PINK FLOYD’ from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

 

Let’s look at the size difference for the Unique Index and see how many deleted index entries we have as a result:

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2176       2087           0

 

OK good, the index is actually identical in size and we have no deleted entries, not a one. All the deleted entries as a result of the delete command have been reused by the subsequent insert statement. This means of course that the index is just as efficient now after all this DML activity, as it was when the index was first created.

 

Let’s perform exactly the same demo, but this time with a Non-Unique index and see any differences …

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> insert into bowie select rownum, ‘BOWIE’ from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_idx on bowie(id);

Index created.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_rows, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2304       2226           0

 

The first difference we notice is that the Non-Unique index after it has just been created is somewhat larger than the equvalent Unique index (2226 leaf blocks vs. 2087 leaf blocks). This is a direct result of the Non-Unique index having to store an extra byte for the length byte associated with the rowid being an additional index column for each and every one of the 1M index entries.

SQL> delete bowie;

1000000 rows deleted.

SQL> insert into bowie select rownum, ‘PINK FLOYD’ from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      4608       4518     1000000

OK not quite so good, big difference here. Previously, the Unique Index remained unchanged and had no deleted index entries. However, the Non-Unique index is now effectively double the size it was previously and has 1M deleted index entries still within the index structure. Not a one was recycled and reused within the logical transaction.

This index is now potentially problematic, especially if there are going to be no or few subsequent inserts until it next gets refreshed, where the deleted entries can be reused but the current entries may again remain in the index after they’ve been deleted.

Again, it’s important to understand what is going on here so one can take the appropriate adminstration steps. Perhaps it might be better to drop the index and recreate it after the transaction (if permitted). Perhaps the truncate command isn’t such a bad idea after all (if permitted). Perhaps it might be better to police the Unique constraint with a Unique rather than a Non-Unique index after all.

Perhaps, it might be better to not perform the above within a single transaction and issue an intermediate commit after all (if permitted) …

 

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> insert into bowie select rownum, ‘BOWIE’ from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_idx on bowie(id);

Index created.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_rows, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2304       2226           0

SQL> delete bowie;

1000000 rows deleted.

Because if we just issue the commit at this point in the process …

SQL> commit;

Commit complete.

SQL> insert into bowie select rownum, ‘PINK FLOYD’ from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_rows, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2304       2226           0

 

We would not have this problem as the subsequent transaction that performs the insert can reused all the deleted space associated with the first delete transaction. 

If one understands how indexes work and understands how deleted space can be reused, one can prevent many potential issues and unnecessary maintenance tasks.

Prevention is always the best cure …

Differences Between Unique and Non-Unique Indexes Part IV (Take It Back) March 25, 2009

Posted by Richard Foote in Index Internals, Non-Unique Indexes, Oracle Indexes, Unique Indexes.
11 comments

I’ve previously discussed various differences between Unique and Non-Unique indexes (Part I, Part II and Part III) and why I have a preference to implement Unique indexes whenever possible and practical.

Various recent discussions on the OTN forums and on Ask Tom reminded me that I hadn’t yet discussed on this blog another subtle, but potentially significant difference between Unique and Non-Unique indexes.

As I’ve previously discussed, there’s actually no such thing as a Non-Unique index entry as such as Oracle ensures all index entries are effectively unique by adding the rowid to the index key for all Non-Unique indexes. Fundamentally, this is essential because Oracle needs some way of efficiently finding the precise index entry associated with an update or delete operation. Without having the rowid as part of the index key, Oracle would be forced to navigate to the first occurrence of an index value and search through all occurrences of the index value until it finds the specific entry containing the rowid of interest. This could potentially result in visiting many leaf blocks if the index value spans multiple leaf blocks. By including the rowid as the last index key column, non-unique index values are further ordered based on the corresponding rowid within the same indexed values. Oracle can therefore always navigate directly to the leaf block containing the exact index entry of interest as the rowid can be included in the branch blocks to determine both the index entry and rowid ranges found in specific leaf blocks.

If we look at a simple example by creating a one row table with a Non-unique index:

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> insert into bowie values (1, ‘BOWIE’);

1 row created.

SQL> commit;

Commit complete.

SQL> create index bowie_idx on bowie(id);

Index created.

SQL> select header_file, header_block from dba_segments where segment_name = ‘BOWIE_IDX’;

HEADER_FILE HEADER_BLOCK
----------- ------------
          7       124937

Let’s dump the index block …

SQL> alter system dump datafile 7 block 124938;

System altered.
Leaf block dump
===============
header address 425713756=0x195fe05c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 1
kdxcofbo 38=0×26
kdxcofeo 8024=0x1f58
kdxcoavs 7986
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——, lock: 0, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c1 e7 8a 00 00

Notice how the rowid is an additional index column within the index entry for the Non-Unique index.

Now if we were to delete and subsequently re-insert a row in the table with same index value within a single transaction, note the rowid of the new row by definition will differ from the deleted row. Therefore, we would need a different index entry for the new index row because if the rowids differ, then the associated index entries must differ as well. Note also (and this is critical) that because we would have a different rowid, if we had multiple index entries with the same key, this new index entry might not be in the same logical order as that of the deleted index entry. In fact, it’s quite possible that the new index entry might actually need to be stored in a totally different leaf block if this specific index value spanned multiple index leaf blocks because the index entries, including the rowids must always be logically ordered.

Therefore, if we were to delete and re-insert the same index value within a single transaction, Oracle is forced to create a new index entry and will not reuse the existing, deleted index entry.

So continuing with the demo, let’s delete the row:

SQL> delete bowie;

1 row deleted.

and now re-insert a row with the same indexed value within the same transaction:

SQL> insert into bowie values (1, ‘THIN WHITE DUKE’);

1 row created.

SQL> commit;

Commit complete.

If we look at a block dump now …

Leaf block dump
===============
header address 425713756=0x195fe05c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 2
kdxcofbo 40=0×28
kdxcofeo 8012=0x1f4c
kdxcoavs 7972
kdxlespl 0
kdxlende 1
kdxlenxt 0=0×0
kdxleprv 0=0×0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: —D–, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c1 e7 8a 00 00
row#1[8012] flag: ——, lock: 2, len=12
col 0; len 2; (2):  c1 02
col 1; len 6; (6):  01 c1 e7 8a 00 01
—– end of leaf block dump —–

We notice the previous index entry has been logically deleted and Oracle has created a new index entry with the new associated rowid.

 

Let’s now run exactly the same demo again, but this time with a Unique index instead of the Non-Unique index …

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> insert into bowie values (1, ‘BOWIE’);

1 row created.

SQL> commit;

Commit complete.

SQL> create unique index bowie_idx on bowie(id);

Index created.

SQL> select header_file, header_block from dba_segments where segment_name = ‘BOWIE_IDX’;

HEADER_FILE HEADER_BLOCK
----------- ------------
          7       125193

SQL> alter system dump datafile 7 block 125194;

System altered.

Leaf block dump
===============
header address 371859548=0x162a205c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 1
kdxcofbo 38=0×26
kdxcofeo 8025=0x1f59
kdxcoavs 7987
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 0=0×0
kdxledsz 6
kdxlebksz 8036
row#0[8025] flag: ——, lock: 0, len=11, data:(6):  01 c1 e8 8a 00 00
col 0; len 2; (2):  c1 02
—– end of leaf block dump —–

 

Notice the big difference here. Because the index has been defined as Unique, all the associated index entries must be unique. It’s simply not possible to have duplicate index entries within a Unique index structure. Therefore, it’s not necessary to have the rowid as a separate column of the index entry as the index values themselves are sufficient to uniquely identify each and every index entry. The rowid is basically just another piece of overhead associated with the index entry rather than a separate index column. The length of this unique index entry is just 11 bytes, where it was previously 12 bytes, because we no longer need to store the length byte associated with the second index column necessary in the Non-unique index for the rowid.

And now comes the subtle difference …

If we were to now delete and re-insert the same index value within a single transaction, Oracle can now reuse the same, deleted index entry, because the index entry is effectively identical to the deleted one. The only possible difference is the rowid but the rowid is no longer a part of the index column list and so can just be updated as necessary. Note also (and this is the critical bit for Unique indexes), because the actual index value remains the same, the order of the index entry within the index must also remain the same. There is no need to move the re-inserted index entry to another part of the index structure because deleting and re-inserting the same index entry does not logically alter the order of where the index entry must reside.

Therefore, if we were to delete and re-insert the same index value within a single transaction, Oracle does not need to create a new index entry and can simply reuse the existing, deleted index entry.

So continuing with the demo, let’s delete the row:

SQL> delete bowie;

1 row deleted.

and now re-insert a row with the same indexed value within the same transaction:

SQL> insert into bowie values (1, ‘THIN WHITE DUKE’);

1 row created.

SQL> commit;

Commit complete.

If we look at a block dump now …

Leaf block dump
===============
header address 371859548=0x162a205c
kdxcolev 0
KDXCOLEV Flags = – - -
kdxcolok 0
kdxcoopc 0×80: opcode=0: iot flags=— is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 1
kdxcofbo 38=0×26
kdxcofeo 8025=0x1f59
kdxcoavs 7987
kdxlespl 0
kdxlende 0
kdxlenxt 0=0×0
kdxleprv 0=0×0
kdxledsz 6
kdxlebksz 8036
row#0[8025] flag: ——, lock: 2, len=11, data:(6):  01 c1 e8 8a 00 01
col 0; len 2; (2):  c1 02
—– end of leaf block dump —–

We note that Oracle has indeed reused the previously deleted index entry and has simply updated the rowid with the new rowid value. There is no deleted index entry, Oracle has simply changed the associated rowid to that of the new row in the table for the existing Unique index entry.

Where an Update of an index entry is actually effectively a delete and an insert of an index entry, notice that by contrast for Unique indexes, a delete and a re-insert operation is effectively an update of an index entry !!

In my next post I’ll highlight how this difference can be critical to the behaviour and efficiency of an index and why it’s important to understand how indexes work to avoid and prevent potential issues.

Non-Unique Indexes and Direct-Path Inserts (What In The World) August 6, 2008

Posted by Richard Foote in Direct-Path Inserts, Non-Unique Indexes, Oracle Indexes, Performance Tuning.
6 comments

The OTN Database Forum has had some really good threads lately and something that came up was the question of indexes and Direct-Inserts which I thought might be worth a mention here.

I’ve previously discussed the differences between Unique and Non-Unique Indexes and my general preference for using Unique Indexes where possible and appropriate:

http://richardfoote.wordpress.com/2007/12/18/differences-between-unique-and-non-unique-indexes-part-i/

http://richardfoote.wordpress.com/2007/12/21/differences-between-unique-and-non-unique-indexes-part-ii/

http://richardfoote.wordpress.com/2007/12/30/differences-between-unique-and-non-unique-indexes-part-iii/

A Direct-Path Insert is a special mechanism used by Oracle to more quickly and efficiently insert data into a table. Rather than utilising the conventional method of using the table freelists or ASSM bitmaps to find an available free block that’s subsequently loaded into the buffer cache and processed, Oracle instead builds the necessary blocks in session memory and writes them directly to disk, “appending” them above the table High Water Mark (HWM) and hence by-passing the buffer cache entirely.

When inserting a large amount of data, a Direct-Path Insert can hence be substantially faster and has the added flexibility of being a NOLOGGING operation if required.

However, if you have a Primary Key (or Unique Key) policed via a Non-Unique index, then Oracle will automatically disable Direct-Path inserts behind the scene. This restriction has only been lifted since 11g.

A simple demo.

First, using a 10.2.0.3 database, create a table and populate it with a few rows:

SQL> CREATE TABLE ZIGGY (id NUMBER, text VARCHAR2(20));

Table created.

SQL> INSERT INTO ziggy SELECT rownum , ‘Ziggy’ FROM dual CONNECT BY LEVEL <= 10000;

10000 rows created.

SQL> COMMIT;

Commit complete.

Next, let’s add a PK that’s policed by a Non-Unique Index:

SQL> ALTER TABLE ziggy ADD CONSTRAINT ziggy_pk PRIMARY KEY(id) USING INDEX(CREATE INDEX ziggy_pk ON ziggy(id));

Table altered.

Now, let’s see how many blocks below the HWM we have allocated to the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’ZIGGY’, estimate_percent=>null, cascade=> true, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> SELECT blocks, empty_blocks FROM dba_tables WHERE table_name = ‘ZIGGY’;

BLOCKS EMPTY_BLOCKS
------ ------------
    25            0

Let’s see how many physical direct write operations we have currently performed:

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    21

Let’s now attempt to perform a Direct-Path Insert operation:

SQL> INSERT /*+ APPEND */ INTO ziggy SELECT 10001, ‘NEW’ FROM dual;

1 row created.

SQL> COMMIT;

Commit complete.

But have we actually performed any physical writes direct operations ?

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    21

NO !! The number of such operations has not changed. Has the HWM been incremented ?

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’ZIGGY’, estimate_percent=>null, cascade=> true, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> SELECT blocks, empty_blocks FROM dba_tables WHERE table_name = ‘ZIGGY’;

BLOCKS EMPTY_BLOCKS
------ ------------
    25            0 

NO !! The number of blocks has also remained the same. Clearly then, the Direct-Path Insert has not actually worked and Oracle has simply performed a conventional insert operation instead.

Let’s see if the situation changes if we replace the index policing the PK with a Unique Index instead …

SQL> ALTER TABLE ziggy DROP PRIMARY KEY;

Table altered.

SQL> DROP INDEX ZIGGY_PK;

Index dropped.

SQL> ALTER TABLE ziggy ADD CONSTRAINT ziggy_pk PRIMARY KEY(id);

Table altered.

OK, let’s see the current number of physical writes direct operations:

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    41 

Let’s try another Direct-Path Insert …

SQL> INSERT /*+ APPEND */ INTO ziggy SELECT 10002, ‘NEW’ FROM dual;

1 row created.

SQL> COMMIT;

Commit complete.

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    42 

And the number of physical writes direct has now increased. What about the HWM ?

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’ZIGGY’, estimate_percent=>null, cascade=> true, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> SELECT blocks, empty_blocks FROM dba_tables WHERE table_name = ‘ZIGGY’;

BLOCKS EMPTY_BLOCKS
------ ------------
    26            0 

Yep, that’s gone up by one as well. Let’s just repeat the process to be sure …

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    42 

SQL> INSERT /*+ APPEND */ INTO ziggy SELECT 10003, ‘NEW’ FROM dual;

1 row created.

SQL> COMMIT;

Commit complete.

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    43 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’ZIGGY’, estimate_percent=>null, cascade=> true, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> SELECT blocks, empty_blocks FROM dba_tables WHERE table_name = ‘ZIGGY’;

BLOCKS EMPTY_BLOCKS
------ ------------
    27            0 

Yes indeed, Direct-Path Inserts are definitely now occurring now we have a Unique Index policing our PK constraint.

Same thing now with a Non-Unique Index policing our PK again but this time on an 11g database …

SQL> CREATE TABLE ZIGGY (id NUMBER, text VARCHAR2(20));

Table created.

SQL> INSERT INTO ziggy SELECT rownum , ‘Ziggy’ FROM dual CONNECT BY LEVEL <= 10000;

10000 rows created.

SQL> COMMIT;

Commit complete.

SQL> ALTER TABLE ziggy ADD CONSTRAINT ziggy_pk PRIMARY KEY(id) USING INDEX(CREATE INDEX ziggy_pk ON ziggy(id));

Table altered.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’ZIGGY’, estimate_percent=>null, cascade=> true, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> select blocks, empty_blocks from dba_tables where table_name = ‘ZIGGY’;

BLOCKS EMPTY_BLOCKS
------ ------------
    25            0 

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    23 

SQL> INSERT /*+ APPEND */ INTO ziggy SELECT 10001, ‘NEW’ FROM dual;

1 row created.

SQL> COMMIT;

Commit complete.

SQL> SELECT n.name, s.value FROM v$mystat s, v$statname n WHERE s.statistic# = n.statistic# AND n.name = ‘physical writes direct’;

NAME                   VALUE
---------------------- -----
physical writes direct    24 

Now, even though we have a Non-Unique index, the number of physical writes direct operations has indeed gone up by one.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’ZIGGY’, estimate_percent=>null, cascade=> true, method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> select blocks, empty_blocks from dba_tables where table_name = ‘ZIGGY’;

BLOCKS EMPTY_BLOCKS
------ ------------
    26            0 

And indeed, the HWM has increased as well showing that indeed a Direct-Path Insert works as expected in 11g even if our PK constraint is policed by a Non-Unique Index. 

Yet another example of a difference between a Unique and a Non-Unique index which might be worth some consideration.

Follow

Get every new post delivered to your Inbox.

Join 1,703 other followers