jump to navigation

Differences Between Unique and Non-Unique Indexes Part 4.5 (Fix You) March 30, 2009

Posted by Richard Foote in Fragmented Indexes, Index Internals, Non-Unique Indexes, Oracle Indexes, Unique Indexes.
5 comments

In my last post, Part IV in this series, we looked at how a Unique Index can reuse space deleted within the same logical transaction, whereas a Non-Unique Index can not.  Deleted space within a Non-Unique index can only be reused by subsequent transactions.

It’s sometimes important to appreciate this distinction because as discussed in the various OTN and Ask Tom threads mentioned in Part IV, there are times when this can make a significant difference to the manageability and efficiency of the resultant index.

Now, it’s not everyday someone might for example delete all rows in a table and repopulate it again within a single transaction (the TRUNCATE command was of course developed for a reason). However, perhaps an application was developed without your involvement, perhaps a large proportion but not all of the data is being deleted or as someone mentioned on OTN, perhaps the table in question is a Materialized View being fully refreshed within a refresh group. There could therefore be occasions when a single transaction might indeed perform a large delete followed by a similarly sized insert.

In which case, whether an index is defined as Unique or Non-Unique might make a difference …

To begin with, let’s populate a table with 1M rows and create an associated Unique index:

SQL> insert into bowie select rownum, ‘BOWIE’ from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index bowie_idx on bowie(id);

Index created.

 

Let’s look at the size of  this newly created Unique index:

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2176       2087           0

 

OK, let’s now delete the entire table and repopulate it again, within the same logical transaction

SQL> delete bowie;

1000000 rows deleted.

SQL> insert into bowie select rownum, ‘PINK FLOYD’ from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

 

Let’s look at the size difference for the Unique Index and see how many deleted index entries we have as a result:

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2176       2087           0

 

OK good, the index is actually identical in size and we have no deleted entries, not a one. All the deleted entries as a result of the delete command have been reused by the subsequent insert statement. This means of course that the index is just as efficient now after all this DML activity, as it was when the index was first created.

 

Let’s perform exactly the same demo, but this time with a Non-Unique index and see any differences …

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> insert into bowie select rownum, ‘BOWIE’ from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_idx on bowie(id);

Index created.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_rows, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2304       2226           0

 

The first difference we notice is that the Non-Unique index after it has just been created is somewhat larger than the equvalent Unique index (2226 leaf blocks vs. 2087 leaf blocks). This is a direct result of the Non-Unique index having to store an extra byte for the length byte associated with the rowid being an additional index column for each and every one of the 1M index entries.

SQL> delete bowie;

1000000 rows deleted.

SQL> insert into bowie select rownum, ‘PINK FLOYD’ from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      4608       4518     1000000

OK not quite so good, big difference here. Previously, the Unique Index remained unchanged and had no deleted index entries. However, the Non-Unique index is now effectively double the size it was previously and has 1M deleted index entries still within the index structure. Not a one was recycled and reused within the logical transaction.

This index is now potentially problematic, especially if there are going to be no or few subsequent inserts until it next gets refreshed, where the deleted entries can be reused but the current entries may again remain in the index after they’ve been deleted.

Again, it’s important to understand what is going on here so one can take the appropriate adminstration steps. Perhaps it might be better to drop the index and recreate it after the transaction (if permitted). Perhaps the truncate command isn’t such a bad idea after all (if permitted). Perhaps it might be better to police the Unique constraint with a Unique rather than a Non-Unique index after all.

Perhaps, it might be better to not perform the above within a single transaction and issue an intermediate commit after all (if permitted) …

 

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number, name varchar2(20));

Table created.

SQL> insert into bowie select rownum, ‘BOWIE’ from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_idx on bowie(id);

Index created.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_rows, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2304       2226           0

SQL> delete bowie;

1000000 rows deleted.

Because if we just issue the commit at this point in the process …

SQL> commit;

Commit complete.

SQL> insert into bowie select rownum, ‘PINK FLOYD’ from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_idx validate structure;

Index analyzed.

SQL> select blocks, lf_rows, lf_blks, del_lf_rows from index_stats;

    BLOCKS    LF_BLKS DEL_LF_ROWS
---------- ---------- -----------
      2304       2226           0

 

We would not have this problem as the subsequent transaction that performs the insert can reused all the deleted space associated with the first delete transaction. 

If one understands how indexes work and understands how deleted space can be reused, one can prevent many potential issues and unnecessary maintenance tasks.

Prevention is always the best cure …

Fragmented Indexes Due To Large Number Of Duplicate Entries (More) October 28, 2008

Posted by Richard Foote in Fragmented Indexes, Oracle Indexes.
4 comments

Many think an index can only be “fragmented” if you have a high number of delete or update operations on the index. An index that only has inserts can’t really be fragmented as no space is “wasted” due to delete (or update) related operations, right ?

I use the term “fragmented” here in the context of having free and available space within the index structure which can not actually be used by Oracle for subsequent insert operations. The space is effectively wasted and can only usually be cleaned up after an index maintenance operation such as a rebuild or coalesce.

However, there are a number of scenarios where an index can become fragmented, despite the lack of associated delete or update operations.

One example is where a non-unique index stores large number of duplicate values, indeed so many duplicates of a specific value that it actually requires many index leaf blocks to store all occurrences of each specific value.

The first point to make here is that with Oracle B-Tree indexes, there is actually no such thing as a non-unique index. All indexes, even those defined as non-unique, are actually unique indexes “behind the covers”.

Why ?

For the simple reason that if we were to (say) delete a specific row in a table, Oracle would need to delete the associated index entry. If there were (say) 1000 occurrences of the indexed value, Oracle can’t simply delete any of the 1000 index entries, it must delete the specific index entry that has the corresponding rowid of the row being deleted.

Therefore, in order for Oracle to quickly find this associated index entry and be able to immediately find the related index leaf block of interest, all the index entries within the same index value are sorted by the rowid. If this wasn’t the case, Oracle would be forced to access the first leaf block that contains an index value of interest and be forced to search through all the duplicate index entries until it finds the specific one Oracle is after, perhaps reading many index leaf blocks in the process.

Therefore, for a specific indexed value, all index entries are sorted by the corresponding rowid. Oracle physically implements this by making the rowid an additional index column value within the index.

A unique index entry basically looks like this, courtesey of a block dump:

row#0[8016] flag: ——, lock: 2, len=20, data:(6):  01 42 1c 7a 00 00

col 0; len 11; (11):  44 41 56 49 44 20 42 4f 57 49 45

 

Note: the 6 byte rowid is not stored as an index column entry (the index entry only has the one column, starting with 0).

 

However, a corresponding non-unique index entry look like this:

 

row#0[8015] flag: ——, lock: 0, len=21

col 0; len 11; (11):  44 41 56 49 44 20 42 4f 57 49 45

col 1; len 6; (6):  01 42 1c 7a 00 00

 

Note: the rowid is treated and stored as additional index column within the index entry, and hence requires an additional byte per index entry to store the column length value. The rowid is now a second column (number 1, remembering columns start with 0) in the index.

 

This effectively makes each and every index entry unique, even for non-unique indexes, as the combination of index value and rowid must be unique for a given table.

 

Now when a new index entry is inserted into an index, Oracle is very specific where the index entry is housed, it must be housed in the index leaf block such that the order of the rowids within duplicate index entries is maintained. The rowid is basically just another column within the index and like all index columns, it must be stored and sorted in the index entry order.

 

Now here’s a funny thing with rowids. When new rows are added to a table, the rowids generally increase in value. Although this is not always the case and there are exceptions, this is generally the trend with most rowids within most tables.

 

A (restricted) rowid consists of a relative file id as the leading portion of the rowid. As a tablespace grows and new files are added to the tablespace, generally the subsequent relative files ids increase and so the subsequent rowid increases as well.

 

The next portion of the rowid is the block id. As a datafile within a tablespace fills up and allocates new extents to a table, the block ids used within the new extent increase and so the rowids increase in value as well. Also as new blocks get used within an extent and the high water mark of a table increases, the rowids increase as well.

 

Yes there are always exceptions. For example, a newly allocated extent could use space previously allocated to a dropped segment and so use block ids in a range lower than those previously allocated. But they’re the exception rather than the rule. Even with ASSM tablespaces and segments with multiple freelists/freelist groups where multiple blocks can be inserted into concurrently, the trend is still for the associated rowids to generally increase over time.

 

Now getting back to our index with lots of duplicate index values. What happens when a block is filled and a 50-50 block split eventuates ?   1/2 the entries go into one block and 1/2 the entries go into the newly allocated block and the new index entry goes where ? Well, if the block is full of nothing but the same index entry because we have lots of duplicate values and the new index entry is associated with the maximum rowid for the specific index value, it goes into the newly allocated leaf block because remember, all index entries must maintain their logical sorted order.

 

In fact, if most/all of the subsequent inserts of the duplicate value are also associated with the maximum rowid currently assigned to the index entry, all the subsequent inserts now go into the newly allocated block. The 50% of free space remaining in the previous leaf block doesn’t get used.

 

Once the newly allocated block also gets filled, again with the same duplicate index values, a 50-50 block split occurs and again, most if not all the subsequent inserts go into the newly allocated leaf block. We now have two, 1/2 empty leaf blocks that has space that can not effectively not be reused because they are filled the same index value but with rowid values less than those being currently allocated.

 

And so the cycle continues with these duplicate index entries and any other duplicate index values that span multiple leaf blocks …

 

This can easily be illustrated with the following example. First, create a simple table and index, and initially insert a row with a large index entry to prevent subsequent 90-10 block splits from occurring:

 

SQL> CREATE TABLE common_values (id NUMBER, common VARCHAR2(10));

 

Table created.

 

 

SQL> CREATE INDEX common_values_i ON common_values(common);

 

Index created.

 

 

SQL> INSERT INTO common_values VALUES (1, ‘ZZZ’);

 

1 row created.

 

 

SQL> COMMIT;

 

Commit complete.

 

Next, populate the table with lots of duplicate values:

 

 

SQL> BEGIN

  2  FOR i IN 1..90000 LOOP

  3    CASE

  4      WHEN mod(i,3) = 0 THEN INSERT INTO common_values VALUES (i, ‘AAA’);

  5      WHEN mod(i,3) = 1 THEN INSERT INTO common_values VALUES (i, ‘BBB’);

  6      WHEN mod(i,3) = 2 THEN INSERT INTO common_values VALUES (i, ‘CCC’);

  7    END CASE;

  8  END LOOP;

  9  END;

 10  /

 

 

SQL> ANALYZE INDEX common_values_i VALIDATE STRUCTURE;

 

Index analyzed.

 

Now lets see how much of the allocated index space is actually used:

 

 

SQL> SELECT BTREE_SPACE, USED_SPACE, PCT_USED FROM INDEX_STATS;

 

BTREE_SPACE USED_SPACE PCT_USED

———– ———- ——–

    2648032    1355551       52

 

As we can see, only 52% of the index is actually being used, a value much less than would be expected of most randomly inserted indexes.

 

Now of course if we had an index with few distinct values, it’s unlikely to be considered by the CBO. However, it’s not uncommon to index column values that have uneven distribution of values, with the index being used to retrieve data for those columns with relatively few occurances of specific values. In these scenarios, it’s possible to have a large portion of the index with poor index space utilisation due to this issue. It’s unlikely to impact performance because the poorly fragmented portion of the index is not usually used, but it’s something to consider if you wish to reclaim wasted index storage or you do indeed have Fast Full Index Scan operations that need addressing.

Follow

Get every new post delivered to your Inbox.

Join 1,862 other followers