jump to navigation

Curious Case Of The Ever Increasing Index Solution (A Big Hurt) January 5, 2012

Posted by Richard Foote in ASSM, Indexing Myth, Oracle Indexes, Quiz.
6 comments

Based on the excellent comments in the Quiz post, we have some clever cookies out there :)

I guess the first thing to point out is that based in the basic scenario provided, the index shouldn’t ordinarily be continually growing in this fashion. Although the index values are monotonically increasing, the deletions are leaving behind fully emptied leaf blocks which can generally be recycled in future block splits.

OK, so why is this index behaving in this fashion, continually to increase in size while the number of rows in the table remains relatively constant ?

Well, there are a number of contributing factors.

As stated, the index values are indeed monotonically increasing so all inserts into the index are hitting the right-most index leaf block within the index structure and the deletions which all occur on the “left-side” of the index are leaving behind leaf blocks that contain nothing but deleted index entries. As Marcus and David mentioned in the comments, the use of a Reverse Key index would therefore alleviate this problem as subsequent inserts will be performed within the same leaf blocks in the index structure, automatically cleaning out and reusing space used by previously deleted index entries. This solution though may create as many problems as it solves (if say range predicate statements relied on a Non-Reverse index).

Additionally, the processing is being performed with a PL/SQL block. Oracle has a whole lot of smarts to make PL/SQL as efficient as possible and so the manner in which code runs within PL/SQL compared to other native languages can vary significantly. Unfortunately at times, these smarts might not be quite so smart after all.

The tablespace used to store the index is a Locally Managed Tablespace (LMT) with Automatic Segment Storage Management (ASSM). Instead of freelists (or freelist groups), Oracle uses a set of bitmap blocks within the index segment to determine the amount of free space available within its blocks and whether a block is available for inserts. As Vyacheslav and Alberto highlighted in the comments, there are a number of “issues” in the manner in which these bitmap blocks are maintained within PL/SQL processing. This is effectively locking out the vast number of these now effectively empty leaf blocks from being recycled and reconsidered for subsequent inserts. By rebuilding the index once in a Manual Segment Space Management (MSSM) tablespace would also alleviate this issue.

The actual processing involved within the PL/SQL block can also have an impact. The procedure contained the following important statement:

select min(id),max(id) into n,m from bowie;

By removing this statement from the PL/SQL block and either manually defining the values to be processed or passing them through to the procedure, can impact how PL/SQL manages the freespace bitmaps within the segment. For example, if one used something similar to the following:

SQL> declare
  2      n number:= 1;
  3      m number:= 200000;
  4  begin
  5      for i in 1..200000 loop
  6          delete from bowie where id=n+i-1;
  7          insert into bowie values(m+i,'David Bowie');
  8          if (i mod 1000=0) then
  9            commit;
 10          end if;
 11      end loop;
 12      commit;
 13  end;
 14  /

The number of leaf blocks allocated will be nowhere as significant as before and stabilised after a few runs to approximate:

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              3     289040       89040        744          4

Finally, the PL/SQL procedure only performed a Commit after 1000 iterations. This means that there were 1000 deletions performed during a logical transaction. As Greg mentioned in the comments, Unique Key index values can be reused within a single transaction BUT only if the same actual values are reused. The procedure introduces new values for those that have been deleted and so the deleted entries can’t be reused during the same transaction. This means there will be at least 1000 deleted index entries that can’t be reused during the running of the procedure and sufficient additional leaf blocks to accommodate these 1000 index entries will need to be allocated, even if we use some of the solutions mentioned, such as Reverse Key indexes or MSSM tablespaces. By performing either all the necessary deletions within one transaction followed by the inserts or having a Commit for each delete/insert pairing, these additional blocks won’t be required. For example:

SQL> declare
  2        n number:= 1;
  3        m number:= 200000;
  4    begin
  5      for i in 1..200000 loop
  6           delete from bowie where id=n+i-1;
  7           commit;
  8           insert into bowie values(m+i,'David Bowie');
  9           commit;
 10      end loop;
 11  end;
 12  /

Although of course, the inefficiencies in the processing or the potential breaking of business rules may not make the index savings worthwhile.

So in summary, there a number of things we could do to fix this scenario, rather than simply periodically rebuilding the index all the time. Depending on applicability, we could convert the index to a Reverse Key index (or say Hash Partition), we could move the index to a MSSM tablespace, we could modify our procedure logic to remove the reference to fetching MIN/MAX values, not use PL/SQL, or to make the index as efficient as possible, change the transactional logic to not perform large numbers of inserts and deletes within the same transaction.

So there’s quite a lot happening within what on the surface looks like a simple piece of PL/SQL :)

Curious Case Of The Ever Increasing Index Quiz (She’ll Drive The Big Car) January 4, 2012

Posted by Richard Foote in Index Internals, Indexing Myth, Oracle Indexes, Quiz.
22 comments

I received an email recently that had a nice example of what can potentially go wrong with an index.

Let’s first create a simple table with a unique index and populate it with 200,000 rows (following demo run on 11.2.0.1):

SQL> create table bowie (id number constraint bowie_pk primary key, name varchar2(100));

Table created.

SQL> insert into bowie select rownum, 'BOWIE' from dual connect by level <= 200000;

200000 rows created.

SQL> commit;

Commit complete.

SQL> analyze index bowie_pk validate structure;

Index analyzed.

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              2     200000           0        374          1

So far, everything is as expected. With have an index with 200,000 rows that currently has 374 leaf blocks.

OK, what we want to do is basically gradually delete the current set of rows and replace them with 200,000 new rows, with ever-increasing Primary Key values. To this end, we create the following procedure:

SQL> create or replace procedure delete_insert_rows
  2  as
  3       n number;
  4       m number;
  5  begin
  6       select min(id),max(id) into n,m from bowie;
  7       for i in 1..200000 loop
  8           delete from bowie where id=n+i-1;
  9           insert into bowie values(m+i,'David Bowie');
 10           if (i mod 1000=0) then
 11                commit;
 12           end if;
 13       end loop;
 14       commit;
 15  end;
 16  /

Procedure created.

So the procedure basically determines the current MIN and MAX values of our PK column and gradually deletes the current rows and then inserts new ones. Every 1000 iterations, we commit the changes. Nothing too complex here.

When we run this procedure for the first time:

SQL> exec delete_insert_rows

PL/SQL procedure successfully completed.

SQL> analyze index bowie_pk validate structure;

Index analyzed.

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              2     293820       93820        619          1

We notice we now have a whole bunch of deleted leaf entries and that the index has grown from 374 to 619 leaf blocks.

If we run the procedure again:

SQL> exec delete_insert_rows

PL/SQL procedure successfully completed.

SQL> analyze index bowie_pk validate structure;

Index analyzed.

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              3     347841      147841        994          3

Things have gone even worse. We still only have 200,000 rows in the table but the index now has an additional 147,841 deleted entries and the number of leaf blocks has again increased substantially to 994 leaf blocks.

If we have a look at a partial treedump of the index:

SQL> select object_id from dba_objects where object_name = 'BOWIE_PK';

 OBJECT_ID
----------
     74060

SQL> alter session set events 'immediate trace name treedump level 74060';

Session altered.

—– begin tree dump
branch: 0x100378b 16791435 (0: nrow: 2, level: 2)
   branch: 0x1003ce0 16792800 (-1: nrow: 733, level: 1)
      leaf: 0x100378e 16791438 (-1: nrow: 149 rrow: 0)
      leaf: 0x100378f 16791439 (0: nrow: 571 rrow: 0)
      leaf: 0x100378c 16791436 (1: nrow: 291 rrow: 0)
      leaf: 0x1003795 16791445 (2: nrow: 571 rrow: 0)
      leaf: 0x1003796 16791446 (3: nrow: 433 rrow: 0)
      leaf: 0x1003797 16791447 (4: nrow: 4 rrow: 0)
      leaf: 0x1003790 16791440 (5: nrow: 571 rrow: 0)
      leaf: 0x1003791 16791441 (6: nrow: 146 rrow: 0)
      leaf: 0x1003792 16791442 (7: nrow: 571 rrow: 0)
      leaf: 0x1003793 16791443 (8: nrow: 288 rrow: 0)
      leaf: 0x1003794 16791444 (9: nrow: 571 rrow: 0)
      leaf: 0x10037a9 16791465 (10: nrow: 430 rrow: 0)

… (most of the treedump has been cut out, following is the last portion of the dump)

  
     leaf: 0x1003e70 16793200 (248: nrow: 533 rrow: 533)
      leaf: 0x1003e74 16793204 (249: nrow: 533 rrow: 533)
      leaf: 0x1003e78 16793208 (250: nrow: 533 rrow: 533)
      leaf: 0x1003e7c 16793212 (251: nrow: 533 rrow: 533)
      leaf: 0x1003e41 16793153 (252: nrow: 533 rrow: 533)
      leaf: 0x1003e45 16793157 (253: nrow: 533 rrow: 533)
      leaf: 0x1003e49 16793161 (254: nrow: 533 rrow: 533)
      leaf: 0x1003e4d 16793165 (255: nrow: 533 rrow: 533)
      leaf: 0x1003e51 16793169 (256: nrow: 533 rrow: 533)
      leaf: 0x1003e3e 16793150 (257: nrow: 533 rrow: 533)
      leaf: 0x1003e03 16793091 (258: nrow: 533 rrow: 533)
      leaf: 0x1003e07 16793095 (259: nrow: 236 rrow: 236)
—– end tree dump

We notice that the first portion of the index contains leaf blocks with nothing but deleted index entries. The number of rrows is 0 for a vast number of leaf blocks. We also notice that the root block has a rba of 0x100378b 16791435, which is only a few values below some of the rba values of the left most indexes in the index structure (say) 0x100378e 16791438. Therefore, this highlights that even though these left most blocks in the index structure contain nothing but deleted index entries, Oracle is not recycling them as it should do. Oracle is simply adding new blocks to the index structure rather than recycling empty leaf blocks, resulting in the index growing bigger and bigger.

The leaf blocks however at the right most end of the index structure (the second portion of the partial treedump), shows us a nice compact set of leaf blocks with lots of index entries per block (most with 533 per leaf block) and with no deleted index entries (rrows matches the nrows value). 

If we run the procedure 10 times in total, we get an index that looks like the following:

SQL> select name, height, lf_rows, del_lf_rows, lf_blks, br_blks from index_stats;

NAME             HEIGHT    LF_ROWS DEL_LF_ROWS    LF_BLKS    BR_BLKS
------------ ---------- ---------- ----------- ---------- ----------
BOWIE_PK              3    1325132     1125132       4136          7

We now have 1,125,132 deleted index entries and the index is now over 10 times the original size, up from 374 to a massive 4,136 leaf blocks, even though the table only contains 200,000 rows.

There are a number of contributing factors here :)

The question is why, why is the index behaving in this fashion and what can we do to ensure the index doesn’t grow in this manner and can remain basically the same size as we delete and insert new rows into the table ?

Index Organized Tables – A Start (Star) January 3, 2012

Posted by Richard Foote in Index Organized Tables, Oracle Indexes.
8 comments

Happy New Year !!

As has been mentioned to me on quite a number of occasions, I’ve haven’t really covered the area of Index Organized Tables (IOTs) on this blog. This despite covering the subject of IOTs in quite some detail in my Index Internals Seminar. Which I readily admit is a bit of a shame as IOTs are often misunderstood little things and I would say are generally way under-utilised in many environments, I would suggest in part because they are so often misunderstood. The only reason I can really give for not discussing them more is that I simply haven’t gotten around to it :)

Well the good news as hinted at late last year, it’s about time I addressed this and started writing a few blog pieces on IOTs.

However, the even better news is that the subject of IOTs have already been covered by Martin Widlake in a truly excellent series of articles on his blog. I would strongly recommend giving them a read as they cover many aspects of IOTs in an interesting and informative manner.

I’ll recap some of the basic principles of IOTs here and then discuss various different aspects (such as the joys of Secondary Indexes on IOTs) in my own unique way :)

So, all that’s to come in the coming weeks.

However, I recently received an index question via email during the Christmas holidays which I thought was worthy of a blog piece or two. Although I often respond to emails, I much prefer to discuss index related “stuff” on the blog so everyone can share in the knowledge exchange. This is an interesting question because it relates to why an index keeps growing, even though the number of rows in the table remains constant. It’s an even more interesting question because there are a number of contributing factors for this impact on the index which I thought were worth sharing.

No more clues, I’ll post the question and the specific scenario in the next few days …

Merry Christmas (plus long lost “The Jean Genie”) December 24, 2011

Posted by Richard Foote in Christmas.
9 comments

I just want to wish you all a very Merry Christmas and a very happy, safe and successful New Year.

As always, thank you all so much for supporting the blog, with all your comments and nice feedback. Hopefully, you’ve found content here that has been of some use. Now that I’m back at Oracle Corporation, 2012 should hopefully be an exciting year with lots to talk and blog about.

Here’s my Christmas present …

Described as an early Christmas present for all David Bowie fans world-wide, footage of the great man performing his classic hit “The Jean Genie” on England’s “Top of the Pops” has recently re-surfaced. Thought long-lost (as the BBC had a tendency to simply delete their tapes), this was originally recorded way back at the time Bowie was working on the “Aladdin Sane” album, on 3rd January 1973 and broadcast the next day on 4th January 1973. It basically hasn’t been seen since until a cameraman on the show recently told the world he had a copy of the tapes in his shed !!

Now, nearly 40 years on, it’s still a great song, a great band and a great performance.

Enjoy :)

Back At Oracle Corporation (The Return Of The Thin White Duke) November 30, 2011

Posted by Richard Foote in Richard's Musings.
30 comments

I’ve been very quiet on the blogging front lately, so apologies for that.

One of the main reasons is that after a period of nearly 10 years, I’ve decided to take up a position back at Oracle Corporation as a “Principal Solutions Consultant” in sunny Canberra. So things have been rather hectic, finishing up in my previous job and starting up afresh at Oracle. Things have certainly changed a lot over this time, with new offices and a much larger workforce than worked in Canberra previously. Some things haven’t changed such as I’ve still got my old email address and the same employee number (2020) as before. I’ve got that feeling of being really excited again at work which I’ve missed recently so I’m really looking forward to my many new adventures to come at Oracle.

The good news is that I’ve been given the green light to continue on with my blogging activities so expect action on that front here again soon. I’ll still be keeping it technical (unless there’s a new David Bowie album or Pink Floyd Box-set to discuss of course) and I’ll still be talking mainly about indexes as there’s heaps that I’ll still like to cover such as Secondary Indexes on IOTs as promised previously, as well as looking at more “exotic” indexes such as those for Spatial, XML, Oracle Text etc. There are also a few “rumours” regarding not needing indexes with Exadata which might potentially become full-blown myths if I don’t do something about it :)

I’ll also get back to answering the various questions that have been building up over the past few weeks.

So stay tuned !!

Why Are My Indexes Still Valid Solution (A Second Face) October 20, 2011

Posted by Richard Foote in IOT, Oracle Indexes, Quiz, Secondary Indexes.
add a comment

I’ve been so busy lately, I just haven’t had any spare time to post.

For now, the quick answer to the last quiz is that the second table was indeed an Index Organized Table (IOT).

One of the nice benefits of an IOT is that when re-organised, unlike a Heap Table, all indexes remain valid, even the Secondary Indexes. I’ll explain why in my next post in the next few days. I’ll also explain why secondary indexes are one of the main disadvantages with IOTs as well.

Stay tuned !!

Why Are My Indexes Still Valid Quiz ? (Move On) October 11, 2011

Posted by Richard Foote in Index Organized Tables, Oracle Indexes, Quiz.
6 comments

OK, this quiz is a nice easy one, the lads at work got this without too much trouble. 
 

Normally, when you MOVE (re-org) a table, all the associated indexes become Unusable. As below:
 

SQL> create table pink_floyd (id number constraint pf_pk primary key, code number, name varchar2(30));
 
Table created.
 
SQL> create index pf_code_i on pink_floyd(code);
 
Index created.
 
SQL> insert into pink_floyd select rownum, mod(rownum,100), 'The Dark Side Of The Moon' from dual connect by level <= 10000;
 
10000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select index_name, num_rows, status from dba_indexes where table_name = 'PINK_FLOYD';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
PF_PK                               10000 VALID
PF_CODE_I                           10000 VALID
 
SQL> alter table pink_floyd move;
 
Table altered.
 
SQL> select index_name, num_rows, status from dba_indexes where table_name = 'PINK_FLOYD';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
PF_PK                               10000 UNUSABLE
PF_CODE_I                           10000 UNUSABLE

 
 
 
So the indexes are now all unusable ..
 
 
However, I previously created another table called BOWIE that has exactly the same columns, indexes and data but when I MOVE this table:
 

SQL> select index_name, num_rows, status from dba_indexes where table_name = 'BOWIE';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
BOWIE_PK                            10000 VALID
BOWIE_CODE_I                        10000 VALID
 
SQL> alter table bowie move;
 
Table altered.
 
SQL> select index_name, num_rows, status from dba_indexes where table_name = 'BOWIE';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
BOWIE_PK                            10000 VALID
BOWIE_CODE_I                        10000 VALID

 
All the indexes remain VALID !!
 
What’s so different about this table and their indexes ??????

I plan to discuss this whole topic (finally) in more detail in the coming weeks …

Why Is My Index Not Being Used No. 2 Solution (The Narrow Way) October 6, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
3 comments

As many have identified, the first thing to point out is that the two queries are not exactly equivalent.

The BETWEEN clause is equivalent to a ‘>= and <=’ predicate, whereas the original query only had a ‘> and <‘ predicate. The additional equal conditions at each end is significant.

The selectivity of the original query is basically costed as  (max condition – min condition)/(max column value – min column value). As this equates to 1 day/2000 days, the selectivity of the first query is therefore 0.0005.

1M rows multiplied by 0.0005 = 500 rows, the CBO cardinality estimate.

Note that 0.0005 is also the selectivity of 1 day.

The  calculation for the BETWEEN clause is different. It’s the above formula PLUS the selectivity of 1 day for the each of the two equal conditions (as the CBO assumes you not only want the values within the range but the specific values on each side of the range).

So that’s a selectivity of 0.0005 for the date range as above plus 2 x 0.0005 for each of the equal conditions = 0.0005 + 0.001 = 0.0015.

1M muliplied by 0.0015 = 1500, the CBO cardinality estimate of the second query.

As many have mentioned in the comments, the Clustering Factor of this index is awful, as each of the distinct 500 occurences of each day is spread evenly throughout the whole table. It’s therefore very expensive to use this index to retrieve a “larger” number of rows.

The cost of the first query that used the index is 505, about as bad as it gets when retrieving 500 rows. If we were to retrieve not 500 but 1500 rows, then the cost would effectively triple and be approximately 1500.

However, the cost of the FTS as highlighted in the second query is 933. This is less than 1500 and so the FTS is prefered by the CBO in the second query.

It all comes down to the relative costs and these all come down to the estimated selectivity of the query and the associated segment (and system) statistics, of which the Clustering Factor of the index is one of the most significant factors of all (no pun intended of course). If we effectively triple the estimated costs of a query as we do with the second query, then this can obviously have an impact on the CBO calculations and the resultant execution plan.

If we were to rewrite the first query to be equivalent to using the BETWEEN:

SQL> select * from bowie where hist_date >= '01-JAN-2011' and hist_date <= '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   933   (2)| 00:00:12 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   933   (2)| 00:00:12 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3344  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

then we now get the same FTS costings and execution plan.

The moral of this story is that using a BETWEEN for a small range can significantly increase the cost of the query. Something to be aware of if perhaps close enough is good enough.

Why Is My Index Not Being Used No. 2 Quiz (Quicksand) October 3, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
34 comments

I have a table that has 1M rows with dates that span 2000 days, all evenly distributed (so there are 500 rows per day for the mathematically challenged). All stats are 100% accurate and I have an index on the date column.
 

SQL> create table bowie (id number, hist_date date, text varchar2(30));
 
Table created.
 
SQL> insert into bowie select rownum, sysdate-mod(rownum, 2000), 'BOWIE' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_date_i on bowie(hist_date);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 
 
 
OK, I now select 1 day’s worth of data:

 
 

SQL> select * from bowie where hist_date > '01-JAN-2011' and hist_date < '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 690852991
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   500 |  9500 |   505   (0)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |   500 |  9500 |   505   (0)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |   500 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("HIST_DATE">TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "HIST_DATE"<TO_DATE(' 2011-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        506  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
 
 
Everything is perfect. The index is used and the cardinality estimate is spot on with the CBO correctly predicting that 500 rows will be returned.
 
 
OK, I now re-write the query with a BETWEEN clause:
 
 

SQL> select * from bowie where hist_date between '01-JAN-2011' and '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   933   (2)| 00:00:12 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   933   (2)| 00:00:12 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3344  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
 
And now everything has gone wrong. I’m still getting the same 500 rows but the CBO is choosing an inefficient FTS. The estimates are now way way off, with the CBO expecting 1500, not 500 rows to be returned.
 
 
QUESTION: Why has everything now gone so terribly wrong ?

Why Is My Index Not Being Used Solution (Eclipse) October 1, 2011

Posted by Richard Foote in ASSM, CBO, Clustering Factor, Oracle Indexes, Quiz.
1 comment so far

Well done to everyone that got the correct answer :)

Indeed, the subtle but significant difference between the two demos was that demo one created the table in a tablespace called USER_DATA with manual segment space management (with freelists/freelist groups set to 1), while demo two created the table in a tablespace called USER_DATA1 with automatic segment space management.

In the first demo, the 3 separate sessions all followed the same freelist and inserted their rows concurrently into the same table blocks, resulting in the table being effectively sorted in ID order.

If we look at the resultant Clustering Factor:

SQL> select num_rows, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_ID_I';
NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
    300000        1452              2171

We notice the Clustering Factor of 2171 is relatively low for an index with 300000 rows, as indeed the order of the rows in the table almost exactly matches the order of the index entries.

In the second demo, ASSM ensures the 3 separate transactions don’t cause contention and insert their rows in a different set of blocks from each other. This is good in that contention is reduced but has the nasty side-effect on now having the resultant rows scattered randomly between different sets of 3 varying blocks. The actual Clustering Factor isn’t particularly bad in that Oracle has to now visit 3 different blocks for a range of values that previously might have been co-located within the 1 block, but because of the manner of which the Clustering Factor is calculated and that it will increase even if forced to visit a block it had just visited a couple of I/O calls beforehand, the calculated Clustering Factor can be appalling.

If we look at the Clustering Factor of the index from the second demo:

SQL> select num_rows, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_ID_I';
 
NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
    300000        1573            271936

We notice the Clustering Factor is now terrible at 271936. It’s a classic example of a table with the data that is relatively well clustered but has an appalling Clustering Factor. If Oracle didn’t increment the Clustering Factor for a block it had only visited a couple of index entries previously, then it would likely have a similar Clustering Factor to the first demo.

But statistics collection doesn’t take this into consideration, it will increment the Clustering Factor even if the block had only just recently been visited (only if it’s the same table block as the previous index entry will the Clustering Factor not increment during stats collection), so hence the terrible Clustering Factor and hence the dramatic difference in how the index is now considered, costed and used by the CBO.

The moral of this story is that if you use ASSM or you use mutliple Freelists/Freelist Groups to avoid contention, seriously consider the impact of the Clustering Factor on indexed columns that would ordinarily have a good Clustering Factor and the impact this in turn may have on your resultant execution plans …

Why Is My Index Not Being Used Quiz (Brain Damage) September 28, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
9 comments

This one is a little different as it comes in the form of a demo (and about 1 minute to read) so you have to work a little :)
 
I create table, index and sequence:
 

SQL> create table bowie (id number, name varchar2(30)) tablespace user_data;
 
Table created.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> create sequence bowie_id order;
 
Sequence created.

 
I then create a little procedure that simply adds 100,000 rows to the table:
 

SQL> create or replace procedure add_bowie_rows as
  2  begin
  3  for i in 1..100000 loop
  4  insert into bowie values (bowie_id.nextval, 'DAVID BOWIE');
  5  commit;
  6  end loop;
  7  end;
  8  /
 
Procedure created.

 
I then have 3 different sessions that run the procedure simultaneously (eg. exec add_bowie_rows).
 
I collect 100% accurate stats:
 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>null, cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 
I run a query that selects about 10,000 rows (out of the 300,000 the table now has):
 

SQL> select * from bowie where id between 42000 and 52000;
 
10001 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 131801496
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            | 10002 |   166K|   125   (1)|00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      | 10002 |   166K|   125   (1)|00:00:02 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I | 10002 |       |    51   (0)|00:00:01 |
------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=42000 AND "ID"<=52000)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        129  consistent gets
         44  physical reads
          0  redo size
     100270  bytes sent via SQL*Net to client
        264  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10001  rows processed

 
 
Oracle uses the index and all is well.
 
OK, in exactly the same database, I repeat the demo again with the same 3 sessions populating the data in exactly the same way using exactly the same procedure with 100% accurate statistics, but there’s just one tiny little difference in the setup script:
 

SQL> drop table bowie;
 
Table dropped.
 
SQL> drop sequence bowie_id;
 
Sequence dropped.
 
SQL> create table bowie (id number, name varchar2(30)) tablespace user_data1;
 
Table created.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> create sequence bowie_id order;
 
Sequence created.

 
 
I next populate the table in 3 different sessions concurrently and collect stats exactly as before…

However, now when I run my query:
 
 

SQL> select * from bowie where id between 42000 and 52000;
 
10001 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 4157583811
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       | 10002 |   166K|   285   (4)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| BOWIE | 10002 |   166K|   285   (4)| 00:00:04 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("ID"<=52000 AND "ID">=42000)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        998  consistent gets
        693  physical reads
          0  redo size
     100270  bytes sent via SQL*Net to client
        264  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10001  rows processed

 
It performs a FTS ??? Note, the cardinality estimate of 1002 is practically spot on and identical to previously when the index was used by the CBO.

Instead of selecting 10,000 rows, if I now select say just 500 rows:
 

SQL> select * from bowie where id between 42000 and 42499;
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 4157583811
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |   501 |  8517 |   284   (4)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| BOWIE |   501 |  8517 |   284   (4)| 00:00:04 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("ID"<=42499 AND "ID">=42000)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        997  consistent gets
          0  physical reads
          0  redo size
       5263  bytes sent via SQL*Net to client
        248  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
It still performs a FTS !!
 
Only when I get down to a really low number of rows, for example 100 rows:
 

SQL> select * from bowie where id between 42000 and 42099;
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 131801496
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   101 |  1717 |    95   (0)|00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      |   101 |  1717 |    95   (0)|00:00:02 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I |   101 |       |     3   (0)|00:00:01 |
------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=42000 AND "ID"<=42099)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        103  consistent gets
          0  physical reads
          0  redo size
       1266  bytes sent via SQL*Net to client
        248  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
Will Oracle use the index.
  
QUESTION: Why, what is the tiny little difference that has made such a huge difference in behaviour ???
 
Now there are a couple of possible answers (at least) that come to mind …

Enjoy !!

Rebuilding Indexes and the Clustering Factor Solution (Move On) September 25, 2011

Posted by Richard Foote in Clustering Factor, Index Rebuild, Indexing Myth, Oracle Indexes, Quiz, Reverse Key Indexes.
1 comment so far

Excellent !! This quiz created quite a bit of debate and it was nice to sit back and read some interesting discussions.

The Clustering Factor is basically the measurement of how well aligned the data in the underlining table is in relation to the index and is the number of table related I/Os required to read the entire table via a full index scan. A value of the CF approaching the number of blocks in the table suggests the data is reasonably well sorted/clustered in relation to the index (although the CF could in theory be somewhat less than the blocks in the table of course). A value of the CF approaching the number of index entries suggests the data is not particularly well sorted/clustered in relation to the index and means we may need to re-visit the same table block numerous times to get the required data, thus decreasing the efficiency of using the index, increasing the costs associated with using the index and therefore decreasing the likelihood of the CBO using the index.

So for an index rebuild to actual have an impact on the CF on an index, means either the rows in the table needs to change or the order of the index entries needs to change.

However, when we typically rebuild an index, it has no impact at all on the table and so can’t possibly change the order of the rows there. Additionally, no matter how fragmented or inefficient the index structure might be, an index rebuild doesn’t change the order of the index entries either as they’re always sorted within the index in the order of the indexed columns.

Therefore an index rebuild typically has no impact at all on the CF of an index, no matter the current value of the CF.

However, there is an exception to this rule.

If we rebuild the index and change it from a NOREVERSE index to a REVERSE index, we now do change the order of the index. Significantly so, as the index entries are now in the order of the index column values when reversed. Therefore this can in turn significantly change the CF of an index.

Conversely, if we rebuild an index and change it from REVERSE to NOREVERSE, we likewise significantly change the order of the index entries and hence the value of the CF.

For a nice little demo, see David Aldridge’s comment or my previous discussion on Reverse Key Indexes.

Of course, it’s always nice to see new ideas and something I hadn’t considered was Gary Myer’s comment regarding changing the logic behind a Function-Based Index prior to a rebuild …

So the moral of this story is that no matter how poorly fragmented the index, how high or low the current CF of an index might be, rebuilding an index in order to improve the CF is a futile exercise and will change less than diddly-squat, except in the above mentioned special circumstances.

Now, back to another hearing of Pink Floyd’s masterpiece “The Dark Side of the Moon” in all its surround sound glory :)

Rebuilding Indexes and the Clustering Factor Quiz (One Of The Few) September 20, 2011

Posted by Richard Foote in Clustering Factor, Index Rebuild, Oracle Indexes, Quiz.
35 comments

Today’s question has been prompted by various recent comments regarding the Clustering Factor (CF) of an index and how to change the CF requires a reorg of the underlining table.

It used to be quite a common myth that if the CF of an index was greater that “X” or based on some nonsensical formula the CF was greater than “Y”, then rebuilding the index somehow made everything better again. I believe it’s now much more commonly accepted that rebuilding an index does not change the CF of an index in any way. By rebuilding an index, the actual order of the index entries remains unchanged as does the order of the rows within the table and so the resultant CF can’t possibly change.

Pick any index, no matter how bad the CF or badly fragmented the index or table, take fresh statistics and after rebuilding the index (and fresh statistics if compute statistics on the index isn’t performed), the CF will remain the same.

However, there are nearly always exceptions …

Give an example of when rebuilding an index will significantly change the actual CF of an index, assuming 100% accurate statistics are gathered before/after the index rebuild on a table with no DML.

There are actually two such examples that spring to mind :)

Big Tables, Sorts and Indexes Solution (Right On Mother) September 19, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
42 comments

My, what a clever lot we have reading this blog :)

Indeed, most of the work has already been done for me as we already have a couple of excellent demos in the list of comments.

The answer is Yes, the CBO will consider using the index and Yes, there may be a number of scenarios when the index will actually be selected by the CBO as it’s the genuinely cheaper option.

At the end of the day, it comes down to whether performing a Full Index Scan, one little block at a time but with the data coming out pre-sorted is cheaper than performing a multiblock Full Table Scan AND a subsequent sort of the data. It of course depends on various factors such as:

  • The most crucial of all, the Clustering Factor of the index. If the data is very well clustered according to the index, then the cost of reading the entire table via the index can be vastly reduced and the index access path becomes viable. A poor (or average) CF, and using the index is just too expensive. Radoslav Golian has an excellent example in the comments on when an index with an excellent CF is chosen by the CBO.
  • The cost of the sort. Sorts are really expensive, especially if Oracle is forced to go to disk and even more so if it has to perform a multi pass sort, so the more costly the sort, the more likely the index is the cheaper option.

An important point for the index to be considered is that it must have a NOT NULL constraint on the column(s), else the index is ignored as the CBO can’t guarantee all rows can referenced within the index.

The moral of this story is this. There is no selectivity by which an index is not considered by the CBO. An index can potentially select 100% of all rows, if doing so is the cheapest option available to the CBO.

I’ve discussed using an index to select 100% of all data before if anyone is interested.

New question coming soon !!

Big Tables, Sorts and Indexes Quiz (Candidate) September 14, 2011

Posted by Richard Foote in Oracle Indexes, Quiz.
27 comments

Following on from the previous quiz on Descending indexes. Simple scenario.

You have a huge table, 10 Million plus rows.

You have an index on a column with a NOT NULL constraint but there are various other columns in the table not included in the index.

You want to select all columns and all rows from the table so you don’t even have a  WHERE condition, but you want the data returned in the order of your indexed column, e.g.

SELECT * FROM really_big_table ORDER BY id;

So you have an index on the ID column.

Question: Is it possible for the CBO to use the index or for the CBO to even consider the index in the first place ? If so, how so ? If not, why not ?

Enjoy :)

 

UPDATE:

OK, all statistics, including system statistics need to be “accurate” and give the CBO correct details on the scenario (so no cheating with manipulating statistics).

No hints allowed and it’s the  ALL_ROWS optimizer_mode, as we want the best solution to retrieve every row from the 10M+ table. Again note, there is no WHERE condition in the query.

If it helps, the index can be unique and there may be as little as just 1 additional column in the table.

Follow

Get every new post delivered to your Inbox.

Join 2,042 other followers