jump to navigation

Merry Christmas (plus long lost “The Jean Genie”) December 24, 2011

Posted by Richard Foote in Christmas.
9 comments

I just want to wish you all a very Merry Christmas and a very happy, safe and successful New Year.

As always, thank you all so much for supporting the blog, with all your comments and nice feedback. Hopefully, you’ve found content here that has been of some use. Now that I’m back at Oracle Corporation, 2012 should hopefully be an exciting year with lots to talk and blog about.

Here’s my Christmas present …

Described as an early Christmas present for all David Bowie fans world-wide, footage of the great man performing his classic hit “The Jean Genie” on England’s “Top of the Pops” has recently re-surfaced. Thought long-lost (as the BBC had a tendency to simply delete their tapes), this was originally recorded way back at the time Bowie was working on the “Aladdin Sane” album, on 3rd January 1973 and broadcast the next day on 4th January 1973. It basically hasn’t been seen since until a cameraman on the show recently told the world he had a copy of the tapes in his shed !!

Now, nearly 40 years on, it’s still a great song, a great band and a great performance.

Enjoy :)

Back At Oracle Corporation (The Return Of The Thin White Duke) November 30, 2011

Posted by Richard Foote in Richard's Musings.
30 comments

I’ve been very quiet on the blogging front lately, so apologies for that.

One of the main reasons is that after a period of nearly 10 years, I’ve decided to take up a position back at Oracle Corporation as a “Principal Solutions Consultant” in sunny Canberra. So things have been rather hectic, finishing up in my previous job and starting up afresh at Oracle. Things have certainly changed a lot over this time, with new offices and a much larger workforce than worked in Canberra previously. Some things haven’t changed such as I’ve still got my old email address and the same employee number (2020) as before. I’ve got that feeling of being really excited again at work which I’ve missed recently so I’m really looking forward to my many new adventures to come at Oracle.

The good news is that I’ve been given the green light to continue on with my blogging activities so expect action on that front here again soon. I’ll still be keeping it technical (unless there’s a new David Bowie album or Pink Floyd Box-set to discuss of course) and I’ll still be talking mainly about indexes as there’s heaps that I’ll still like to cover such as Secondary Indexes on IOTs as promised previously, as well as looking at more “exotic” indexes such as those for Spatial, XML, Oracle Text etc. There are also a few “rumours” regarding not needing indexes with Exadata which might potentially become full-blown myths if I don’t do something about it :)

I’ll also get back to answering the various questions that have been building up over the past few weeks.

So stay tuned !!

Why Are My Indexes Still Valid Solution (A Second Face) October 20, 2011

Posted by Richard Foote in IOT, Oracle Indexes, Quiz, Secondary Indexes.
add a comment

I’ve been so busy lately, I just haven’t had any spare time to post.

For now, the quick answer to the last quiz is that the second table was indeed an Index Organized Table (IOT).

One of the nice benefits of an IOT is that when re-organised, unlike a Heap Table, all indexes remain valid, even the Secondary Indexes. I’ll explain why in my next post in the next few days. I’ll also explain why secondary indexes are one of the main disadvantages with IOTs as well.

Stay tuned !!

Why Are My Indexes Still Valid Quiz ? (Move On) October 11, 2011

Posted by Richard Foote in Index Organized Tables, Oracle Indexes, Quiz.
6 comments

OK, this quiz is a nice easy one, the lads at work got this without too much trouble. 
 

Normally, when you MOVE (re-org) a table, all the associated indexes become Unusable. As below:
 

SQL> create table pink_floyd (id number constraint pf_pk primary key, code number, name varchar2(30));
 
Table created.
 
SQL> create index pf_code_i on pink_floyd(code);
 
Index created.
 
SQL> insert into pink_floyd select rownum, mod(rownum,100), 'The Dark Side Of The Moon' from dual connect by level <= 10000;
 
10000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.
 
SQL> select index_name, num_rows, status from dba_indexes where table_name = 'PINK_FLOYD';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
PF_PK                               10000 VALID
PF_CODE_I                           10000 VALID
 
SQL> alter table pink_floyd move;
 
Table altered.
 
SQL> select index_name, num_rows, status from dba_indexes where table_name = 'PINK_FLOYD';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
PF_PK                               10000 UNUSABLE
PF_CODE_I                           10000 UNUSABLE

 
 
 
So the indexes are now all unusable ..
 
 
However, I previously created another table called BOWIE that has exactly the same columns, indexes and data but when I MOVE this table:
 

SQL> select index_name, num_rows, status from dba_indexes where table_name = 'BOWIE';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
BOWIE_PK                            10000 VALID
BOWIE_CODE_I                        10000 VALID
 
SQL> alter table bowie move;
 
Table altered.
 
SQL> select index_name, num_rows, status from dba_indexes where table_name = 'BOWIE';
 
INDEX_NAME                       NUM_ROWS STATUS
------------------------------ ---------- --------
BOWIE_PK                            10000 VALID
BOWIE_CODE_I                        10000 VALID

 
All the indexes remain VALID !!
 
What’s so different about this table and their indexes ??????

I plan to discuss this whole topic (finally) in more detail in the coming weeks …

Why Is My Index Not Being Used No. 2 Solution (The Narrow Way) October 6, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
3 comments

As many have identified, the first thing to point out is that the two queries are not exactly equivalent.

The BETWEEN clause is equivalent to a ‘>= and <=’ predicate, whereas the original query only had a ‘> and <’ predicate. The additional equal conditions at each end is significant.

The selectivity of the original query is basically costed as  (max condition – min condition)/(max column value – min column value). As this equates to 1 day/2000 days, the selectivity of the first query is therefore 0.0005.

1M rows multiplied by 0.0005 = 500 rows, the CBO cardinality estimate.

Note that 0.0005 is also the selectivity of 1 day.

The  calculation for the BETWEEN clause is different. It’s the above formula PLUS the selectivity of 1 day for the each of the two equal conditions (as the CBO assumes you not only want the values within the range but the specific values on each side of the range).

So that’s a selectivity of 0.0005 for the date range as above plus 2 x 0.0005 for each of the equal conditions = 0.0005 + 0.001 = 0.0015.

1M muliplied by 0.0015 = 1500, the CBO cardinality estimate of the second query.

As many have mentioned in the comments, the Clustering Factor of this index is awful, as each of the distinct 500 occurences of each day is spread evenly throughout the whole table. It’s therefore very expensive to use this index to retrieve a “larger” number of rows.

The cost of the first query that used the index is 505, about as bad as it gets when retrieving 500 rows. If we were to retrieve not 500 but 1500 rows, then the cost would effectively triple and be approximately 1500.

However, the cost of the FTS as highlighted in the second query is 933. This is less than 1500 and so the FTS is prefered by the CBO in the second query.

It all comes down to the relative costs and these all come down to the estimated selectivity of the query and the associated segment (and system) statistics, of which the Clustering Factor of the index is one of the most significant factors of all (no pun intended of course). If we effectively triple the estimated costs of a query as we do with the second query, then this can obviously have an impact on the CBO calculations and the resultant execution plan.

If we were to rewrite the first query to be equivalent to using the BETWEEN:

SQL> select * from bowie where hist_date >= '01-JAN-2011' and hist_date <= '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   933   (2)| 00:00:12 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   933   (2)| 00:00:12 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3344  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

then we now get the same FTS costings and execution plan.

The moral of this story is that using a BETWEEN for a small range can significantly increase the cost of the query. Something to be aware of if perhaps close enough is good enough.

Why Is My Index Not Being Used No. 2 Quiz (Quicksand) October 3, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
34 comments

I have a table that has 1M rows with dates that span 2000 days, all evenly distributed (so there are 500 rows per day for the mathematically challenged). All stats are 100% accurate and I have an index on the date column.
 

SQL> create table bowie (id number, hist_date date, text varchar2(30));
 
Table created.
 
SQL> insert into bowie select rownum, sysdate-mod(rownum, 2000), 'BOWIE' from dual connect by level <= 1000000;
 
1000000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index bowie_date_i on bowie(hist_date);
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 
 
 
OK, I now select 1 day’s worth of data:

 
 

SQL> select * from bowie where hist_date > '01-JAN-2011' and hist_date < '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 690852991
 
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |   500 |  9500 |   505   (0)| 00:00:07 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE        |   500 |  9500 |   505   (0)| 00:00:07 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_DATE_I |   500 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("HIST_DATE">TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "HIST_DATE"<TO_DATE(' 2011-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        506  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
 
 
Everything is perfect. The index is used and the cardinality estimate is spot on with the CBO correctly predicting that 500 rows will be returned.
 
 
OK, I now re-write the query with a BETWEEN clause:
 
 

SQL> select * from bowie where hist_date between '01-JAN-2011' and '02-JAN-2011';
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |  1500 | 28500 |   933   (2)| 00:00:12 |
|*  1 |  TABLE ACCESS FULL| BOWIE |  1500 | 28500 |   933   (2)| 00:00:12 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("HIST_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "HIST_DATE"<=TO_DATE(' 2011-01-02 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3344  consistent gets
          0  physical reads
          0  redo size
       5563  bytes sent via SQL*Net to client
        395  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
 
And now everything has gone wrong. I’m still getting the same 500 rows but the CBO is choosing an inefficient FTS. The estimates are now way way off, with the CBO expecting 1500, not 500 rows to be returned.
 
 
QUESTION: Why has everything now gone so terribly wrong ?

Why Is My Index Not Being Used Solution (Eclipse) October 1, 2011

Posted by Richard Foote in ASSM, CBO, Clustering Factor, Oracle Indexes, Quiz.
1 comment so far

Well done to everyone that got the correct answer :)

Indeed, the subtle but significant difference between the two demos was that demo one created the table in a tablespace called USER_DATA with manual segment space management (with freelists/freelist groups set to 1), while demo two created the table in a tablespace called USER_DATA1 with automatic segment space management.

In the first demo, the 3 separate sessions all followed the same freelist and inserted their rows concurrently into the same table blocks, resulting in the table being effectively sorted in ID order.

If we look at the resultant Clustering Factor:

SQL> select num_rows, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_ID_I';
NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
    300000        1452              2171

We notice the Clustering Factor of 2171 is relatively low for an index with 300000 rows, as indeed the order of the rows in the table almost exactly matches the order of the index entries.

In the second demo, ASSM ensures the 3 separate transactions don’t cause contention and insert their rows in a different set of blocks from each other. This is good in that contention is reduced but has the nasty side-effect on now having the resultant rows scattered randomly between different sets of 3 varying blocks. The actual Clustering Factor isn’t particularly bad in that Oracle has to now visit 3 different blocks for a range of values that previously might have been co-located within the 1 block, but because of the manner of which the Clustering Factor is calculated and that it will increase even if forced to visit a block it had just visited a couple of I/O calls beforehand, the calculated Clustering Factor can be appalling.

If we look at the Clustering Factor of the index from the second demo:

SQL> select num_rows, leaf_blocks, clustering_factor from dba_indexes where index_name = 'BOWIE_ID_I';
 
NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
    300000        1573            271936

We notice the Clustering Factor is now terrible at 271936. It’s a classic example of a table with the data that is relatively well clustered but has an appalling Clustering Factor. If Oracle didn’t increment the Clustering Factor for a block it had only visited a couple of index entries previously, then it would likely have a similar Clustering Factor to the first demo.

But statistics collection doesn’t take this into consideration, it will increment the Clustering Factor even if the block had only just recently been visited (only if it’s the same table block as the previous index entry will the Clustering Factor not increment during stats collection), so hence the terrible Clustering Factor and hence the dramatic difference in how the index is now considered, costed and used by the CBO.

The moral of this story is that if you use ASSM or you use mutliple Freelists/Freelist Groups to avoid contention, seriously consider the impact of the Clustering Factor on indexed columns that would ordinarily have a good Clustering Factor and the impact this in turn may have on your resultant execution plans …

Why Is My Index Not Being Used Quiz (Brain Damage) September 28, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
9 comments

This one is a little different as it comes in the form of a demo (and about 1 minute to read) so you have to work a little :)
 
I create table, index and sequence:
 

SQL> create table bowie (id number, name varchar2(30)) tablespace user_data;
 
Table created.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> create sequence bowie_id order;
 
Sequence created.

 
I then create a little procedure that simply adds 100,000 rows to the table:
 

SQL> create or replace procedure add_bowie_rows as
  2  begin
  3  for i in 1..100000 loop
  4  insert into bowie values (bowie_id.nextval, 'DAVID BOWIE');
  5  commit;
  6  end loop;
  7  end;
  8  /
 
Procedure created.

 
I then have 3 different sessions that run the procedure simultaneously (eg. exec add_bowie_rows).
 
I collect 100% accurate stats:
 

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>null, cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 1');
 
PL/SQL procedure successfully completed.

 
I run a query that selects about 10,000 rows (out of the 300,000 the table now has):
 

SQL> select * from bowie where id between 42000 and 52000;
 
10001 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 131801496
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            | 10002 |   166K|   125   (1)|00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      | 10002 |   166K|   125   (1)|00:00:02 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I | 10002 |       |    51   (0)|00:00:01 |
------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=42000 AND "ID"<=52000)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        129  consistent gets
         44  physical reads
          0  redo size
     100270  bytes sent via SQL*Net to client
        264  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10001  rows processed

 
 
Oracle uses the index and all is well.
 
OK, in exactly the same database, I repeat the demo again with the same 3 sessions populating the data in exactly the same way using exactly the same procedure with 100% accurate statistics, but there’s just one tiny little difference in the setup script:
 

SQL> drop table bowie;
 
Table dropped.
 
SQL> drop sequence bowie_id;
 
Sequence dropped.
 
SQL> create table bowie (id number, name varchar2(30)) tablespace user_data1;
 
Table created.
 
SQL> create index bowie_id_i on bowie(id);
 
Index created.
 
SQL> create sequence bowie_id order;
 
Sequence created.

 
 
I next populate the table in 3 different sessions concurrently and collect stats exactly as before…

However, now when I run my query:
 
 

SQL> select * from bowie where id between 42000 and 52000;
 
10001 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 4157583811
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       | 10002 |   166K|   285   (4)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| BOWIE | 10002 |   166K|   285   (4)| 00:00:04 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("ID"<=52000 AND "ID">=42000)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        998  consistent gets
        693  physical reads
          0  redo size
     100270  bytes sent via SQL*Net to client
        264  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      10001  rows processed

 
It performs a FTS ??? Note, the cardinality estimate of 1002 is practically spot on and identical to previously when the index was used by the CBO.

Instead of selecting 10,000 rows, if I now select say just 500 rows:
 

SQL> select * from bowie where id between 42000 and 42499;
 
500 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 4157583811
 
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |   501 |  8517 |   284   (4)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| BOWIE |   501 |  8517 |   284   (4)| 00:00:04 |
---------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("ID"<=42499 AND "ID">=42000)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        997  consistent gets
          0  physical reads
          0  redo size
       5263  bytes sent via SQL*Net to client
        248  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        500  rows processed

 
It still performs a FTS !!
 
Only when I get down to a really low number of rows, for example 100 rows:
 

SQL> select * from bowie where id between 42000 and 42099;
 
100 rows selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 131801496
 
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   101 |  1717 |    95   (0)|00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      |   101 |  1717 |    95   (0)|00:00:02 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I |   101 |       |     3   (0)|00:00:01 |
------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=42000 AND "ID"<=42099)
 

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        103  consistent gets
          0  physical reads
          0  redo size
       1266  bytes sent via SQL*Net to client
        248  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

 
Will Oracle use the index.
  
QUESTION: Why, what is the tiny little difference that has made such a huge difference in behaviour ???
 
Now there are a couple of possible answers (at least) that come to mind …

Enjoy !!

Rebuilding Indexes and the Clustering Factor Solution (Move On) September 25, 2011

Posted by Richard Foote in Clustering Factor, Index Rebuild, Indexing Myth, Oracle Indexes, Quiz, Reverse Key Indexes.
1 comment so far

Excellent !! This quiz created quite a bit of debate and it was nice to sit back and read some interesting discussions.

The Clustering Factor is basically the measurement of how well aligned the data in the underlining table is in relation to the index and is the number of table related I/Os required to read the entire table via a full index scan. A value of the CF approaching the number of blocks in the table suggests the data is reasonably well sorted/clustered in relation to the index (although the CF could in theory be somewhat less than the blocks in the table of course). A value of the CF approaching the number of index entries suggests the data is not particularly well sorted/clustered in relation to the index and means we may need to re-visit the same table block numerous times to get the required data, thus decreasing the efficiency of using the index, increasing the costs associated with using the index and therefore decreasing the likelihood of the CBO using the index.

So for an index rebuild to actual have an impact on the CF on an index, means either the rows in the table needs to change or the order of the index entries needs to change.

However, when we typically rebuild an index, it has no impact at all on the table and so can’t possibly change the order of the rows there. Additionally, no matter how fragmented or inefficient the index structure might be, an index rebuild doesn’t change the order of the index entries either as they’re always sorted within the index in the order of the indexed columns.

Therefore an index rebuild typically has no impact at all on the CF of an index, no matter the current value of the CF.

However, there is an exception to this rule.

If we rebuild the index and change it from a NOREVERSE index to a REVERSE index, we now do change the order of the index. Significantly so, as the index entries are now in the order of the index column values when reversed. Therefore this can in turn significantly change the CF of an index.

Conversely, if we rebuild an index and change it from REVERSE to NOREVERSE, we likewise significantly change the order of the index entries and hence the value of the CF.

For a nice little demo, see David Aldridge’s comment or my previous discussion on Reverse Key Indexes.

Of course, it’s always nice to see new ideas and something I hadn’t considered was Gary Myer’s comment regarding changing the logic behind a Function-Based Index prior to a rebuild …

So the moral of this story is that no matter how poorly fragmented the index, how high or low the current CF of an index might be, rebuilding an index in order to improve the CF is a futile exercise and will change less than diddly-squat, except in the above mentioned special circumstances.

Now, back to another hearing of Pink Floyd’s masterpiece “The Dark Side of the Moon” in all its surround sound glory :)

Rebuilding Indexes and the Clustering Factor Quiz (One Of The Few) September 20, 2011

Posted by Richard Foote in Clustering Factor, Index Rebuild, Oracle Indexes, Quiz.
35 comments

Today’s question has been prompted by various recent comments regarding the Clustering Factor (CF) of an index and how to change the CF requires a reorg of the underlining table.

It used to be quite a common myth that if the CF of an index was greater that “X” or based on some nonsensical formula the CF was greater than “Y”, then rebuilding the index somehow made everything better again. I believe it’s now much more commonly accepted that rebuilding an index does not change the CF of an index in any way. By rebuilding an index, the actual order of the index entries remains unchanged as does the order of the rows within the table and so the resultant CF can’t possibly change.

Pick any index, no matter how bad the CF or badly fragmented the index or table, take fresh statistics and after rebuilding the index (and fresh statistics if compute statistics on the index isn’t performed), the CF will remain the same.

However, there are nearly always exceptions …

Give an example of when rebuilding an index will significantly change the actual CF of an index, assuming 100% accurate statistics are gathered before/after the index rebuild on a table with no DML.

There are actually two such examples that spring to mind :)

Big Tables, Sorts and Indexes Solution (Right On Mother) September 19, 2011

Posted by Richard Foote in CBO, Oracle Indexes, Quiz.
42 comments

My, what a clever lot we have reading this blog :)

Indeed, most of the work has already been done for me as we already have a couple of excellent demos in the list of comments.

The answer is Yes, the CBO will consider using the index and Yes, there may be a number of scenarios when the index will actually be selected by the CBO as it’s the genuinely cheaper option.

At the end of the day, it comes down to whether performing a Full Index Scan, one little block at a time but with the data coming out pre-sorted is cheaper than performing a multiblock Full Table Scan AND a subsequent sort of the data. It of course depends on various factors such as:

  • The most crucial of all, the Clustering Factor of the index. If the data is very well clustered according to the index, then the cost of reading the entire table via the index can be vastly reduced and the index access path becomes viable. A poor (or average) CF, and using the index is just too expensive. Radoslav Golian has an excellent example in the comments on when an index with an excellent CF is chosen by the CBO.
  • The cost of the sort. Sorts are really expensive, especially if Oracle is forced to go to disk and even more so if it has to perform a multi pass sort, so the more costly the sort, the more likely the index is the cheaper option.

An important point for the index to be considered is that it must have a NOT NULL constraint on the column(s), else the index is ignored as the CBO can’t guarantee all rows can referenced within the index.

The moral of this story is this. There is no selectivity by which an index is not considered by the CBO. An index can potentially select 100% of all rows, if doing so is the cheapest option available to the CBO.

I’ve discussed using an index to select 100% of all data before if anyone is interested.

New question coming soon !!

Big Tables, Sorts and Indexes Quiz (Candidate) September 14, 2011

Posted by Richard Foote in Oracle Indexes, Quiz.
27 comments

Following on from the previous quiz on Descending indexes. Simple scenario.

You have a huge table, 10 Million plus rows.

You have an index on a column with a NOT NULL constraint but there are various other columns in the table not included in the index.

You want to select all columns and all rows from the table so you don’t even have a  WHERE condition, but you want the data returned in the order of your indexed column, e.g.

SELECT * FROM really_big_table ORDER BY id;

So you have an index on the ID column.

Question: Is it possible for the CBO to use the index or for the CBO to even consider the index in the first place ? If so, how so ? If not, why not ?

Enjoy :)

 

UPDATE:

OK, all statistics, including system statistics need to be “accurate” and give the CBO correct details on the scenario (so no cheating with manipulating statistics).

No hints allowed and it’s the  ALL_ROWS optimizer_mode, as we want the best solution to retrieve every row from the 10M+ table. Again note, there is no WHERE condition in the query.

If it helps, the index can be unique and there may be as little as just 1 additional column in the table.

Descending Indexes Solution (Yellow Submarine) September 9, 2011

Posted by Richard Foote in Descending Indexes, Oracle Indexes, Quiz.
28 comments

Answers to the quiz on Descending Indexes and hopefully some useful dangers and tips on using them.

The answer to the first question is Yes, a “normal” Ascending Index can be used by the CBO to retrieve data in descending order and hence possibly avoid a sort. The reason being that leaf blocks in the index structure have effectively two pointers, one that points to the next leaf block in the index structure (except for the very last leaf block) and one that points to the previous block (except for the first leaf block). So the data in an index can be retrieved in either order.

The answer to the second question is Yes as well, a Descending Index can also be used to also retrieve data in either logical order as again all the leaf blocks have the two set of pointers.

That being the case, if an index has just the one column value, does it therefore make any difference which index one creates, ascending or descending ?

Hence my last question. The answer is maybe, as there are a number of fundamental differences in how each type of index is implemented.

Naturally, a little demo to illustrate :)

Let’s begin by creating a simple little table and a normal B-Tree index on an ID column, which has monotonically increasing values:

SQL> create table bowie (id number, name varchar2(30));

Table created.

SQL> create index bowie_id_i on bowie(id);

Index created.

SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

Note the index is indeed a “Normal” B-Tree index and because the indexed values monotonically increase, all index leaf block splits are 90-10 splits resulting a perfectly compact, 100% utilised index structure:

SQL> select index_type from dba_indexes where index_name = 'BOWIE_ID_I';

INDEX_TYPE
---------------------------
NORMAL

SQL> analyze index bowie_id_i validate structure;

Index analyzed.

SQL> select lf_rows, lf_blks, pct_used from index_stats;

   LF_ROWS    LF_BLKS   PCT_USED
---------- ---------- ----------
    100000        199        100

Let’s now run a query to ensure the index is indeed used and that the sort can indeed be avoided. Note I’ve not actually collected any CBO statistics at this stage but I’m definitely using the CBO:

SQL> alter system set optimizer_mode='ALL_ROWS' scope=both;

System altered.

SQL> select * from bowie where id between 42 and 84 order by id desc;

43 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2771731789

-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |    43 |  1290 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | BOWIE      |    43 |  1290 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN DESCENDING| BOWIE_ID_I |    43 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=42 AND "ID"<=84)
       filter("ID">=42 AND "ID"<=84)

Note
-----
   - dynamic sampling used for this statement
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          9  consistent gets
          0  physical reads
          0  redo size
       1092  bytes sent via SQL*Net to client
        418  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         43  rows processed

So the execution plan clearly shows the use of the index via an index range scan descending and that there are indeed no sort operations performed. There were no statistics gathered, so the CBO performed some dynamic sampling to determine a taste for the data.

Let’s now change the optimizer_mode to CHOOSE, a common default setting (especially pre 11g, this example is run on a 10.2.0.4 database) and re-run the query:

SQL> select * from bowie where id between 42 and 84 order by id desc;

43 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3062669298

---------------------------------------------------
| Id  | Operation                    | Name       |
---------------------------------------------------
|   0 | SELECT STATEMENT             |            |
|   1 |  SORT ORDER BY               |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| BOWIE      |
|*  3 |    INDEX RANGE SCAN          | BOWIE_ID_I |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">=42 AND "ID"<=84)

Note
-----
   - rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1092  bytes sent via SQL*Net to client
        418  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         43  rows processed

No statistics on the table now means the Rule Based Optimizer kicks in and although a sort operation is performed (as there’s no descending scan), Oracle at least used the index.

OK, let’s now run the exact same sequence of events, but this time using a Descending Index.

SQL> drop table bowie;

Table dropped.

SQL> create table bowie (id number, name varchar2(30));

Table created.

SQL> create index bowie_id_i on bowie(id desc);

Index created.

SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

So it’s the exact same table and set of data. Let’s now look at the type of index created:

SQL> select index_type from dba_indexes where index_name = 'BOWIE_ID_I';

INDEX_TYPE
---------------------------
FUNCTION-BASED NORMAL

OK, Difference Number 1. A Descending Index is no ordinary “Normal” index, but is implemented as a  “Function-Based Normal” index instead. This means there’ll be a new hidden virtual column created behind the scenes and that the Rule Based Optimizer is going to have an issue here as it can’t cope with Function-based Indexes.

Let’s look at some Index_Stats:

SQL> analyze index bowie_id_i validate structure;

Index analyzed.
SQL> select lf_rows, lf_blks, pct_used from index_stats;

   LF_ROWS    LF_BLKS   PCT_USED
---------- ---------- ----------
    100000        426         50

Difference Number 2: This index is approximately double the size of the previous index and only half as efficient with its storage. Why ? Because as the data is now inserted in reverse logical order, the last index leaf block no longer receives the largest current index value and so 90-10 splits are not performed. As only 50-50 splits are performed, the index structure is left with 50% empty blocks which can not be reused. Unfortunately, a possible candidate for periodic index rebuilds …

Let’s now re-run the query using the CBO:

SQL> select * from bowie where id between 42 and 84 order by id desc;

43 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3472402785

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)|Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   208 |  6240 |     1   (0)|00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BOWIE      |   208 |  6240 |     1   (0)|00:00:01 |
|*  2 |   INDEX RANGE SCAN          | BOWIE_ID_I |     1 |       |     1   (0)|00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(SYS_OP_DESCEND("ID")>=HEXTORAW('3EAAFF')  AND
              SYS_OP_DESCEND("ID")<=HEXTORAW('3ED4FF') )
       filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("ID"))>=42 AND
              SYS_OP_UNDESCEND(SYS_OP_DESCEND("ID"))<=84)

Note
-----
   - dynamic sampling used for this statement
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          9  consistent gets
          0  physical reads
          0  redo size
       1092  bytes sent via SQL*Net to client
        418  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         43  rows processed

Difference Number 3. Although the same execution plan with the same number of consistent gets is performed, the cardinality estimates are not as accurate and the SYS_OP_DESCEND and SYS_OP_UNDESCEND functions are used as access/filter conditions as they’re the functions implemented in the function-based index.

If we run the same query using the Rule Based Optimizer (remember, we “forgot” to collect statistics on the table):

SQL> alter system set optimizer_mode='CHOOSE' scope=both;

System altered.

SQL> select * from bowie where id between 42 and 84 order by id desc;

43 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2027917145

------------------------------------
| Id  | Operation          | Name  |
------------------------------------
|   0 | SELECT STATEMENT   |       |
|   1 |  SORT ORDER BY     |       |
|*  2 |   TABLE ACCESS FULL| BOWIE |
------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ID"<=84 AND "ID">=42)

Note
-----
   - rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        309  consistent gets
          0  physical reads
          0  redo size
       1092  bytes sent via SQL*Net to client
        418  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         43  rows processed

Difference Number 4. The Rule based Optimizer does not support Function-Based Indexes and so the index is now completely ignored. Oracle has no choice here but to perform the much more expensive Full Table Scan, when previously the ascending index was used.

A Descending Index can potentially be useful in a concatenated, multi-column index, in which the columns could be ordered in a combination of ascending/descending order that could in turn return the data in a required specific order, thereby negating the need for a potentially expensive sort operation.

However, with a single column index, one would need to question the need for making such an index descending …

Having fun :) Enjoy your weekend !!

Descending Indexes Quiz (Up On The Ladder) September 8, 2011

Posted by Richard Foote in Descending Indexes, Oracle Indexes, Quiz.
13 comments

OK, you won’t find the answer to these questions on my blog, so using my search facility won’t be of any help :)

Actually, it’s quite an easy one this, honest ;)

If you have a query such as:

SELECT * FROM bowie WHERE id BETWEEN 42 and 84 ORDER BY id DESC;

1) Can a default B-Tree index on the single ID column that stores data in Ascending order be used by the CBO to retrieve the data automatically in the required Descending order, without the need for a sort operation ?

2) Can a Descending B-Tree index on the ID column be used by the CBO to retrieve the data automatically in Ascending order, without the need for a sort operation ?

3) Depending on your answers above, what are the differences (if any) between the implementation of an Ascending and Descending index ?

Enjoy !!

Best Method To Select One Row From Small Table – Solution (Revolution 1) September 7, 2011

Posted by Richard Foote in Oracle Indexes, Small Indexes.
4 comments

OK, time for some answers, although of course regular readers of this blog will already know the answer :)

When selecting one row from the small table as in the quiz in my previous post, the correct order is as follows:

1) PK access of an Index Organized Table.

This option only requires just the 1 consistent get (as the IOT only consists of just 1 block) and this consistent get is a “cheaper” consistent get examination, which in turn only requires the 1 latch. So 1 consistent get and 1 latch. For more info on this, see: http://richardfoote.wordpress.com/2009/05/27/indexes-and-small-tables-part-vii-cluster-one/.

2) Use of Unique Index With a Heap Table.

This option only requires 2 consistent gets (one to read the index leaf block and one to read the table block). There can only be a maximum of one row when a single equality predicate is specified as the index is Unique, which means that 2 consistent gets is the maximum necessary (there may only be the 1 consistent get if there are either no rows to be returned or if all required columns can be found in the index). Additionally, because the index is Unique, both consistent gets are the cheaper, 1 latch consistent get examinations. So that’s 2 consistent gets and 2 latches. For more information, see: http://richardfoote.wordpress.com/2009/05/13/indexes-and-small-tables-part-v-its-no-game/.

3) Use of Non-Unique Index With a Heap Table

This option requires at most 3 consistent gets (one to read the leaf block, one to access the table and possibly one more to perform an additional fetch operation and checking the leaf block again in case of more rows, possibly necessary as the index is Non-Unique and there could be more than one row that matches an equality predicate). Unfortunately, as the index is Non-Unique, the consistent gets are the full-blown consistent gets which requires the buffer block to be pinned/unpinned via 2 latch calls. If the last value is specified, then the additional fetch may be unnecessary and if the index contains all the necessary columns, then just 1 consistent get would be necessary. But in the example provided, we’re looking at typically 3 consistent gets and 6 latches. For more information, see: http://richardfoote.wordpress.com/2009/05/05/indexes-and-small-tables-part-iv-treefingers/.

4) Use of a Full Table Scan

This option requires as a minimum 4 consistent gets (as Oracle needs to not only access the one block containing the 42 rows, but additionally the table segment header multiple times in order to determine the extent map, segment HWM etc.). These additional consistent gets are not generally an issue as these overheads are negligible for a typical FTS of a typical table. But in this example, with a tiny table, these consistent gets makes the difference. Note also that all 4 consistent gets require the block to be pinned/unpinned and so require 2 latch gets each. So that’s 4 consistent gets and 8 latches, minimum, even if all the rows can fit in 1 table block. On its own, no big deal, but if this small lookup table is accessed 10,000 times a minute, that’s potentially a lot of extra CPU and contention when performing a FTS over the above options. For more info, see: http://richardfoote.wordpress.com/2009/04/16/indexes-on-small-tables-part-i-one-of-the-few/ and the other articles on Indexes on Small Tables.

To those that got it right, well done. To those that didn’t, hopefully you’ve learnt something useful.

The moral of the story, no table is too small to potentially benefit from an index, even if the table is only one block in size :)

New question and discussion tomorrow.

Follow

Get every new post delivered to your Inbox.

Join 1,712 other followers