Index Monitoring and Index Statistics (The Great Gig In The Sky) September 16, 2008
Posted by Richard Foote in 11g, Concatenated Indexes, Extended Statistics, Index Monitoring, Index statistics, Oracle Cost Based Optimizer, Oracle Indexes.8 comments
I write this post whilst listening to Pink Floyd’s masterpiece “The Dark Side Of The Moon” while sadly lamenting the passing away of Richard Wright. RIP and thank you for all the great musical gifts you’ve given me over the years.
In my last post I highlighted an example of where Index Monitoring doesn’t show how indeed Oracle does indeed use an index when checking for the existence of Foreign Key values. Thought I might discuss yet another example of where Oracle does indeed use an index but it’s again not picked up by index monitoring, this time with a slight 11g flavour.
This specific example involves using the statistics associated with the indexes to provide the CBO with useful additional information, although the index itself is not used directly within the execution plan. Dropping the index means losing this information which could possibly result in a different, non optimal execution plans.
Prior to 11g, Oracle can have a hard time of accurately determining the correct selectively where there is a correlation between two (or more) columns in a table. By default, Oracle assumes the selectivity of two distinct columns to be the density of both columns multiplied together. So for example, if one column (say “A”) had 10 distinct values and the other column (say “B” also had 10 distinct values, Oracle assumes the selectivity of both columns combined to be 10 x 10 = 100 distinct values. A predicate such as:
WHERE A = 5 and B = 2
would assume 1% of data would be retrieved if both columns A and B both had 10 distinct possible values.
However, what if there’s a special relationship between the columns and the actual number of distinct combinations was somewhat different ? What if B always equals 2 when A equals 5, what if instead of the theoretical 100 different combinations there were only 10 combinations (or some such) because most of the other possible combination don’t actually exist …
This 9i and 10g demo shows how there is indeed only 10 distinct values for each of two different columns, however there is a direct relationship between these columns such that there is actually only 10 distinct combinations of both these columns (and not the 100 combinations which are possible and which Oracle assumes in it’s selectivity calculations).
Instead of the actual 10,000 rows (or 10% of all data) being selected, Oracle is incorrectly assuming only 1000 rows (or 1%) will be selected. This is a significant error by a order of magnitude which in many cases can result in a less efficient execution plan.
With 11g, Oracle can use the statistics associated with an index to give Oracle some vital extra information. Because if there’s an index based on the two columns, then the number of distinct key values recorded for the index can provide Oracle with a much more accurate estimation of the true selectivity based on the two columns. If a concatenated index based on the two columns only has say 10 distinct values, then Oracle can assume that a specific combination of the two columns is likely to also retrieve 1/10 of all the values and not the 1/100 that are theoretically possible.
This identical 11g demo to the one above shows by having an index on the two columns that have a correlation, Oracle is using the DISTINCT_KEYS statistic for the index to determine the correct selectivity and associated cardinality for the query.
However, the demo clearly shows index monitoring is still not showing the index as being “USED”. If you were to hence drop the index, the CBO loses potentially vital information and the cardinality estimates revert back to being the product of the two column densities as with pre 11g.
By dropping the index which appears to not be used, we can potentially impact other execution plans, even though they don’t directly use the index within the execution plan. The correct cardinality estimates of a table can for example potential drive the order in which the table is subsequently joined or the manner in which it’s joined.
This demo on the possible impact of dropping an “unused” index shows how an execution plan can change to be sub-optimal, even though neither execution actually directly uses the associated index. It’s not a particularly “clever” example, but it does illustrate the potential impact of dropping these so called unused indexes.
Of course, with 11g, we now have the capability of collecting extended statistics. We can potentially determine these same level of statistics by generating statistics on both columns combined. Oracle can determine the actual distinct combinations of columns that are somehow correlated and produce more accurate and detailed statistics with hence make the CBO determine more accurate and reliable cardinality estimates.
This final demo on extended statistics shows how we can recreate the more efficient execution plan and provide the CBO with more detailed extended statistics so that it can accurately determine the correct cardinality estimates without the need to recreate the “unused” index.
Extended statistics can be extremely useful in determining correct cardinality values for column combinations that exist however it still falls somewhat short when it attempts to estimate the expected cardinality for combinations that don’t actually exist, even with histograms.
But that’s a tale for another day.
Now it’s time for “Wish You Were Here” …
Index Monitoring and Foreign Keys – Caution (Kid A) September 12, 2008
Posted by Richard Foote in 11g, Constraints, Foreign Keys, Index Monitoring.15 comments
One of the questions asked of indexes is are they actually being used or are they sitting there looking pretty not being used, wasting valuable resources and storage.
One possible method of determining whether an index is being used is to set the MONITORING USAGE attribute of the index. This will place the index in the v$object_usage view and will flag the index as being used when indeed the Cost Based Optimizer decides to “use” the index in an execution plan.
For example:
SQL> create table daddy (id number constraint daddy_pk primary key, name varchar2(20));
Table created.
SQL> insert into daddy values (1, ‘BOWIE’);
1 row created.
SQL> insert into daddy values (2, ‘ZIGGY’);
1 row created.
SQL> insert into daddy values (3, ‘THIN WHITE DUKE’);
1 row created.
SQL> commit;
Commit complete.
SQL> alter index daddy_pk monitoring usage;
Index altered.
If we look at the v$object_usage view, we’ll now see an entry for this index, with the USED flag currently set to NO:
SQL> select index_name, monitoring, used from v$object_usage where index_name = ‘DADDY_PK’;
INDEX_NAME MON USE ------------ --- --- DADDY_PK YES NO
If we however make the CBO “use” this index:
SQL> select * from daddy where id = 1;
ID NAME ---------- -------------------- 1 BOWIE -------------------------------------------- |Id| Operation | Name | -------------------------------------------- | 0| SELECT STATEMENT | | | 1| TABLE ACCESS BY INDEX ROWID| DADDY | |*2| INDEX UNIQUE SCAN | DADDY_PK| --------------------------------------------
We now notice that the index is now marked as having been used:
SQL> select index_name, monitoring, used from v$object_usage where index_name = ‘DADDY_PK’;
INDEX_NAME MON USE ------------ --- --- DADDY_PK YES YES
Although it doesn’t give any indication on just how often the index is actually used, if an index is still marked as unused for a period of time, it does provide some indication that the index has not been used during that time and is a candidate for being dropped.
Or does it ?
Unfortunately, the are some cases when an index is being used but monitoring the index doesn’t give any such indication. However dropping such indexes can prove disastrous …
For example, let’s create another table that has a Foreign Key which references this DADDY table:
SQL> create table kiddie (id number, name varchar2(20), fk number, constraint kiddie_fk foreign key(fk) references daddy(id));
Table created.
Let’s now populate it with a number of rows:
SQL> insert into kiddie select rownum, ‘MAJOR TOM’, 1 from dual connect by level <= 1000000;
1000000 rows created.
SQL> commit;
Commit complete.
Let’s now create an index on the Foreign Key column and collect some stats:
SQL> create index kiddie_fk_i on kiddie(fk);
Index created.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’DADDY’, estimate_percent=>null, cascade=>true, method_opt=> ‘FOR ALL COLUMNS SIZE 1′);
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’KIDDIE’, estimate_percent=>null, cascade=>true, method_opt=> ‘FOR ALL COLUMNS SIZE 1’);
PL/SQL procedure successfully completed.
Let’s now monitor this index on the Foreign Key and see where it may or may not get used:
SQL> alter index kiddie_fk_i monitoring usage;
Index altered.
SQL> select index_name, monitoring, used from v$object_usage where index_name = ‘KIDDIE_FK_I’;
INDEX_NAME MON USE ------------ --- --- KIDDIE_FK_I YES NO
OK, let’s now delete a row from the parent table and look at the statistics to see how many consistent reads were performed:
SQL> delete daddy where id = 2;
1 row deleted.
Statistics --------------------------- 8 db block gets 26 consistent gets 1 physical reads 1 rows processed
Now, behind the scenes, Oracle indeed used the index on the Foreign Key to determine whether or not the parent row could be safely deleted or whether there were child records that were still referencing the parent row. Note we used just 26 consistent reads.
Let’s see if the monitoring of the index has picked this up …
SQL> select index_name, monitoring, used from v$object_usage where index_name = ‘KIDDIE_FK_I’;
INDEX_NAME MON USE ------------ --- --- KIDDIE_FK_I YES NO
Ooops, this is not good. Is this index really not being used, is it really perhaps safe to drop.
Let’s find out:
SQL> drop index kiddie_fk_i;
Index dropped.
Let’s now delete another parent row and see if we notice a difference in performance:
SQL> delete daddy where id = 3;
1 row deleted.
Statistics ---------------------------- 7 db block gets 3194 consistent gets 1599 physical reads 1 rows processed
Ooops, the consistent gets have jumped up dramatically from 26 to a massive 3194 !! Without the index on the Foreign Key, the only way now for Oracle to check whether it’s OK to delete a parent row is to perform a Full Table Scan on the child table. Not only is this potentially very costly but there could be some nasty locking implications as well.
Index Monitoring only tells you if an index has been used or if it hasn’t been used by the CBO.
It doesn’t however tell you whether an index has actually been used or whether it’s safe to drop the index.
More examples of when it may not be safe to drop a so called unused index to come …
Partition Pruning – Some Examples (Knives Out) September 3, 2008
Posted by Richard Foote in Oracle Cost Based Optimizer, Partitioning.4 comments
Following on from my last post where I looked at the use of partition pruning vs. the use of an index, I thought it might be worthwhile to just go through a few additional basic examples of where partition pruning can come into play.
I’ll also include a little example of how execution plans may not favour indexes as much with partitioned tables as the resultant cost of a Full Table Scan can be reduced significantly due to partition pruning.
All the examples use exactly the same tables as created in the previous post. For a full listing of the examples and resultant executions plans, look at this Partition Pruning Demo. You may want to open up the demo in a different browser window to make it easier to follow.
In Example 1:
SQL> select * from big_album_sales where release_date between ’01-JUN-2003′ and ’13-APR-2004′;
the CBO can determine that all the data can only belong in Partitions 3 and 4 (as only only years that need to be referenced are 2003 and 2004) and so not have to visit any of the other partitions. The PARTITION RANGE ITERATOR step clearly shows that it only has to visit the partitions in the range 3 – 4.
In Example 2:
SQL> select * from big_album_sales where release_date > ’01-JUN-2006′;
All the possible values must exist from partition 6 which stores all the 2006 data and onwards through to the last partition in the table which is partition 8. The PARTITION RANGE ITERATOR step clearly shows that it only has to visit the partitions in the range 6 – 8.
In Example 3:
SQL> select * from big_album_sales where release_date between ’01-JUN-2006′ and ’30-NOV-2006′ or release_date > ’14-JUN-2008′;
The only possible partitions the data can reside in are partition 6 or partition 8 (for the years 2006 or 2008). In this case, the CBO performs the PARTITION RANGE OR step which visits only these specific partitions.
In Example 4:
SQL> select * from big_album_sales where release_date is null;
All NULL values must reside in the last partition, as the table was created with the last partition having a MAXVALUE boundary. That being the case, the CBO need only visit the last partition (8) within the table to find all possible NULL values of RELEASE_DATE.
Because partition pruning can make performing a so-called Full Table Scan much more efficient as only those partitions with possible data need only be accessed, in some rare cases this can make the CBO appear to ignore reasonable indexes that may be used and accessed in an equivalent non-partitioned table.
Note that the data includes a series of RELEASE_DATE values but many of the dates actually have the same timestamp component as all the row values were determined and loaded in a very quick manner. As a result, the CBO considers there are actually 718 rows per distinct occurrence of a RELEASE_DATE value. This fact is significant when costing an equality predicate.
In Example 5:
SQL> select * from big_album_sales where release_date = ’01-JUN-2006′;
we’re only interested in the one specific date value but Oracle still favours the Full Table Scan (FTS). Note this value can only possibly exist in partition 6 associated with data from 2006 and so the FTS need only actually access partition 6. Oracle assumes it will retrieve approximately 718 rows which is too costly via the index as partition pruning can make the FTS relatively efficient. Note that the actual cost of the FTS is 303.
In Example 6:
SQL> select /*+ index(b) */ * from big_album_sales b where release_date = ’01-JUN-2006′;
a hint is used to force the use of the associated index on the RELEASE_DATE column. Note in this example, it’s actually a much cheaper option (there are far fewer consistent gets) as no rows are actually being retrieved but the cost of 722 is significantly greater than that of the FTS (303) as Oracle is expecting to retrieve and access 718 rows.
In Example 7:
SQL> select * from big_album_sales2 where release_date = ’01-JUN-2006′;
we run exactly the same statement but this time on the non-partitioned table and notice that the execution plan is totally different. Oracle is indeed using the index because in this example the FTS is so much more expensive as it needs to read the entire table and not just the one partition (or approximately 1/8 of the table) as it does with the partitioned table example.
Lots more on partitioning to come in the future …