I’ve recently dug up an old presentation I did for the local Oracle User Group a number of years ago:
It’s slightly dated but has some useful general information on the subject of diagnosing performance issues and Extended SQL Tracing in Oracle that some may find useful, so I thought it might be worth posting it here. I stripped out most of the formatting so that the resultant file will be smaller to download.
Extended SQL Tracing has been absolutely invaluable over the years in diagnosing and trouble-shooting performance related issues. I had the very good fortune of attending a Hotsos Diagnosing Oracle Performance course in Sydney around 2003 (at the same time when Guy Sebastian won the first Australian Idol, ooops, probably shouldn’t have admitted remembering that) with Cary Millsap and Gary Goodman. It was a really fantastic training course which focused on the whole subject of “Method R” and using Extended SQL Tracing to diagnose performance issues. I also remember Steve Adams attending as well and thinking to myself, this guy really knows his Oracle …
If you haven’t already, I would strongly recommend checking out the Optimizing Oracle Performance book by Cary Millsap with Jeff Holt, an excellent read.
And Guess what. Using Extended SQL Tracing actually works !!
In my previous post regarding Reading 100% of Data via an Index, one of the examples I described was the scenario where an index was used to avoid a sort.
Index entries are always logically stored in the same order as the indexed columns (except of course when using a Reverse Key Index). Therefore if an index range scan is used to retrieve data, the data is returned in indexed column order. Sorts are relatively expensive operations, so if data needs to be retrieved in a specific order, the CBO can use the fact indexes return sorted data to its advantage and can potentially avoid performing the sort operation.
However, there’s a common misconception that if an appropriate index exists, Oracle will always use the index to avoid the sort. This is simply not true. Oracle will only use the index to avoid a sort if the costs of doing so is less than other alternatives. It could well be that the cost of performing say a Full Table Scan plus an associated sort might well be less than performing a really expensive and inefficient index range scan without the need for a sort. It depends on the relative costs of each possible option.
This Indexes and Sorts Part I Demo sets up a scenario and shows how an index can be used to select 100% of all rows in a table. However, note that the index in question has an excellent Clustering Factor with the rows stored in the table in basically the same order as that of the index. An index range scan of the whole table in this case is relatively efficient in that Oracle only needs to generally access each specific block the once. Therefore reading the entire table via an index and thus preventing the sort operation has a lesser cost than a potentially more efficient FTS but with the additional sorting overhead.
One of the more common reasons why an index is ignored when it possibly appears it should be used is due to the possibility of NULL values in the result set. As previously discussed, index entries that consist of nothing but NULLs are not indexed. Therefore an attempt to read all values from a table that has the potential to include NULL values may not use an associated index as the index may not reference all the required rows. The query either needs to be modified to exclude NULL values or the column(s) need to have a NOT NULL constraint for an index to be considered in this scenario.
This Indexes and Sorts Part II Demo shows how an index can not be used to eliminate a sort operation as the query result set could potentially return NULL values as the associated column does not have a NOT NULL constraint. By rewriting the query to exclude NULL values, the index is subsequently used by the CBO to retrieve all data, thus eliminating the sort operation.
Another perhaps more common reason why an index is not used to retrieve data in the index order, thus eliminating the need for a sort is that the cost of using the index to retrieve the necessary data is too costly and the savings in not performing the sort don’t outweigh the additional overheads of using the inefficient index. If an index has a very poor Clustering Factor, it can be extremely expensive for the CBO to use the index to retrieve data as most visits to the table requires a different table block to be accessed. In an extreme scenario, it may be necessary to read and re-read the same table block as many times as there are rows in the block and it may require as many distinct table block visits as there are rows in the table to retrieve all necessary rows via an index.
The cost of reading the table and the number of table block visits calculated by the CBO when using an index range scan is basically the selectivity of the query multiplied by the Clustering Factor of the index. Therefore the Clustering Factor is a crucial variable in deciding whether it’s actually worth using the index to prevent the sort or whether it’s actually less costly to use an alternative access path, such as perhaps an efficient FTS and subsequent sort.
The Indexes and Sorts Part III Demo is similar to the previous Part I demo except it uses a table that is ordered differently thus making the associated index result in a dreadful Clustering Factor. Consequently, the same query that previously used the index and no sort suddenly uses a FTS and a subsequent sort to read the entire table. However, even a query that only retrieves 10% of the data still uses a FTS and a subsequent sort. In fact even a query that retrieves just 1% of the data performs a FTS and subsequent sort. In this specific example, it wasn’t until approximately 0.11% of data was retrieved that the CBO decided it was cheaper to use the index and eliminate the sort than to use a FTS and subsequent sort.
The key message here is that yes, an index can be used to retrieve data in the specific order of the index and thus eliminate a sort operation. However, it will only do so if the cost of reading the necessary data via the index is less than alternative access paths, including the feared FTS, plus associated sort. If an index has a poor Clustering Factor, it is less likely to be considered as a method of eliminating a sort operation.
It’s totally hilarious and so very very clever.
The challenge is indeed to watch it all without laughing !!
Whoever put it together deserves much credit, although I do slightly prefer the original version :)
I’ve already previously discussed how a Full Table Scan can sometimes be the most effective execution path for reading a relatively small proportion of all rows (< 1%). Thought I might quickly discuss how an index can be the most effective execution path for reading a relatively high proportion of rows.
Such as 100% of all rows.
There is no magic number or proportion from which Oracle will somehow magically switch to using a FTS. If an index access path has a lower cost, it will be selected over the more expensive FTS and the associated percentage of rows retrieved can potentially be anywhere between 0% and 100% of all rows.
Typically, reading a large percentage of rows is considered the exclusive domain of the Full Table Scan, however there are various scenarios in which the costs of reading all rows via an index is actually the cheaper alternative. Here are just a few examples.
Perhaps the most obvious example is when we’re only interested in columns that can all be found within an index. In this case, if the index segment is smaller than the parent table segment (which in most cases it is), then Oracle can treat the index as being some kinda “skinny” version of the table and perform a multi-block Index Fast Full Scan without having to subsequently visit the table segment at all.
This simple Index Fast Full Scan Demo shows how a query that only references columns in an index can use the index to retrieve 100% of all rows in the table.
Another example is where the table segment is very poorly fragmented with lots of deleted space. A FTS will need to read all table blocks below the High Water Mark (HWM), including potentially many mostly or totally empty table blocks. If these costs are excessive and the remaining rows can be more efficiently accessed via an index, in extreme cases it might be more efficient to read all rows via the index than via a FTS. Yes, the table segment should probably be reorganised via say a Move or Shrink command if the deleted space is not going to be reused any time soon, however until this has been performed, it could very well be more efficient to access 100% of data via an index.
This extreme Poorly Fragmented Table Demo highlights how an index can be most efficient in retrieving 100% of all rows in a table, if the table is badly fragmented with lots of deleted space.
Yet another example is when the index could be used to avoid a possible sort. Index entries are always stored in the order of the indexed columns (except for Reverse Key Indexes). Therefore by reading the data via an index, all data will be retrieved in the order of the index. If this order matches the specific required order due to a ORDER BY clause, then Oracle does not need to perform the sort operation. In some cases, especially when the index has an excellent Clustering Factor, it might be more efficient to retrieve 100% of all data via an index and avoid the sort than use a FTS followed by a sort.
This tiny extract from my index internals seminar shows a simple Index and Sort Demo whereby the CBO decides to use the index to retrieve 100% of all data as it prevents Oracle from having to perform an expensive sort.
You begin to get the idea …
In summary, an index can of course can be most effective when retrieving just the 1 row (0 rows even) but it can also be most effective when retrieving up to 100% of all rows in a table as these simple examples illustrate.
The next time someone asks at what point or percentage will Oracle no longer consider using an index, we now all know there is no magic number and that it all entirely depends on many many factors which ultimately determine the relative costs of all possible access paths.
Read-Only Table Before 11g (A Day In The Life) May 15, 2008Posted by Richard Foote in Oracle General, Oracle Indexes, Oracle Opinion, Read Only.
Basically the question is how can one make a table read-only before 11g ?
The thread mentions a number of possibilities, some better than others. I thought I might just mention this possible solution involving Materialized Views. There are various alternatives based on this basic idea, this is just a simple example.
First create and populate a table we want to convert to Read-Only.
SQL> create table bowie_ro (id number, name varchar2(20));
SQL> alter table bowie_ro add primary key (id);
SQL> insert into bowie_ro values (1, ‘Bowie’);
1 row created.
SQL> insert into bowie_ro values (2, ‘Ziggy’);
1 row created.
SQL> insert into bowie_ro values (3, ‘Major Tom’);
1 row created.
Next, rename the table to a another name.
SQL> rename bowie_ro to bowie_temp;
Next, create a materialized view called the original name based on the renamed table.
SQL> create materialized view bowie_ro
2 refresh on demand complete
3 as select * from bowie_temp;
Materialized view created.
Next, drop the orignal table.
SQL> drop table bowie_temp;
We can now see and select the table as we could previously.
SQL> select * from bowie_ro;
3 Major Tom
However, you now can’t perform DML on the table, making it effectively read-only …
SQL> insert into bowie_ro values (4, ‘Thin White Duke’);
insert into bowie_ro values (4, ‘Thin White Duke’)
ERROR at line 1:
ORA-01732: data manipulation operation not legal on this view
You may of course need to add a few grants, constraints or indexes here or there but the table is now effectively read-only without the need of a read-only tablespace or other trick as mentioned in the OTN thread.
Back to indexes accessing all rows in a table soon :)
What seems like ages ago, I listed 8 things you may not have known about indexes. Although I’ve since written about many of the 8 items, I’ve yet to address the last item listed:
8. An index can potentially be the most efficient and effective may to retrieve anything between 0% and 100% of the data from a table.
A few recent posts on OTN reminded me that perhaps it’s about time I wrote something on this topic.
Generally, the question that’s commonly asked is at what point or at what percentage of data does Oracle no longer consider the use of an index and judges the Full Table Scan (FTS) as the most efficient method to retrieve the data from a table.
Basically, what’s the “magic number”, is it 1% of data, 2%, 5%, 7.5%, 15%, 42%, 50% ???
The answer unfortunately is that there is no such magic number or percentage, it all entirely depends. The way I often answer this question is by simply stating I can very easily come up with a scenario where a FTS is the most cost effective method to retrieve 1% of the data. Equally, I can very easily come up with a scenario where an index is the most cost effective method to retrieve 99% of the data.
Like I said, there is no magic number, it entirely depends on a whole list of different factors and variables.
To start, I thought I might go through the example of how a 1% cardinality result is best achieved via a FTS, highlighting why and how the Cost Based Optimizer comes to such a decision.
I’ll use a simple little scenario with nice simple numbers to make the mathematics nice and easy to follow :)
OK, let’s assume we have a table that has 10,000,000 rows. The table uses 100,000 table blocks to store these rows and so we have on average 100 rows per block. With an 8K block size, we’re basically looking at a table with an average row size of about 80 bytes.
Let’s say this table has an associated index with approximately 20,000 leaf blocks required to store the index entries for a particular column and the index has a blevel of 2 (or a height of 3). This basically means we can store approximately 500 index entries per block and the average index entry is about 16 bytes or so in length.
The indexed column has 100 distinct values which are evenly distributed such that each distinct value has approximately 100,000 occurrences each. The column has no NULL values.
Let’s say we write a query based on the indexed column and we’re interested in just one of the possible 100 values or approximately 1% of the data in total. For example:
SELECT * FROM bowie_table WHERE code = ‘ABCDE’;
Does the CBO choose the index or does it chose the FTS ?
Well, let’s first cost the index access path.
We begin by reading the root block and the intermediate branch block for a cost of 2.
We also need to read approximately 1% of all the index leaf blocks in order to access all the index entries of interest. So that’s 20,000 (leaf blocks) x 0.01 = 200 leaf blocks in total.
So the total cost of reading just the index is 202.
Next comes the interesting bit. How many of the 100,000 table blocks do we need to access in order to read just 1% of the data (i.e. 100,000 rows) ?
Well, the answer depends entirely on the Clustering Factor of the index or to put it another way, in how well ordered the rows in the table are in relation to the index. If the index column values of interest are all very well clustered together in the table, then we can access the required rows by visiting fewer blocks than if the index column values are evenly and randomly distributed throughout the table.
In fact, in the worst possible cases scenario, if the Clustering Factor is appalling and has a value close to the number of rows in the table (10,000,000), we may actually need to visit each and every block in the table as each block has an average of 100 rows per block and we want on average 1% or one of these rows from each and every table block.
In the best possible case scenario, with the column values perfectly clustered together and with a Clustering Factor approaching the number of blocks in the table (100,000), we may get away with only having to visit 1% of all the table blocks or just 1,000 of them.
So the Clustering Factor is a crucial variable in how costly it would be to read the table via the index. The actual table access costs therefore are simply calculated as being the selectivity of the query (0.01 in our case) multiplied by the Clustering Factor of the associated index.
In this example, the Clustering Factor is indeed appalling with a value of 10,000,000 and the table access costs are therefore calculated as 0.01 x 10,000,000 = 100,000.
So the total costs of using the index is 202 (for the index related costs) + 100,000 (to access the rows from the table) = 100,202 in total.
So what are the costs associated with the FTS ?
Well, the FTS has a number of advantages over the index scan. Firstly, as Oracle needs to process all the blocks, it can retrieve all the necessary rows by reading a specific table block just the once. However, with the index scan, Oracle may possibly need to access a specific table block multiple times in some scenarios.
Secondly, as Oracle knows it has to read each and every block, Oracle can do so with a larger “bite of the pie” each time via multiblock reads, knowing it’s not wasting resources as all blocks need to be processed anyways. Index access reads perform single block I/Os whereas a FTS can perform muiltblock I/Os at a time. In this specific example, let’s assume the effective multiple read value is 10, remember, we want to keep the arthmetic nice and simple …
Finally, a FTS can be performed in parallel, even if the table itself isn’t partitioned, which means the overall response times can be further improved and the CBO can reduce its “costs” accordingly. In this example, we won’t worry about parallel query.
So the costs of a FTS in our example is basically 1 (for the segment header) + 100,000 (table blocks) / 10 (the effective multblock read value) = 1+10,000 = 10,001.
So that’s roughly an overall cost of 100,202 for the index vs. 10,001 for the FTS.
The results are not even close with the FTS winning hands down and that’s for just 1% of the data …
A couple of final little points for now.
Firstly, the cost of just reading 1 block (for the single block index reads) vs. 10 blocks (for the multiblock FTS reads) may actually differ somewhat as multiblock reads are doing more “work” with it’s associated I/O. By default, with no parameters set and with no system statistics, the CBO will cost each I/O as being the same. More about how to possibly adjust this another time.
Also, by default the CBO will assume all associated I/Os are physical I/Os and will cost them accordingly, even if the BCHR is nice and high and the index access path in question might be accessed within (say) a nested loop join where the likelihood of many of the index related I/Os in particular being cached is very high. More on this at another time as well.
But for now, just note how in this relatively trivial example, the following factors came into play when determining the potential costs of this query:
- Selectivity of the query
- Data distribution with regard to the actual occurrences of the required data
- Number of table blocks (below the high water mark)
- Number of leaf blocks
- Index Height
- Average number of rows per table block
- Average number of leaf entries per leaf block
- Clustering Factor
- Caching characteristics of index and table
- Effective multiblock read count
- Relative cost of single vs. multiblock I/Os
All of which contribute to make any single “magic number” by which Oracle will no longer consider using an index but another fairy tale in the Oracle book of myths and folklore …
Thought I might share some thoughts regarding recoverability issues with regard to having indexes separate and stored in their own tablespace.
I’ve already discussed here how the loss of an index only tablespace would be a catastrophic event, with the database in dire straights until the indexes are recovered. Therefore the faster we can recover from the situation, the faster we can make DML statements work again, the faster we can prevent Full Table Scans from crippling database performance, the faster we can return the database to a functional state again, the better for our users and for our sanity.
One of the advantages of having indexes separate from tables and stored in their own tablespace is that we have a number of different recovery options available to us. Rather than having to perform a full tablespace or data file recovery, we can potentially simply just rebuild all the impacted indexes. Providing the base tables are available and have not been impacted by whatever catastrophic event has befallen the index tablespace, we can rebuild the indexes (in another tablespace if necessary). This will hopefully be a more simplistic, efficiently and most importantly faster method of recovering all our impacted indexes than performing an actual database recovery.
But will it really be more simplistic, efficient and faster ? The recovery advantages with having indexes in their own tablespace are often exaggerated. Let’s first take a look at an example scenario.
Let’s assume we have an index only tablespace that stores all the indexes for our application. Let’s say we have 100G worth of indexes. In a physically separate table only tablespace, let’s say we have a total of 200G worth of table data which is approximately double that of the index tablespace. Generally speaking, it’s common for indexes to not use the same amount of storage as the tables as typically not all columns are indexed. Of course it’s possible for a specific column to be indexed several times and for the index storage to exceed table storage in some cases, but not typically. There may of course be some free space in these tablespaces but let’s assume free space is minimal.
So we have 100G of indexes and 200G of tables.
Let’s also assume there’s on average 2 indexes per table, if only to keep the following arithmetic nice and simple :) Of course some tables may have many more indexes, some may just have the one index and in some rare examples there may be no indexes at all.
Now, it’s important to note that building a new index is actually a very expensive exercise. Oracle has to read all the data blocks in the base table, it has to sort the data in the order of the index entries, it has to create the index segment and write the index data, while generating undo and redo in the process.
Now that’s a lot of work …
However, in this scenario, we need to do this work for each and every index that’s in our stuffed index tablespace. Not only that, but we also need a script that can identify each of our impacted indexes, that generates the necessary index rebuild scripts (to another tablespace if necessary) and that handles any necessary constraint related issues.
In this specific scenario, we have to make Oracle and the database processes basically perform the following amount of work:
- Read approximately 400G of table related data. As we have an average of 2 indexes per table, we have to basically read each and every table an average of 2 times to build their related indexes. That’s 2 x 200G = 400G.
- Sort approximately 100G worth of index related data. Sorting is a really expensive, relatively slow process and we have 100G worth of index data that needs to be sorted.
- Write and create approximately 100G of index related segments
Note we also have to generate Data Dictionary related changes, we have to generate a bunch of undo related changes and we also (although optionally) generate lots and lots of redo.
In short, the database is being absolutely hammered during this whole process and it will take a loooong time to complete.
And this is meant to be the easy, efficient and above all fast method of recovering our indexes ?
So what is the alternate recovery strategy that this method of “simply” rebuilding all indexes is meant to protect us from.
Well, with a damaged tablespace, we basically need to perform a tablespace level recovery, restoring “just” the 100G worth of data files and applying any associated redo logs since our last backup. Depending on our backup and recovery strategy, we may actually reduce the redo logs being applied by applying incremental or cumulative backups as well.
Instead of the database slowly and laboriously having to read, process and write 6 or 7 times the amount of data (in our scenario), we can use the OS to much more efficiently copy across the index related data files.
Instead of having to script the rebuilding of all impacted indexes, literally a couple of RMAN commands will basically automatically completely restore and recover the impacted index tablespace for us.
In the scenario when only a specific data file or mount point within the tablespace has been problematic, the implications of attempting to recover the situation by simply rebuilding the indexes gets worse, much worse.
Firstly, if we are so inclined, we need to identify which indexes have at least one extent within the damaged portion of the index tablespace. We then need to entirely rebuild all these indexes, regardless of how much of the index may actually remain undamaged with other extents in undamaged portions of the index tablespace. This all takes resources, resources, resources and time, time, time.
We can’t just rebuild a part of an index (unless it’s partitioned of course) but we can recover a part of a tablespace. We can simply recover the damaged part of the tablespace, restoring and recovering just the specific data file or files, again potentially with just a few simple RMAN commands.
With small databases with small amounts of data, the time it takes to rebuild all indexes in an application may be acceptable for the business. However, in larger database environments, the extra time and resources required to rebuild large amounts of index data compared to other recovery strategies would be totally and completely unacceptable.
An exercise for those who store indexes in a separate tablespace, in large part because of the recoverability advantages. On a QA system or equivalent copy of your production database environment, go through a real exercise of attempting to recover your indexes by rebuilding them and actually time how long such a recovery process takes. Then repeat the exercise by recovering the database using a conventional database recovery technique and time the differences.
You may just come to the conclusion that rebuilding indexes may not be such a fast and efficient recovery process in many scenarios after all …