jump to navigation

Buffer Cache Hit Ratios: Useful or Not ? December 16, 2007

Posted by Richard Foote in Buffer Cache Hit Ratio, Oracle General, Oracle Myths, Oracle Opinion, Richard's Musings.
6 comments

The question of whether the Buffer Cache Hit Ratio (BCHR) is a useful metric or not is one of those discussions in Oracle that seems to crop up on a regular basis. I thought it might be worth briefly mentioning the topic here.

The BCHR represents the percentage of LIOs in which the required block is already loaded in the buffer cache(s). The higher the BCHR value, the greater the percentage of blocks accessed directly from memory and the lower the subsequent ratio of physical reads. A “higher” BCHR is generally considered a good thing as it’s a good thing to find required blocks in memory (right ?), a “lower” BCHR is generally considered not such a good thing as it’s bad to perform a higher ratio of physical reads (right ?).

The first problem with the BCHR of course is that it’s often miscalculated with many not appreciating for example the impact of direct reads on the actual physical reads used in BCHR calculations.

Assuming the BCHR is correctly calculated, the second problem with the BCHR is that it represents a database wide average. Averages are always dangerous things as they generally completely hide specific details and anomalies. Unless the average represents and can guarantee some consistent database metric or behaviour, then the average quickly becomes just a number, without any actual or inherent meaning.

A specific BCHR value at any specific point in time doesn’t actually tell us anything meaningful about the performance of specific tasks within the database. A database generally performs 100s or 1000s or 10000s of tasks at any given point of time. Unless all tasks or a significant percentage of tasks exhibit the same performance issue, then a single database-wide metric will be unable to determine issues with these specific tasks. The “average” figure hides details of the specific issue.

Therefore, at what point or at what value does an average figure provide meaning ?

The third problem with the BCHR is that these specific tasks within the database sometimes perform activities that are “good” and “efficient” but can result in the BCHR either going up or down or remain unchanged. Sometimes these activities can be “bad” and “inefficient” but can also result in the BCHR either going up or down or remain unchanged. Therefore without understanding what these specific activities might be, it’s simply impossible to know whether a specific change in the BCHR is good or bad.

Let’s assume we have a BCHR of 90%. Is this a good thing or is this a bad thing ? Is database performance an issue with a BCHR at 90% or is database performance good ? Is database performance good or bad generally or are there specific processes within the database that are problematic ? The answer of course is that it entirely “depends” and a value of 90% or 95% or 50% can’t in of itself answer any of these questions.

We can have a BCHR at 90% and performance can be terrible. It can be terrible at a database wide level due to any number of problems or issues or it can be terrible for specific users using specific applications or processes.

We can have a BCHR at 90% and performance can be perfect, with all users achieving optimal response times.

A BHCR of 90% is useless on it’s own. We need to go and perform all manners of additional checks to ensure the database is “healthy”.

However, even those who claim the BCHR is a meaningful and useful tuning metric generally agree and admit the BCHR on its own has no inherent usefulness and that it needs to be used in combination with other database “checks”. They generally claim that it’s the BCHR when monitored and used from a historical point of view with fluctuations of its value over time that makes the BCHR useful.

Really ?

Let’s again assume the BCHR has been at 90% for the past few weeks (or whatever time-frame) and it has now dropped to 85%. The Quest Spotlight monitor is flashing red and obviously something has “changed”. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database (perhaps simply the specific workload has changed) ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have an important process or (processes) that have suddenly started performing expensive, inefficient Full Table Scans. That’s not good, as the extra logical and physical IOs have impacted response times detrimentally. Things have indeed gone worse. Or perhaps we have a process that was performing an extremely inefficient nested loop operation, reading the same cached table numerous of times that is now performing the same function much more efficiently, reducing LIOs significantly. Response times may have improved and things are indeed better. Or perhaps there’s a large batch program or report that needs to be urgently run during normal business hours that’s resulting in lots of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed at all.

Therefore, having a BCHR that has dropped to 85% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.

Let’s assume the BCHR has gone from 90% to 95% (or whatever higher value). The Quest Spotlight monitor is flashing nice and green but something appears to have changed. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans that are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better. Or perhaps we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs significantly causing the overall BCHR to increase as well. Response times may have plummeted and things are much worse. Or perhaps there’s a series of large batch programs or reports that usually run during normal business hours but the reporting section are on a Xmas lunch and haven’t bothered to run them today resulting in a reduction of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed.

Therefore, having a BCHR that has increased to 95% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.

Let’s assume the BCHR has not changed at all and is still sitting at 90% (or whatever value). The Quest Spotlight monitor is flashing nice and green but nothings appears to have changed. However, has nothing really changed, or could things now be seriously wrong with the database ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans and are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better but because the BCHR is a database-wide metric, this improvement made have gone unnoticed. Or perhaps at the same time we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs causing response times to plummet and making key business processes much worse. But again because the BCHR is a database-wide metric, the overall BCHR may not have been impacted.

Or of course, one of hundreds of things have caused serious database performance issues while the BCHR remains totally unchanged …

Therefore, having a BCHR that has remains unchanged doesn’t actually tell us much either. The database made be running better than previously, the database may be having serious problems or the “health” of the database may remain unaltered.

So, the BCHR at any specific value doesn’t tell us much without having to check other database metrics as well.

The BCHR increasing doesn’t tell us much without having to check other database metrics as well.

The BCHR decreasing doesn’t tell us much without having to check other database metrics as well.

The BCHR remaining unchanged doesn’t tell us much without having to check other database metrics as well.

Note the database metrics we would need to check (for example, those queries using the most LIOs, those queries using the most PIOs, those queries using the most CPU, those queries being executed the most, those queries with excessive response times, causes of the most significant waits in the database, specific application/process response times, etc. etc. etc…) are exactly the same in all the above scenarios.

The biggest problem of all with the BCHR is that regardless of it’s values, or whether it goes up or down or remains unchanged, we need to perform precisely the same database checks regardless as it doesn’t tell us whether the “health” of the database has improved, got worse or remains unchanged.

If we need to perform the same actions regardless of the BCHR, then I suggest the answer to whether the BCHR is useful or not is a big and resounding no.

Follow

Get every new post delivered to your Inbox.

Join 1,913 other followers