jump to navigation

Buffer Cache Hit Ratios: Useful or Not ? December 16, 2007

Posted by Richard Foote in Buffer Cache Hit Ratio, Oracle General, Oracle Myths, Oracle Opinion, Richard's Musings.
6 comments

The question of whether the Buffer Cache Hit Ratio (BCHR) is a useful metric or not is one of those discussions in Oracle that seems to crop up on a regular basis. I thought it might be worth briefly mentioning the topic here.

The BCHR represents the percentage of LIOs in which the required block is already loaded in the buffer cache(s). The higher the BCHR value, the greater the percentage of blocks accessed directly from memory and the lower the subsequent ratio of physical reads. A “higher” BCHR is generally considered a good thing as it’s a good thing to find required blocks in memory (right ?), a “lower” BCHR is generally considered not such a good thing as it’s bad to perform a higher ratio of physical reads (right ?).

The first problem with the BCHR of course is that it’s often miscalculated with many not appreciating for example the impact of direct reads on the actual physical reads used in BCHR calculations.

Assuming the BCHR is correctly calculated, the second problem with the BCHR is that it represents a database wide average. Averages are always dangerous things as they generally completely hide specific details and anomalies. Unless the average represents and can guarantee some consistent database metric or behaviour, then the average quickly becomes just a number, without any actual or inherent meaning.

A specific BCHR value at any specific point in time doesn’t actually tell us anything meaningful about the performance of specific tasks within the database. A database generally performs 100s or 1000s or 10000s of tasks at any given point of time. Unless all tasks or a significant percentage of tasks exhibit the same performance issue, then a single database-wide metric will be unable to determine issues with these specific tasks. The “average” figure hides details of the specific issue.

Therefore, at what point or at what value does an average figure provide meaning ?

The third problem with the BCHR is that these specific tasks within the database sometimes perform activities that are “good” and “efficient” but can result in the BCHR either going up or down or remain unchanged. Sometimes these activities can be “bad” and “inefficient” but can also result in the BCHR either going up or down or remain unchanged. Therefore without understanding what these specific activities might be, it’s simply impossible to know whether a specific change in the BCHR is good or bad.

Let’s assume we have a BCHR of 90%. Is this a good thing or is this a bad thing ? Is database performance an issue with a BCHR at 90% or is database performance good ? Is database performance good or bad generally or are there specific processes within the database that are problematic ? The answer of course is that it entirely “depends” and a value of 90% or 95% or 50% can’t in of itself answer any of these questions.

We can have a BCHR at 90% and performance can be terrible. It can be terrible at a database wide level due to any number of problems or issues or it can be terrible for specific users using specific applications or processes.

We can have a BCHR at 90% and performance can be perfect, with all users achieving optimal response times.

A BHCR of 90% is useless on it’s own. We need to go and perform all manners of additional checks to ensure the database is “healthy”.

However, even those who claim the BCHR is a meaningful and useful tuning metric generally agree and admit the BCHR on its own has no inherent usefulness and that it needs to be used in combination with other database “checks”. They generally claim that it’s the BCHR when monitored and used from a historical point of view with fluctuations of its value over time that makes the BCHR useful.

Really ?

Let’s again assume the BCHR has been at 90% for the past few weeks (or whatever time-frame) and it has now dropped to 85%. The Quest Spotlight monitor is flashing red and obviously something has “changed”. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database (perhaps simply the specific workload has changed) ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have an important process or (processes) that have suddenly started performing expensive, inefficient Full Table Scans. That’s not good, as the extra logical and physical IOs have impacted response times detrimentally. Things have indeed gone worse. Or perhaps we have a process that was performing an extremely inefficient nested loop operation, reading the same cached table numerous of times that is now performing the same function much more efficiently, reducing LIOs significantly. Response times may have improved and things are indeed better. Or perhaps there’s a large batch program or report that needs to be urgently run during normal business hours that’s resulting in lots of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed at all.

Therefore, having a BCHR that has dropped to 85% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.

Let’s assume the BCHR has gone from 90% to 95% (or whatever higher value). The Quest Spotlight monitor is flashing nice and green but something appears to have changed. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans that are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better. Or perhaps we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs significantly causing the overall BCHR to increase as well. Response times may have plummeted and things are much worse. Or perhaps there’s a series of large batch programs or reports that usually run during normal business hours but the reporting section are on a Xmas lunch and haven’t bothered to run them today resulting in a reduction of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed.

Therefore, having a BCHR that has increased to 95% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.

Let’s assume the BCHR has not changed at all and is still sitting at 90% (or whatever value). The Quest Spotlight monitor is flashing nice and green but nothings appears to have changed. However, has nothing really changed, or could things now be seriously wrong with the database ?

Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans and are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better but because the BCHR is a database-wide metric, this improvement made have gone unnoticed. Or perhaps at the same time we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs causing response times to plummet and making key business processes much worse. But again because the BCHR is a database-wide metric, the overall BCHR may not have been impacted.

Or of course, one of hundreds of things have caused serious database performance issues while the BCHR remains totally unchanged …

Therefore, having a BCHR that has remains unchanged doesn’t actually tell us much either. The database made be running better than previously, the database may be having serious problems or the “health” of the database may remain unaltered.

So, the BCHR at any specific value doesn’t tell us much without having to check other database metrics as well.

The BCHR increasing doesn’t tell us much without having to check other database metrics as well.

The BCHR decreasing doesn’t tell us much without having to check other database metrics as well.

The BCHR remaining unchanged doesn’t tell us much without having to check other database metrics as well.

Note the database metrics we would need to check (for example, those queries using the most LIOs, those queries using the most PIOs, those queries using the most CPU, those queries being executed the most, those queries with excessive response times, causes of the most significant waits in the database, specific application/process response times, etc. etc. etc…) are exactly the same in all the above scenarios.

The biggest problem of all with the BCHR is that regardless of it’s values, or whether it goes up or down or remains unchanged, we need to perform precisely the same database checks regardless as it doesn’t tell us whether the “health” of the database has improved, got worse or remains unchanged.

If we need to perform the same actions regardless of the BCHR, then I suggest the answer to whether the BCHR is useful or not is a big and resounding no.

Why are there so many Oracle related myths ? The Inconvenient Truth December 12, 2007

Posted by Richard Foote in Buffer Cache Hit Ratio, Oracle General, Oracle Myths, Richard's Musings.
30 comments

NOTE: If you’ve linked here from Burleson’s truely awful “Oracle Index Rebuilding” article, (no, I’m not Ken Adkins !!), I suggest you read this post very carefully, as it’s most applicable considering where you’ve linked from ;)

Hundreds of years ago, most people thought the world was flat and if you travelled far enough, you’ll simply fall off the edge of the world. It seemed like a reasonable thing to believe, I mean the world kinda looks flat doesn’t it? It just didn’t make sense to think otherwise. Also, as most people didn’t really travel very far in their lives, it wasn’t a big deal to most people. The chances of falling off the world were remote. Very few actually pushed the boundaries to find out the truth; whether the world was flat or not  just wasn’t relevant or important to most people.

Well about 10 – 15 years or so ago, most people thought Oracle was “flat”. OK, not flat exactly, but certainly that a segment with one extent was better for performance than say 20 extents, that having a PCTINCREASE of 1 reduced fragmentation, that a high BCHR meant all was likely well with the database, that indexes should be rebuilt on a regular basis etc. etc. I mean it all sounds like it makes sense doesn’t it? Many new to Oracle came from other databases and technologies where these sorts of things were perfectly acceptable practices in their old environments, so it made sense they were necessary in Oracle too. They were also all based or founded on some level of fact or half truths but half truths have a habit of becoming full truths and over time get set in stone as cold, hard facts.

In those days, it really wasn’t much of an issue anyways. Databases were relatively small, anything over a few GB was considered enormous so rebuilding and reorganising things, although inconvenient, was all very plausible. Very few databases were truly 24 x 7 with most having those long, dark, lonely nights fully available to play around with the database, without disrupting business processes.

There was no Internet in those days or it was just emerging and so communication in all things Oracle was mainly through books and conferences where relatively few “experts” dominated. There were very very few that pushed the boundaries, most people were content to simply read and hear they were spending those dark nights doing something constructive. Everyone suggested rebuilding indexes on a regular basis was a good thing and improved performance, so it must be right, right ?

However, things changed. Over time, databases kept getting bigger and bigger with hardware not keeping up with shrinking maintenance windows. Organisations demanded more and more availability and those lonely nights became as mission critical as the traditional 9am-5pm.

The Internet became a hugely important communication vehicle with people able to ask questions and get instant answers from the other side of the world. Suddenly someone could make an erroneous comment and instantly someone could correct them.

This enabled a whole new wave of Oracle professionals to make an impact. The previously few recognised authors and noted experts were swamped by a whole new breed of Oracle expert, who were able to easily communicate and share their knowledge through websites and a mass of Oracle related forums. They weren’t satisfied with simply rehashing Oracle manuals and books, but treated truly understanding Oracle as a scientist would try and truly understand their specialised field. Jonathan Lewis, Steve Adams, Tom Kyte,  Anjo Kolk, Mogens Norgaard etc. become well known in the Oracle sphere and began changing how the majority saw and understood Oracle. People questioned authority and began testing and proving things for themselves.

Rebuilding indexes on a regular basis not only became questionable (as did all the other myths), it become unfeasible and impracticable. A recent survey by the IOUG suggested 35% of Oracle shops surveyed had at least one database over 1TB. DBAs started looking after more and more databases, looking after bigger and bigger databases, looking after more and more complex database environments.

Do we really need to rebuild these damn indexes ? As these myths were investigated, tested, analysed and pulled part, the answer become more and more obvious.

 No.

It simply required a clear and accurate understanding of how indexes actually behaved for the truth to be known, as researched and documented by the new breed. It also required enough Oracle people needing a solution to their maintenance issues, needing to actually justify business unavailability, for actual facts to start replacing myths.

Most indexes never need to be rebuilt. Never, ever.

Most.

So is the answer above really no ? Is the world truly not flat and safe to walk on for as long as we could want ?

If you were to blindly walk (for example) to the Grand Canyon in the USA, someone could indeed plunge off the “end of the world” to their doom. Some indexes if not rebuilt could cause performance issues if not rebuilt.

Here lies one of the main reasons why these Oracle myths never seem to die, they’re often based on half truths and it’s determining which half is which that can be problematic (although the ratio is more accurately 999-1 than 50-50). Determining that 0.1% is not necessarily a straight forward process, it can’t always be determined by a simple formula or rule of thumb.

Also human nature can be a funny thing. People are often either too lazy or too busy or both to want a complex answer to a complex question. People love easy solutions, even if it’s not actually the right solution. The path of least resistence is often the most traveled path, even if it leads to a sudden drop at a cliff’s edge …

But why in this information age, is finding the answer of what is truth and what is myth so problematic ? With so much good, clear, accurate information out there, with so many respected and knowledgeable experts out there spreading a true understanding of Oracle, why can the truth still be so hard to determine ?

The answer is “Information Pollution“.

Historically, there has been so much inaccurate information written and published about Oracle, there have been so many “wrongs” written as “rights”, that technical  accuracy can often be “clouded” by this pollution. Also, some of the older breed simply refuse to accept today’s realities. They desperately cling to old habits else fear losing creditability.

Oracle Corportation itself must accept some of the blame. They’ve historitically documented things incorrectly, they still have metalink notes that are inaccurate, host forums that spread misinformation. Although Oracle produces incrediably detailed documentation and resource rich web sources, Oracle itself is one of the big “Information Polluters”.

So, there are still many people who spread myths. There are still many people who think all indexes should be rebuilt regularly if they experience lots of DML, that indexes should be rebuilt if they’re have a height greater than some level.

There a literally thousands of Oracle newbies coming into the scene each year. How can they tell what is right from wrong (or half right from half wrong) when they have no prior knowledge or experience with Oracle ?

Not easily.

And if they pick the wrong source and learn inaccurate information and adopt inefficient processes, guess what ? They in turn become information polluters and so the clouds potentially thicken …

And so these Oracle myths continue to persist and never seem to quite disappear. So is Oracle in danger of “Global Warming” ?

No, because thankfully, the amount of good information far outweighs the bad. Oracle as a community is producing far more “Oxygen” than “Carbon-Dioxide”. The number of people who think one extent is better or that indexes need constant rebuilding, as percentage of the whole, is dwindling and dwindling over time. More and more people appreciate the world really is a lovely round planet because they have to know. Just as they have to travel to the other side of the world, they have to maintain larger and larger databases with less and less disruption to their businesses.

Importantly, there’s also a very easy and generally very accurate way to determine the right from wrong, the wheat from the chafe.

Simply prove it.

If something within Oracle (or anything really) is right and accurate and correct, then it can generally be proven. The Jonathan Lewis and Steve Adams and Tom Kytes of this world not only accurately document how an Oracle process works and functions, they generally prove it with clear, consistent, repeatable test cases. They don’t simply say a=b they actually demonstrate and prove it.

Whereas those that deal in myths and inaccuracies by definition are unable to prove it. They makes claims and suggestions and generalisations but lack that one essential ingredient that differentiates them from those that deal with facts and technical accuracies.

Proof.

And without proof, one can clearly pick out the “Information Polluters”. The next time you read something from somewhere, just ask yourself this one simple question. Are they simply stating something as fact or are they actually attempting to prove something as fact. Can that proof be copied across into your particular environment ? Is it still accurate ? Can you extend and expand on the claims and is it still accurate ?

An answer in the positive is a hell of a lot better than an answer that consists of nothing more than a “because it just is” …

Why are there so many Oracle related myths ?

Global Warming !!

Is it too late ?

Not if we meet the Oracle Kyoto Treaty and demand clean oxygen-based proofs ;)

Follow

Get every new post delivered to your Inbox.

Join 1,913 other followers