A common myth or mis-perception is that when deciding how to order the columns in a concatenated, multi-column index, one should avoid placing low cardinality columns in front.
For example, if you want to create an index on two columns, column ID which has many many distinct values and column CODE which has very few distinct values, create the index as (ID, CODE) as it’ll be far more efficient than a corresponding index on (CODE, ID).
The reasoning goes that by creating the (CODE, ID) index, one decreases the performance and efficiency of using the index as Oracle will have to scan through multiple index leaf blocks containing the low cardinality column, until it eventually finds the specific index entries of interest.
Or so the theory goes …
In actual fact, there’s no real difference in navigating to the specific leaf block of interest for an index on (ID, CODE) compared to an index based on (CODE, ID), providing both indexed columns are known.
The important fact that’s missed is that the branch index entries contain column entries based on all indexed columns, or at least on as much as is necessary to uniquely identify the required navigational path. Therefore, Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know.
The only slight overhead that an index based on (CODE,ID) will have is that these branch index entries are going to be somewhat larger as it will likely require both columns for the branch index entries but likely only the one column the other way around. However, branch blocks usually take up a small percentage of the overall index structure and this (usually) minor overhead is very unlikely to make a difference to the index height.
This demo on Index Column Cardinality Order shows how Oracle navigates to a specific leaf block of interest in the same manner and with the same costs, regardless of the ordering of low and high cardinality columns in the index. It also shows and describes a couple of index branch block dumps to highlight how Oracle uses the column values to define the necessary navigational path.
So the high cardinality column shouldn’t necessarily be the column given leading column status.
In actual fact there are a number of good reasons why the low cardinality column could be considered as the better option as the leading column. For a start, the index can be compressed much more efficiently if the leading column has lower cardinality. Also, an Index Skip Scan can at least be considered if the leading column has lower cardinality.
Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration, as is the clustering factor of the leading column.
All good discussions for another day
Bitmap Indexes With Many Distinct Column Values (Wots…uh the deal) February 1, 2008Posted by Richard Foote in Bitmap Indexes, Oracle General, Oracle Indexes, Oracle Myths.
In the seemingly never ending list of 8 things one may not have known about indexes, Number 3 stated:
“Bitmap Indexes can be extremely useful and beneficial even if the column contains thousands of distinct values”.
On the surface, this may seem like a somewhat strange thing to suggest to many folk. It seems to be stated in numerous places that Bitmap indexes are only really beneficial with columns that have low numbers of distinct values. For example, a column called GENDER (I was going to use another word but I have to deal with enough spam messages as it is ) has only a few distinct values, so it would be perfect for a Bitmap Index.
Columns that have say 5 or 10 or maybe 20 distinct values should all be OK. But what if a column has 100 distinct values, that might just be pushing it. A column with 1000 distinct values would obviously be totally inappropriate. I would have to be some kind of deranged fool for even contemplating and suggesting a column with 10,000 distinct values, right !!
A Bitmap Index is actually a B-Tree index in it’s basic structure and shape. It has index branch blocks that point to index leaf blocks that have all the necessary index information stored in an ordered and sorted manner. However, in the leaf blocks, a conventional B-Tree index basically stores the indexed column values followed by its corresponding rowid. A bitmap index differs considerably and basically stores in the leaf blocks for each distinct column, the column value followed by a starting and ending rowid that specifies a range of possible rowids within the table followed by a series of bits that denotes for each possible rowid within the range whether the row contains the column value (1) or not (0). If the index entry is larger than roughly half the index block size, another bitmap “piece” is created for the index entry, specifying a different range of rowids with corresponding bitmaps.
The “biggest” component of the index entry is thus this series of bits. But most of the values will be zeros (as a specific row can only have at most the one value of the column) and all these zeros can be compressed quite efficiently by Oracle within the index entry.
So having lots of distinct column values means having lots of index entries with lots of rowid ranges with lots of bitmaps. So a column with anything approaching 10,000 values would be totally inappropriate, right ?
Well take a look at this demo comparing a B-Tree Index vs. a Bitmap Index for a column that has 10,000 distinct values and you might just be surprised.
The table contains 1 Million rows and one of the columns is a NUMBER field that has 10,000 distinct values (hence a Density value of 1%).
The B-Tree Index required 2,090 leaf blocks to store the index and an equality query returning the expected 100 rows requires 19 consistent reads. Not too bad.
However, the equivalent Bitmap Index required just 56 leaf blocks and an equality query returning the same 100 rows does so with just 11 consistent reads.
Ummm, perhaps bitmaps indexes don’t require such low numbers of distinct column values to be useful after all …
A few points to ponder on from this specific example.
The B-Tree index had to store 1,000,000 index values, once for each and every not null row in the parent table. The Bitmap Index only had to store the index values 10,000 times, once for each unique occurrence of the column value (although there may be times when this ratio may be higher)
The B-Tree index had to stored 1,000,000 rowids, once for each and every index row entry. The Bitmap Index only had to store a pair of rowid values for each unique occurrence of the column value (although there may be times when Bitmap Indexes need to store more than one pair of rowids per index value).
If the rows in the table are clustered together based on the index column value, it means the zeros in the bitmap index can be nice and continuous within the bitmap string and so will compress nicely. Therefore, the manner in which the rows are ordered in the table will have a direct impact in how efficient the Bitmap Index can be compressed.
There’s lots and lots of interesting things to discuss about Bitmap Indexes. For now, just note the next time you read Bitmap Indexes should only be used for columns with few distinct values, you may want to get some clarification on what is meant exactly by “few” …
Introduction To Reverse Key Indexes: Part IV (Cluster One) January 21, 2008Posted by Richard Foote in Clustering Factor, Index statistics, Oracle Indexes, Oracle Myths, Reverse Key Indexes.
There’s a myth that suggests if the Clustering Factor (CF) of an index is greater than a certain ratio when compared to the number of rows in the table, the CF is poor and an index rebuild would be beneficial.
The slight problem with this advice is that the CF actually measures how well aligned the order of the column values are in the table as compared to the order of the index entries in the index. Generally, a table in which the column values are ordered in a similar manner to the index will have a CF closer to the number of blocks in the table. A table in which the column values are ordered in a random manner when compared to the index will have a CF closer to the number of rows in the table.
An index rebuild doesn’t change the ordering of the index row entries and an index rebuild has no impact on the table so therefore the comparative ordering of both remains unchanged. Therefore the CF of an index will be identical after the rebuild as it was before.
Well actually, there is one slight exception to this rule. Reverse Key Indexes.
Generally, rows with monotonically increasing column values are physically inserted in the order of the monotonically increasing columns. This may not be the case however with tables in ASSM tablespaces or tables with multiple freelists or freelist groups as concurrent inserts will be directed to differing blocks. In these cases we may actually have data that is quite well clustered but may have quite poor CF values due to the manner in which the CF is calculated.
Assuming a Non-ASSM, single freelist/freelist group table, the CF of monotonically increasing indexed values would ordinarily be quite good. As a non-reverse index must also have its values in the monotonically column order, the CF of the index is likely to be nice and low.
However, if you were to rebuild the index as a Reverse Key Index, the index values get reversed and “randomly” redistributed within the index structure, totally changing the order of the index entries within the index. As a result, the index values are no longer aligned with those of the table and the CF is likely to now be quite appalling.
Rebuilding an index generally has no impact on the CF as the index row values retain the same logically order. Rebuilding an index to be reverse (or visa-versa) is the exception to the rule as it will physically (and logically) change the index row order.
A Reverse Key Index is likely therefore to have a much worse CF than it’s non-reverse equivalent.
A Reverse Key Index will be ignored for range predicates (as already discussed in Part I) so a poor CF may not have an impact. However, as also discussed, index range scans are still viable in some scenarios so an increased CF may impact execution plans detrimentally.
See this demo on how a Reverse Index Rebuild turns a “perfect” CF into a shocker.
In Part I, we saw how with Reverse Key Indexes, Oracle will basically take the indexed value, reverse it and then index the reversed value. As a result, data that would ordinarily be logically sorted within an index structure will now be randomly distributed. This therefore negates the use of Reverse Key Indexes with range predicates, with the CBO not even considering them in its costings.
This is all the information we need to dispel a rather bizarre suggestion that has been doing the rounds regarding using Reverse Key Indexes to deal with LIKE predicates that have a leading wildcard. For example, such a suggestion can be found here and within an OTN discussion here.
Basically the suggestion is to:
1) Create a Reverse Key Index on the column to be searched with a LIKE predicate having a leading wildcard (such %, _).
2) Instead of writing the query as usual, e.g.
SELECT * FROM bowie_table WHERE name LIKE ‘%BOWIE’
rewrite the query programmatically such as:
SELECT * FROM bowie_table WHERE name LIKE ‘EIWOB%’;
by reversing the required text and now having the wildcard at the end.
The Reverse Key Index stores the data in a reversed format identical to say ‘EIWOB’, so Oracle should be able to use the Reverse Key Index to efficiently find all rows that start with ‘EIWOB’ as they’re all grouped together within the index structure, right ?
Ignoring the fact the example in the above link is somewhat meaningless as it uses a leading and a trailing wildcard in both queries and so assuming the first query only has a leading wildcard and the second query only has a trailing wildcard, this suggested use of a Reverse Key Index can not possibly work on any current version of Oracle.
There are a few fundamental problems with this suggestion but in summary not only will it not work but worse, it will actually return the wrong results.
The suggestion is correct as far as indeed, using a normal index to return data with a LIKE statement containing a leading wildcard will negate the use of an index range scan, the CBO doesn’t even consider it. An index hint may push Oracle to use a Full Index Scan, but not an Index Range Scan.
However using a Reverse Index Key to solve this is unfortunately doomed to failure for two very simple reasons.
One, as we have already seen, Oracle also ignores Index Range Scans for Reverse Key Indexes with range predicates and unfortunately, a query such as WHERE name LIKE ‘EIWOB%’ is a range scan. The CBO simply doesn’t consider the Reverse Key Index in it’s deliberations.
Two, is of course that Oracle has no possible way of knowing that when you say LIKE ‘EIWOB%’, what you really mean is search for all records ending with BOWIE, LIKE ‘%BOWIE’. How can Oracle possibly know this ? If it could use the index (which it can’t) Oracle would only reverse the search string around anyways and use the index to look for indexed entries beginning with ‘BOWIE’ within the index structure, remembering everything is of course stored in reverse within the index.
So Oracle is actually searching for all records starting with ‘EIWOB’, not ending with ‘BOWIE’ which are two entirely different things.
The net result of using this suggested strategy is not good.
1) Oracle ignores the Reverse Key Index anyways as a LIKE ‘EIWOB%’ is a range predicate
2) Oracle therefore performs a Full Table Scan anyways
3) As the query is effectively searching for all records that start with ‘EIWOB’, not as expected all records that end with ‘BOWIE’, the two queries in the example will actually return completely different results
The Reverse Key Indexes Part II Demo shows how this suggested use of a Reverse Key Index is a very very bad idea …
However, if you want to solve the issue of efficiently finding the results of a LIKE ‘%BOWIE’, there are some possible approaches one could take that will use an index and return correct results.
One possible solution (as mentioned in the OTN link listed at the beginning) is to create a Function-Based Index using the REVERSE Function, (Warning: this function is undocumented and unsupported):
CREATE INDEX bowie_reverse_func_i ON bowie(REVERSE(name));
A query such as WHERE REVERSE(name) LIKE ‘EIWOB%’ or better still WHERE REVERSE(name) LIKE REVERSE(‘%BOWIE’) can now both potentially use the index.
The reverse function will reverse the name column (from say ‘DAVID BOWIE’ to ‘EIWOB DIVAD’) and the LIKE range predicate can work with the index as it’s a Function-Based index rather than a Reverse Key Index and it’s not using a LIKE with a leading wildcard. A column containing ‘DAVID BOWIE’, but stored as ‘EIWOB DIVAD’ within the index, can be found efficiently via an index range scan using this Function-Based Index.
I’ve included an example on effectively using a Function-Based Index with the Reverse Function at the end of the above demo. There’s also a discussion and other alternatives at Gints Plivna’s Blog.
Another alternative is to use an Oracle Text Index, which also has the capability of dealing logically with queries such as %BOWIE% but as they say, that’s a topic for another day.
More on Reverse Key Indexes to come as well.
8 Things You May Not Know About Indexes January 10, 2008Posted by Richard Foote in Oracle General, Oracle Indexes, Oracle Myths, Richard's Musings.
There’s a somewhat bizarre wave washing the Blogshere at the moment whereby everyone is being asked to reveal 8 things about themselves that not many people know. It all sounds like a bit of fun, although as a craze, it does seem to have gone a bit berserk.
Now I’m a pretty private sort of person so this isn’t quite my thing. However, in the spirit of it all, I’ve decided to bend the rules a tad and give it an “Oracle Index” theme.
So here are 8 things (from the top of my head) that you may not know about Oracle Indexes:
Index compression can actually make indexes substantially larger, not smaller
It’s not true you can’t index NULL values. A single column or a set of columns containing nothing but NULLs can easily be indexed
Bitmap Indexes can be extremely useful and beneficial even if the column contains thousands of distinct values.
B-Tree Indexes can be extremely useful and beneficial even if the column contains very few distinct values (as low as 1)
It’s possible to make the CBO reference and use within an execution plan indexes that don’t in actual fact exist
It’s possible and potentially very useful to just index some column values and not all column values within a table
A REVERSE index can quite happily be used by the CBO to perform index range scans within an execution plan
An index can potentially be the most efficient and effective way to retrieve anything between 0% and 100% of data from a table
Now if there’s anything in the list you indeed didn’t know, don’t worry, that’s what this Blog is for
Buffer Cache Hit Ratios: Useful or Not ? December 16, 2007Posted by Richard Foote in Buffer Cache Hit Ratio, Oracle General, Oracle Myths, Oracle Opinion, Richard's Musings.
The question of whether the Buffer Cache Hit Ratio (BCHR) is a useful metric or not is one of those discussions in Oracle that seems to crop up on a regular basis. I thought it might be worth briefly mentioning the topic here.
The BCHR represents the percentage of LIOs in which the required block is already loaded in the buffer cache(s). The higher the BCHR value, the greater the percentage of blocks accessed directly from memory and the lower the subsequent ratio of physical reads. A “higher” BCHR is generally considered a good thing as it’s a good thing to find required blocks in memory (right ?), a “lower” BCHR is generally considered not such a good thing as it’s bad to perform a higher ratio of physical reads (right ?).
The first problem with the BCHR of course is that it’s often miscalculated with many not appreciating for example the impact of direct reads on the actual physical reads used in BCHR calculations.
Assuming the BCHR is correctly calculated, the second problem with the BCHR is that it represents a database wide average. Averages are always dangerous things as they generally completely hide specific details and anomalies. Unless the average represents and can guarantee some consistent database metric or behaviour, then the average quickly becomes just a number, without any actual or inherent meaning.
A specific BCHR value at any specific point in time doesn’t actually tell us anything meaningful about the performance of specific tasks within the database. A database generally performs 100s or 1000s or 10000s of tasks at any given point of time. Unless all tasks or a significant percentage of tasks exhibit the same performance issue, then a single database-wide metric will be unable to determine issues with these specific tasks. The “average” figure hides details of the specific issue.
Therefore, at what point or at what value does an average figure provide meaning ?
The third problem with the BCHR is that these specific tasks within the database sometimes perform activities that are “good” and “efficient” but can result in the BCHR either going up or down or remain unchanged. Sometimes these activities can be “bad” and “inefficient” but can also result in the BCHR either going up or down or remain unchanged. Therefore without understanding what these specific activities might be, it’s simply impossible to know whether a specific change in the BCHR is good or bad.
Let’s assume we have a BCHR of 90%. Is this a good thing or is this a bad thing ? Is database performance an issue with a BCHR at 90% or is database performance good ? Is database performance good or bad generally or are there specific processes within the database that are problematic ? The answer of course is that it entirely “depends” and a value of 90% or 95% or 50% can’t in of itself answer any of these questions.
We can have a BCHR at 90% and performance can be terrible. It can be terrible at a database wide level due to any number of problems or issues or it can be terrible for specific users using specific applications or processes.
We can have a BCHR at 90% and performance can be perfect, with all users achieving optimal response times.
A BHCR of 90% is useless on it’s own. We need to go and perform all manners of additional checks to ensure the database is “healthy”.
However, even those who claim the BCHR is a meaningful and useful tuning metric generally agree and admit the BCHR on its own has no inherent usefulness and that it needs to be used in combination with other database “checks”. They generally claim that it’s the BCHR when monitored and used from a historical point of view with fluctuations of its value over time that makes the BCHR useful.
Let’s again assume the BCHR has been at 90% for the past few weeks (or whatever time-frame) and it has now dropped to 85%. The Quest Spotlight monitor is flashing red and obviously something has “changed”. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database (perhaps simply the specific workload has changed) ?
Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have an important process or (processes) that have suddenly started performing expensive, inefficient Full Table Scans. That’s not good, as the extra logical and physical IOs have impacted response times detrimentally. Things have indeed gone worse. Or perhaps we have a process that was performing an extremely inefficient nested loop operation, reading the same cached table numerous of times that is now performing the same function much more efficiently, reducing LIOs significantly. Response times may have improved and things are indeed better. Or perhaps there’s a large batch program or report that needs to be urgently run during normal business hours that’s resulting in lots of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed at all.
Therefore, having a BCHR that has dropped to 85% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.
Let’s assume the BCHR has gone from 90% to 95% (or whatever higher value). The Quest Spotlight monitor is flashing nice and green but something appears to have changed. However, has it changed for the better, for the worse or has it had no noticeable impact on the “health” of the database ?
Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans that are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better. Or perhaps we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs significantly causing the overall BCHR to increase as well. Response times may have plummeted and things are much worse. Or perhaps there’s a series of large batch programs or reports that usually run during normal business hours but the reporting section are on a Xmas lunch and haven’t bothered to run them today resulting in a reduction of physical IOs to the database, but is not actually impacting the more important online transactional systems. Actually, the health of the database may not have changed.
Therefore, having a BCHR that has increased to 95% (or whatever value ) doesn’t actually tell us much other than something may have changed. But it may have changed for the worse or the better or not significantly at all. There’s no way of knowing without performing further database checks.
Let’s assume the BCHR has not changed at all and is still sitting at 90% (or whatever value). The Quest Spotlight monitor is flashing nice and green but nothings appears to have changed. However, has nothing really changed, or could things now be seriously wrong with the database ?
Well again, without looking at other specific metrics, one can’t possibly answer these questions. Perhaps we have a key process or (processes) that was previously performing expensive, inefficient Full Table Scans and are now performing efficient index scans. That’s good, the reduction in logical and physical IOs have impacted response times positively. Things have indeed gone better but because the BCHR is a database-wide metric, this improvement made have gone unnoticed. Or perhaps at the same time we have a process that was previously performing efficiently that has suddenly started to perform extremely inefficient nested loop operations, reading the same cached table numerous of times, increasing LIOs causing response times to plummet and making key business processes much worse. But again because the BCHR is a database-wide metric, the overall BCHR may not have been impacted.
Or of course, one of hundreds of things have caused serious database performance issues while the BCHR remains totally unchanged …
Therefore, having a BCHR that has remains unchanged doesn’t actually tell us much either. The database made be running better than previously, the database may be having serious problems or the “health” of the database may remain unaltered.
So, the BCHR at any specific value doesn’t tell us much without having to check other database metrics as well.
The BCHR increasing doesn’t tell us much without having to check other database metrics as well.
The BCHR decreasing doesn’t tell us much without having to check other database metrics as well.
The BCHR remaining unchanged doesn’t tell us much without having to check other database metrics as well.
Note the database metrics we would need to check (for example, those queries using the most LIOs, those queries using the most PIOs, those queries using the most CPU, those queries being executed the most, those queries with excessive response times, causes of the most significant waits in the database, specific application/process response times, etc. etc. etc…) are exactly the same in all the above scenarios.
The biggest problem of all with the BCHR is that regardless of it’s values, or whether it goes up or down or remains unchanged, we need to perform precisely the same database checks regardless as it doesn’t tell us whether the “health” of the database has improved, got worse or remains unchanged.
If we need to perform the same actions regardless of the BCHR, then I suggest the answer to whether the BCHR is useful or not is a big and resounding no.
Back when Oracle8 was released, Oracle introduced a number of new features with regard to constraints.
The first was the option of making a constraint DEFERRABLE, meaning the policing of a constraint can be deferred until the issuing of the COMMIT, rather than during the execution of an individual statement. This gave application developers more freedom in how they designed code, particularly with regard to the order in which parent – child data is inserted and manipulated.
The second new option was the ability to enable a constraint with NOVALIDATE, meaning Oracle would enable the constraint but not bother to check existing data to ensure nothing violated the constraint.
This could be useful in a number of scenarios. For example, you have data that currently violates the constraint but have urgent business requirements to enable the constraint ASAP preventing further violations, with the intention of cleaning up the existing violations at some future time.
Or you know the data is OK, so to reduce the overheads associated with enabling the constraint (eg. reading all the data to check for violations), you enable the constraint “immediately” with NOVALIDATE, bypassing the “redundant” checking.
Both deferrable and novalidate constraints therefore imply there “could” be data at any given point in time that violates the constraint. Therefore Oracle also introduced the ability to have non-unique indexes (rather than unique indexes) policing either PK or Unique constraints. For deferrable or novalidate constraints, the index must in fact be non-unique, as a unique index would prevent any such (temporary) violations of PK or Unique constraints.
Now, there are a number of interesting and subtle differences in the manner in which Oracle manages and processes a Unique vs. a Non-Unique index (eg. the amount of storage they use, the amount of redo they generate, the number of latches they acquire). This will be discussed in another Blog entry some other day.
Today, I just want to focus on a couple of interesting little side-effects with regard to how the CBO deals (or doesn’t deal)with NOT NULL and CHECK constraints that have been created as Deferrable or Novalidate.
In 9i, the CBO was clever enough to know that if someone searched for a NULL value but the column had a NOT NULL constraint, there couldn’t possibly be any data matching the criteria. Providing you had an index on the column, the CBO would generate an execution plan that used the index, found no NULL values and returned an empty row set accordingly. If you had no index, the CBO would be forced to use a Full Table Scan. So the CBO actually used an index in an efficient manner to search for non-existent nulls.
BUT, if the NOT NULL constraint was either deferrable or novalidated, then Oracle couldn’t know there were no nulls, there just might be. Therefore, Oracle was forced into the FTS regardless of the existence of the constraint or index, as null values are not indexed (unless part of a concatenated index).
See this demo for details: NOT NULLs demo with 9i
Since 10g, the CBO has become smarter. The NOT NULL example works in a very similar manner, except that the index is no longer required. If one searches for a NULL value on a column that has a NOT NULL constraint, the CBO automatically determines there can be no matching rows and returns the empty row set immediately with no LIOs. None, as accessing the data is simply not necessary.
BUT again, it can only do so if and only if the NOT NULL constraint is validated and nondeferrable, otherwise the CBO can’t guarantee no nulls.
See this little demo for details: NOT NULLs demo with 10g
Although we actually have applications that intentionally search for nulls on NOT NULL columns to return empty row sets, it’s not common that an application would perform such a search.
What is much more common is searching for a column value that simply doesn’t exist. If a column value doesn’t meet a business rule, it’s a good idea to police such business rules with Check constraints. 10g has extended the NOT NULL scenario to include Check constraints. If a search attempts to search for a column value that violates a check constraint, Oracle will immediately return an empty row set without performing any LIOs.
But once again, it can only do so if the check constraint has been validated and set as nondeferrable.
See this demo for a 10g check constraint example: Check Constraints with 10g
Making constraints deferrable or enabling them with novalidate can be useful. However, if possible, ensure constraints are not deferrable and validated as this provides the CBO with additional information regarding the columns that it might just put to good use.
Additionally, unless there’s a requirement to the contrary, use unique indexes rather than non-unique indexes to police uniqueness. But that’s a discussion for another day …
Why are there so many Oracle related myths ? The Inconvenient Truth December 12, 2007Posted by Richard Foote in Buffer Cache Hit Ratio, Oracle General, Oracle Myths, Richard's Musings.
NOTE: If you’ve linked here from Burleson’s truely awful “Oracle Index Rebuilding” article, (no, I’m not Ken Adkins !!), I suggest you read this post very carefully, as it’s most applicable considering where you’ve linked from
Hundreds of years ago, most people thought the world was flat and if you travelled far enough, you’ll simply fall off the edge of the world. It seemed like a reasonable thing to believe, I mean the world kinda looks flat doesn’t it? It just didn’t make sense to think otherwise. Also, as most people didn’t really travel very far in their lives, it wasn’t a big deal to most people. The chances of falling off the world were remote. Very few actually pushed the boundaries to find out the truth; whether the world was flat or not just wasn’t relevant or important to most people.
Well about 10 – 15 years or so ago, most people thought Oracle was “flat”. OK, not flat exactly, but certainly that a segment with one extent was better for performance than say 20 extents, that having a PCTINCREASE of 1 reduced fragmentation, that a high BCHR meant all was likely well with the database, that indexes should be rebuilt on a regular basis etc. etc. I mean it all sounds like it makes sense doesn’t it? Many new to Oracle came from other databases and technologies where these sorts of things were perfectly acceptable practices in their old environments, so it made sense they were necessary in Oracle too. They were also all based or founded on some level of fact or half truths but half truths have a habit of becoming full truths and over time get set in stone as cold, hard facts.
In those days, it really wasn’t much of an issue anyways. Databases were relatively small, anything over a few GB was considered enormous so rebuilding and reorganising things, although inconvenient, was all very plausible. Very few databases were truly 24 x 7 with most having those long, dark, lonely nights fully available to play around with the database, without disrupting business processes.
There was no Internet in those days or it was just emerging and so communication in all things Oracle was mainly through books and conferences where relatively few “experts” dominated. There were very very few that pushed the boundaries, most people were content to simply read and hear they were spending those dark nights doing something constructive. Everyone suggested rebuilding indexes on a regular basis was a good thing and improved performance, so it must be right, right ?
However, things changed. Over time, databases kept getting bigger and bigger with hardware not keeping up with shrinking maintenance windows. Organisations demanded more and more availability and those lonely nights became as mission critical as the traditional 9am-5pm.
The Internet became a hugely important communication vehicle with people able to ask questions and get instant answers from the other side of the world. Suddenly someone could make an erroneous comment and instantly someone could correct them.
This enabled a whole new wave of Oracle professionals to make an impact. The previously few recognised authors and noted experts were swamped by a whole new breed of Oracle expert, who were able to easily communicate and share their knowledge through websites and a mass of Oracle related forums. They weren’t satisfied with simply rehashing Oracle manuals and books, but treated truly understanding Oracle as a scientist would try and truly understand their specialised field. Jonathan Lewis, Steve Adams, Tom Kyte, Anjo Kolk, Mogens Norgaard etc. become well known in the Oracle sphere and began changing how the majority saw and understood Oracle. People questioned authority and began testing and proving things for themselves.
Rebuilding indexes on a regular basis not only became questionable (as did all the other myths), it become unfeasible and impracticable. A recent survey by the IOUG suggested 35% of Oracle shops surveyed had at least one database over 1TB. DBAs started looking after more and more databases, looking after bigger and bigger databases, looking after more and more complex database environments.
Do we really need to rebuild these damn indexes ? As these myths were investigated, tested, analysed and pulled part, the answer become more and more obvious.
It simply required a clear and accurate understanding of how indexes actually behaved for the truth to be known, as researched and documented by the new breed. It also required enough Oracle people needing a solution to their maintenance issues, needing to actually justify business unavailability, for actual facts to start replacing myths.
Most indexes never need to be rebuilt. Never, ever.
So is the answer above really no ? Is the world truly not flat and safe to walk on for as long as we could want ?
If you were to blindly walk (for example) to the Grand Canyon in the USA, someone could indeed plunge off the “end of the world” to their doom. Some indexes if not rebuilt could cause performance issues if not rebuilt.
Here lies one of the main reasons why these Oracle myths never seem to die, they’re often based on half truths and it’s determining which half is which that can be problematic (although the ratio is more accurately 999-1 than 50-50). Determining that 0.1% is not necessarily a straight forward process, it can’t always be determined by a simple formula or rule of thumb.
Also human nature can be a funny thing. People are often either too lazy or too busy or both to want a complex answer to a complex question. People love easy solutions, even if it’s not actually the right solution. The path of least resistence is often the most traveled path, even if it leads to a sudden drop at a cliff’s edge …
But why in this information age, is finding the answer of what is truth and what is myth so problematic ? With so much good, clear, accurate information out there, with so many respected and knowledgeable experts out there spreading a true understanding of Oracle, why can the truth still be so hard to determine ?
The answer is “Information Pollution“.
Historically, there has been so much inaccurate information written and published about Oracle, there have been so many “wrongs” written as ”rights”, that technical accuracy can often be “clouded” by this pollution. Also, some of the older breed simply refuse to accept today’s realities. They desperately cling to old habits else fear losing creditability.
Oracle Corportation itself must accept some of the blame. They’ve historitically documented things incorrectly, they still have metalink notes that are inaccurate, host forums that spread misinformation. Although Oracle produces incrediably detailed documentation and resource rich web sources, Oracle itself is one of the big “Information Polluters”.
So, there are still many people who spread myths. There are still many people who think all indexes should be rebuilt regularly if they experience lots of DML, that indexes should be rebuilt if they’re have a height greater than some level.
There a literally thousands of Oracle newbies coming into the scene each year. How can they tell what is right from wrong (or half right from half wrong) when they have no prior knowledge or experience with Oracle ?
And if they pick the wrong source and learn inaccurate information and adopt inefficient processes, guess what ? They in turn become information polluters and so the clouds potentially thicken …
And so these Oracle myths continue to persist and never seem to quite disappear. So is Oracle in danger of “Global Warming” ?
No, because thankfully, the amount of good information far outweighs the bad. Oracle as a community is producing far more “Oxygen” than “Carbon-Dioxide”. The number of people who think one extent is better or that indexes need constant rebuilding, as percentage of the whole, is dwindling and dwindling over time. More and more people appreciate the world really is a lovely round planet because they have to know. Just as they have to travel to the other side of the world, they have to maintain larger and larger databases with less and less disruption to their businesses.
Importantly, there’s also a very easy and generally very accurate way to determine the right from wrong, the wheat from the chafe.
Simply prove it.
If something within Oracle (or anything really) is right and accurate and correct, then it can generally be proven. The Jonathan Lewis and Steve Adams and Tom Kytes of this world not only accurately document how an Oracle process works and functions, they generally prove it with clear, consistent, repeatable test cases. They don’t simply say a=b they actually demonstrate and prove it.
Whereas those that deal in myths and inaccuracies by definition are unable to prove it. They makes claims and suggestions and generalisations but lack that one essential ingredient that differentiates them from those that deal with facts and technical accuracies.
And without proof, one can clearly pick out the “Information Polluters”. The next time you read something from somewhere, just ask yourself this one simple question. Are they simply stating something as fact or are they actually attempting to prove something as fact. Can that proof be copied across into your particular environment ? Is it still accurate ? Can you extend and expand on the claims and is it still accurate ?
An answer in the positive is a hell of a lot better than an answer that consists of nothing more than a “because it just is” …
Why are there so many Oracle related myths ?
Global Warming !!
Is it too late ?
Not if we meet the Oracle Kyoto Treaty and demand clean oxygen-based proofs ;)
Index Internals – Rebuilding The Truth December 11, 2007Posted by Richard Foote in Index Coalesce, Index Height, Index Internals, Index Rebuild, Index Shrink, Index statistics, Oracle Indexes, Oracle Myths, Oracle Opinion, Richard's Musings.
The issue of when to rebuild indexes crops up again and again and again. Unfortunately, so do the same incorrect, myth-filled uninspired responses which considering how important and “key” (no pun intended) indexes are to database design and performance generally, is very puzzling indeed.
In the coming days I’ll post why I believe these index related myths simply refuse to go away …
This presentation was originally written several years ago but is still as relevant today as it’s always been.
Recently updated version: Index Internals – Rebuilding The Truth