jump to navigation

Fragmented Indexes Due To Large Number Of Duplicate Entries (More) October 28, 2008

Posted by Richard Foote in Fragmented Indexes, Oracle Indexes.
4 comments

Many think an index can only be “fragmented” if you have a high number of delete or update operations on the index. An index that only has inserts can’t really be fragmented as no space is “wasted” due to delete (or update) related operations, right ?

I use the term “fragmented” here in the context of having free and available space within the index structure which can not actually be used by Oracle for subsequent insert operations. The space is effectively wasted and can only usually be cleaned up after an index maintenance operation such as a rebuild or coalesce.

However, there are a number of scenarios where an index can become fragmented, despite the lack of associated delete or update operations.

One example is where a non-unique index stores large number of duplicate values, indeed so many duplicates of a specific value that it actually requires many index leaf blocks to store all occurrences of each specific value.

The first point to make here is that with Oracle B-Tree indexes, there is actually no such thing as a non-unique index. All indexes, even those defined as non-unique, are actually unique indexes “behind the covers”.

Why ?

For the simple reason that if we were to (say) delete a specific row in a table, Oracle would need to delete the associated index entry. If there were (say) 1000 occurrences of the indexed value, Oracle can’t simply delete any of the 1000 index entries, it must delete the specific index entry that has the corresponding rowid of the row being deleted.

Therefore, in order for Oracle to quickly find this associated index entry and be able to immediately find the related index leaf block of interest, all the index entries within the same index value are sorted by the rowid. If this wasn’t the case, Oracle would be forced to access the first leaf block that contains an index value of interest and be forced to search through all the duplicate index entries until it finds the specific one Oracle is after, perhaps reading many index leaf blocks in the process.

Therefore, for a specific indexed value, all index entries are sorted by the corresponding rowid. Oracle physically implements this by making the rowid an additional index column value within the index.

A unique index entry basically looks like this, courtesey of a block dump:

row#0[8016] flag: ——, lock: 2, len=20, data:(6):  01 42 1c 7a 00 00

col 0; len 11; (11):  44 41 56 49 44 20 42 4f 57 49 45

 

Note: the 6 byte rowid is not stored as an index column entry (the index entry only has the one column, starting with 0).

 

However, a corresponding non-unique index entry look like this:

 

row#0[8015] flag: ——, lock: 0, len=21

col 0; len 11; (11):  44 41 56 49 44 20 42 4f 57 49 45

col 1; len 6; (6):  01 42 1c 7a 00 00

 

Note: the rowid is treated and stored as additional index column within the index entry, and hence requires an additional byte per index entry to store the column length value. The rowid is now a second column (number 1, remembering columns start with 0) in the index.

 

This effectively makes each and every index entry unique, even for non-unique indexes, as the combination of index value and rowid must be unique for a given table.

 

Now when a new index entry is inserted into an index, Oracle is very specific where the index entry is housed, it must be housed in the index leaf block such that the order of the rowids within duplicate index entries is maintained. The rowid is basically just another column within the index and like all index columns, it must be stored and sorted in the index entry order.

 

Now here’s a funny thing with rowids. When new rows are added to a table, the rowids generally increase in value. Although this is not always the case and there are exceptions, this is generally the trend with most rowids within most tables.

 

A (restricted) rowid consists of a relative file id as the leading portion of the rowid. As a tablespace grows and new files are added to the tablespace, generally the subsequent relative files ids increase and so the subsequent rowid increases as well.

 

The next portion of the rowid is the block id. As a datafile within a tablespace fills up and allocates new extents to a table, the block ids used within the new extent increase and so the rowids increase in value as well. Also as new blocks get used within an extent and the high water mark of a table increases, the rowids increase as well.

 

Yes there are always exceptions. For example, a newly allocated extent could use space previously allocated to a dropped segment and so use block ids in a range lower than those previously allocated. But they’re the exception rather than the rule. Even with ASSM tablespaces and segments with multiple freelists/freelist groups where multiple blocks can be inserted into concurrently, the trend is still for the associated rowids to generally increase over time.

 

Now getting back to our index with lots of duplicate index values. What happens when a block is filled and a 50-50 block split eventuates ?   1/2 the entries go into one block and 1/2 the entries go into the newly allocated block and the new index entry goes where ? Well, if the block is full of nothing but the same index entry because we have lots of duplicate values and the new index entry is associated with the maximum rowid for the specific index value, it goes into the newly allocated leaf block because remember, all index entries must maintain their logical sorted order.

 

In fact, if most/all of the subsequent inserts of the duplicate value are also associated with the maximum rowid currently assigned to the index entry, all the subsequent inserts now go into the newly allocated block. The 50% of free space remaining in the previous leaf block doesn’t get used.

 

Once the newly allocated block also gets filled, again with the same duplicate index values, a 50-50 block split occurs and again, most if not all the subsequent inserts go into the newly allocated leaf block. We now have two, 1/2 empty leaf blocks that has space that can not effectively not be reused because they are filled the same index value but with rowid values less than those being currently allocated.

 

And so the cycle continues with these duplicate index entries and any other duplicate index values that span multiple leaf blocks …

 

This can easily be illustrated with the following example. First, create a simple table and index, and initially insert a row with a large index entry to prevent subsequent 90-10 block splits from occurring:

 

SQL> CREATE TABLE common_values (id NUMBER, common VARCHAR2(10));

 

Table created.

 

 

SQL> CREATE INDEX common_values_i ON common_values(common);

 

Index created.

 

 

SQL> INSERT INTO common_values VALUES (1, ‘ZZZ’);

 

1 row created.

 

 

SQL> COMMIT;

 

Commit complete.

 

Next, populate the table with lots of duplicate values:

 

 

SQL> BEGIN

  2  FOR i IN 1..90000 LOOP

  3    CASE

  4      WHEN mod(i,3) = 0 THEN INSERT INTO common_values VALUES (i, ‘AAA’);

  5      WHEN mod(i,3) = 1 THEN INSERT INTO common_values VALUES (i, ‘BBB’);

  6      WHEN mod(i,3) = 2 THEN INSERT INTO common_values VALUES (i, ‘CCC’);

  7    END CASE;

  8  END LOOP;

  9  END;

 10  /

 

 

SQL> ANALYZE INDEX common_values_i VALIDATE STRUCTURE;

 

Index analyzed.

 

Now lets see how much of the allocated index space is actually used:

 

 

SQL> SELECT BTREE_SPACE, USED_SPACE, PCT_USED FROM INDEX_STATS;

 

BTREE_SPACE USED_SPACE PCT_USED

———– ———- ——–

    2648032    1355551       52

 

As we can see, only 52% of the index is actually being used, a value much less than would be expected of most randomly inserted indexes.

 

Now of course if we had an index with few distinct values, it’s unlikely to be considered by the CBO. However, it’s not uncommon to index column values that have uneven distribution of values, with the index being used to retrieve data for those columns with relatively few occurances of specific values. In these scenarios, it’s possible to have a large portion of the index with poor index space utilisation due to this issue. It’s unlikely to impact performance because the poorly fragmented portion of the index is not usually used, but it’s something to consider if you wish to reclaim wasted index storage or you do indeed have Fast Full Index Scan operations that need addressing.

Busy Busy Busy !!! October 23, 2008

Posted by Richard Foote in Oracle Index Seminar, Richard's Musings.
4 comments

I thought I better publish something just in case anyone feared I had fallen down a cliff or something.

These’s last few weeks have been so hectic, it’s been crazy. I seriously got home tonight and noticed that some of my roses in the garden were blooming and I actually made a conscious decision to smell them and just enjoy the moment.

Then I remembered I was running late and had to help with the kids and that precious little moment just disappeared 🙂

Played in a football tournament over the weekend where I played 6 games of soccer over two days in 29C heat, so my body is suffering a bit at the moment. We won the “Saucer” and the title of “The Best of the Worst” teams so it was a somewhat successful campaign !! Scored a few goals as well 🙂

Much of my spare time is spent getting everything organised and ready for my trip to Europe next week. All the dates for my Indexing “Extravaganza” Seminar tour as part of the Oracle University Celebrity series have now been confirmed so there’s a lot of travel and a lot of presenting ahead of me over the coming weeks.

All the dates and venues can be found on my Public Appearance page and list on the right.

I’ve also been invited to present at the Finland Oracle User Group on October 30 at Helsinki so I’m now leaving a few days earlier to make it a nice round 3 week trip in total. I’m actually feeling just a little bit tired and exhausted just thinking about it all but it should be a whole lot of fun and I look forward to meeting some of you on my travels. I haven’t been to many of these places before so any good suggestions of where to eat, drink and what to see always appreciated 🙂

The blog has had to take a back seat but I’ll try and write something before I go and a few entries during my travels. Looking forward to the Christmas holidays already !!

Introducing The Latest Oracle ACE Director (Aces High) October 10, 2008

Posted by Richard Foote in Oracle ACE Director.
15 comments

I’ve just found out over the past couple days that a nomination for me to be made an Oracle ACE Director has been accepted by Oracle Corporation. It’s always pleasing to have your efforts acknowledged and appreciated by the Oracle community and so it’s really nice to be recognised in this manner by Oracle.

The Oracle ACE Program was setup a couple of years ago to publicly recognise and I guess reward and encourage people to share their technical knowledge and skills with the wider Oracle community. Anyone can be nominated to be an Oracle ACE and a committee review and accept nominations based on the candidate’s credentials. An Oracle ACE Director is someone who is not only recognised for past efforts and achievements but is also willing to commit themselves to ongoing participation and involvement in assisting the Oracle community.

For me the most exciting aspect of being an Oracle ACE Director is that it will hopefully open up opportunities to attend and participate in more events and conferences both at home and overseas. Australia is a difficult place from and to which to travel and this will hopefully enable me to “spread my wisdom” to more unsuspecting souls and to both meet and learn from more people. For example, Oracle Open World next year is now a distinct probability.

I would like to thank Daniel Morgan who nominated me and also my fellow Australian Oracle ACE Directors Marcel Kratochvil and Chris Muir who have both been really supportive in my appointment. Thanks guys.

Constraints With Oracle Total Recall (Remember A Day) October 7, 2008

Posted by Richard Foote in 11g, Constraints, Flashback, Total Recall.
15 comments

11g has many new and potentially useful features which I’m slowly investigating and evaluating.

One feature that has the potential to be particularly useful where I work is “Oracle Total Recall“. It provides the facility to automatically capture all changes to any tables you choose, in a “secure, tamper proof” manner. This enables all changes to be easily tracked and viewed for historical, auditing, security control, etc. purposes.

Thought I might “introduce” Total Recall to those of you that may not have seen it in action and to offer an initial word of warning of a little trap one can easily fall into …

One nice aspect with it all is that it’s very easy to setup and configure.

To begin with, you need to create a storage area called a “flashback data archive” where all the historical changes can be stored, e.g.:

SQL> CREATE FLASHBACK ARCHIVE bowie_archive
        TABLESPACE total_recall_ts QUOTA 500m RETENTION 1 YEAR;

You simply specify the tablespace you wish to use, how much space within the tablespace you wish to allocate for archiving purposes and a retention period to specify how long to keep the archived data.

To create a table with the “Total Recall” change tracking capability, simply specify which flashback archive area to use:

SQL> CREATE TABLE bowie_recall (id number, text varchar2(20), hist_date date) FLASHBACK ARCHIVE bowie_archive;

Table created.

All changes to the table are now automatically tracked and archived and can easily be accessed as necessary. To illustrate, I’m just going to simply insert a row into the table.

SQL> insert into bowie_recall values (1, ‘Bowie’, sysdate);

1 row created.

SQL> commit;

Commit complete.

Sometime later, I perform an update on the row:

SQL> update bowie_recall
  2  set text = ‘Ziggy’, hist_date = sysdate
  3  where id = 1;

1 row updated.

SQL> commit;

Commit complete.

OK, so the current row basically looks like thus:

SQL> select id, text, to_char(hist_date, ‘dd-mm-yyyy hh24:mi:ss’) hist_date
         from bowie_recall;

  ID TEXT     HIST_DATE
---- -------- -------------------
   1 Ziggy    06-10-2008 10:09:55

 

However, if I want to see what the row looked like at a previous point in time, I can use the AS OF TIMESTAMP clause such as this:

SQL> select id, text, to_char(hist_date, ‘dd-mm-yyyy hh24:mi:ss’) hist_date
         from bowie_recall
         as of timestamp to_timestamp(‘2008-10-06 10:07:00’,
                     ‘yyyy-mm-dd hh24:mi:ss’)
         where id = 1;

  ID TEXT     HIST_DATE
---- -------- -------------------
   1 Bowie    06-10-2008 10:06:15

 

This is what the row looked like when it was initially inserted.

If I want to look at all the different versions of the row between any specific times of interest, I can write something such as this using the VERSIONS BETWEEN clause:

SQL> select id, text, to_char(hist_date, ‘dd-mm-yyyy hh24:mi:ss’) hist_date
         from bowie_recall
         versions between timestamp
         to_timestamp(‘2008-10-06 10:07:00’, ‘yyyy-mm-dd hh24:mi:ss’)
         and sysdate
         where id = 1;

  ID TEXT     HIST_DATE
---- -------- -------------------
   1 Ziggy    06-10-2008 10:09:55
   1 Bowie    06-10-2008 10:06:15

 

It lists all versions of the rows between the two points of time …

Like I said, very easy to setup and very easy to subsequently query.

One of the disadvantages however of enabling the tracking of changes to a table is that it prevents a number of subsequent operations on the table. For example, you can’t simply drop a column:

SQL> alter table bowie_recall drop column status;
alter table bowie_recall drop column status
*
ERROR at line 1:
ORA-55610: Invalid DDL statement on history-tracked table

You also for example can’t modify a column to say add a NOT NULL constraint:

SQL> alter table bowie_recall modify text not null;
alter table bowie_recall modify text not null
*
ERROR at line 1:
ORA-55610: Invalid DDL statement on history-tracked table

Now the trap …

Oracle does however allow you to add any other type of constraint, such as say a Primary Key, without complaint:

SQL> alter table bowie_recall add primary key (id);

Table altered.

All well and good, except for when you want to now retrieve some of the historical changes from the table:

SQL> select id, text, to_char(hist_date, ‘dd-mm-yyyy hh24:mi:ss’) hist_date from bowie_recall versions between timestamp to_timestamp(‘2008-10-06 10:07:00’, ‘yyyy-mm-dd hh24:mi:ss’) and sysdate where id = 1;
select id, text, to_char(hist_date, ‘dd-mm-yyyy hh24:mi:ss’) hist_date from  bowie_recall versions between timestamp to_timestamp(‘2008-10-06 10:07:00’, ‘yyyy-mm-dd hh24:mi:ss’) and sysdate where id = 1
                                                                             *
ERROR at line 1:
ORA-01466: unable to read data – table definition has changed

Remember, one of the reasons for implementing Total Recall on a table is to implement a “tamper proof” data tracking solution and yet the addition of any constraint on a table (Primary Key, Unique Key, Foreign Key or Check Constraint) will cause you to be unable to automatically read the historical data, without directly accessing the underlining SYS_FBA_ archive tables.

Be careful with constraint management and the historical tracking of tables …

Total Recall is of course all relatively new and these sorts of early glitches are to be expected. More on Total Recall later, including possible performance implications when both modifying and querying historical data.

An Active Writing Lesson (Battle For Britain – The Letter) October 1, 2008

Posted by Richard Foote in Richard's Musings.
6 comments

Cary Millsap has written a great blog entry, A Lesson In Writing From 1944. It’s a fascinating read regarding letters written by General Eisenhower prior to the D-Day Normandy landings in dark 1944 and how they were subtly modified by the author from “Passive” to “Active” voice. Cary then very nicely discusses how the use of active voice can help make technical writings more clear and concise and how passive voice should be avoided.

It’s a great read because Cary is so very right. So often, technically “suspect” articles and writings are written in the passive voice using what Cary terms “weasel wording”, often in an attempt to hide the insecurities and doubts associated with the author’s lack of knowledge or actual expertise. So many articles from a number of serial offenders spring instantly to mind 😉

A point I would make though is that it’s actually extremely difficult to write a good active voiced technical article. As Cary mentions “When you know something, you say it. When you don’t, active voice pretty much forces you to say that”. The problem from a technical perspective, especially with regard to Oracle database technologies is that it can often be very difficult to say “it” concisely in an active voice manner and yet remain technically accurate.

Many things in Oracle are not “black and white”. Words such as “always”, “never”, “must”, “definitely” , “certainly”, etc. are often dangerous terms to use. The old “it depends”, the great excuse of the passive voiced writer is of course so very true in so many Oracle related scenarios. It’s why “weasel writing” can often be used so effectively in Oracle writings to hide an author’s inadequacies because the casual reader can appreciate and understand how difficult it is to be active voiced, to be confident enough to be clear and precise about a particular subject with the associated ownership.

However, difficult doesn’t mean impossible, not even close. Difficult simply means that a writer to use active voice effectively, needs to not only be both concise and accurate and but also have an in-depth understanding of a specific subject matter to be able to clearly state and include what conditions and caveats a statement might reasonably require. Appreciate that a bold, “simple” statement is only appropriate when the statement is “framed” appropriately based on factors such as the intended audience, the surrounding subtext, the intended overall message, etc. Understand that “vagueness” and “ambiguities”, the “safe havens” of the “passive voiced weasel”, need to be identified and adequately addressed.

And that’s the tricky bit. How to be clear, concise and above all accurate (enough at least for the intended audience) without being too simplistic and/or generalised, thereby allowing ambiguities and uncertainties to cloud or distort the intended message. If “you” have a point to make, just make it clearly without attempting to hide the point with some generalistic insurance. If “you” have an uncertainty or important caveat, make sure “you” attribute that clarification or qualification in a clear and implicit manner. For some, clouding a message or being inaccurate or simply being wrong is the unwanted and undesirable consequence of poorly written active voiced articles. Being both concise and precise without the unwanted ambiguities is the difficult balancing act of the active voiced writer.

Like I said, it ain’t easy.

For some of course, clouding the message is the intended and desirable consequence of writing a passive voiced piece. Without the camouflage, without the ambiguities and confusion, without attempting to pass on possible faults as the possible thoughts of others, the author’s insecurities and deficiencies would be exposed. And that would never do, not when the author’s intent in writing the piece might be to simply make the reader “think” the author knows what they’re talking about …

The power of using the active voice is that, although difficult, it actually makes writing quality work so much easier because it truly forces the author to consider and embed all the necessary “factors” into the writings.  It’s very difficult to be active voiced and accurate without also being precise and yet as comprehensive and as in-depth as necessary. It’s so much harder to “hide” deficiencies and uncertainties because they become so much easier to expose and counteract. They’re clearly your ideas and thought processes, not some vague generalisations that could be attributed to others or to some general consensus.

If one is clear and concise in what one says, it’s so very much easier for others to say you are wrong or inaccurate. If it comes from you, you can’t blame others for mistakes or inaccuracies. That’s why I totally agree with Cary when he says it sometimes requires courage to be active voiced because as a consequence one is so very much more exposed.

Being clear and concise and accurate is of course possible, but only if you actually know what you’re writing about, only if you’re truly confident in what you’re communicating, only if you have the actual technical knowledge to discuss a subject matter openly with the intent to share that knowledge and understanding. Writing in the active voice should be pursed because it actually makes the task of writing a quality piece of work so much easier. Perhaps most important of all, it makes the job of the reader, the intended audience, so much easier as well because a clear and concise piece of writing coming from you is so very much easier to read and comprehend and understand.

And when writing a technical piece, shouldn’t that ultimately be the name of the game ?

The next time you read something full of passive tones and ambiguities, just ask yourself why isn’t the author being clear and concise and claiming full ownership in what they’re writing. Question perhaps if they really know what it is they’re writing about ? Maybe even wonder what it might be they’re trying to hide. 😉