Why A Segment Really Has To Be At Least 2 Blocks In Size July 20, 2009Posted by Richard Foote in Extent Management, Humour, Tablespace Management.
I just couldn’t resist this one …
In the latest Don Burleson news piece dated July 17, 2009 called “Tablespace size initial extent size” (although the referenced article itself is dated June 18, 2009), it attempts to answer the question why is there a mismatch between the uniform extent size specified in the tablespace and the actual size of extents in the tablespace.
The question uses the following basic example:
EXTENT MANAGEMENT LOCAL
UNIFORM SIZE 3K
SEGMENT SPACE MANAGEMENT MANUAL;
The uniform extent size is only a tiny 3K, this in an 8K block database. It’s suspected an initial extent would therefore be at least 8K in size but are puzzled why the initial extent is actually 16K.
The reason given by Don Burleson is nothing short of bizarre. The suggestion is that because “the default MINEXTENTS is 2″and because the uniform size of the tablespace is less than the database block size, therefore “the MINEXTENTS takes precedence and you see 16K, 8K * 2, for your initial tablespace size“.
It also mentions that “The initial size of a tablespace is also governed by the MINEXTENTS. In this case, you did not specify in in your create tablespace syntax, so the default in dba_tablespaces takes effect“.
There are of course a number of things wrong with this explanation.
Firstly, the default value of MINEXTENTS is of course not 2 but 1. This has been the default value of MINEXTENTS since I can remember (with the exception of a rollback segment which is a special case).
In fact, with regard to LMTs, Oracle doesn’t even allow a non-default MINEXTENT value when defining the tablespace as it simply doesn’t make sense to do so (or any other storage clause value for that matter). It’s therefore not specified in the create tablespace syntax because it’s simply not allowed with a LMT !!
Secondly, if one simply just attempted to create an object in such a tablespace, one would clearly see that there is indeed only one extent created by default. And as the question itself suggests, this one and only extent indeed has a size of 16K (assuming it’s in an 8K block tablespace).
First create the tablespace:
SQL> CREATE SMALLFILE
2 TABLESPACE “TEST2″
4 DATAFILE ‘c:/temp/TEST2.dbf’
5 SIZE 5M
6 EXTENT MANAGEMENT LOCAL
7 UNIFORM SIZE 3K
8 SEGMENT SPACE MANAGEMENT MANUAL
Next, create a table in the tablespace …
SQL> CREATE TABLE BABY (id number) TABLESPACE TEST2;
Let’s look at what we’re created …
SQL> select segment_name, tablespace_name, extent_id, bytes, blocks from dba_extents where segment_name = ‘BABY';
SEGMENT_N EXTENT_ID BYTES BLOCKS --------- --------- ----- ------ BABY 0 16384 2
SQL> select segment_name, tablespace_name, min_extents, bytes, blocks from dba_segments where segment_name = ‘BABY';
SEGMENT_N MIN_EXTENTS BYTES BLOCKS --------- ----------- ----- ------ BABY 1 16384 2
We of course notice that the table indeed only has 1 extent. We of course notice that the default value of MINEXTENTS is indeed just 1, not 2
And we of course notice that the extent is made up of 2 blocks for a combined total of 16K as suggested by the question.
In keeping with an index theme, we notice exactly the same characteristics if we create an index in this tablespace:
SQL> create index baby_i on baby(id) tablespace test2;
SQL> select segment_name, tablespace_name, extent_id, bytes, blocks from dba_extents where segment_name = ‘BABY_I';
SEGMENT_N EXTENT_ID BYTES BLOCKS --------- --------- ----- ------ BABY_I 0 16384 2
SQL> select segment_name, tablespace_name, min_extents, bytes, blocks from dba_segments where segment_name = ‘BABY_I';
SEGMENT_N MIN_EXTENTS BYTES BLOCKS --------- ----------- ----- ------ BABY_I 1 16384 2
Same thing. Only one extent with an initial size of 2 blocks or 16K in total (when created in an 8K tablespace).
A simple test will confirm all this. And this is the beauty of simple tests, we can use them to validate Oracle behaviour. We don’t need a large, commercial, 1000+ concurrent, 100TB database to confirm all this. A very basic test of a PC at home is all you need …
Therefore the whole suggestion that the initial extent size of 16K has anything to do with MINEXTENTS is just plain wrong, wrong, wrong.
When I used to teach all this for Oracle University, if a student on an introduction DBA Admin course didn’t know that a segment consists of one extent by default, by lunchtime on the first day, I would be disappointed.
So what’s the real explanation ?
Well, any regular reader of my blog will of course know the answer …
Every segment must have a segment header where segment information such as the extent map, freelists (for non-ASSM segments), the HWM, etc. is stored. However, it makes no sense for a segment to consist of just a segment header, it needs at least one other block in which to store actual data relevant to the segment. Therefore a segment must consist of at least 2 blocks (note for ASSM segments, we need additional blocks for the segment bitmap information to be stored as well).
It makes no sense therefore for a segment to consist of just 1 block, it must be at least 2 blocks in size as an absolute minimum.
Therefore, it makes no sense to create a 1 extent segment where the extent is only 1 block in size. If a segment must consist of at least 2 blocks in a non-ASSM tablespace, then the associated extents must also consist of at least 2 blocks.
So in summary:
1) The default MINEXTENTS is 1, not 2 (with the exception of a rollback segment)
2) The reason why an extent must consist of at least 2 blocks is because a segment must also consist of at least 2 blocks, the segment header plus an additional block
3) Don’t believe everything you read or hear, even if it’s on the “news”
Back to more interesting topics soon I promise :)
UPDATE: 23 July 2009.The article has been updated with the erroneous discussion of MINEXTENTS defaulting to 2 being the issue replaced with the details I’ve listed here and the fact an extent must be a minimum of 2 blocks for a non-ASSM LMT. There are still a number of errors left in the article, however I’ll leave those for Don to sort out. Here’s a tip though. Perhaps save so-called “Oracle news” articles for those questions that are actually answered correctly ;)
Thought I might share some thoughts regarding recoverability issues with regard to having indexes separate and stored in their own tablespace.
I’ve already discussed here how the loss of an index only tablespace would be a catastrophic event, with the database in dire straights until the indexes are recovered. Therefore the faster we can recover from the situation, the faster we can make DML statements work again, the faster we can prevent Full Table Scans from crippling database performance, the faster we can return the database to a functional state again, the better for our users and for our sanity.
One of the advantages of having indexes separate from tables and stored in their own tablespace is that we have a number of different recovery options available to us. Rather than having to perform a full tablespace or data file recovery, we can potentially simply just rebuild all the impacted indexes. Providing the base tables are available and have not been impacted by whatever catastrophic event has befallen the index tablespace, we can rebuild the indexes (in another tablespace if necessary). This will hopefully be a more simplistic, efficiently and most importantly faster method of recovering all our impacted indexes than performing an actual database recovery.
But will it really be more simplistic, efficient and faster ? The recovery advantages with having indexes in their own tablespace are often exaggerated. Let’s first take a look at an example scenario.
Let’s assume we have an index only tablespace that stores all the indexes for our application. Let’s say we have 100G worth of indexes. In a physically separate table only tablespace, let’s say we have a total of 200G worth of table data which is approximately double that of the index tablespace. Generally speaking, it’s common for indexes to not use the same amount of storage as the tables as typically not all columns are indexed. Of course it’s possible for a specific column to be indexed several times and for the index storage to exceed table storage in some cases, but not typically. There may of course be some free space in these tablespaces but let’s assume free space is minimal.
So we have 100G of indexes and 200G of tables.
Let’s also assume there’s on average 2 indexes per table, if only to keep the following arithmetic nice and simple :) Of course some tables may have many more indexes, some may just have the one index and in some rare examples there may be no indexes at all.
Now, it’s important to note that building a new index is actually a very expensive exercise. Oracle has to read all the data blocks in the base table, it has to sort the data in the order of the index entries, it has to create the index segment and write the index data, while generating undo and redo in the process.
Now that’s a lot of work …
However, in this scenario, we need to do this work for each and every index that’s in our stuffed index tablespace. Not only that, but we also need a script that can identify each of our impacted indexes, that generates the necessary index rebuild scripts (to another tablespace if necessary) and that handles any necessary constraint related issues.
In this specific scenario, we have to make Oracle and the database processes basically perform the following amount of work:
- Read approximately 400G of table related data. As we have an average of 2 indexes per table, we have to basically read each and every table an average of 2 times to build their related indexes. That’s 2 x 200G = 400G.
- Sort approximately 100G worth of index related data. Sorting is a really expensive, relatively slow process and we have 100G worth of index data that needs to be sorted.
- Write and create approximately 100G of index related segments
Note we also have to generate Data Dictionary related changes, we have to generate a bunch of undo related changes and we also (although optionally) generate lots and lots of redo.
In short, the database is being absolutely hammered during this whole process and it will take a loooong time to complete.
And this is meant to be the easy, efficient and above all fast method of recovering our indexes ?
So what is the alternate recovery strategy that this method of “simply” rebuilding all indexes is meant to protect us from.
Well, with a damaged tablespace, we basically need to perform a tablespace level recovery, restoring “just” the 100G worth of data files and applying any associated redo logs since our last backup. Depending on our backup and recovery strategy, we may actually reduce the redo logs being applied by applying incremental or cumulative backups as well.
Instead of the database slowly and laboriously having to read, process and write 6 or 7 times the amount of data (in our scenario), we can use the OS to much more efficiently copy across the index related data files.
Instead of having to script the rebuilding of all impacted indexes, literally a couple of RMAN commands will basically automatically completely restore and recover the impacted index tablespace for us.
In the scenario when only a specific data file or mount point within the tablespace has been problematic, the implications of attempting to recover the situation by simply rebuilding the indexes gets worse, much worse.
Firstly, if we are so inclined, we need to identify which indexes have at least one extent within the damaged portion of the index tablespace. We then need to entirely rebuild all these indexes, regardless of how much of the index may actually remain undamaged with other extents in undamaged portions of the index tablespace. This all takes resources, resources, resources and time, time, time.
We can’t just rebuild a part of an index (unless it’s partitioned of course) but we can recover a part of a tablespace. We can simply recover the damaged part of the tablespace, restoring and recovering just the specific data file or files, again potentially with just a few simple RMAN commands.
With small databases with small amounts of data, the time it takes to rebuild all indexes in an application may be acceptable for the business. However, in larger database environments, the extra time and resources required to rebuild large amounts of index data compared to other recovery strategies would be totally and completely unacceptable.
An exercise for those who store indexes in a separate tablespace, in large part because of the recoverability advantages. On a QA system or equivalent copy of your production database environment, go through a real exercise of attempting to recover your indexes by rebuilding them and actually time how long such a recovery process takes. Then repeat the exercise by recovering the database using a conventional database recovery technique and time the differences.
You may just come to the conclusion that rebuilding indexes may not be such a fast and efficient recovery process in many scenarios after all …
Indexes In Their Own Tablespace: Availabilty Advantages (Is There Anybody Out There?) April 28, 2008Posted by Richard Foote in Backup and Recovery, Oracle General, Oracle Indexes, Oracle Myths, Oracle Opinion, Tablespace Management.
I’ve already discussed here some reasons why performance is not particularly improved by simply separating indexes from tables and storing them in a different tablespace. It sounds like it might be helpful from a performance perspective but when one digs down a little, it turns out any so-called performance benefits are questionable to say the least.
However, performance is only one reason why it’s claimed storing indexes in their own tablespace is beneficial. There are wondrous advantages to database availability and database recovery options if only indexes are stored in their own tablespaces. The loss of all indexes due to a catastrophic disaster in the index tablespace means that the database tables themselves are all still potentially available.
This sounds as if there might be a number of advantages with this strategy, right ?
Well it means for a start that none of the “real” data has been lost. If we store indexes away from the parent tables and we only lose the index tablespace, the table tablespace could possible be totally unaffected by this loss. This potentially suggests a number of things:
- The Database will still be functional. Yes it might run a little slower without indexes but at least with the tables still available, we can still perform our business critical operations until the indexes have been fixed as the actual tables are unaffected
- We don’t actually have to perform a database recovery to get us out of this mess. So long as all the tables are still available, we can simply recover the situation by rebuilding all the indexes from the problematic tablespace. This will hopefully be more simplistic, more efficient and most importantly faster than having to perform an actual database recovery
This all sounds perfectly reasonable …
Today, I’m just going to briefly mentioned some thoughts on the first point, the second point I’ll discuss another day.
I guess the key question here (pun fully intended) is just how important and vital are indexes to the “normal” operation of a database? Is a database effectively operational if we were to lose all our indexes, is an application still effective and operational if we were to lose all indexes belonging to the application? If by storing indexes in their own tablespace, do we get availability benefits if we were to lose only the index related tablespace?
All good questions to ask and ponder about every now and then.
Let’s be clear I’m not discussing the loss or corruption of a single (or handful) of indexes. If a specific index gets corrupted for whatever reason, yes we could recover the index by (say) making the index unusable and rebuilding the index. However, we can do this whether the specific problematic index in question was stored with or separate from the parent table so the scenario doesn’t really differ much.
No, one of the (so-called) benefits of storing indexes in their own tablespace is that if we have a catastrophic issue with the index tablespace, we only lose a whole bunch of indexes. No tables are impacted, just all the indexes stored in the tablespace. However, just how well will business critical operations function without indexes in our database …
The suggestion is that things will just be a lot slower. We’ll have lots of Full Table Scans where previously we had nice efficient index related scans, but at least data can be viewed and manipulated as the actual tables themselves will still be available. Yes things will be slower and less than ideal but better than if we had stored tables and indexes together because in this scenario we would have lost both indexes and tables making the database effectively useless until recovered.
Well let’s setup a really simple scenario and see how things fair without indexes …
First, we create a simple little “parent” test table and populate it with a few rows:
SQL> create table bowie_1 (id number, name varchar2(20));
SQL> insert into bowie_1 values (1, ‘Bowie’);
1 row created.
SQL> insert into bowie_1 values (2, ‘Ziggy’);
1 row created.
SQL> insert into bowie_1 values (3, ‘Floyd’);
1 row created.
Next, we create a simple little “child” table and populate it with a few rows:
SQL> create table bowie_2 (id number, fk_value number);
SQL> insert into bowie_2 values (1,1);
1 row created.
SQL> insert into bowie_2 values (2,1);
1 row created.
SQL> insert into bowie_2 values (3,2);
1 row created.
SQL> insert into bowie_2 values (4,3);
1 row created.
SQL> insert into bowie_2 values (5,3);
1 row created.
We now add a Primary Key to the parent table which will create for us an index. Note this is the only index in this demonstration which is stored in a separate tablespace to the table:
SQL> alter table bowie_1 add constraint bowie_1_pk primary key(id) using index tablespace users;
Next we create a Foreign Key in our child table. Note this table doesn’t actually have a Primary Key (rare, not recommended but possible) and the Foreign Key has no associated index:
SQL> alter table bowie_2 add constraint bowie_2_fk foreign key(fk_value) referencing bowie_1(id);
Finally, we take the index tablespace offline to simulate a problematic index related tablespace:
SQL> alter tablespace users offline;
OK, the setup is now complete. Let’s see what life is like without our poor little index. First, let’s perform a simple query on our parent table. I’ve hinted the query to make the CBO use the index which the CBO is of course likely to do with most of our queries on most of our tables (and if the CBO doesn’t want to use the index for a specific query, the loss of an index is not going to be an issue then anyways):
SQL> select /*+ index */ * from bowie_1 where id = 1;
select /*+ index */ * from bowie_1 where id = 1
ERROR at line 1:
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: ‘C:\ORACLE\ORADATA\FLOYD\USERS01.DBF’
Now the error one may get if the index was simply corrupted or if there’s a problem or corruption at the hardware level may differ but the overall ramification will be the same. Queries that the CBO deems should use a “problematic” index will simply fall over. This is not exactly a good thing from an availability perspective …
How about inserting a new row in the parent table:
SQL> insert into bowie_1 values (4, ‘IGGY’);
insert into bowie_1 values (4, ‘IGGY’)
ERROR at line 1:
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: ‘C:\ORACLE\ORADATA\FLOYD\USERS01.DBF’
Oh yeah, that’s right. We have an index that also needs to be inserted as well. Not just any index mind you, but an index that is used to police the uniqueness of the associated PK constraint. Yes, if the problem was at the hardware level, the error message will differ but the ramifications will be the same. We will not be able to insert into the table unless the index is dropped and we can’t drop the index unless the PK constraint is dropped as well.
How about an insert into the other table that doesn’t even have an index:
SQL> insert into bowie_2 values (6, 1);
insert into bowie_2 values (6, 1)
ERROR at line 1:
ORA-00376: file 4 cannot be read at this time
ORA-01110: data file 4: ‘C:\ORACLE\ORADATA\FLOYD\USERS01.DBF’
Oh for goodness sake, what now !! Well the table has a FK that points to the table with the problematic index and we need to check to ensure the FK value actually exists in the parent table. How do we perform such a check, why by using the index on the PK column of course and yep, the index can’t currently be used. So unless we drop the FK constraint, we’re stuffed here as well …
Perhaps life isn’t so sweet without these indexes after all …
What if we make the index unusable first rather than it be simply “unavailable” or “damaged” for whatever reason:
SQL> alter index bowie_1_pk unusable;
Well, providing we’re setup to skip unusable indexes:
SQL> show parameter skip
NAME TYPE VALUE
———————————— ———– ——————————
skip_unusable_indexes boolean TRUE
We can at least now make our queries run without the use of any problematic indexes:
SQL> select /*+ index */ * from bowie_1 where id = 1;
If this table contained 100M rows, it might of course take a long long long time and if we had too many users performing too many Full Table Scans, the entire database might of course scream to a thudding halt, but yes at least we’ve now got our queries working to the point of ignoring unusable indexes.
But is a database (or application or part thereof) that performs nothing but Full Table Scans really a scenario we want to be in? Does this really help to justify the separating of indexes from our tables ? Hummm, not sure about that one …
What about our DML operations now the index is unusable, do these at least function to some degree ?
SQL> insert into bowie_1 values (4, ‘IGGY’);
insert into bowie_1 values (4, ‘IGGY’)
ERROR at line 1:
ORA-01502: index ‘BOWIE.BOWIE_1_PK’ or partition of such index is in unusable state
That’s a no for our first parent table example …
SQL> insert into bowie_2 values (6, 1);
insert into bowie_2 values (6, 1)
ERROR at line 1:
ORA-01502: index ‘BOWIE.BOWIE_1_PK’ or partition of such index is in unusable state
And that’s also a no for our child, FK table example. Oracle still needs to use the problematic PK related index to police the value in our FK column.
So what kind of database environment are we left with when the indexes from our index only tablespace becomes problematic, even with all our tables totally intact.
Well, until we make the indexes unusable, all index related queries will be falling over all over the place with database related errors. Once we go through a process of identifying all problematic indexes and making them all unusable, we’re left with a database environment that’s performing Full Table Scans all over the place. Just imagine how long it’ll now take to find the customer details of that 10G table. Just imagine the user experience on the database when that 6 table join query can only be performed with Full Table Scans. Just imagine your user concurrent activity with no associated indexes available …
The good news of course is that the tables will at least get no bigger as all inserts will fail, all deletes will fail and many of the updates will fail, except on all those tables that have no Primary Key and no Unique Key and no Foreign Key. Ummm, just how many tables do you have that have no PK or UK or FK constraint ? Not many right …
Losing an index only tablespace would be a catastrophic event, one that would ruin the day of not only the poor DBA having to recover from such a scenario but also any poor user needing to access an impacted application.
One might even argue things could be better if a tablespace containing both tables and indexes was lost if it resulted in another tablespace containing other tables and indexes still being available as at least some table/indexes would be accessible and usable in a viable manner.
Regardless, in either scenario, the database/tablespace/datafile would need to be recovered ASAP to stop user complaints flooding the help desk.
Of course having indexes in their own tablespace will help us recover from such a catastrophic scenario in a more simplistic, efficient and ultimately faster manner, right ?
Well, unfortunately, maybe not. I’ll get around to discussing this issue sometime soon …
Based on the discussions on whether there’s any performance benefit in storing indexes separately from tables, I’ve had a number of questions on how can one actually determine exactly how an index range scan is performed. How for example can one determine whether an index I/O is following by a table I/O in a serial manner.
I thought therefore it might be worth going through an example of the process one can use to find these things out for oneself. This is the same basic process I use to determine much of Oracle’s internal behaviour. Simply take an extended trace of the specific operation and study the resultant trace file …
First step is to create a sample table. The table and index doesn’t need to be particularly sophisticated, simple is usually best:
SQL> create table bowie_test (id number, value number, name varchar(20)) tablespace bowie_ts parallel 1;
SQL> create index bowie_test_id_i on bowie_test(id) tablespace users parallel 1;
SQL> create index bowie_test_value_i on bowie_test(value) tablespace users parallel 1;
The first index will be an index on a monotonically increasing value, the second index is on a randomly generated value. Notice the indexes have been created in a separate tablespace to the table so we can easily determine the associated data files.
We next need to just check which data files belong to which tablespace:
SQL> select tablespace_name, file_id from dba_data_files where tablespace_name in (‘BOWIE_TS’, ‘USERS’);
TABLESPACE_NAME FILE_ID ------------------------------ ---------- USERS 4 BOWIE_TS 7 BOWIE_TS 8
Notice how the USERS tablespace which stores the indexes has a data file ID of 4 …
Next we populate the table, the ID column with a monotonically increasing value and the VALUE column with a randomly generated number:
SQL> insert into bowie_test select rownum, dbms_random.value(0,1000000), ‘David Bowie’ from dual connect by level <= 1000000;
1000000 rows created.
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>’BOWIE_TEST’, cascade=>true, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE 1′);
PL/SQL procedure successfully completed.
Next, we flush the buffer cache to ensure all subsequent I/Os at least start off by being physical I/Os:
SQL> alter system flush buffer_cache;
We now start an extended trace of the subsequent operations in the session:
SQL> set autotrace traceonly
SQL> alter session set events ‘10046 trace name context forever, level 12′;
Now, let’s perform a large index range scan, using the random column index. Note we use a hint to ensure the execution plan uses the index:
SQL> SELECT /*+ index(b bowie_test_value_i) */ * FROM bowie_test b WHERE value BETWEEN 100000 and 200000;
99611 rows selected.
Finally, we end the extended trace of the session:
SQL> alter session set events ‘10046 trace name context off';
Following are sample lines from the generated trace file. Let’s begin by looking at the trace file, searching for the query we’ve just executed:
PARSING IN CURSOR #1 len=99 dep=0 uid=88 oct=3 lid=88 tim=27706931411 hv=2642872055 ad=’23a7d548′ sqlid=’7aynpuafsf1rr’
SELECT /*+ index(b bowie_test_value_i) */ * FROM bowie_test b WHERE value BETWEEN 100000 and 200000
END OF STMT
WAIT #1: nam=’SQL*Net message to client’ ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=12446 tim=27706931615
WAIT #1: nam=’db file sequential read’ ela= 14972 file#=4 block#=23148 blocks=1obj#=80423 tim=27706946845
WAIT #1: nam=’db file sequential read’ ela= 10979 file#=4 block#=30204 blocks=1obj#=80423 tim=27706957960
WAIT #1: nam=’db file sequential read’ ela= 33756 file#=4 block#=24802 blocks=1obj#=80423 tim=27706991842
WAIT #1: nam=’db file sequential read’ ela= 16408 file#=7 block#=121331 blocks=1obj#=80422 tim=27707008593
WAIT #1: nam=’SQL*Net message from client’ ela= 2099 driver id=1413697536 #bytes=1 p3=0 obj#=80422 tim=27707010906
WAIT #1: nam=’db file sequential read’ ela= 1661 file#=7 block#=123430 blocks=1obj#=80422 tim=27707012707
WAIT #1: nam=’SQL*Net message to client’ ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=80422 tim=27707012765
WAIT #1: nam=’db file sequential read’ ela= 10203 file#=7 block#=120965 blocks=1obj#=80422 tim=27707023006
WAIT #1: nam=’db file sequential read’ ela= 13488 file#=7 block#=122222 blocks=1obj#=80422 tim=27707036715
WAIT #1: nam=’db file sequential read’ ela= 16737 file#=7 block#=124918 blocks=1obj#=80422 tim=27707053604
WAIT #1: nam=’db file sequential read’ ela= 15888 file#=7 block#=120896 blocks=1obj#=80422 tim=27707069597
WAIT #1: nam=’db file sequential read’ ela= 11684 file#=7 block#=125491 blocks=1obj#=80422 tim=27707081569
Notice how the first 3 physical I/Os are related to the index, the root block, the intermediate branch block and the first leaf block of interest. Note the file number tells us this, as does the obj#. Also notice how each I/O only reads the one block and how each block is random and not consecutive to the previously read blocks (as illustrated by the block# value).
Notice how we next have a whole series of I/Os that correspond to the table. Again we can tell by the different file number and by the new obj#. Notice how they’re all also 1 block I/Os and how they’re all random, non consecutive physical I/Os. There are actually approximately 150 such table related I/Os until we finally reach the next index related I/O:
WAIT #1: nam=’db file sequential read’ ela= 8374 file#=4 block#=30216 blocks=1obj#=80423 tim=27708496278
WAIT #1: nam=’db file sequential read’ ela= 13180 file#=7 block#=122271 blocks=1 obj#=80422 tim=27708509565
WAIT #1: nam=’db file sequential read’ ela= 13942 file#=7 block#=124230 blocks=1 obj#=80422 tim=27708523576
WAIT #1: nam=’db file sequential read’ ela= 10766 file#=7 block#=122566 blocks=1 obj#=80422 tim=27708534450
WAIT #1: nam=’db file sequential read’ ela= 9414 file#=7 block#=124035 blocks=1 obj#=80422 tim=27708543976
WAIT #1: nam=’db file sequential read’ ela= 10800 file#=7 block#=125604 blocks=1 obj#=80422 tim=27708554874
WAIT #1: nam=’db file sequential read’ ela= 13215 file#=7 block#=121903 blocks=1 obj#=80422 tim=27708568145
WAIT #1: nam=’db file sequential read’ ela= 12977 file#=7 block#=124409 blocks=1 obj#=80422 tim=27708581221
WAIT #1: nam=’db file sequential read’ ela= 13834 file#=7 block#=121260 blocks=1 obj#=80422 tim=27708595150
Notice how this next index block which relates to the next leaf block is only a single block I/O that again is not at all consecutive to the previously read index block. In short, it’s just another random I/O.
This is then again followed by a whole series of randomly accessed table related blocks.
This whole sequence is then repeated many times for a whole series of different index leaf block, none of which are consecutive to the previously read index blocks. Here are a few of the following index related blocks reads, in between each we had a whole series of table related I/Os:
WAIT #1: nam=’db file sequential read’ ela= 16580 file#=4 block#=26300 blocks=1 obj#=80423 tim=27709706388
WAIT #1: nam=’db file sequential read’ ela= 13769 file#=4 block#=30113 blocks=1 obj#=80423 tim=27711127979
WAIT #1: nam=’db file sequential read’ ela= 12267 file#=4 block#=16476 blocks=1 obj#=80423 tim=27712662102
WAIT #1: nam=’db file sequential read’ ela= 7692 file#=4 block#=30603 blocks=1 obj#=80423 tim=27713801853
Note that the next “logical” block in an index structure is not necessarily the next “physical” block in the index segment. None of these are consecutive, they’re all effectively random Index related I/Os …
The fact all these I/Os are performed in this specific, serial sequence and the fact all these I/Os are effectively random is one of the main reasons why separating indexes from tables makes no real difference to performance.
That and the fact of course that while this select was being performed, there were concurrently a whole series of other I/Os being requested at the same time by all the other concurrent sessions in this database environment that potentially conflicted with the I/O requests in my session.
I’ll leave it to the reader to perform the same tests again with the monotonically increasing index to see how things can be somewhat different for well clustered indexes. However, contention between sessions and different processes still makes separating indexes in these more promising scenarios ultimately futile as well.
In Part I, I discussed how separating indexes from tables won’t likely improve performance as:
- Oracle moves from reading index blocks to table blocks in a sequential manner
- Most of the associated I/Os are likely to be random anyways
- Multi-User environments would result in disk contention regardless
That being said, why is it then some sites claim performance improvements after separating indexes from tables ? Previously, performance was sluggish however after moving indexes into a separate tablespace, performance appears to have picked up. Clearly then, moving indexes into a separate tablespace does improve performance, even if common sense might suggest otherwise.
Well, not quite. Here’s a scenario that’s not entirely uncommon …
Currently, an application has both tables and indexes in the same tablespace. The tablespace consists of various datafiles distributed across (say) 4 physical devices. Most database waits are I/O related with both db file sequential reads and db file scattered reads featuring heavily in performance metric reports. However, I/O performance is somewhat average with slow I/O related wait times and performance is generally suffering as a result.
Maybe, just maybe, the problem is due to having tables and indexes in the same tablespace. Perhaps if we separate the indexes away from the tables, contention will reduce, I/O wait times will decrease and database performance might improve as a result.
So we create a shining new, index only tablespace spread across (say) 4 additional physical disks and rebuild all our indexes in this new tablespace. To our relief, thankfully, performance has indeed improved. Average I/O wait times have been reduced and overall database performance has improved as a result. Despite what folks like that Richard Foote dude claims, here is clear proof and evidence of performance indeed improving, purely and simply by just separating indexes from their tables.
All we did was pull the wings off the fly and now it won’t take off after clapping our hands. Clear proof that flies go deaf when you pull off their wings …
There are of course two additional, potentially significant events that have also occurred other than just the indexes being separated from the tables.
The first one is that not only have all the indexes been moved to another tablespace, but all indexes have also been rebuilt as a consequence. Now, I’m the last person to get all excited about indexes being rebuilt, however as I’ve gone to great lengths to document, there are rare scenarios when indexes can get fragmented and may benefit from a rebuild. By moving indexes into a new tablespace, we’ve effectively rebuilt all the indexes, the (say) 99% where it wouldn’t have mattered but also the (say) 1% where it may have improved things. We have also rebuilt those indexes where there may be some temporary improvement until the index starts to flesh itself out again.
As a result, there could be all manner of related changes to execution plans and performance generally, especially related to larger index range scans and index fast full scans.
It’s not the indexes being separate from the tables that’s making some difference here, it’s the fact all the indexes have been rebuilt (especially those that were badly fragmented and accessed by large index scans).
The fly isn’t really deaf …
However, the far more significant difference we’ve also made is that we have of course just introduced 4 new physical devices into our database infrastructure. As a result, we may have significantly enhanced our I/O bandwidth and possibly reduced I/O related contention issues. All the general I/O activity related to indexes that was occurring on our initial 4 disk table/index tablespace have all been removed and are now occurring on our new, separate 4 disk index only tablespace.
But that’s a good thing right, that’s what we wanted to achieve ?
In the index range scan scenario I discussed in Part I, just note how few of the overall I/Os were related to the index. In larger index range scans where in theory separating indexes might improve performance, very few of the related physical I/O activity is actually attributed to indexes. The index would have to have an extremely low (and rare) clustering factor for index costs to be significant. In most “randomly” distributed index scans, there’s significantly more table related physical I/O activity than index activity.
By moving just the indexes into these new physical devices, we’ve just moved a whole bunch of segments that as a group incur relatively low physical I/O related activity while leaving together all those that result in the majority of physical I/Os.
Yes. we’ve reduced contention and I/O demands on the initial tablespace but as whole, we haven’t done it very well at all. Yes, we’ve reduced contention and perhaps improved performance, but we could have done it so much better. Yes, it appears separating indexes from tables has improved performance but has it really …
It’s not the separating of indexes from tables that’s improved performance, it’s the fact we’ve introduced 4 new disks and we’ve shifted some of the I/O activity away from the initial tablespace.
The fly isn’t deaf after all …
As an example, previously we had 100% of related I/O activity in the initial table/index, 4 disk tablespace. However, only (say) 20% of the activity was actually related to the indexes, 80% was attributed to all the tables. By moving all the indexes into the new, 4 disk index only tablespace, we therefore reduce the load on the initial tablespace by 20%. We now have 80% of the I/O load on 4 disks and just 20% on the other 4 disks. Yes, performance might improve as a result but we could do so much better. Currently, 4 of the disks have 20% of all segment related load on them and the other 4 disks have just 5% of all associated load.
Instead, if only we either added the 4 disks to the other 4 disk set and striped both tables and indexes across all 8 disks or moved and distributed both indexes and tables into the new 4 disk set, we might have been able to distribute load much more evenly across all 8 disks with approximately 12.5% load across each one.
By doing so, we may have improved performance by an even better and more significant amount. Conversely, by separating indexes into their own tablespace, we may actually be hurting general database performance because database performance is not optimal due to the uneven distribution of I/O related activities.
Of course, there’s a very easy way to confirm this. Look at the statistics in V$FILESTAT or look at a statspack report and carefully study the physical I/O activity in the table only and index only tablespaces and compare the results. Just how evenly distirubuted are the I/O related workloads …
Yes, there are scenarios where distributing individual segments here or there may be beneficial but the overall objective is generally to try and even out disk/spindle workloads as much as possible. Separating all indexes blindly is typically a very poor method of trying to achieve this.
If an individual query is not likely to improve by having an index in a separate tablespace and if separating indexes results in a non-uniform distribution of physical I/O activity, then you may want to start questioning whether it’s all really worth it.
Of course, database recoveries will be simplified by having indexes in their own tablespace, right ?. Ummm, I’ll tackle that myth next …
Although by no means as common as it once was, there’s still some who believe separating indexes in a different tablespace from their parent tables somehow improves performance.
The theory goes that by having indexes in their own tablespace, we can reduce overall contention issues and thereby improve the overall performance of the database.
Here are some thoughts for consideration for those who might be so inclined …
First, let’s just have a little look at the behaviour of a “typical” large scale index range scan, using an index with a height of say 3.
We first begin by accessing the root block of the index. This is a single block read which for many high accessed indexes would typically be cached and not result in a physical I/O. Next we read an intermediate branch block. This is also a single block read and is also likely to be cached if the index is heavily accessed. Regardless, it’s another index related I/O. Next we finally reach and read the first index leaf block containing the start of the index entries of interest. Again, it’s a single block I/O and again it’s index related.
So far we’ve performed 3 “random”, single block I/Os of index related blocks. If the index were in a separate tablespace, all the action would only be on the index tablespace thus far.
We next read our first table block containing the first row referenced by the first index entry of interest. This yet again is a single block I/O that could potentially be any block within the table. If the table were in a separate tablespace from the index, we would still need to perform a physical I/O (assuming the block isn’t already cached) on a “random” block within the table tablespace. If the table were in the same tablespace as the index, we again need to perform a physical I/O on a random table block. Still no difference thus far.
We next (very likely) reference the same index leaf block to determine the second row of interest. Note this block will almost certainly still be cached as it’s just been accessed. Therefore, if the index were in the same or different tablespace to the table, still no difference as there’s no associated physical I/O.
We then read the second table block of interest via a single block I/O. Unless this index has a very good clustering factor, we’re likely to read a totally different table block that could be any other block within the table. It’s extremely unlikely therefore to be the block that is physically contiguous to the block previously read. Only if the index were very well clustered, could it possibly be the same block as previously read or possibly the next logical block in the table.
However, in all these scenarios, having the table in a separate tablespace still makes no difference at this stage. We either need to perform another physical I/O on the table or we perform just a logical I/O. Even in the extremely unlikely case the next block read is physically contiguous to the previous block read, it would still be contiguous whether the index was separate or not and not be impacted by the index read activity thus far. Again, thus far it makes no real difference having the index in a separate tablespace.
We go back to the same index leaf block to determine the next row of interest and then access the next table block, which for a large randomly distributed table is again likely to be another different block. The point being we’re accessing the index and the table in a sequential fashion, reading the index, then reading the table. Reading the index and then reading the table again.
For small index scans, the index leaf block in question is likely to be the same single leaf block as a leaf block can potentially store hundreds of index entries (depending of course on block size, index row size and where within the index leaf block we logically begin to read the index entries of interest). So for small scans, it’s not going to have any real impact having indexes in a separate tablespace as we’re basically reading a few index related blocks followed by the table related blocks.
The table blocks are likely to be different blocks in a randomly distributed, poorly clustered index or possibly (although more rarely) a small sample of blocks in a well clustered index. However, in either scenario, if if we need to access just the one leaf block, it makes no difference whether the index is in a separate tablespace or not, the I/Os and so-called contention are the same regardless.
In some scenarios, Oracle can perform a prefetch step whereby it orders the index entries based on the rowids to first determine which table blocks need to be accessed, thus preventing the same table block having to be re-read several times. However, again, it makes no difference thus far if the index is in a separate tablespace or not as the I/O requirements are the same regardless.
In larger index range scans however, we might need to visit the next logical index leaf block or indeed subsequently many such index leaf blocks. Note each leaf block contains a pointer (referred to as kdxlenxt in a block dump) so Oracle can directly access the next index leaf block. If our index were in a separate tablespace and making the HUGE assumption that there’s thus far been no other activity in the index tablespace, the disk head may not have moved from where it left off after reading the last leaf block. With the indexes and tables coexisting in the same tablespace, we have very likely moved on from this location with any subsequent table related I/O activity.
Maybe now at last, finally we have a benefit in having indexes in their own tablespace …
However, reading the next index leaf block is again a single block read and most importantly is not necessarily “physically” contiguous to the previous leaf block. Remember, index leaf blocks split as part of their natural growth and the new block allocated is simply the next block available in the index freelist. Therefore the next logical index leaf block in an index structure could physically be virtually anywhere within the extents allocated to the index. When we read the next “logical” index leaf block, it does not necessarily mean it’s the next “physical” block within the index segment. It’s likely just another random, single block I/O.
That being the case, again we have no benefit in the index being in a separate tablespace. In both scenarios, we have to go scanning the disk looking for the physical location of the next index leaf block (again assuming the index leaf block isn’t already cached). This activity needs to be performed whether the index is in it’s own tablespace or not.
When we move back to read the next table block based on the first index entry from the newly accessed index leaf block, again, it’s extremely unlikely the next table block accessed will be the next contiguous block from the previously read table block. So again, we very likely need to go a hunting for the next table block on disk, regardless of it being in a separate tablespace from the index. Again, separating indexes from tables makes no real difference.
So not only do we move between index and table in a sequential manner but the actual blocks read within both the index and the table are likely to be totally random, non contiguous, single block reads.
That being the case, what are the performance benefits of storing indexes and tables separately ? How does storing indexes and tables separately actually reduce contention when most physical I/Os in both index and table segments are effectively random, single block reads ?
Now this example has just been a single index scan, performed by one user on just one index and table. The benefits therefore of separating indexes and tables even in a single user environment are somewhat “dubious”.
However, how many environments only have the one user. Not many. Most environments have lots of users, some with many hundreds, some with many thousands of concurrent users . All these users are potentially performing concurrent I/O operations, not only potentially on these very same tables and indexes but on lots of different tables and lots of different indexes within our table and index tablespaces. Even if index leaf blocks were to be physically contiguous in some cases (such as monotonically increasing indexes where this is more likely), by the time we’ve read the index leaf block, processed and read all the associated table blocks referenced by the index leaf block, the chances of there being no subsequent physical activity in the index tablespace due to another user session is virtually nil. We would still need to re-scan the disk to physically access the next index leaf block (or table block) anyways.
Add to the mix the fact many sites now use SANS, NAS, ASM etc. and what might appear to be one contiguous file could actually be physically split and spread all over the place. The whole notion of what is actually physically contiguous and what isn’t is blurred anyways.
The next time someone suggests separating indexes from table improves performance, you may just want to ask a couple of little questions; why and how ?
However, I’ll next discuss how indeed performance can improve by storing indexes in a separate tablespace. But just like our mad scientist thinking flies with no wings go deaf, I’ll explain how the performance improvement is not actually directly related to the indexes being separate from the tables.
I’ll also discuss how database recoveries are not typically helped by having indexes in a separate tablespace as often suggested.
Larger Block Index Tablespace and Small Index Scans – Performance Improvement ? (Let Down) March 31, 2008Posted by Richard Foote in Index Block Size, Oracle General, Oracle Indexes, Oracle Myths, Performance Tuning, Tablespace Management.
Thought it might be worth looking at the impact on the performance of Unique and small index range scans as typical in OLTP environments, when an index is rebuilt in a larger block tablespace.
Warning, this discussion will again primarily be an exercise in simple mathematics. However, a few little details to start the ball rolling.
I’ve already discussed how in many scenarios, by increasing the index block size, the height of the index can remain unchanged. One can’t simply assume a larger block index results in an index with a lesser height. However, as we shall see, even when indexes do reduce in height, the so-called performance benefits can be somewhat “exaggerated”.
Note also in OLTP environments, the vast majority of index related access paths are either unique scans or small index scans. We’re only interested in a specific customer, in a specific order, in the bank balances of specific accounts, with specific customers buying specific products with specific credit cards, etc. Remember also that say a block size of 8K can generally store many hundreds of index entries per index leaf block and that even a humble little 2K leaf block can often store a three figure number of index entries …
So let’s say in our first example we currently have an 8K index with a height of 3 which we rebuild in a 16K tablespace and the index height remains the same. What are the comparative costs of performing a simple index look-up.
In the 8K index, we need to read 3 x 8K (root, branch and leaf block) to read the index entry of interest for a total of 24K. However, in the new 16K index, we now need to read 3 x 16K (again, a root, branch and index leaf block) to read the index entry of interest, for a total of 48K.
That’s double the potential physical I/O if the blocks are not currently cached, that’s double the memory that needs to be used in the buffer cache to store the index blocks and that’s potentially more CPU we need to use in order to pin and process the larger index blocks.
Hummm, these larger blocks aren’t too impressive so far …
But what if we actually reduce the height after moving an index to a larger block tablespace ?
In the 8K index, we still have our 3 x 8K = 24K. If we rebuild in an 16K block tablespace and reduce it’s height by 1, we now only need to read 2 x 16K = 32K. That’s 3 consistent reads at 24K vs. 2 consistent reads at 32K. Yes, we reduce the number of consistent reads which is a good thing but we still have a larger index footprint which again means possibly more data off disk, more memory and more CPU resources to process the index blocks.
Let’s hope our 2K block index with a height of 2 reduces its height somewhat if we decide to rebuild it in 32K blocks because that’s 2 x 2K = 4K vs 2 x 32K = 64K otherwise. But even if we did, although we reduce the consistent reads, which is a good thing, that’s still 2 x 2K = 4K vs 1 x 32K = 32K.
But we can potentially reduce the height of a 2K index if rebuild in a much larger block size (in specific cases) by more than 1 level, right ?
Yes, a 4 level 2K index, 4 x 2K = 8K, might very well now only require 2 levels in a 32K index, 2 x 32K = 64K. But that’s still 64K of index data we need to access vs. 8K in the smaller block. Consistent reads reduce but our footprint for small index range scans is still larger, sometimes by a considerably amount depending on the change of block size.
You begin to see the issue …
It’s a bit like someone saying they’re going to improve the skyline somewhat and only build appartment buildings that have fewer numbers of floors than previously. But if the floors they build are 2 times or 4 times or even 16 times higher than the previous floors and they only reduce the number of floors by a moderate amount, is the building really lower ?
Have we really reduced the height of the building ?
Yes, the lift only has to stop at fewer floors which can be more “efficient” but it takes longer each time as it goes from one floor to the next. So is it really that much quicker to get to the top, or more specifically in our discussion, from the top to the bottom of the building ?
Surely though, things must run faster, performance must actually improve, things must be more efficient by having larger sized index blocks else why bother ? Why indeed …
This demo on the Performance Impact Of Small Index Scans In A Larger Index Block Size shows how performance may actually worsen, not improve if we move indexes into a larger block size. It creates an index on a well clustered column (so the index is at it’s most efficient and potentially impacts performance the most), first with an 8K block size and then with a 16K block size. The size of the index was carefully chosen so that the larger index block size did indeed reduce the height of the index (although the range of values when this was possible was actually quite limited with these block sizes). A simple PL/SQL procedure then performs a massive number of single row look-ups and the CPU and elapsed times are monitored. The results show when the associated blocks are either cached or not cached, the larger block index incurs larger CPU related costs and results in overall slower response times, despite the fact it actually has a lower height than the smaller block index.
It’s not precisely the same as a large scale environment as PL/SQL has subtle little differences and efficiencies when compared to multiple, separate transactions. Also, it’s not the specific results that are important here but the overall general approach. You can pick any index you want to investigate, you can pick whatever block sizes you may be interested in, you can monitor and benchmark by checking out specific session details or by tracing the specific sessions, you can determine what the comparative costs and response times may be and you can determine what may cause any performance differences by digging deeper into the session statistics or trace files.
No matter what index or block size you select, you will likely come to the same conclusion. The overall performance benefits of your OLTP transactions when you move indexes to a larger block tablespace will likely be “disappointing”.
If a real estate developer comes knocking on your door with a promise that the empty block around the corner will have an apartment building with only a few floors and it shouldn’t ruin the view too much, you may just want to ask them quietly exactly how many floors they’re planning to build and exactly how high each floor will be.
Else the resulting view may turn out more disappointing than you’re led to believe ;)
Store Indexes In A Larger Block Tablespace: Height Reduction 1/2 Myth (Five Foot One) March 26, 2008Posted by Richard Foote in Index Block Size, Index Height, Index Internals, Oracle General, Oracle Indexes, Oracle Myths, Tablespace Management.
A common misconception with using a larger block tablespace specifically for indexes is that this will result in a reduction in the height of indexes and hence “flatten” index structures.
However, this is only partly true.
A few little generalisations to begin with.
First, most databases out there have a default block size of 8K. I won’t go into a big discussion on what the database block size should be set to (maybe another time), however I will say most databases these days have a default block size of 8K and that it’s certainly questionable to have the database block size set to 2K.
Note also in many common platforms (e.g. Windows, Linux) the maximum block size limit is 16K. So in many environments, when we talk about moving indexes into a bigger sized block, it specifically involves moving from an 8K to a 16K block size.
Yes, in theory you could move an index from (say) a 2K block size up as high as a 32K block size but you would need to question why the default is so low to begin with and whether the upper value is actually supported in your environment.
I make this point because the difference between block sizes makes a huge difference in the probability of the index height actually being reduced.
So let’s start with an example of moving indexes between an 8K default block size to a 16K block tablespace, not least because the arithmetic is easier and extrapolate out as we go along.
So in our example, the new block size is double or 2 times that of the default one. By doubling the block size, we effectively 1/2 the number of necessary leaf blocks in the index structure. By having fewer leaf blocks we also therefore reduce the overall associated block level overheads so the actual reduction in leaf blocks could be a tad more, but we’ll say a 1/2 reduction to keep the numbers nice and simple.
Note the reduction in leaf blocks in therefore simply 1 / the ratio of block increase (1/2). Moving from a 2K block to a 32K block is 16 times larger so we’ll have approximately 1/16 the number of leaf blocks.
So how does (say) halving the number of leaf blocks impact the overall height of the index ?
We obviously can’t reduce the height of an index with a height of just 1. The index consists of just the one block so a larger block would simply mean the block having more free space.
To reduce the height of an index with a height of 2 (back to 1), we therefore must be able to store all index entries within a single block. Therefore, in the 8K to 16K example, the index can only have 2 full leaf blocks for this to be possible. If an index has 3 or more “filled” leaf blocks, the index must remain at a height of 2 as we can’t fit all the index entries into the single larger index block.
Importantly therefore, all indexes with a height of 2 with more than 2 full leaf blocks would not reduce in height by simply doubling the block size. This could very well be the vast majority of indexes at this level.
For an index with a height of 2, the index must have less full leaf blocks than the ratio of block increase for a height reduction to be possible. In our best case scenario, 2K block to 32K block, any index with more than 16 full leaf blocks would not reduce in height.
To reduce the height of an index with a height of 3 (back to 2), we must therefore be able to store all intermediate branch blocks into the one branch (root) block. When we double the block size, we therefore 1/2 the leaf blocks and 1/2 again the necessary branch blocks. Therefore the necessary branch blocks is 1/(2×2) = 1/4 that of the default block size. Therefore any index with a height of 3 that has more than 4 full intermediate branch blocks will again not reduce in height as again all the necessary branch information would not fit in one root block.
Importantly therefore, all indexes with a height of 3 with more than 4 full intermediate branch blocks would not reduce in height by simply doubling the block size. Again, this could very well be a significant proportion of all indexes at this level. Note also in many databases, the vast majority of indexes have a height of 3 or less so by simply doubling the index block size, most indexes would not reduce in height …
For an index with a height of 3, the index must have less full intermediate branch blocks than the ratio of block increase to the power of 2 for a height reduction to be possible. In our best case scenario, the 2K block to 32K block, only those indexes with more than 16×16=256 full intermediate branch blocks will reduce in height. This is therefore likely to be a far higher proportion of all such indexes.
You see the pattern …
To reduce the height of an index with a height of 4 (back to 3), we must therefore store all first level intermediate branch blocks into the one branch (root) block. When we double the block size, we therefore 1/2 the leaf blocks, 1/2 again the second level intermediate branch blocks and 1/2 again the first level intermediate branch blocks. Therefore the necessary first level intermediate branch blocks is 1/(2x2x2) = 1/8 that of the default block size. Therefore any index with a height of 4 that has more than 8 full intermediate first level branch blocks would again not reduce in height as again all the necessary first level branch information would not fit in the one root block.
Importantly therefore, all indexes with a height of 4 with more than 8 full intermediate branch blocks will not reduce in height by simply doubling the block size. However, as the index height increases, the ratio of indexes where this is likely to be the case decreases.
For our best case scenario, 2K to 32K, we now start hitting very large numbers 16x16x16=4096 so the likelihood of a index height reduction is very very high.
And so on …
The important point being that by simply doubling the index block size, in most databases, the vast majority of indexes are actually quite unlikely to reduce in height as the index needs to be within very limited size boundaries for the index height to reduce. The greater the index height however, the greater the index size boundaries whereby an index height reduction is possible.
Also, the greater the index block increase, proportionally the fewer the index blocks and so greater the likelihood of an index height reduction.
This demo on the Impact Of Block Size On Index Height illustrates that by simply doubling the index block size, the height of an index (in various sizes) rarely decreases.
One final point. With our height 4 index example, note the index can only have a maximum of 8 first level branch blocks for the height to reduce. Therefore, in effect, we’re replacing a maximum of 9 x 8K branch blocks with 1 x 16K block. If this index is frequently accessed, these 9 branch blocks are likely cached and we only need to read two of these blocks anyways for an index range scan (for a total of 16K). After the rebuild, we still need to read this block (16K again) anyways so from a purely performance perspective with regard to just simply reducing the index height, the so-called performance benefits are often very much exaggerated.
As we’ll see in the next epic episode of this series, performance can actually decrease :(
Next time someone claims moving indexes into a larger block size will decrease the height and flatten an index, remember it really does depend. In many databases, especially when the index block size is just doubled, it’s actually quite surprising just how unlikely it is for an index to actually decrease in height.
Store Indexes In A Larger Block Tablespace: The Multiblock Read Myth Part II (The Fly) March 20, 2008Posted by Richard Foote in Index Block Size, Index statistics, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Myths, Performance Tuning, Tablespace Management.
Thought I might begin by mentioning a lovely little story by Billy Verreynne in this OTN Forum Thread.
Basically a scientist is doing research on the behaviour of flies. He notes when he opens the jar lid of a trapped fly and claps his hands, the fly takes off and flies away. One day, he decides to pull the wings off the fly. When he claps his hands, the fly just sits there. No matter how loud he claps, other than a slight rocking motion, the fly just doesn’t budge.
Excitedly, he writes in his journal his latest discovery. When you pull the wings off a fly, it goes stone deaf !!
This is soooo funny because this sort of thing happens all the time in the Oracle world (and elsewhere). I see it time after time after time, we’ve all done it.
Hey, I just compressed a segment into one extent and performance improved. Conclusion, storing a segment in one extent improves performance.
Hey, I just separated my indexes from my tables and performance improved. Conclusion, separating indexes from tables improves performance.
Hey, I just moved my indexes into a larger block sized tablespace and my Index Fast Full Scan performance improved. Conclusion, a larger index block size improves Index Fast Full Scan performance.
Lots of flies with no wings. Lots of people thinking flies go deaf when they don’t have wings. An action appears to cause an effect, but does it really …
As Billy himself suggests, the fundamental reason why so many people think “flies with no wings go deaf” is that many don’t understand what’s actually going on. Many simply don’t understand the basic workings, the fundamental processes and mechanisms involved in how Oracle functions or how Oracle performs a specific operation. Tuning by observation, tuning by making change “A” without understanding all the implications of such a change and subsequently making suppositions on the results of such a change, ultimately means we have lots of people thinking flies without wings are deaf.
As I discussed in Part 1, Oracle performs exactly the same sized I/O during a multiblock read, regardless of the block size of the segment. Exactly the same. Without understanding this simple fact, one could very easily come to a wrong conclusion regarding the ramification of block sizes on multiblock reads. Without understanding flies don’t have ears or sound sensors in their wings, one could very well come to a wrong conclusion regarding the ramifications of removing the wings from a fly.
If we perform an Index Fast Full Scan and performance improves, it can’t be because associated multiblock I/Os are more efficient. A fly doesn’t have sound sensors in its wings. There must be another explanation. Conversely, in my example in Part I of this discussion, performance went worse with a larger index block size (as it did in Greg Rahn’s example on this OTN Forum Thread), but again not as a result of multiblock read performance.
So how could the performance of an Index Fast Full Scan change (for better or worse), if one simply rebuilds the index within a larger block size tablespace ? Well there are of course many possible reasons, with the two more obvious explanations being the following:
1) Most randomly inserted indexes have a PCT_USED value ranging between 70-75% as these indexes perform random 50-50 block splits that are subsequently in differing stages of being filled. By rebuilding an index (say back to a default PCTFREE of 10%) one might increase index compactness by say 15% and hence decrease the overall index size (note reduced block overheads may also reduce the index a little as well, depending on index size and differences in block sizes). The Fast Full Index Scan is the access path that potentially benefits most by defragmenting an index as the associated costs are proportional to the overall size of the index. Reducing the size of an index could therefore impact subsequent performance. However, rebuilding the index in the current block size would likely achieve a similar result (plus block overheads), compacting the index and resulting in potentially better performance (although once the index blocks begin to split again, the index would eventually return back to its previous state). Therefore, it’s not the bigger block size but the resultant defragmentation of the index that’s improved matters. The fly isn’t deaf, it just needs its wings to fly …
2) By storing an index in a larger block tablespace, the index must physically be stored on a different database file. This file could be on a faster disk, improving performance, this file could be on a faster part of the disk, improving performance, this file could be on a disk with far less disk contention, improving performance, etc. etc. It’s not the larger block size that’s improved (or worsened) performance, it’s the new physical characteristics of where the index is now stored. If one were to rebuild the index with the current block size and use the same physical characteristics of the larger block index, subsequent performance would likewise increase (or decrease). The fly isn’t deaf, it just needs its wings to fly …
There are many other possible reasons, the system was less busy when using the larger block index, more of the index was physically cached when using the larger block index, etc. etc.
Of course, an Index Fast Full Scan is rarely a scalability issue anyways. Do we really want our applications to perform hundreds of large, concurrent Index Fast Full Scans ? Tuning the application to avoid these overheads should be the focus rather than moving indexes into a larger block tablespace in the vain hope it will improve things dramatically. But that’s the topic of another discussion …
Can the performance of an Index Fast Full scan change after moving the index to a larger block size tablespace. Absolutely. However, it doesn’t necessarily mean such a change in performance is a direct result of the index having a larger block size and that multiblock read performance has improved.
It doesn’t mean moving indexes to larger block size tablespaces suddenly makes Oracle go deaf …
UPDATE: I’ve added this simple little demo that illustrates how performance improves when an index is rebuilt in a larger block tablespace. This will of course suggest to some folk that the larger block tablespace improved the performance but what actually improved things was rebuilding the fragmented index to be a more efficient structure. The larger block tablespace was not the fix, rebuilding the index was the important factor. In fact, by rebuilding the index in the original smaller block tablespace, not only do we also improve performance, but things are further improved as we reduce CPU overheads incurred by the larger block tablespace and as a result elapsed times are further improved.
One of the great myths surrounding the use of differing block sizes is that storing indexes in larger block sizes somehow dramatically improves the performance of index related multiblock reads.
Oracle performs index multiblock reads when performing an Index Fast Full Scan, when it basically treats the index structure as a skinny version of the table. It reads the entire index structure, multiple blocks at a time, “throwing away” any non leaf blocks as it stumbles across them.
The theory goes that by storing indexes in larger size blocks, we would obviously have fewer index related blocks. If we need to read the entire index, as we do with an Index Fast Full Scan, surely it must be more efficient if we have fewer, larger index blocks.
The evidence looks convincing. Here’s a link to an extract from a book by Robin Schumacher where he clearly shows a dramatic “improvement” by using an index tablespace with double the block size. In one query using an 8K block index, the consistent reads during an Index Fast Full Scan was 421. However, when the same index was recreated as a 16K block index, the same query only used 211 consistent gets, 1/2 of what it was previously.
Conclusive “proof” that the 16K block index improved performance wouldn’t you say ?
Well actually, it’s only conclusive proof that the number of consistent gets has dropped, whether it actually improves “performance” is another thing entirely.
There are a couple of little “details” that many don’t quite appreciate. The devil is always in the details …
The first point to note is that when Oracle performs a multiblock read, it uses the value in the db_file_multiblock_read_count parameter to determine how many blocks to read per multiblock read (with system statistics, Oracle itself can determine how best to set this value).
So if the db_file_multiblock_read_count value were set to say 16, Oracle will attempt to read as many as 16 blocks at a time during a multiblock read operation.
Note this value is based on the default block size of the database. So if the default block size is 8K and the db_file_multiblock_read_count is 16, Oracle will try and read 16 x 8K blocks at a time during a multiblock read operation.
However, if there’s a non-default block sized segment (say 16K), Oracle will adjust the number of blocks that are actually read during a multiblock read operation so that the maximum size of the overall multiblock read is identical to that of the default block size.
So if the db_file_multiblock_read_count is 16 and the default block size is 8K, a multiblock read of an object in a 16K tablespace will only read 8 blocks at a time (and not 16). A multiblock read of an object in a 2K tablespace will read 64 blocks at a time.
The actual size of a multiblock read therefore is identical regardless of the block size of an object within a database.
An easy way to highlight this is to simply trace a session and see the specific size of corresponding multiblock read operations.
A sample from a trace on an 8K block index (with the db_file_multiblock_read_count set to 16), performing an Index Fast Full Scan looks like this:
WAIT #1: nam=’db file scattered read’ ela= 1487 file#=8 block#=1050 blocks=16 obj#=78294 tim=615409554677
WAIT #1: nam=’db file scattered read’ ela= 1377 file#=8 block#=1066 blocks=16 obj#=78294 tim=615409557777
WAIT #1: nam=’db file scattered read’ ela= 1143 file#=8 block#=1082 blocks=16 obj#=78294 tim=615409561563
Note that Oracle is reading 16 x 8K blocks (128K) per multiblock read operation.
However, when the index is recreated in a 16K block size tablespace, the Fast Full Index Scan looks like this:
WAIT #1: nam=’db file scattered read’ ela= 1413 file#=6 block#=14 blocks=8 obj#=78296 tim=626802128684
WAIT #1: nam=’db file scattered read’ ela= 1447 file#=6 block#=22 blocks=8 obj#=78296 tim=626802131649
WAIT #1: nam=’db file scattered read’ ela= 2014 file#=6 block#=30 blocks=8 obj#=78296 tim=626802135222
Note that Oracle is now only reading 8 x 16K blocks (128K) per multiblock operation.
Both indexes are effectively doing exactly the same work, both are effectively reading up to 128K of data per multiblock read …
It’s like paying someone $50 per 1/2 hour of work and then deciding to make things more “efficient” by paying them $100 per hour of work instead. In the end, you’re still just paying them $800 for an 8 hour day’s work regardless …
Note a larger block size will have less associated block overheads with there being less actual blocks so the overall size of an index may reduce a little, depending on index size and differences in block sizes. Therefore any possible improvements will only be restricted to the potential savings in the overall index size. With many databases having default block sizes of 8k and a maximum block size restricted to 16k, these savings may be minimum or non-existent.
This demo on the impact of different block sizes on multiblock read operations shows how Oracle actually performs the same sized reads when performing multiblock reads from differing block sized tablespaces, with the performance of the index in the larger block size tablespace being somewhat worse in this specific example.
With Oracle effectively performing identical work behind the scenes, the performance between different block size tablespaces is likely to be similar. You’re still paying $800 a day regardless …
Although it’s often claimed that multiblock reads is one of the key areas where larger index block sizes are beneficial, a claim based generally on the simplistic fact the number of consistent reads is reduced, the reality of the situation is somewhat different …
A suggestion that seems to pop up on a routine basis on various forums and discussion boards is that we should be storing our Indexes in a larger block-size tablespace. For example, if our database block size is set to 8K, we should be creating separate (say) 16K block tablespaces specifically for our indexes. Doing so will improve performance as the index will have a flatter, more efficient structure. Multiblock reads will also be more efficient (or so the theory goes) as we would be reading fewer index blocks during such scans.
Oracle introduced the concept of having different tablespaces in a database with different block sizes back in 9i Release 1 in order to make transportable tablespaces between databases with differing block sizes possible. However, there’s nothing preventing one creating a new tablespace with a non-default block size and assigning objects to these tablespaces.
In principle, storing indexes (in particular) in a larger block size sounds like a really good idea doesn’t it ?
I’ll be discussing the pros and cons of this approach in future postings but just some initial thoughts to get everyone thinking about it:
All tablespaces with a non-default block size requires a separate, non-default block size buffer cache to be manually configured
Non-Default buffer caches are not automatically sized as part of Oracle’s automatic memory management and must be manually tuned and sized, potentially increasing administrative overheads
Non-Default buffer caches do not have an associated KEEP or RECYCLED pool and so all objects with the same non-default block size must reside in the same buffer cache
The possibility of unnecessarily caching blocks from an infrequently accessed object and wasting memory is therefore likely to increase
The possibility of unnecessarily aging out blocks from a more frequently accessed object is also likely to increase, thus increasing I/O related overheads
Although the height of an index may reduce if stored in a larger block size, in many cases it may not actually change at all
In those cases when the height of an index is actually reduced, the actual performance benefit of such a height reduction is often overstated
The reduction of index leaf blocks (a much more telling possible advantage) is only beneficial to very specific types of queries
Larger blocks often have the disadvantage of greater contention, which can lead to performance related issues
Indexes with larger block sizes have a significantly greater I/O and memory related footprint in relation to most OLTP related index scans
Index related multiblock reads on larger block sized segments actually have no real benefit when compared to multiblock reads on smaller block sized segments
Most databases out there in the “real world” only use default block size tablespaces so the risks associated with finding bugs, CBO anomalies, etc. increase once non-default block sizes are introduced
Although in specific scenarios with specific applications, there may be some potential benefits of using non-default blocks sizes, in general, the disadvantages of using non-default block sizes usually out weigh these potential benefits.
As we shall see …