Index Scan or Full Table Scan: The “Magic” Number (Magic Dance)

Index Scan or Full Table Scan: The “Magic” Number (Magic Dance) May 12, 2008

Posted by Richard Foote in Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Myths, Oracle Opinion.
trackback

What seems like ages ago, I listed 8 things you may not have known about indexes. Although I’ve since written about many of the 8 items, I’ve yet to address the last item listed:

8. An index can potentially be the most efficient and effective may to retrieve anything between 0% and 100% of the data from a table.

A few recent posts on OTN reminded me that perhaps it’s about time I wrote something on this topic.

Generally, the question that’s commonly asked is at what point or at what percentage of data does Oracle no longer consider the use of an index and judges the Full Table Scan (FTS) as the most efficient method to retrieve the data from a table.

Basically, what’s the “magic number”, is it 1% of data, 2%, 5%, 7.5%, 15%, 42%, 50% ???

The answer unfortunately is that there is no such magic number or percentage, it all entirely depends. The way I often answer this question is by simply stating I can very easily come up with a scenario where a FTS is the most cost effective method to retrieve 1% of the data. Equally, I can very easily come up with a scenario where an index is the most cost effective method to retrieve 99% of the data.

Like I said, there is no magic number, it entirely depends on a whole list of different factors and variables.

To start, I thought I might go through the example of how a 1% cardinality result is best achieved via a FTS, highlighting why and how the Cost Based Optimizer comes to such a decision.

I’ll use a simple little scenario with nice simple numbers to make the mathematics nice and easy to follow 🙂

OK, let’s assume we have a table that has 10,000,000 rows. The table uses 100,000 table blocks to store these rows and so we have on average 100 rows per block. With an 8K block size, we’re basically looking at a table with an average row size of about 80 bytes.

Let’s say this table has an associated index with approximately 20,000 leaf blocks required to store the index entries for a particular column and the index has a blevel of 2 (or a height of 3). This basically means we can store approximately 500 index entries per block and the average index entry is about 16 bytes or so in length.

The indexed column has 100 distinct values which are evenly distributed such that each distinct value has approximately 100,000 occurrences each. The column has no NULL values.

Let’s say we write a query based on the indexed column and we’re interested in just one of the possible 100 values or approximately 1% of the data in total. For example:

SELECT * FROM bowie_table WHERE code = ‘ABCDE’;

Does the CBO choose the index or does it chose the FTS ?

Well, let’s first cost the index access path.

We begin by reading the root block and the intermediate branch block for a cost of 2.

We also need to read approximately 1% of all the index leaf blocks in order to access all the index entries of interest. So that’s 20,000 (leaf blocks) x 0.01 = 200 leaf blocks in total.

So the total cost of reading just the index is 202.

Next comes the interesting bit. How many of the 100,000 table blocks do we need to access in order to read just 1% of the data (i.e. 100,000 rows) ?

Well, the answer depends entirely on the Clustering Factor of the index or to put it another way, in how well ordered the rows in the table are in relation to the index. If the index column values of interest are all very well clustered together in the table, then we can access the required rows by visiting fewer blocks than if the index column values are evenly and randomly distributed throughout the table.

In fact, in the worst possible cases scenario, if the Clustering Factor is appalling and has a value close to the number of rows in the table (10,000,000), we may actually need to visit each and every block in the table as each block has an average of 100 rows per block and we want on average 1% or one of these rows from each and every table block.

In the best possible case scenario, with the column values perfectly clustered together and with a Clustering Factor approaching the number of blocks in the table (100,000), we may get away with only having to visit 1% of all the table blocks or just 1,000 of them.

So the Clustering Factor is a crucial variable in how costly it would be to read the table via the index. The actual table access costs therefore are simply calculated as being the selectivity of the query (0.01 in our case) multiplied by the Clustering Factor of the associated index.

In this example, the Clustering Factor is indeed appalling with a value of 10,000,000 and the table access costs are therefore calculated as 0.01 x 10,000,000 = 100,000.

So the total costs of using the index is 202 (for the index related costs) + 100,000 (to access the rows from the table) = 100,202 in total.

So what are the costs associated with the FTS ?

Well, the FTS has a number of advantages over the index scan. Firstly, as Oracle needs to process all the blocks, it can retrieve all the necessary rows by reading a specific table block just the once. However, with the index scan, Oracle may possibly need to access a specific table block multiple times in some scenarios.

Secondly, as Oracle knows it has to read each and every block, Oracle can do so with a larger “bite of the pie” each time via multiblock reads, knowing it’s not wasting resources as all blocks need to be processed anyways. Index access reads perform single block I/Os whereas a FTS can perform muiltblock I/Os at a time. In this specific example, let’s assume the effective multiple read value is 10, remember, we want to keep the arthmetic nice and simple …

Finally, a FTS can be performed in parallel, even if the table itself isn’t partitioned, which means the overall response times can be further improved and the CBO can reduce its “costs” accordingly. In this example, we won’t worry about parallel query.

So the costs of a FTS in our example is basically 1 (for the segment header) + 100,000 (table blocks) / 10 (the effective multblock read value) = 1+10,000 = 10,001.

So that’s roughly an overall cost of 100,202 for the index vs. 10,001 for the FTS.

The results are not even close with the FTS winning hands down and that’s for just 1% of the data …

A couple of final little points for now.

Firstly, the cost of just reading 1 block (for the single block index reads) vs. 10 blocks (for the multiblock FTS reads) may actually differ somewhat as multiblock reads are doing more “work” with it’s associated I/O. By default, with no parameters set and with no system statistics, the CBO will cost each I/O as being the same. More about how to possibly adjust this another time.

Also, by default the CBO will assume all associated I/Os are physical I/Os and will cost them accordingly, even if the BCHR is nice and high and the index access path in question might be accessed within (say) a nested loop join where the likelihood of many of the index related I/Os in particular being cached is very high. More on this at another time as well.

But for now, just note how in this relatively trivial example, the following factors came into play when determining the potential costs of this query:

Selectivity of the query
Data distribution with regard to the actual occurrences of the required data
Number of table blocks (below the high water mark)
Number of leaf blocks
Index Height
Average number of rows per table block
Average number of leaf entries per leaf block
Clustering Factor
Caching characteristics of index and table
Effective multiblock read count
Relative cost of single vs. multiblock I/Os
Parallelism

All of which contribute to make any single “magic number” by which Oracle will no longer consider using an index but another fairy tale in the Oracle book of myths and folklore …

Comments»

1. Amit - May 13, 2008: Hi Richard,

Could you clarify the difference between

– Data distribution with regard to the actual occurrences of the required data

And

– Clustering Factor

LikeLike

Reply
2. Richard Foote - May 13, 2008: Hi Amit

Good question, I wasn’t very clear was I 🙂

The first point was simply to do with how evenly distributed are the occurrences or frequencies of the required column values. Basically, how accurate is Oracle in determining the associated selectively and cardinality of the data, does the code value ‘ABCDE’ really occur 100,000 times or is the data skewed and the value only occurs 10,000 times. Do we need histograms to give the CBO more information or do we not.

I was just trying to make the point that even the assumption of 1% being selected is a variable that the CBO needs to get right else all its subsequent calculations on whether to use/not use an index are going to be inaccurate.

LikeLike

Reply
3. Brian Tkatch - May 13, 2008: Richard, first thing first. Wow. This helps things make a lot of sense.

I think you just condensed half a chapter on one page. It’ll take time to digest.

—

Let me get this straight, obviously all being averages:

“OK, let’s assume we have a table that has 10,000,000 rows. The table uses 100,000 table blocks to store these rows and so we have on average 100 rows per block.”

10,000,000 records/100,000 blocks= 100 records per block.

“With an 8K block size, we’re basically looking at a table with an average row size of about 80 bytes.”

8k block size = ~8,000 byte block size.
8,000 byte block size/100 records = 80 bytes per record.

“Let’s say this table has an associated index with approximately 20,000 leaf blocks required to store the index entries for a particular column and the index has a blevel of 2 (or a height of 3). This basically means we can store approximately 500 index entries per block and the average index entry”

10,000,000 records / 20,000 leaf blocks = 500 records per leaf block

“is about 16 bytes or so in length.”

8,000 byte block size / 500 records per leaf block = 16 bytes per record in a leaf block

“The indexed column has 100 distinct values which are evenly distributed such that each distinct value has approximately 100,000 occurrences each.”

10,000,000 records / 100 distinct values which are evenly distributed = 100,000 records per distinct value

“The actual table access costs therefore are simply calculated as being the selectivity of the query (0.01 in our case)”

100,000 records per distinct value / 10,000,000 records = 0.01 selectivity = 1% of the data.

multiplied by the Clustering Factor of the associated index.

LikeLike

Reply
4. Brian Tkatch - May 13, 2008: Amit,

The documentation (ALL_INDEXES is where i looked) states about clustering factor:

Indicates the amount of order of the rows in the table based on the values of the index.

* If the value is near the number of blocks, then the table is very well ordered. In this case, the index entries in a single leaf block tend to point to rows in the same data blocks.
* If the value is near the number of rows, then the table is very randomly ordered. In this case, it is unlikely that index entries in the same leaf block point to rows in the same data blocks.

(The rest i am writing for me, so I will be long. It helps me understand things.)

INDEXes are ordered. For example, an INDEX on a single COLUMN will lists all similar entries together. If they can all fit in one block (based on space left from the previous entry) they will. Similar data is not distributed throughout the entire INDEX (that would be against the entire point of the INDEX!).

TABLEs, however, store data randomly (thus the usual help of an INDEX to quickly find specific similar data). Which means, look at an ordered INDEX blocks, to know where to go to the random TABLE blocks.

Now, the records do not need to be distributed. Indeed, due to sheer randomness they may actually be next to each other. Which makes two extreme cases (and reality in the middle).

1) The data is all in the same bock.
2) The data is distributed throughout all the TABLE’s blocks.

Or, put another way:

1) The distribution of the data in the TABLE blocks is exactly like the distribution of data in the INDEX.

2) The distribution of the data in the TABLE blocks is nothing like the distribution of data in the INDEX.

Clustering Factor (CF) is an INDEX property, not a TABLE property, so each INDEX has its own CF when compared to the TABLE it indexes.

Case 1 has a low CF, because the data is “clustered” together when compared to the INDEX entries.

Case 2 has a high CF, because the data is not “clustered” together when compared to the INDEX entries.

Low CF = less TABLE blocks visited = lower I/O.
High CF = more TABLE blocks visited = higher I/O.

CF is determined when INDEXes are ANALYZEd.

Assuming i got that right, i think i finally understand it. Thanx Amit. 🙂

LikeLike

Reply
5. Richard Foote - May 13, 2008: Hi Brian

I figure if it takes longer to digest, I can take longer between posts 😉

BTW, you got things straight and the clustering factor analysis spot on !!

Proud of you 🙂

LikeLike

Reply
6. Brian Tkatch - May 13, 2008: Heh. Thanx Richard.

>I figure if it takes longer to digest, I can take longer between posts

Grr… You’ve slowed down *much* too much already.

>BTW, you got things straight and the clustering factor analysis spot on !!

Thanx. Something i’d like to know. How is it calculated?

LikeLike

Reply
7. Amit - May 14, 2008: Hi Richard,

Thanks for Clarification.

Hi Brian,

Thanks for explanation. Though I must admit that I had not put my question clearly. I felt that may be both the points were similar.

LikeLike

Reply
8. Yas - May 14, 2008: How is it calculated?
Brian, if you have Jonathan Lewis’s book Cost-Based Oracle Fundamentals, he shows the sql used to calculate the clustering factor. The idea is something like; start with CF=0, start from the first leaf block, get a rowid, look which table block that row is in, take the second rowid from the leaf block, look which table block that row is in, if the second row is in a different block increment CF to 1. If not do not increment it. Goes like this.

LikeLike

Reply
9. Karthick - May 14, 2008: This one is really interesting. So I started to dig around myself. Exploring index is always exiting.

Can you help me in understanding this behavior?

I create a table as the name suggests it’s a big table. And I started to insert data into it.

SQL> BEGIN
2 FOR i IN 1..100
3 LOOP
4 INSERT /*+ APPEND */ INTO HX_BIG_TABLE
5 SELECT *
6 FROM (SELECT 0 no,
7 o.*
8 FROM all_objects o
9 ORDER BY object_id)
10 WHERE rownum create sequence hx_my_sequence;

Sequence created.

SQL> update hx_big_table set no = hx_my_sequence.nextval;

10000000 rows updated.

SQL> commit;

Commit complete.

So here I enforce that no field is unique.

SQL> create unique index hx_big_table_idx_no on hx_big_table(no);

Index created.

SQL> create index hx_big_table_idx_object_id on hx_big_table(object_id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname => ‘SYSADM’,tabname => ‘HX_BIG_TABLE’,method_opt => ‘for all indexed columns’);

PL/SQL procedure successfully completed.

SQL> set serveroutput on
SQL> exec print_table(‘select * from user_tables where table_name = ”HX_BIG_TABLE”’);

TABLE_NAME : HX_BIG_TABLE
TABLESPACE_NAME : PSDEFAULT
CLUSTER_NAME :
IOT_NAME :
STATUS : VALID
PCT_FREE : 10
PCT_USED :
INI_TRANS : 1
MAX_TRANS : 255
INITIAL_EXTENT : 131072
NEXT_EXTENT : 131072
MIN_EXTENTS : 1
MAX_EXTENTS : 2147483645
PCT_INCREASE : 0
FREELISTS :
FREELIST_GROUPS :
LOGGING : YES
BACKED_UP : N
NUM_ROWS : 9985336
BLOCKS : 137658
EMPTY_BLOCKS : 0
AVG_SPACE : 0
CHAIN_CNT : 0
AVG_ROW_LEN : 97
AVG_SPACE_FREELIST_BLOCKS : 0
NUM_FREELIST_BLOCKS : 0
DEGREE : 1
INSTANCES : 1
CACHE : N
TABLE_LOCK : ENABLED
SAMPLE_SIZE : 488317
LAST_ANALYZED : 14-may-2008 12:04:21
PARTITIONED : NO
IOT_TYPE :
TEMPORARY : N
SECONDARY : N
NESTED : NO
BUFFER_POOL : DEFAULT
ROW_MOVEMENT : DISABLED
GLOBAL_STATS : YES
USER_STATS : NO
DURATION :
SKIP_CORRUPT : DISABLED
MONITORING : YES
CLUSTER_OWNER :
DEPENDENCIES : DISABLED
COMPRESSION : DISABLED
DROPPED : NO
—————–

PL/SQL procedure successfully completed.

SQL> validate index hx_big_table_idx_no;

Index analyzed.

SQL> exec print_table(‘select * from index_stats’);

HEIGHT : 3
BLOCKS : 22368
NAME : HX_BIG_TABLE_IDX_NO
PARTITION_NAME :
LF_ROWS : 10000000
LF_BLKS : 22131
LF_ROWS_LEN : 158888893
LF_BLK_LEN : 8000
BR_ROWS : 22130
BR_BLKS : 34
BR_ROWS_LEN : 263229
BR_BLK_LEN : 8032
DEL_LF_ROWS : 0
DEL_LF_ROWS_LEN : 0
DISTINCT_KEYS : 10000000
MOST_REPEATED_KEY : 1
BTREE_SPACE : 177321088
USED_SPACE : 159152122
PCT_USED : 90
ROWS_PER_KEY : 1
BLKS_GETS_PER_ACCESS : 4
PRE_ROWS : 0
PRE_ROWS_LEN : 0
OPT_CMPR_COUNT : 0
OPT_CMPR_PCTSAVE : 0
—————–

PL/SQL procedure successfully completed.

SQL> validate index hx_big_table_idx_object_id;

Index analyzed.

SQL> exec print_table(‘select * from index_stats’);

HEIGHT : 3
BLOCKS : 22384
NAME : HX_BIG_TABLE_IDX_OBJECT_ID
PARTITION_NAME :
LF_ROWS : 10000000
LF_BLKS : 22138
LF_ROWS_LEN : 159004200
LF_BLK_LEN : 8000
BR_ROWS : 22137
BR_BLKS : 42
BR_ROWS_LEN : 328917
BR_BLK_LEN : 8032
DEL_LF_ROWS : 0
DEL_LF_ROWS_LEN : 0
DISTINCT_KEYS : 100000
MOST_REPEATED_KEY : 100
BTREE_SPACE : 177441344
USED_SPACE : 159333117
PCT_USED : 90
ROWS_PER_KEY : 100
BLKS_GETS_PER_ACCESS : 53.5
PRE_ROWS : 0
PRE_ROWS_LEN : 0
OPT_CMPR_COUNT : 1
OPT_CMPR_PCTSAVE : 30
—————–

PL/SQL procedure successfully completed.

SQL> show parameter db_block_size

NAME TYPE VALUE
———————————— ——————————– ——————————
db_block_size integer 8192

12:49:29 SQL> show parameter multiblock

NAME TYPE VALUE
———————————— ——————————– ——————————
db_file_multiblock_read_count integer 16

SQL> explain plan for select no from hx_big_table;

Explained.

SQL> select * from table(dbms_xplan.display);

So now I did the math for FTS.

Total Block: 137658
Multi block Read: 16

Cost = Segment Header + (Total Block/Multi block Read) = 1 + (137658/16) = 8604.625

Let’s see the cost for Index Scan.

Root Block + Intermediate Branch Block: 2
Leaf Block: 22131

Cost = Root Block + Intermediate Branch Block + Leaf Block = 2 + 22131 = 22133

So based on the value oracle should go for a full table scan

PLAN_TABLE_OUTPUT
———————————————————————————-
Plan hash value: 1095470352
———————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————-
PLAN_TABLE_OUTPUT
———————————————————————————-
| 0 | SELECT STATEMENT | | 9985K| 57M| 30540 (2)| 00:06:07 |
| 1 | TABLE ACCESS FULL| HX_BIG_TABLE | 9985K| 57M| 30540 (2)| 00:06:07 |
———————————————————————————-

And it does. But the cost in the explain plan and the cost computed by me has a big difference. can you explain why is it so.

8 rows selected.

SQL> delete from plan_table;

2 rows deleted.

SQL> explain plan for select object_id from hx_big_table;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
—————————————————————————————————
Plan hash value: 2076745462
—————————————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————————
PLAN_TABLE_OUTPUT
—————————————————————————————————
| 0 | SELECT STATEMENT | | 9985K| 47M| 5107 (5)| 00:01:02 |
| 1 | INDEX FAST FULL SCAN| HX_BIG_TABLE_IDX_OBJECT_ID | 9985K| 47M| 5107 (5)| 00:01:02 |
—————————————————————————————————

8 rows selected.

Here it goes for index fast full scan. I was expecting a TABLE FULL SCAN. Why is it so? How a unique index and a no unique index are handled.

Regards,

Karthick.
http://www.karthickarp.blogspot.com/

LikeLike

Reply
10. Brian Tkatch - May 14, 2008: Amit,

Richard did not directly mention CF, so i thought i would. Once i started typing, i was talking more to me than to you. 🙂

Yas,

Thanx for the explanation. I knew it was something similar, but i got the comparison part wrong.

LikeLike

Reply
11. Richard Foote - May 14, 2008: Hi Karthick

OK, I found 2 questions in there 🙂

First, why the difference in cost with the FTS ? Well the big unknown here is what version of Oracle are you running and do you have system statistics.

If you do have system stats (which I suspect), then check out the MBRC value in sys.aux_stats$ as Oracle will use this value in it’s calculations. I would guess it’s value is around 4.5 or some such …

Note that without system stats, Oracle doesn’t actually use the 16 value but an “adjusted” value to compensate for the fact that Oracle may split up many of the 16 block multiblock reads anyways if it finds a cached block as part of the blocks it’s about to read. For 16, it gets adjusted down to something around the 10 mark.

Question 2, why the index and not the FTS ?

You’re only selecting the OBJECT_ID column which it can find directly from the index. Therefore it’s cheaper to treat the index (which only has 22384 blocks) as a “skinny table” rather than the table itself which has 137658 blocks.

Not sure what you mean by how is a unique and non-unique index handled ?

LikeLike

Reply
12. Richard Foote - May 14, 2008: Hi Brian

I believe in quality not quantity although I admit both is often nice 😉

CF, just like Yas says. Imagine a full scan of the index. Read the rowid from the first index entry and increase CF to 1. Read the rowid from the second index entry and check to see if it points to the same physical data block as the previous rowid. If it does, count remains the same, if it differs, count moves up to 2. Check the next rowid, if the same as previous rowid, count remains the same, if it differs (even if it only points back to the block from the first rowid), then the count goes up.

Repeat until you read all the rowids in the index (or the estimate thereof) and the final score is the CF.

LikeLike

Reply
13. Brian Tkatch - May 14, 2008: Hmm…

So if the the INDEX’s and TABLE’s ‘s TABLESPACE use different block sizes, that would drive up the CF. Not that it matters

Does that mean that the minimum CF = maximum(leaf blocks in INDEX, blocks in TABLE)?

LikeLike

Reply
14. Karthick - May 15, 2008: Sorry here is my version.

SQL> select * from v$version;

BANNER
—————————————————————-
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Prod
PL/SQL Release 10.2.0.1.0 – Production
CORE 10.2.0.1.0 Production
TNS for 32-bit Windows: Version 10.2.0.1.0 – Production
NLSRTL Version 10.2.0.1.0 – Production

Your suspection is correct 🙂 i dont have system stats. I will work on it.

and about this “Not sure what you mean by how is a unique and non-unique index handled ?”

I have indexed two column NO (Unique), OBJECT_ID (Non Unique).

I took explain plan for this two statements

1) SELECT no FROM hx_big_table

2) SELECT object_id FROM hx_big_table

The first one went for a full table scan and the second one for a Index fast full scan. Both have been indexed then why is this difference. Thats why i wantd to ask is Unique and Non Unique index cost are computed in different way.

Regards,

Karthick
http://www.karthickarp.blogspot.com/

LikeLike

Reply
15. Richard Foote - May 15, 2008: Hi Brian

The block size of the index makes no difference to the CF, absolutely none. That’s because the index has the same number of index entries in the same logical order regardless of the block size.

The table blocksize may make a difference if the table is well clustered on an index as a larger block size xould mean fewer blocks and possibly drive the CF down.

However, if the index column values are randomly distributed, then it may make no difference at all.

The minimum CF is basically the minimum different number of blocks referenced in the index or to put it another way, it’s the number of non-empty blocks within the table.

Note it’s possible to have substantially fewer used blocks in a table than are allocated so the CF can very possibly be far fewer than the number of blocks in a table.

LikeLike

Reply
16. Richard Foote - May 15, 2008: Hi Karthick

Sorry, for some reason your comments are treated as spam and I have to de-spam them when I spot them.

OK, now I understand !!

The problem with your first index is that the column does not have a NOT NULL constraint so Oracle can not guarantee that it can actually use the index to get all NO values.

Whereas the OBJECT_ID column does have the NOT NULL constraint.

Put a NOT NULL constraint on the NO column and Oracle will likely use the index.

LikeLike

Reply
17. Brian Tkatch - May 15, 2008: Thanx Richard!

LikeLike

Reply
18. Karthick - May 16, 2008: Thanks Richard that helped.

I tried another option by creating the index slightly in a different manner.

CREATE UNIQUE INDEX hx_big_table_idx_no ON hx_big_table(no,’ ‘);

This one also works.

Once again thanks for all your great posts.

Regards,

Karthick.
http://www.karthickarp.blogspot.com/

LikeLike

Reply
19. Richard Foote - May 16, 2008: Hi Karthick

Yes, that would do the trick as the index now ensures all null values are indexed as well. It’s useful when setting the column to NOT NULL is not appropriate.

LikeLike

Reply
20. Abhishek - July 8, 2008: My question is – does select clause have an impact on the explain for a query from table scan to index scan or vice versa using the same where clause ?

LikeLike

Reply
21. Richard Foote - July 8, 2008: Hi Abhishek

The most obvious example of where this might be true is when the select clause only selects columns that can be entirely retrieved via an index. If Oracle can avoid visiting the table, this will obviously impact the costings as the table selectively component is no longer costed.

So yes, adding a column (or removing a column) from the select list can make all the difference.

LikeLike

Reply
22. Narendra - August 20, 2008: Richard,

One of my favourite posts (and so difficult to digest that I have to read it agin and again….of course, all credit to myself for finding it “difficult”). However, it would have been great if you could have included a Demo or link to a demo, as you generally do in other posts. Or is the demo already exists somewhere and I have missed it ?
I guess I am kind of a person who understands SQL better than english…

LikeLike

Reply
23. Richard Foote - August 25, 2008: Hi Narendra

I’ve read “Lord Of The Rings” over a dozen times 😉

I’ll see what I can do about a demo, my seminar has several examples of how the CF can make all the difference.

LikeLike

Reply
24. Jane - September 10, 2008: Hi Richard,

I have a query in which I am referring a BIG table once only, however when I see the v$session_longops, this table has been scanned (FTS) multiple times due to which the query is running very slow. Can you explain what might be causing this? Can the temp tablespace size and sort area available might be putting hinderance to the optimum performance of this query?

Thanks in advance.
Jane

LikeLike

Reply
25. Richard Foote - September 11, 2008: Hi Jane

Not enough to go on I’m sorry. How is the big table being accessed, how is it being joined, what are the predicates, what are the indexes, what are there clustering factors and other stats, what’s the SQL, what’s the execution plan, what’s the selectivity of the query …

This is being caused by the CBO thinking it’s accessing the required data in the most efficient manner and it being perhaps wrong in it’s thinking 🙂

LikeLike

Reply
26. pramod - November 11, 2011: Hi Richard ,

That’s nice little research and is very useful for all the readers.
Now i have a similar situation here

I have a query which is doing very high gets , but based on table a,index stats i could not figure out why it is doing that much high IO

My query

UPDATE Test set X = :B3
, Y = :B2 WHERE A = :B1

It does 7.67E+09 gets, 2,355,142 disk read per execution.

Execution plan

Using index on Col A

Execution Plan
————————————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————————————–
| 0 | UPDATE STATEMENT | | | | 416 (100)| |
| 1 | UPDATE | TEST | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 2488 | 126K| 416 (0)| 00:00:05 |
| 3 | INDEX RANGE SCAN | IND_A | 2488 | | 22 (0)| 00:00:01 |
————————————————————————————————————–

Table stats – TEST

NUM_ROWS : 316456000
Blocks : 5690660
Size : 44GB

INdex Stats – IND_A

BLEVEL : 3
LEAF_BLOCKS : 2227000
clustering_factor : 44834000
NUM_ROWS : 300588000
Size : 20GB

Clustering factor is in between Num_blocks of table and Num_rows of tables which i feel quiet OK

I also feel looking at the stats, full table scan is better.

In any kind of access, this should not do that many high buffer gets.
What do you think the reason for this very high buffer gets.Do you think full table scan is better here ?
If so why optimizer is not picking up full table scan on it’s own ?

Thanks

LikeLike

Reply
Richard Foote - December 19, 2011: Hi Pramod

OK, the key piece of information that’s missing here is exactly how many rows are actually being updated.

The 3 possible issues that instantly spring to mind here are:

1) Although the CBO estimates that 2488 rows (approx. 0.000786% of the data) is to be updated, in actual fact a substantial higher number of rows are being updated

2) The table has triggers that performs a substantial amount of additional work

3) The table undergoes significant amount of concurrent changes, requiring much undo to the accessed during the update

Whether a FTS might be better, the easiest way is to find out and test via hints. The reason why the CBO is not choosing the FTS is simply because it comes at a cost > 416. With 5690660 table blocks, this not a surprise. However, whether 416 is an accurate cost for the index comes down primarily to the selectivity of the update and the accuracy of the various stats (including system stats).

LikeLike

Reply
27. Ajeet - March 12, 2012: Hi Richard,

I have observed that 10053 trace output for one of my queries shows the cost of an unique index scan as 0. I am not sure why, to me the cost of index access would be at least equal to the blevel of the index . can you please help me understand this.

Below is the snippet of 10053 trace file including the query :

*********************************
Number of join permutations tried: 2
*********************************
(newjo-save) [0 1 ]
Final – All Rows Plan: Best join order: 2
Cost: 342.7712 Degree: 1 Card: 596969.0000 Bytes: 5969690
Resc: 342.7712 Resc_io: 290.0000 Resc_cpu: 1223880329
Resp: 342.7712 Resp_io: 290.0000 Resc_cpu: 1223880329
kkoipt: Query block SEL$1 (#0)
******* UNPARSED QUERY IS *******
SELECT COUNT(*) “COUNT(*)” FROM “EMUDSS”.”ORDERS” “ORDERS”,”EMUDSS”.”LINEITEM” “LINEITEM” WHERE “ORDERS”.”O_ORDERKEY”=”LINEITEM”.”L_ORDERKEY”
kkoqbc-end
: call(in-use=39416, alloc=65448), compile(in-use=38488, alloc=40584)
apadrv-end: call(in-use=39416, alloc=65448), compile(in-use=39344, alloc=40584)

sql_id=52a0skswvaqvk.
Current SQL statement for this session:
explain plan for select
count(*)
from
orders,
lineitem
where
o_orderkey = l_orderkey

============
Plan Table
============
———————————————+———————————–+
| Id | Operation | Name | Rows | Bytes | Cost | Time |
———————————————+———————————–+
| 0 | SELECT STATEMENT | | | | 343 | |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
| 2 | NESTED LOOPS | | 583K | 5830K | 343 | 00:00:05 |
| 3 | INDEX FAST FULL SCAN | PK_LINEITEM| 586K | 2932K | 294 | 00:00:05 |
| 4 | INDEX UNIQUE SCAN | PK_ORDERS | 1 | 5 | 0 | |
———————————————+———————————–+
Predicate Information:
———————-
4 – access(“O_ORDERKEY”=”L_ORDERKEY”)

Content of other_xml column
===========================
db_version : 10.2.0.1
parse_schema : EMUDSS
plan_hash : 2594599139
Outline Data:
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE(‘10.2.0.1’)
ALL_ROWS
OUTLINE_LEAF(@”SEL$1″)
INDEX_FFS(@”SEL$1″ “LINEITEM”@”SEL$1” (“LINEITEM”.”L_ORDERKEY” “LINEITEM”.”L_LINENUMBER”))
INDEX(@”SEL$1″ “ORDERS”@”SEL$1” (“ORDERS”.”O_ORDERKEY”))
LEADING(@”SEL$1″ “LINEITEM”@”SEL$1” “ORDERS”@”SEL$1″)
USE_NL(@”SEL$1” “ORDERS”@”SEL$1”)
END_OUTLINE_DATA
*/

LikeLike

Reply
28. Richard Foote - April 5, 2012: Hi Ajeet

I guess the obvious thing to point out is that the blevel of an index can be 0 🙂

The other thing to note is that for small indexes, the root block is likely cached anyways and so the CBO take this into its considerations in its costings as I’ve discussed previously:

BLEVEL 1 => BLEVEL 2 (Teenage Wildlife)

LikeLike

Reply
29. optimizer basics (3) « Daniel Westermann's Blog - June 14, 2012: […] there are lots of smart people out there who spent a lot of work in describing this, for example: Richard Foote Randolf Geist John […]

LikeLike

Reply
30. chandra prakash - June 3, 2014: Hi Richard . Awesome article. Let me try to understand few things and then I would be back with my questions 🙂

LikeLike

Reply
31. Eric - February 21, 2016: Indeed, Awesome. Simple, clear and to the point.
In my SAP/Oracle environment, we are having a performance issue where Oracle is using an index Range scan and rowids to retrieve data from a 30G table. The index used is only 1 of many fields available in the table. The SQL statement is reading all fields of the table but the where-clause contains no fields from the index. I am trying to understand why Oracle is not choosing “full” table scan in case like this. While searching the answer, I came across this great article even the case stated in it is not relevant to my case..

Richard, do you have any comment on my issue?

Thanks

LikeLike

Reply
Richard Foote - March 10, 2016: Hi Eric

Not clear what might be happening. Can you post the query and execution plan ?

LikeLike

Reply
32. krishna kant shukla - May 29, 2018: Hi Richard,

First of all thanks much for putting together this concept.

I have few doubts regarding mentioned example:

1) when you say:

” We also need to read approximately 1% of all the index leaf blocks in order to access all the index entries of interest. So that’s 20,000 (leaf blocks) x 0.01 = 200 leaf blocks in total. ”

I understand there are 20,000 x 500 = 10000000 indexes present.
But when you say we need to access approx. 1% of all index leaf blocks then mathematically you saying i.e. 20000 x .01 = 200 leaf blocks which will have 100000 (200 x 500).

Now my question here is, as per above calculations does that means indexes regarding value ‘ABCDE’ are accomodated right next to each other to support the fact that in 200 leaf blocks we will meet corresponding indexes of all ‘ABCDE’ rows i.e. 1,00,000 occurences of ‘ABCDE’ ??
Isn’t there a possibility that there might be a need of 250 leaf blocks for reading all concerned indexes ??

2) When you say:

” So the Clustering Factor is a crucial variable in how costly it would be to read the table via the index. The actual table access costs therefore are simply calculated as being the selectivity of the query (0.01 in our case) multiplied by the Clustering Factor of the associated index.

In this example, the Clustering Factor is indeed appalling with a value of 10,000,000 and the table access costs are therefore calculated as 0.01 x 10,000,000 = 100,000. ”

My question here is :
How the selectivity for 10 million rows with 100 as cardinality value is coming to .01 ?
I understand it should be .001 (cardinality / No of Rows x 100) =
(100 / 10000000 x 100) = .001

Please let me know your thoughts !!
Appreciating your contribution to spreading the knowledge.

Thanks,
Krishna

LikeLike

Reply
Richard Foote - May 30, 2018: Hi krishna

First, just need to be clear with some definitions. Don’t confuse index “entries” with “indexes”. There are a total of 10M index entries in this one index.

Regarding question 1), all index entries must be ordered within an index. It’s not possible and it would make no sense for an index to have index entries that are not logically ordered (just think how an index at the back of a book is always alphabetically ordered and think how how it would be use use the index if it wasn’t).

As such, then yes all index entries with the same indexed value would all be stored together within the leaf blocks. It’s possible to 1% of the index entries to use more than 1% of the index structure (if for example, the index entries are larger than average, if there’s more free space than usual in that part of the index, etc.), but the CBO can only assume even distribution of index values across leaf blocks (the issue with averages) and so the costings reflect this.

Regarding question 2), selectivity is basically the percentage of expected resultset. If you have 100 distinct values and assuming an even distribution of rows across those values (for each specific value, you get the same % of rows returned), then the selectivity is 1/100 or 1% of data. This is represented as 0.01 as this is use to multiply the total rows in order to get the expect cardinality. So total expected rows would by 10M x 0.01 = 100,000 rows.

Hope this makes sense.

LikeLike

Reply
33. Pete - December 15, 2018: Richard,

I have a query where Oracle is choosing an index range scan (does not even visit the table) over a full table scan. The full table scan version (when I force it with a hint) is 3 times quicker.

You’ve mentioned a number of factors to consider that could point to why Oracle chooses one over another. I will see if I can dig a bit further into those.

The query I have does have some complex predicates, like wildcarded bind values, date ranges and other filters with ANDs/ORs, I assume these could cause the optimizer to get in a twist too?

LikeLike

Reply
Richard Foote - February 5, 2019: Hi Pete

Oracle picks the plan that is effectively the cheapest based on the costings it makes which are in large part due to the object/system statistics. These lowest costings may not actually result in the fastest/most efficient plan for numerous reasons, especially if these base statistics/assumptions are inaccurate.

Indeed, the query predicates can also be a factor in index selection (e.g. a leading wildcard with LIKE predicate).

Simplistically, Oracle is using your index because the CBO thinks it’s the cheapest option.

LikeLike

Reply

	PP on Costing Concatenated Indexes W…
	Zeeshan Moin on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	PP on The End Of Richard Foote…
	Ram on The Fake Index Trap (“No…
	Richard Foote on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Sayan Malakshinov on The End Of Richard Foote…

Richard Foote's Oracle Blog