Reading 100% of Data Via An Index (We Are Hungry Men)

Reading 100% of Data Via An Index (We Are Hungry Men) May 20, 2008

Posted by Richard Foote in Oracle Cost Based Optimizer, Oracle Indexes, Oracle Myths.
trackback

I’ve already previously discussed how a Full Table Scan can sometimes be the most effective execution path for reading a relatively small proportion of all rows (< 1%). Thought I might quickly discuss how an index can be the most effective execution path for reading a relatively high proportion of rows.

Such as 100% of all rows.

There is no magic number or proportion from which Oracle will somehow magically switch to using a FTS. If an index access path has a lower cost, it will be selected over the more expensive FTS and the associated percentage of rows retrieved can potentially be anywhere between 0% and 100% of all rows.

Typically, reading a large percentage of rows is considered the exclusive domain of the Full Table Scan, however there are various scenarios in which the costs of reading all rows via an index is actually the cheaper alternative. Here are just a few examples.

Perhaps the most obvious example is when we’re only interested in columns that can all be found within an index. In this case, if the index segment is smaller than the parent table segment (which in most cases it is), then Oracle can treat the index as being some kinda “skinny” version of the table and perform a multi-block Index Fast Full Scan without having to subsequently visit the table segment at all.

This simple Index Fast Full Scan Demo shows how a query that only references columns in an index can use the index to retrieve 100% of all rows in the table.

Another example is where the table segment is very poorly fragmented with lots of deleted space. A FTS will need to read all table blocks below the High Water Mark (HWM), including potentially many mostly or totally empty table blocks. If these costs are excessive and the remaining rows can be more efficiently accessed via an index, in extreme cases it might be more efficient to read all rows via the index than via a FTS. Yes, the table segment should probably be reorganised via say a Move or Shrink command if the deleted space is not going to be reused any time soon, however until this has been performed, it could very well be more efficient to access 100% of data via an index.

This extreme Poorly Fragmented Table Demo highlights how an index can be most efficient in retrieving 100% of all rows in a table, if the table is badly fragmented with lots of deleted space.

Yet another example is when the index could be used to avoid a possible sort. Index entries are always stored in the order of the indexed columns (except for Reverse Key Indexes). Therefore by reading the data via an index, all data will be retrieved in the order of the index. If this order matches the specific required order due to a ORDER BY clause, then Oracle does not need to perform the sort operation. In some cases, especially when the index has an excellent Clustering Factor, it might be more efficient to retrieve 100% of all data via an index and avoid the sort than use a FTS followed by a sort.

This tiny extract from my index internals seminar shows a simple Index and Sort Demo whereby the CBO decides to use the index to retrieve 100% of all data as it prevents Oracle from having to perform an expensive sort.

You begin to get the idea …

In summary, an index can of course can be most effective when retrieving just the 1 row (0 rows even) but it can also be most effective when retrieving up to 100% of all rows in a table as these simple examples illustrate.

The next time someone asks at what point or percentage will Oracle no longer consider using an index, we now all know there is no magic number and that it all entirely depends on many many factors which ultimately determine the relative costs of all possible access paths.

Comments»

1. Brian Tkatch - May 20, 2008: “SORT BY clause”

Do you mean ORDER BY?

LikeLike

Reply
2. Richard Foote - May 20, 2008: Hi Brian

Fixed. Thanks !!

LikeLike

Reply
3. rossgoodman - May 20, 2008: Thanks, this is the perfect explanation for a conversation I was having last week !

Ross
http://www.RossGoodman.com

LikeLike

Reply
4. Robert Klemme - May 20, 2008: Once again the default answer is the best answer to a general question: “It depends.”

We once needed to squeeze out the last millisecond of some high frequency index lookups and thus created an index that covered all rows we needed. This worked very nicely indeed.

As a somewhat (un)related note: Microsoft SQL Server 2005 has “CREATE INDEX … INCLUDE (column list)” [1] which IMHO is a very nice solution because it clearly separates index key columns and non key columns.

Cheers

robert

[1] http://msdn.microsoft.com/en-us/library/ms188783.aspx

LikeLike

Reply
5. Peter Scott - May 20, 2008: @Robert – Microsoft’s version of the Index Organized Table… 🙂 but with the disadvantage of storing both the index and the table, but at least you know it is an index (which is a whinge about finding the partitions in an Oracle partitioned index organized table…)

LikeLike

Reply
6. karthickarp - May 21, 2008: Richard,

As usual very interesting stuff….

Thanks,

Karthick
http://www.karthickarp.blogspot.com/

LikeLike

Reply
7. Robert Klemme - May 26, 2008: @Peter, there is a much better approximation to an IOT in SQL Server: if you define a clustered index on a table (there can be at most one) the table is converted into a BTree. In this case there is only the index and leaf entries contain the data – there is no additional storage. Downside is that additional indexes will have to store the index keys instead of the row id.

If you’re interested in the details you can find them here:
http://msdn.microsoft.com/library/ms189051.aspx
http://msdn.microsoft.com/library/ms177484.aspx

The INCLUDE clause is really a unique and IMHO useful feature.

LikeLike

Reply
8. Gary - May 26, 2008: “The INCLUDE clause is really a unique and IMHO useful feature”
I can see the benefit where you can create an index that is unique on, say, one column but includes a second column to save on table lookups. But in Oracle this can be done with a unique constraint enforced by a nonunique index.
For a nonunique index, I don’t see much benefit. If Oracle did implement it, I guess it would mean the index entries being stored in order of column1, rowid instead of column1, column2, rowid. If column2 were regularly updated, it may mean that the index entry doesn’t move as much, so may offer a benefit in some situations.

LikeLike

Reply
9. Robert Klemme - May 30, 2008: Gary of course you can achieve similar results in Oracle. But, please do not forget documentation: with the INCLUDE clause it is crystal clear which columns are key columns and which columns are there for optimization purposes only. It is just more expressive.

LikeLike

Reply
10. Chris - July 10, 2008: using 10.2.0.3 on linux when I run the fast full scan demo as is I get the same result as Richard, changing the primary key to a unique index results in a full tablescan, any ideas why? the output I get is.

SQL> create table t as select rownum id, sysdate-mod(rownum,500) start_date, ‘BOWIE’ text from dual connect by level create unique index t_idx on t(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname =>’ODS’,tabname=>’t’,estimate_percent => null,cascade => true,method_opt=>’FOR ALL COLUMNS SIZE 1′);

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly
SQL> select ID from t;

1000000 rows selected.

Execution Plan
———————————————————-
Plan hash value: 1601196873

————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————–
| 0 | SELECT STATEMENT | | 1000K| 4882K| 744 (1)| 00:00:09 |
| 1 | TABLE ACCESS FULL| T | 1000K| 4882K| 744 (1)| 00:00:09 |
————————————————————————–

Statistics
———————————————————-
1 recursive calls
0 db block gets
69773 consistent gets
0 physical reads
0 redo size
14646874 bytes sent via SQL*Net to client
733726 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed

LikeLike

Reply
11. Richard Foote - July 11, 2008: Hi Chris

A big difference between a PK and a UK index is that NULLS are permitted with a UK index.

Therefore, in your example, there could be null values in the ID column.

And these nulls which could be present would not be in the index as Oracle does not index entirely null index entries (in a B-Tree index).

Therefore, the index can not guarantee that it will return every value of ID as it might miss out on all the possible null values.

Therefore, it can’t use the index in your query, it must use the FTS.

Adding a NOT NULL constraint (or where condition) would make it possible for Oracle to use the index again.

LikeLike

Reply
12. Chris - July 11, 2008: Knew it had to be something simple, must of been having a slow brain day, I’ve even just spent time in the last day or two explaining how to index nulls using a function based index to a developer. Forgot that just because a column doesn’t contain any null values doesn’t mean it can’t unless the constraint is there, thanks for that.

LikeLike

Reply
13. Big Tables, Sorts and Indexes Solution (Right On Mother) « Richard Foote’s Oracle Blog - September 19, 2011: […] discussed using an index to select 100% of all data before if anyone is […]

LikeLike

Reply

	PP on Costing Concatenated Indexes W…
	Zeeshan Moin on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	PP on The End Of Richard Foote…
	Ram on The Fake Index Trap (“No…
	Richard Foote on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Sayan Malakshinov on The End Of Richard Foote…

Richard Foote's Oracle Blog