Big Tables, Sorts and Indexes Quiz (Candidate) | Richard Foote's Oracle Blog

jump to navigation

Big Tables, Sorts and Indexes Quiz (Candidate) September 14, 2011

Posted by Richard Foote in Oracle Indexes, Quiz.
trackback

Following on from the previous quiz on Descending indexes. Simple scenario.

You have a huge table, 10 Million plus rows.

You have an index on a column with a NOT NULL constraint but there are various other columns in the table not included in the index.

You want to select all columns and all rows from the table so you don’t even have a WHERE condition, but you want the data returned in the order of your indexed column, e.g.

SELECT * FROM really_big_table ORDER BY id;

So you have an index on the ID column.

Question: Is it possible for the CBO to use the index or for the CBO to even consider the index in the first place ? If so, how so ? If not, why not ?

Enjoy 🙂

UPDATE:

OK, all statistics, including system statistics need to be “accurate” and give the CBO correct details on the scenario (so no cheating with manipulating statistics).

No hints allowed and it’s the ALL_ROWS optimizer_mode, as we want the best solution to retrieve every row from the 10M+ table. Again note, there is no WHERE condition in the query.

If it helps, the index can be unique and there may be as little as just 1 additional column in the table.

Comments»

1. Bernard Polarski - September 14, 2011

The CBO Could use the ‘INDEX_ASC’ hint in order to avoid the sort (possible since you said the sort column is never null). The cost would be very high as you will have a hit on the index then on the table.

However if the table has 1000 columns, maybe it is cheaper to follow the index than load every blocks before sorting in multipass.

LikeLike

Richard Foote - September 14, 2011

It’s hard to cover all that’s necessary in a simply question 🙂

At this point, I’ll just comment that you can’t add hints. Additionally, stats must be current and correct, as must any system stats. So you can’t “hide” the true facts from the CBO.

Regarding number of columns in the table, let’s say there are only 2 …

LikeLike

2. Stew Ashton - September 14, 2011

Also the FIRST_ROWS hint, since it tells the CBO to return the first rows as quickly as possible.

LikeLike

Richard Foote - September 14, 2011

Hi Stew

No hints and let’s assume All_ROWS as we truely want the cheaper solution to access all 10M+ rows …

LikeLike

3. Marcus Mönnig - September 14, 2011

Feeling adventurous and a bit needled, so I’ll give it a shot… 🙂

Why shouldn’t the CBO consider the index?

It’s all about cost: On the con side regarding using the index there are the additonal costs of doing the full index scan. On the pro side, we don’t have to access empty table blocks as we would with a FTS and of course we don’t need a sort step when using the index.

Another con would be a large clustering factor of the index, since we can’t expect a revisited block to be still in cache, so that would mean multiple “physical” reads for the same block when using the index.

All of this and more is considered by the CBO and whatever plan sums up to the lowest predicated cost is seleced for execution.

LikeLike

Richard Foote - September 14, 2011

Hi Marcus

It’s fun being adventurous isn’t it 😉

I’ll not comment any further on what you’ve said for now …

LikeLike

4. tonysleight - September 14, 2011

My small input into this topic notes that the original posting did not set the criteria that the index must be unique.
So, if the table consisted of millions of rows, but just a handful of unique IDs and a bit mapped index was used. Maybe the CBO would be inclined to favour a bit mapped index to access the table in preference to a full table scan?

LikeLike

Richard Foote - September 14, 2011

Hi Tony

Assume OLTP and B-Tree index. What if the column were unique …

LikeLike

tonysleight - September 14, 2011

So, based on your comments. What we are looking for is a scenario where the CBO would choose a B-Tree index to access every row in a large table in preference to a FTS.

The FTS would read table data based on the high water mark. So, if the big table had in the past been even bigger so big that the I/O cost in performing a FTS was greater than the index scan, then in theory there should be no reason that the index could not be selected.

LikeLike

5. Stew Ashton - September 14, 2011

My first post was a little short: I wanted to get the first lines out on the Web as quickly as possible 😉

Interesting that the comments use the word “consider” in a different sense than Richard. I think what Richard means is “does the CBO think the index might be used to provide the correct answer?”. The index can be used if it contains an entry for every row in the table, and if the index order helps answer the question.

If those conditions are met, the CBO may consider an INDEX FULL SCAN or an INDEX FULL SCAN DESCENDING followed by a table access BY INDEX ROWID.

Of course, the CBO may consider this and then reject it, but that’s another story. With the FIRST_ROWS hint, I get plans that use both UNIQUE and NONUNIQUE indexes, but not BITMAP indexes (no doubt because bitmap indexes can’t be FULL SCANned).

LikeLike

6. Martin Preiss - September 15, 2011

Hi Richard,

if there are a lot of deleted rows below the HWM the FTS will be quite expensive and in this case an INDEX FULL SCAN will get a lower cost:

-- 11.1.0.7 with noworkload system stats
drop table big_t;

create table big_t
as
with
base_data
as
(
select rownum id1
  from dual
connect by level <= 1000000
)
,
mult
as
(
select rownum id2
  from dual
connect by level <= 30
)
select rownum id
     , 'aaa' col2
  from mult
     , base_data;
     
delete from big_t
 where id > 10000000; 

alter table BIG_T modify id not null; 
 
exec dbms_stats.gather_table_stats(user, 'BIG_T') 

create unique index big_t_idx on big_t(id);

explain plan for
select *
  from big_t t
 order by id

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |    10M|    85M| 21299   (1)| 00:04:59 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_T     |    10M|    85M| 21299   (1)| 00:04:59 |
|   2 |   INDEX FULL SCAN           | BIG_T_IDX |    10M|       | 11019   (1)| 00:02:35 |
-----------------------------------------------------------------------------------------

explain plan for
select /*+ full(t) */ *
  from big_t t
 order by id;

------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    10M|    85M|       | 33353   (2)| 00:07:47 |
|   1 |  SORT ORDER BY     |       |    10M|    85M|   191M| 33353   (2)| 00:07:47 |
|   2 |   TABLE ACCESS FULL| BIG_T |    10M|    85M|       |  9268   (1)| 00:02:10 |
------------------------------------------------------------------------------------

My first idea was using I/O costing and cheating a little bit with MBRC – but deleting 20M from my 30M table was sufficient (after deleting 10M from a 20M table was not sufficient).

Regards

Martin

LikeLike

Richard Foote - September 15, 2011

Hi Martin

Nice 🙂

Both Tony and yourself have raised the issue of a table with lots of wasted space below the HWM.

What if the table was perfectly compacted, with not a single delete in sight ?

LikeLike

7. Radoslav Golian - September 15, 2011

If the table is perfectly compacted and has a perfect clustering factor (clustering factor equals cca to number of blocks of the table)
then avoiding sorting (I think it would be a disk sort for such big table) by using index may be more effective..

SQL>  create table tab as select level a, lpad('x',300,'x') b from dual connect by level < 10000000;

Table created.

SQL> execute dbms_stats.gather_table_stats(user, 'TAB');

PL/SQL procedure successfully completed.

SQL> alter table tab add constraint pk_tab primary key (a);

Table altered.


SQL>  select clustering_factor from user_indexes where index_name='PK_TAB';

CLUSTERING_FACTOR
-----------------
           434783


SQL> select blocks from user_tables where table_name='TAB';

    BLOCKS
----------
    435703

We have perfect clustering factor..

SQL> set autot trace exp;
SQL> select * from tab order by a;

Execution Plan
----------------------------------------------------------
Plan hash value: 3124583913

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    10M|  2918M|   457K  (1)| 01:31:30 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB    |    10M|  2918M|   457K  (1)| 01:31:30 |
|   2 |   INDEX FULL SCAN           | PK_TAB |    10M|       | 22307   (1)| 00:04:28 |
--------------------------------------------------------------------------------------

SQL>


SQL> select /*+full(tab)*/ * from tab order by a;

Execution Plan
----------------------------------------------------------
Plan hash value: 2037665419

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10M|  2918M|       |   752K  (1)| 02:30:29 |
|   1 |  SORT ORDER BY     |      |    10M|  2918M|  6250M|   752K  (1)| 02:30:29 |
|   2 |   TABLE ACCESS FULL| TAB  |    10M|  2918M|       | 95697   (1)| 00:19:09 |
-----------------------------------------------------------------------------------

We have a big disk sort here..

LikeLike

Richard Foote - September 19, 2011

Hi Radoslav

Excellent, well done 🙂

LikeLike

8. Narendra - September 15, 2011

Besides other factors (and assumptions) already mentioned by others (and you), is it possible that CBO’s decision to (not) use index would be influenced by PGA_AGGREGATE_TARGET setting? Or to be more specific PGA_MAX_SIZE setting?
My reasoning goes as follows:
The table is really large and all the table data is being selected. This generally tends to say don’t use index but…
The index is on a NOT NULL column which means index can be used to execute the query and produce correct results.
As it is assumed that correct statistics are in place and no hints allowed, the only (??) factors that would affect optimizer’s decision whether to use index or not would be
i) The cost of sorting the data in order to take advantage of multi-block reads as opposed to cost of reading table and index with single block reads in order to avoid the final sort
ii) If it is not a unique index, I think the clustering factor may influence the optimizer decision as others mentioned above
iii) Not sure but would the fact that 11g prefers direct path reads over cache reads affect optimizer’s decision
iv) I guess the higher degree of parallelism (on table) might influence the optimizer to use full table scan instead of index scan.

LikeLike

Saurabh Manroy - September 15, 2011

Narendra,

Precisely, values of _smm_min_size and _smm_max_size parameters are considered while evaluating cost of order by clause in (FTS + Sort) v/s index full scan.

LikeLike

Richard Foote - September 19, 2011

Narendra

Indeed, the cost of the sort is a crucial component in all this, especially if forced to go to disk.

And yes, parallelism comes into the equation as well.

Note that whether the index is unique or non-unique matters little as the CF comes into consideration in both cases. Remember we want all the rows, not just the one.

LikeLike

9. Flado - September 15, 2011

Yes, it is possible. It is a cost thing – if the cost of one full table scan + sort is bigger than the cost of one index full scan + table accesses, the CBO will use the index. Otherwise, it will not, unless hinted or otherwise persuaded to do so.
Tested on 10.2 and 11.2.

LikeLike

Richard Foote - September 19, 2011

Hi Flado

Indeed 🙂

LikeLike

10. John Brady - September 16, 2011

As a few others have kind of said already, it depends on how the Optimizer costs out the various alternatives and which is cheapest.

FTS = FTS Cost (all I/O) + Sort Cost
Sort Cost is either just CPU if the data fits in available memory, or is CPU + I/O to do a multi-pass sort.
Available memory depends on memory settings (PGA I assume).

Index = Index Full Scan Cost (all I/O) + Data Row (all I/O)
Data Row cost depends on Clustering Factor. In the worst case with poor clustering (high Clustering Factor) it would be as many I/Os as rows in the table (10 million). In the best case with the data in order and clustered on the indexed column, the number of I/Os would be the number of blocks in the table i.e. each block would only be read once, because the rows were well clustered.

So it depends on how well clustered the data is on this column, and how much memory is available. If enough memory is available then the Full Table Scan should be cheaper than the Index, as there are less disk I/Os in total (it will use multi-block reads).

This assumes that the CPU cost calculated is lower than the extra I/Os in the alternative execution plans. With the rate of CPU increase over the years, I think this would be true.

An alternative execution strategy would be available if the table was an Index Organised Table, on this column. Then the data is stored in order in the leaf blocks of the index. So an Index Full Scan could be done, and there would be no Sort because the data would be retrieved in the correct order.

Anyway, that’s my best guess.

John

LikeLike

Richard Foote - September 19, 2011

Hi John

Spot on 🙂

LikeLike

11. vishal Desai - September 17, 2011

I tried following:

Case 1:
=======

– create large table
– create unique index on id column (unique index)
– create bitmap index on flag column (flag index)

10053:

SINGLE TABLE ACCESS PATH

Oracle tried to calculate cost for following paths:

– Access path table full scan
– Access path index full scan (flag index)

OPTIMIZER STATISTICS AND COMPUTATIONS
…
****** Recost for ORDER BY (using index) ************
Access path analysis for TEST1
***************************************
SINGLE TABLE ACCESS PATH
****** trying bitmap/domain indexes ******
****** finished trying bitmap/domain indexes ******
Best:: AccessPath: TableScan

Cost: 69196.57 Degree: 1 Resp: 69196.57 Card: 14954700.00 Bytes: 200

Case 2:
=======

– create large table
– create unique index on id column (unique index)
– create bitmap index on flag column (flag index)
– Add primary key using unique index (primary index)

10053:

Oracle tried to calculate cost for following paths:

SINGLE TABLE ACCESS PATH

– Access path table full scan
– Access path index full scan (flag index)
– Access path index full scan (primary index)

OPTIMIZER STATISTICS AND COMPUTATIONS
…
****** Recost for ORDER BY (using index) ************
Access path analysis for TEST1
***************************************
****** trying bitmap/domain indexes ******
Access Path: index (FullScan)
Index: TEST1_RCN_ID
resc_io: 17007.00 resc_cpu: 3076948760
ix_sel: 1.000000 ix_sel_with_filters: 1.000000
Cost: 17159.45 Resp: 17159.45 Degree: 0
****** finished trying bitmap/domain indexes ******
Best:: AccessPath: IndexRange
Index: TEST1_RCN_ID
Cost: 1898325.96 Degree: 1 Resp: 1898325.96 Card: 14954700.00 Bytes: 200

Looks like oracle optimizer did not consider unique or bitmap (obvious) for order by but when I added primary key constraint it did consider primary key index but overall cost of index scan was high compared to table scan so it decided to do full table scan.

LikeLike

Richard Foote - September 19, 2011

Hi Vishal

Note, it was not the PK as such that was important, it was the implicit NOT NULL constraint that makes the difference.

Oracle will not consider the index to return all rows (with no predicate) without the NOT NULL constraint as the CBO can not guarantee all rows are referenced within the index.

LikeLike

12. vishal Desai - September 17, 2011

I changed primary key index leaf blocks and clustering factor to very low value using dbms_stats.set_index_stats and result is:

————————————-
| Id | Operation |
————————————-
| 0 | SELECT STATEMENT |
| 1 | TABLE ACCESS BY INDEX ROWID|
| 2 | INDEX FULL SCAN |
————————————-

LikeLike

Richard Foote - September 19, 2011

Hi Vishal

Yes, but that was cheating 😉

But importantly, it does suggest that it’s the related costs that’s important here, providing you have that NOT NULL constraint in place.

LikeLike

13. Big Tables, Sorts and Indexes Solution (Right On Mother) « Richard Foote’s Oracle Blog - September 19, 2011

[…] access path becomes viable. A poor (or average) CF, and using the index is just too expensive. Radoslav Golian has an excellent example in the comments on when an index with an excellent CF is chosen by the […]

LikeLike

14. Refresh constraints « SbhOracle - September 27, 2011

[…] Big Tables, Sorts and Indexes Quiz (Candidate) (richardfoote.wordpress.com) […]

LikeLike

Leave a comment Cancel reply

Contact Details

If you wish to contact me directly, please do so at richard.foote@bigpond.com
Richard’s Art

This slideshow requires JavaScript.

Blog Visits
- 2,850,990 hits
Recent Posts
Archives
- July 2023
- May 2023
- April 2023
- March 2023
- February 2023
- December 2022
- November 2022
- July 2022
- May 2022
- April 2022
- December 2021
- November 2021
- October 2021
- August 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- June 2020
- May 2020
- April 2020
- March 2020
- January 2020
- December 2019
- September 2019
- August 2019
- July 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- September 2016
- August 2016
- June 2016
- May 2016
- April 2016
- February 2016
- January 2016
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- February 2013
- December 2012
- November 2012
- October 2012
- September 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
Richard Foote’s Pages
Top Posts
Richard’s Categories

Richard’s Categories
Richard's Cloud

11g 12c 19c 19c New Features Automatic Indexing Autonomous Database Autonomous Data Warehouse Autonomous Transaction Processing Block Dumps CBO Clustering Factor Exadata Full Table Scans Index Access Path Index Internals Index Rebuild Index statistics Oracle Oracle19c Oracle Blog Oracle Cloud Oracle Cost Based Optimizer Oracle General Oracle Indexes Oracle Myths Oracle Statistics Partitioning Performance Tuning Richard's Musings Unique Indexes
September 2011

M T W T F S S

1 2 3 4

5 6 7 8 9 10 11

12 13 14 15 16 17 18

19 20 21 22 23 24 25

26 27 28 29 30

« Aug Oct »

Feeds
Full
Comments