jump to navigation

Big Tables, Sorts and Indexes Quiz (Candidate) September 14, 2011

Posted by Richard Foote in Oracle Indexes, Quiz.
trackback

Following on from the previous quiz on Descending indexes. Simple scenario.

You have a huge table, 10 Million plus rows.

You have an index on a column with a NOT NULL constraint but there are various other columns in the table not included in the index.

You want to select all columns and all rows from the table so you don’t even have a  WHERE condition, but you want the data returned in the order of your indexed column, e.g.

SELECT * FROM really_big_table ORDER BY id;

So you have an index on the ID column.

Question: Is it possible for the CBO to use the index or for the CBO to even consider the index in the first place ? If so, how so ? If not, why not ?

Enjoy 🙂

 

UPDATE:

OK, all statistics, including system statistics need to be “accurate” and give the CBO correct details on the scenario (so no cheating with manipulating statistics).

No hints allowed and it’s the  ALL_ROWS optimizer_mode, as we want the best solution to retrieve every row from the 10M+ table. Again note, there is no WHERE condition in the query.

If it helps, the index can be unique and there may be as little as just 1 additional column in the table.

Comments»

1. Bernard Polarski - September 14, 2011

The CBO Could use the ‘INDEX_ASC’ hint in order to avoid the sort (possible since you said the sort column is never null). The cost would be very high as you will have a hit on the index then on the table.

However if the table has 1000 columns, maybe it is cheaper to follow the index than load every blocks before sorting in multipass.

Like

Richard Foote - September 14, 2011

It’s hard to cover all that’s necessary in a simply question 🙂

At this point, I’ll just comment that you can’t add hints. Additionally, stats must be current and correct, as must any system stats. So you can’t “hide” the true facts from the CBO.

Regarding number of columns in the table, let’s say there are only 2 …

Like

2. Stew Ashton - September 14, 2011

Also the FIRST_ROWS hint, since it tells the CBO to return the first rows as quickly as possible.

Like

Richard Foote - September 14, 2011

Hi Stew

No hints and let’s assume All_ROWS as we truely want the cheaper solution to access all 10M+ rows …

Like

3. Marcus Mönnig - September 14, 2011

Feeling adventurous and a bit needled, so I’ll give it a shot… 🙂

Why shouldn’t the CBO consider the index?

It’s all about cost: On the con side regarding using the index there are the additonal costs of doing the full index scan. On the pro side, we don’t have to access empty table blocks as we would with a FTS and of course we don’t need a sort step when using the index.

Another con would be a large clustering factor of the index, since we can’t expect a revisited block to be still in cache, so that would mean multiple “physical” reads for the same block when using the index.

All of this and more is considered by the CBO and whatever plan sums up to the lowest predicated cost is seleced for execution.

Like

Richard Foote - September 14, 2011

Hi Marcus

It’s fun being adventurous isn’t it 😉

I’ll not comment any further on what you’ve said for now …

Like

4. tonysleight - September 14, 2011

My small input into this topic notes that the original posting did not set the criteria that the index must be unique.
So, if the table consisted of millions of rows, but just a handful of unique IDs and a bit mapped index was used. Maybe the CBO would be inclined to favour a bit mapped index to access the table in preference to a full table scan?

Like

Richard Foote - September 14, 2011

Hi Tony

Assume OLTP and B-Tree index. What if the column were unique …

Like

tonysleight - September 14, 2011

So, based on your comments. What we are looking for is a scenario where the CBO would choose a B-Tree index to access every row in a large table in preference to a FTS.

The FTS would read table data based on the high water mark. So, if the big table had in the past been even bigger so big that the I/O cost in performing a FTS was greater than the index scan, then in theory there should be no reason that the index could not be selected.

Like

5. Stew Ashton - September 14, 2011

My first post was a little short: I wanted to get the first lines out on the Web as quickly as possible 😉

Interesting that the comments use the word “consider” in a different sense than Richard. I think what Richard means is “does the CBO think the index might be used to provide the correct answer?”. The index can be used if it contains an entry for every row in the table, and if the index order helps answer the question.

If those conditions are met, the CBO may consider an INDEX FULL SCAN or an INDEX FULL SCAN DESCENDING followed by a table access BY INDEX ROWID.

Of course, the CBO may consider this and then reject it, but that’s another story. With the FIRST_ROWS hint, I get plans that use both UNIQUE and NONUNIQUE indexes, but not BITMAP indexes (no doubt because bitmap indexes can’t be FULL SCANned).

Like

6. Martin Preiss - September 15, 2011

Hi Richard,

if there are a lot of deleted rows below the HWM the FTS will be quite expensive and in this case an INDEX FULL SCAN will get a lower cost:

-- 11.1.0.7 with noworkload system stats
drop table big_t;

create table big_t
as
with
base_data
as
(
select rownum id1
  from dual
connect by level <= 1000000
)
,
mult
as
(
select rownum id2
  from dual
connect by level <= 30
)
select rownum id
     , 'aaa' col2
  from mult
     , base_data;
     
delete from big_t
 where id > 10000000; 

alter table BIG_T modify id not null; 
 
exec dbms_stats.gather_table_stats(user, 'BIG_T') 

create unique index big_t_idx on big_t(id);

explain plan for
select *
  from big_t t
 order by id

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |    10M|    85M| 21299   (1)| 00:04:59 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BIG_T     |    10M|    85M| 21299   (1)| 00:04:59 |
|   2 |   INDEX FULL SCAN           | BIG_T_IDX |    10M|       | 11019   (1)| 00:02:35 |
-----------------------------------------------------------------------------------------

explain plan for
select /*+ full(t) */ *
  from big_t t
 order by id;

------------------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    10M|    85M|       | 33353   (2)| 00:07:47 |
|   1 |  SORT ORDER BY     |       |    10M|    85M|   191M| 33353   (2)| 00:07:47 |
|   2 |   TABLE ACCESS FULL| BIG_T |    10M|    85M|       |  9268   (1)| 00:02:10 |
------------------------------------------------------------------------------------

My first idea was using I/O costing and cheating a little bit with MBRC – but deleting 20M from my 30M table was sufficient (after deleting 10M from a 20M table was not sufficient).

Regards

Martin

Like

Richard Foote - September 15, 2011

Hi Martin

Nice 🙂

Both Tony and yourself have raised the issue of a table with lots of wasted space below the HWM.

What if the table was perfectly compacted, with not a single delete in sight ?

Like

7. Radoslav Golian - September 15, 2011

If the table is perfectly compacted and has a perfect clustering factor (clustering factor equals cca to number of blocks of the table)
then avoiding sorting (I think it would be a disk sort for such big table) by using index may be more effective..

SQL>  create table tab as select level a, lpad('x',300,'x') b from dual connect by level < 10000000;

Table created.

SQL> execute dbms_stats.gather_table_stats(user, 'TAB');

PL/SQL procedure successfully completed.

SQL> alter table tab add constraint pk_tab primary key (a);

Table altered.


SQL>  select clustering_factor from user_indexes where index_name='PK_TAB';

CLUSTERING_FACTOR
-----------------
           434783


SQL> select blocks from user_tables where table_name='TAB';

    BLOCKS
----------
    435703

We have perfect clustering factor..

SQL> set autot trace exp;
SQL> select * from tab order by a;

Execution Plan
----------------------------------------------------------
Plan hash value: 3124583913

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    10M|  2918M|   457K  (1)| 01:31:30 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB    |    10M|  2918M|   457K  (1)| 01:31:30 |
|   2 |   INDEX FULL SCAN           | PK_TAB |    10M|       | 22307   (1)| 00:04:28 |
--------------------------------------------------------------------------------------

SQL>


SQL> select /*+full(tab)*/ * from tab order by a;

Execution Plan
----------------------------------------------------------
Plan hash value: 2037665419

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10M|  2918M|       |   752K  (1)| 02:30:29 |
|   1 |  SORT ORDER BY     |      |    10M|  2918M|  6250M|   752K  (1)| 02:30:29 |
|   2 |   TABLE ACCESS FULL| TAB  |    10M|  2918M|       | 95697   (1)| 00:19:09 |
-----------------------------------------------------------------------------------

We have a big disk sort here..

Like

Richard Foote - September 19, 2011

Hi Radoslav

Excellent, well done 🙂

Like

8. Narendra - September 15, 2011

Besides other factors (and assumptions) already mentioned by others (and you), is it possible that CBO’s decision to (not) use index would be influenced by PGA_AGGREGATE_TARGET setting? Or to be more specific PGA_MAX_SIZE setting?
My reasoning goes as follows:
The table is really large and all the table data is being selected. This generally tends to say don’t use index but…
The index is on a NOT NULL column which means index can be used to execute the query and produce correct results.
As it is assumed that correct statistics are in place and no hints allowed, the only (??) factors that would affect optimizer’s decision whether to use index or not would be
i) The cost of sorting the data in order to take advantage of multi-block reads as opposed to cost of reading table and index with single block reads in order to avoid the final sort
ii) If it is not a unique index, I think the clustering factor may influence the optimizer decision as others mentioned above
iii) Not sure but would the fact that 11g prefers direct path reads over cache reads affect optimizer’s decision
iv) I guess the higher degree of parallelism (on table) might influence the optimizer to use full table scan instead of index scan.

Like

Saurabh Manroy - September 15, 2011

Narendra,

Precisely, values of _smm_min_size and _smm_max_size parameters are considered while evaluating cost of order by clause in (FTS + Sort) v/s index full scan.

Like

Richard Foote - September 19, 2011

Narendra

Indeed, the cost of the sort is a crucial component in all this, especially if forced to go to disk.

And yes, parallelism comes into the equation as well.

Note that whether the index is unique or non-unique matters little as the CF comes into consideration in both cases. Remember we want all the rows, not just the one.

Like

9. Flado - September 15, 2011

Yes, it is possible. It is a cost thing – if the cost of one full table scan + sort is bigger than the cost of one index full scan + table accesses, the CBO will use the index. Otherwise, it will not, unless hinted or otherwise persuaded to do so.
Tested on 10.2 and 11.2.

Like

Richard Foote - September 19, 2011

Hi Flado

Indeed 🙂

Like

10. John Brady - September 16, 2011

As a few others have kind of said already, it depends on how the Optimizer costs out the various alternatives and which is cheapest.

FTS = FTS Cost (all I/O) + Sort Cost
Sort Cost is either just CPU if the data fits in available memory, or is CPU + I/O to do a multi-pass sort.
Available memory depends on memory settings (PGA I assume).

Index = Index Full Scan Cost (all I/O) + Data Row (all I/O)
Data Row cost depends on Clustering Factor. In the worst case with poor clustering (high Clustering Factor) it would be as many I/Os as rows in the table (10 million). In the best case with the data in order and clustered on the indexed column, the number of I/Os would be the number of blocks in the table i.e. each block would only be read once, because the rows were well clustered.

So it depends on how well clustered the data is on this column, and how much memory is available. If enough memory is available then the Full Table Scan should be cheaper than the Index, as there are less disk I/Os in total (it will use multi-block reads).

This assumes that the CPU cost calculated is lower than the extra I/Os in the alternative execution plans. With the rate of CPU increase over the years, I think this would be true.

An alternative execution strategy would be available if the table was an Index Organised Table, on this column. Then the data is stored in order in the leaf blocks of the index. So an Index Full Scan could be done, and there would be no Sort because the data would be retrieved in the correct order.

Anyway, that’s my best guess.

John

Like

Richard Foote - September 19, 2011

Hi John

Spot on 🙂

Like

11. vishal Desai - September 17, 2011

I tried following:

Case 1:
=======

– create large table
– create unique index on id column (unique index)
– create bitmap index on flag column (flag index)

10053:

SINGLE TABLE ACCESS PATH

Oracle tried to calculate cost for following paths:

– Access path table full scan
– Access path index full scan (flag index)

OPTIMIZER STATISTICS AND COMPUTATIONS

****** Recost for ORDER BY (using index) ************
Access path analysis for TEST1
***************************************
SINGLE TABLE ACCESS PATH
****** trying bitmap/domain indexes ******
****** finished trying bitmap/domain indexes ******
Best:: AccessPath: TableScan

Cost: 69196.57 Degree: 1 Resp: 69196.57 Card: 14954700.00 Bytes: 200

Case 2:
=======

– create large table
– create unique index on id column (unique index)
– create bitmap index on flag column (flag index)
– Add primary key using unique index (primary index)

10053:

Oracle tried to calculate cost for following paths:

SINGLE TABLE ACCESS PATH

– Access path table full scan
– Access path index full scan (flag index)
– Access path index full scan (primary index)

OPTIMIZER STATISTICS AND COMPUTATIONS

****** Recost for ORDER BY (using index) ************
Access path analysis for TEST1
***************************************
****** trying bitmap/domain indexes ******
Access Path: index (FullScan)
Index: TEST1_RCN_ID
resc_io: 17007.00 resc_cpu: 3076948760
ix_sel: 1.000000 ix_sel_with_filters: 1.000000
Cost: 17159.45 Resp: 17159.45 Degree: 0
****** finished trying bitmap/domain indexes ******
Best:: AccessPath: IndexRange
Index: TEST1_RCN_ID
Cost: 1898325.96 Degree: 1 Resp: 1898325.96 Card: 14954700.00 Bytes: 200

Looks like oracle optimizer did not consider unique or bitmap (obvious) for order by but when I added primary key constraint it did consider primary key index but overall cost of index scan was high compared to table scan so it decided to do full table scan.

Like

Richard Foote - September 19, 2011

Hi Vishal

Note, it was not the PK as such that was important, it was the implicit NOT NULL constraint that makes the difference.

Oracle will not consider the index to return all rows (with no predicate) without the NOT NULL constraint as the CBO can not guarantee all rows are referenced within the index.

Like

12. vishal Desai - September 17, 2011

I changed primary key index leaf blocks and clustering factor to very low value using dbms_stats.set_index_stats and result is:

————————————-
| Id | Operation |
————————————-
| 0 | SELECT STATEMENT |
| 1 | TABLE ACCESS BY INDEX ROWID|
| 2 | INDEX FULL SCAN |
————————————-

Like

Richard Foote - September 19, 2011

Hi Vishal

Yes, but that was cheating 😉

But importantly, it does suggest that it’s the related costs that’s important here, providing you have that NOT NULL constraint in place.

Like

13. Big Tables, Sorts and Indexes Solution (Right On Mother) « Richard Foote’s Oracle Blog - September 19, 2011

[…] access path becomes viable. A poor (or average) CF, and using the index is just too expensive. Radoslav Golian has an excellent example in the comments on when an index with an excellent CF is chosen by the […]

Like

14. Refresh constraints « SbhOracle - September 27, 2011

[…] Big Tables, Sorts and Indexes Quiz (Candidate) (richardfoote.wordpress.com) […]

Like


Leave a comment