jump to navigation

FIRST_ROWS_10 CBO Is Hopeless, It’s Using The Wrong Index !! (Weeping Wall) November 5, 2018

Posted by Richard Foote in ALL_ROWS, CBO, Exadata, FIRST_ROWS_10, Oracle Indexes, Siebel.
trackback

low

There’s an organisation I had been dealing with on and off over the years who were having all sorts of issues with their Siebel System and who were totally convinced their performance issues were due directly to being forced to use the FIRST_ROWS_10 optimizer. I’ve attempted on a number of occasions to explain that their issues are not actually due to some unexplained deficiency with the FIRST_ROWS_10 CBO, but due to a number of other root issues, sadly to no avail. I recently found out they’re still struggling with performance issues, so I thought it might be worth looking at a classic example of where it looks simplistically like a FIRST_ROWS_10 CBO issue, but the “real” underlying problem(s) are actually quite different. Just in case other sites are likewise struggling to identify such SQL performance issues when using FIRST_ROWS_10…

This is a somewhat simplified version of their most common issue. Firstly, I create a table with 3M rows that has two columns of interest. The CODE column is initially populated with two evenly distributed distinct values and the GRADE column which only has the one distinct value.

SQL> create table bowie (id number not null, code number not null, grade number not null, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,2), 42, 'David Bowie'
from dual connect by level > = 3000000;

3000000 rows created.

SQL> commit;

Commit complete.

I then update a few rows (just 5) so that the CODE column now has a few occurrences of a third distinct value and update 5 other rows so the GRADE column has a few occurrences of a second distinct value:

SQL> update bowie set code=2
where id in (42, 4343, 400042, 1420001, 2000042);

5 rows updated.

SQL> commit;

Commit complete.

SQL> update bowie set grade=2
where id in (4212, 434323, 440423, 1440002, 2400642);

5 rows updated.

SQL> commit;

Commit complete.

We now introduce “a root problem”, not collecting histograms on these two columns, such that the CBO doesn’t recognise that the values in these columns are not evenly distributed. The CBO will incorrectly assume the rare CODE values actually occur 1M times as it will assume even distribution across the three distinct values. Now this is NOT the specific root issue at this organisation as they do gather histograms, but they do have numerous issues with the CBO not picking the correct cardinality/selectivity of their SQL.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We next create indexes on these two CODE and GRADE columns:

SQL> create index bowie_code_i on bowie(code);

Index created.

SQL> create index bowie_grade_i on bowie(grade);

Index created.

Let’s now run the following query using the session default FIRST_ROWS_10 optimizer. The query basically returns just the 5 rows that have a CODE = 2, but sorts the result set by the GRADE column:

SQL> alter session set optimizer_mode=first_rows_10;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 3133133456

---------------------------------------------------------------------------------------------
| Id | Operation                   | Name          | Rows  | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |               |    10 |   240 |       4 (0) | 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | BOWIE         | 1000K |   22M |       4 (0) | 00:00:01 |
|  2 | INDEX FULL SCAN             | BOWIE_GRADE_I |    31 |       |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=2)

Statistics
----------------------------------------------------------
    1 recursive calls
    0 db block gets
17518 consistent gets
 5865 physical reads
    0 redo size
  858 bytes sent via SQL*Net to client
  572 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    5 rows processed

The FIRST_ROWS_10 optimizer has come up with a terrible execution plan. Instead of using the index on the CODE column to quickly access the 5 rows of interest and then sort them, it uses an INDEX FULL SCAN via the GRADE column index.

This results in a massively inefficient execution plan (note 17,518 consistent gets), as the CBO has to basically read the entire table via this GRADE index to eventually find the 5 rows of interest that have a CODE=2.

The FIRST_ROWS_10 certainly appears to be dreadful…

But before you go off and demand that Oracle not use this CBO, the key question to ask here is WHY? Why is the FIRST_ROWS_10 CBO deciding to use what is clearly the wrong index?

If we can understand why this is happening, perhaps we can then address what is clearly a problem with an appropriate solution that might not just fix this query but many many like this. And perhaps we can address this problem with an optimal solution and not with a band-aid fix or with a sub-optimal solution that is beneficial for just this one query.

Now there are actually two clues within this execution plan regarding what is really going on.

The first is that the execution plan is estimating that 1000K rows are to be processed by the table access after the filter on CODE=2 has been applied. But this is not correct, there are only 5 such rows.

The second clue that not all is right is that the CBO is estimating 10 rows are to be retrieved via this FIRST_ROWS_10 access plan (as Oracle is trying here to come up with the best plan to retrieve the first 10 rows as efficiently as possible), however there are only 5 rows that meet this SQL criteria. The CBO is not picking up that less than the 10 mandatory rows will actually be fetched and only need to be considered

I always recommend a couple of things to look at if one ever comes across the scenario where the FIRST_ROWS(N) optimizer doesn’t appear to be behaving itself. The first is to look at a 10053 trace and see what the CBO costings are for the various alternative plans. The second is to simply run the query with the ALL_ROWS CBO to see what it’s initial deliberations might be, noting that the CBO has to perform an initial pass with ALL_ROWS to see the data density of the various steps to accurately come up with the optimal FIRST_ROWS(N) costings. Without knowing the potential full result set, The FIRST_ROWS_10 optimizer wouldn’t be able to determine for example how much of a Full Index Scan actually needs to be processed before it likely finds the necessary rows of interest.

So let’s see what costings and plan we get with the ALL_ROWS CBO:

SQL> alter session set optimizer_mode=all_rows;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2027917145

------------------------------------------------------------------------------------
| Id | Operation         | Name  | Rows  | Bytes |TempSpc | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       | 1000K |   22M |        |   11173 (8) | 00:00:01 |
|  1 | SORT ORDER BY     |       | 1000K |   22M |    34M |   11173 (8) | 00:00:01 |
|* 2 | TABLE ACCESS FULL | BOWIE | 1000K |   22M |        |   3387 (11) | 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("CODE"=2)

Statistics
----------------------------------------------------------
    0 recursive calls
    0 db block gets
11897 consistent gets
    0 physical reads
    0 redo size
  858 bytes sent via SQL*Net to client
  572 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    5 rows processed

The root issue now becomes somewhat obvious…

ALL_ROWS is not correctly estimating 5 rows are to be returned, but 1000K rows !! Oracle is not estimating that using the index on the CODE column will only fetch 5 rows, but using such an index would retrieve 1000K rows. Using such a CODE index to access 1M rows would therefore be viewed as being much too expensive.

Importantly, the sort step would therefore not sort 5 rows, but would be required to sort 1000K rows, which would be extremely expensive.

Oracle thinks all this when deciding the best way to access the first 10 rows of interest as efficiently as possible with the FIRST_ROWS_10 CBO.

Rather than using the CODE index to first retrieve all 1000K rows, to then sort all 1000K rows before finally being able to return the first 10 rows of interest, Oracle instead does the following.

It uses the index of the GRADE column to retrieve the first 10 rows of interest. As 1 in 3 of all rows are estimated to be of interest (1M out of the 3M rows, because we’re interested in 1 of the 3 distinct CODE values), it estimates it doesn’t actually have to perform much of the FULL INDEX SCAN to find these initial 10 rows of interest.

As the GRADE index was accessed, it also means these first 10 rows would have been fetched in GRADE order. Therefore, there is no need to perform the SORT BY step as the index guarantees the data to be fetched in GRADE order. Not having to perform this sort makes this plan fantastically cheap compared to any other option that first requires all 1000K  of data to be fetched and sorted.

The execution plan when using ALL_ROWS is therefore deciding to perform a Full Table Scan (FTS) to access efficiently what the CBO thinks will be the 1000K rows of interest. This would be much more efficient than accessing all 1000K of interest via either the CODE index (followed by the sort) or via the GRADE index (in which the sort is not required) but requires all the table to be accessed by the index.

Now for this organisation, this FTS is not an entirely bad thing. Why? Because they run Siebel on an Exadata platform !!

Exadata takes this FTS and performs a Smart Scan. And the associated Storage Index can automatically determine this data is extremely rare and potentially only access the relatively few storage regions within the table where these few values of interest reside.

The query goes from taking 60 seconds to run using the “awful” FIRST_ROWS_10 CBO to just 2 seconds with the “brilliant” ALL_ROWS CBO.

However, the “root issue” here is not the FIRST_ROWS_10 CBO but the fact it is being fed insufficient statistics to make an accurate estimate of the true cost. As with all CBOs, rubbish stats in, rubbish plan out…

If we fix the actual root issue and provide the CBO with the necessary statistics to make the correct cardinality/selectivity estimates (in this example by collecting histograms on the skewed data columns):

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 75');

PL/SQL procedure successfully completed.

And now re-run the query again with ALL_ROWS:

SQL> alter session set optimizer_mode=all_rows;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2357877461

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |              |    5 |   120 |      5 (20) | 00:00:01 |
|  1 | SORT ORDER BY                       |              |    5 |   120 |      5 (20) | 00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE        |    5 |   120 |       4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE_CODE_I |    5 |       |       3 (0) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=2)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  8 consistent gets
  0 physical reads
  0 redo size
858 bytes sent via SQL*Net to client
572 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  1 sorts (memory)
  0 sorts (disk)
  5 rows processed

We notice that the ALL_ROWS CBO is now correctly determining the correct query cardinality (5 rows) and is now using the CODE index to retrieve the correctly estimated 5 rows. It’s happy to now perform the sort as the sort of 5 rows has a trivial cost (the cost just goes up by 1).

If we now run the query using the default session FIRST_ROWS_10 CBO:

SQL> alter session set optimizer_mode=first_rows_10;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2357877461

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |              |    5 |   120 |      5 (20) | 00:00:01 |
|  1 | SORT ORDER BY                       |              |    5 |   120 |      5 (20) | 00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE        |    5 |   120 |       4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE_CODE_I |    5 |       |       3 (0) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=2)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  8 consistent gets
  0 physical reads
  0 redo size
858 bytes sent via SQL*Net to client
572 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  1 sorts (memory)
  0 sorts (disk)
  5 rows processed

We note it’s also using the same execution plan as ALL_ROWS, as the FIRST_ROWS_10 CBO likewise is correctly determining that using the CODE index is now a very efficient manner in which to access just the 5 rows of interest.

Here’s the thing. If you are returning 10 or less rows, the optimal execution plan for both FIRST_ROWS_10 and ALL_ROWS should ultimately be the same, as they both should cost the associated plans the same way.

By correctly identifying and addressing the root issue here (poor cardinality/selectivity estimates), we get the following considerable benefits:

  • We now have an execution plan that doesn’t take 2 seconds to run, but 0.02 of a second (we are now down to just 8 consistent gets). This is much more efficient than the Exadata FTS and allows for the optimal plan to be selected, not just a better plan.
  • We automatically fix ALL execution plans for all queries that are based on this combination of table and filtering columns
  • We correctly understand and identify issues with any other table that likewise has the same costing issue
  • We don’t unnecessarily have to add ALL_ROWS hints or use ALL_ROWS based baselines to address all such related issues
  • We don’t implement a fix (such as baselines) that becomes ineffective if we were to even change the underlying SQL with any subsequent release
  • We don’t attempt to fix the relatively few problem queries with a global change (such as changing to ALL_ROWS CBO) that can potentially impact negatively as many queries as get addressed
  • We don’t spend years demanding futilely that Oracle Support allow Siebel with ALL_ROWS based session settings

So if you’re running Siebel and having performance issues, don’t just assume it’s some deficiency with the FIRST_ROWS_10 CBO, spend the time to get to the bottom of any root issues (e.g. CBO bugs with getting histograms costs incorrect for CHAR columns, missing statistics on small tables, poor default settings when returning empty result sets, Siebel bugs with Cartesian Joins, missing extended statistics, missing indexes, etc. etc.)…

In a future post, I’ll explain why playing around with the unsupported _sort_elimination_cost_ratio parameter (again, always a bad idea when trying to address specific SQL tuning issues) is ultimately futile when trying to get FIRST_ROWS_10 to not use the clearly inefficient index that eliminates the sort…

Comments»

1. Giedrius - November 6, 2018

Hi,

If final plans (after calculating histograms) for all_rows and first_rows_10 are the same why actual execution statistics differ? 209 consistent gets when all_rows vs 8 when first_rows

Thank you

Like

Richard Foote - November 6, 2018

Hi Giedrius

For exactly the reason Ric stated, it was executed just after I had flushed the shared_pool which resulted in a hard parse and associated overheads. I’ve decided to change it to a subsequent re-execution to avoid any such confusion.

Thanks for pointing it out 🙂

Liked by 1 person

2. Ric Van Dyke - November 6, 2018

HI Giedrius – Just looking at the stats it looks like the ALL_ROWS plan stats are including a hard parse and the FIRST_ROWS stats are not, notice the recursive calls count of 90 vs 1. Not sure but that seems like the cause to me. – Ric

Liked by 1 person

Richard Foote - November 6, 2018

Hi Ric

Spot on 🙂

Liked by 1 person

3. Jonathan Lewis - November 6, 2018

It doesn’t help that Siebel is the type of application where you find 147 indexes on a single table, and the index you would like Oracle to use is just one of 7 which all start with the same 5 columns before varying on the 6th to 10th columns.

Liked by 1 person

Richard Foote - November 6, 2018

Hi Jonathan

Yes, Siebel certainly has it’s “challenges”, which makes understanding how the CBO actually works all the more important 🙂

Like


Leave a comment