jump to navigation

Unique Indexes Force Hints To Be “Ignored” Part II (One Of The Few) February 19, 2019

Posted by Richard Foote in CBO, Hash Join, Hints, Oracle Indexes, Transitive Closure, Unique Indexes.
2 comments

Final Cut

In Part I, I showed a demo of how the introduction of a Unique Index appears to force a hint to be “ignored”. This is a classic case of what difference a Unique Index can make in the CBO deliberations.

So what’s going on here?

When I run the first, un-hinted query:

SQL> select * from bowie1, bowie2
where bowie1.id=bowie2.code and bowie1.id=1;
we notice something a little odd in the Predicate Information section of the execution plan:
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("BOWIE1"."ID"=1)
6 - access("BOWIE2"."CODE"=1)
Where the hell is the join condition, it’s not listed? Additionally, where does the BOWIE2.CODE=1 condition come from, it’s not a predicate within the above SQL statement?
The answer is “Transitive Closure“, whereby the CBO can automatically infer that BOWIE2.CODE must equal 1, if BOWIE2.CODE=BOWIE1.ID and BOWIE1.ID=1. This is something that the CBO master Jonathan Lewis has blogged about a number of times, including this post on Cartesian Merge Join.
Because the CBO is picking up the fact (based on the column statistics) that BOWIE1.ID is basically a unique column, it can effectively drop the join condition and simply use a  Merge Join Cartesian to get all rows from BOWIE1 that match the BOWIE1.ID=1 predicate (just 1 row), with all those rows from BOWIE2 that match the BOWIE2.CODE=1 predicate (estimated 50 rows).
-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name          | Rows | Bytes | Cost (%CPU) |   Time     |
-------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |               |   50 |  1150 |       5 (0) |   00:00:01 |
|  1 | MERGE JOIN CARTESIAN                |               |   50 |  1150 |       5 (0) |   00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE1        |    1 |    16 |       2 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE1_ID_I   |    1 |       |       1 (0) |   00:00:01 |
|  4 | BUFFER SORT                         |               |   50 |   350 |       3 (0) |   00:00:01 |
|  5 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2        |   50 |   350 |       3 (0) |   00:00:01 |
|* 6 | INDEX RANGE SCAN                    | BOWIE2_CODE_I |   50 |       |       1 (0) |   00:00:01 |
-------------------------------------------------------------------------------------------------------
The CBO is picking up the fact that a Merge Join Cartesian is not as bad as it might sound if only 1 row is likely to be returned from one side of this process that can be combined with just the rows of interest from the other table, with no unnecessary row throwaways.
However, the CBO might not get this right, the efficiency of this depends somewhat on there really only being the one row returned from the BOWIE1.ID=1 condition. If there were many more than one row possible, then this can become relatively inefficient.
The second hinted query is therefore directing Oracle to perform a Hash Join, as I potentially know my data better than Oracle and think it a better option in this case:
SQL> select /*+ use_hash(bowie2) */ * from bowie1, bowie2
where bowie1.id=bowie2.code and bowie1.id=1;
The hint is directing Oracle to access the BOWIE2 table to be the probe table in a Hash Join operation.
We notice that the join predicate is now listed in the Predicate Information of the execution plan:
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("BOWIE1"."ID"="BOWIE2"."CODE")
3 - access("BOWIE1"."ID"=1)
5 - access("BOWIE2"."CODE"=1)
The key point being that there could be a many to many join operation that needs to be performed and Oracle can’t perform a Hash Join unless there is a join predicate listed.
As such, the CBO uses a Hash Join as requested by the hint:
------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name          | Rows | Bytes | Cost (%CPU) |  Time     |
------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |               |   50 |  1150 |       5 (0) |  00:00:01 |
|* 1 | HASH JOIN                           |               |   50 |  1150 |       5 (0) |  00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE1        |    1 |    16 |       2 (0) |  00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE1_ID_I   |    1 |       |       1 (0) |  00:00:01 |
|  4 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2        |   50 |   350 |       3 (0) |  00:00:01 |
|* 5 | INDEX RANGE SCAN                    | BOWIE2_CODE_I |   50 |       |       1 (0) |  00:00:01 |
------------------------------------------------------------------------------------------------------
Note currently, there is no Unique constraint on the BOWIE1.ID column and the index on BOWIE1.ID is non-unique. Although the column statistics suggests the ID is basically unique (the number of distinct values matches the number of rows in the table), there is no certainly that this is correct. The statistics might not be accurate and there could be a bunch of duplicate IDs that would result in a many to many join operation whereby a Hash Join might be preferable.
But by replacing the non-unique index on BOWIE1.ID with a unique index, the CBO now knows there is indeed just the one viable row from the BOWIE1 side of the join, with the BOWIE1.ID=1 predicate. As such, the CBO goes back to using Transitive Closure to again effectively eliminate the join predicate.

 

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("BOWIE1"."ID"=1)
5 - access("BOWIE2"."CODE"=1)
The CBO can now safely get just the required row from BOWIE1 table via the BOWIE1.ID=1 predicate and the required data from BOWIE2 directly via the BOWIE2.CODE=1 predicate. The CBO makes this decision before considering the most appropriate join strategy, which can now not possibly be the Hash Join, as the Hash Join is only possible with a join predicate in place.
------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name          | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |               |   50 |  1150 |       5 (0) | 00:00:01 |
|  1 | NESTED LOOPS                        |               |   50 |  1150 |       5 (0) | 00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID         | BOWIE1        |    1 |    16 |       2 (0) | 00:00:01 |
|* 3 | INDEX UNIQUE SCAN                   | BOWIE1_ID_I   |    1 |       |       1 (0) | 00:00:01 |
|  4 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2        |   50 |   350 |       3 (0) | 00:00:01 |
|* 5 | INDEX RANGE SCAN                    | BOWIE2_CODE_I |   50 |       |       1 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------
As such, the USE_HASH hint is now “ignored”, because it’s simply now not a viable option for the CBO. A Nested Loop is now performed instead, in which the row for the Outer Table (BOWIE1) can be retrieved and all “corresponding” rows for the Inner Table (BOWIE2) can be likewise accessed via just the BOWIE2.CODE=1 predicate.
A Nested Loop is the join type you can have when you’re not necessarily performing a join…
Advertisements