Index Skip Scan: Potential Use Case or Maybe Not ? (Shine On You Crazy Diamond) January 30, 2018
Posted by Richard Foote in Oracle Indexes.trackback
While answering a recent question on a LinkedIn forum, it got me thinking whether there’s a potential use case for using an INDEX SKIP SCAN I hadn’t previously considered.
I’ve discussed Index Skip Scans previously (as I did here), a feature introduced around Oracle9i that allows an index to be considered by the CBO even if the leading column of the index is not included in a query predicate. However, the index is only going to be used by the CBO if there are relatively few distinct values in the missing leading column, as Oracle has to effectively scan the index multiple times for each potential leading column value. If there are too many distinct values, each scan might not result in sufficient index leaf blocks being “skipped” thereby making the INDEX SKIP SCAN access path too inefficient.
But it occurred to me that the strategy of using an index skip scan could also potentially be applied when performing a sort-based aggregate function. Or maybe not.
So a little test case to find out.
I begin by creating a table with 1M rows. The key here though is that the CODE column only has 3 distinct values, so any aggregation based on CODE will only return 3 or fewer rows.
SQL> create table bowie (id number not null, code number not null, sales number not null, name varchar2(42)); Table created. SQL> insert into bowie select rownum, mod(rownum,3), ceil(dbms_random.value(0,10000)), 'David Bowie' from dual connect by level commit; Commit complete.
I now create an index based on the CODE and SALES columns, with CODE the leading column:
SQL> create index bowie_i on bowie(code, sales); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
As CODE only has very few distinct values, the index is a candidate for an INDEX SKIP SCAN if CODE is not specified in a predicate. For example:
SQL> select code, sales from bowie where sales=1; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3861840421 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 700 | 5 (0) | 00:00:01 | |* 1 | INDEX SKIP SCAN | BOWIE_I | 100 | 700 | 5 (0) | 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("SALES"=1) filter("SALES"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 31 consistent gets 0 physical reads 0 redo size 2305 bytes sent via SQL*Net to client 674 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
So an INDEX SKIP SCAN has indeed been used by the CBO and at just 31 consistent gets, not a bad result when fetching the 100 rows of interest. With so few distinct values of CODE, Oracle only has to perform a relatively few number of scans of the index to retrieve all possible SALES of interest across each og the (3) CODE values. At a cost of just 5, the CBO has estimated that relatively few index leaf blocks indeed need to be accessed here.
It’s also worth mentioning at this point that Oracle can also use the index very effectively to return just the MIN (or potentially MAX) SALES column for each CODE value of interest, as it only has to read the first CODE index entry to subsequently determine the associated minimum SALES value:
SQL> select min(sales) from bowie where code=1; MIN(SALES) ---------- 1 Execution Plan ---------------------------------------------------------- Plan hash value: 2634611566 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 7 | 3 (0) | 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 7 | | | | 2 | FIRST ROW | | 1 | 7 | 3 (0) | 00:00:01 | |* 3 | INDEX RANGE SCAN (MIN/MAX) | BOWIE_I | 1 | 7 | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("CODE"=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 544 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
At just 3 consistent gets, Oracle has only had to read the first CODE=1 index entry to immediately determine the minimum associated SALES value.
In theory, Oracle could use the same strategies when processing a GROUP BY aggregate query to very quickly and efficiently determine the minimum SALES value for each distinct CODE in my data. Oracle knows there are only 3 (few) distinct key values and with a combination of using the index to quickly access the minimum (first) SALES of each CODE value and an INDEX SKIP SCAN to quickly re-scan the index to get to the next CODE index entry, an index could be used to very quickly and efficiently find and retrieve the necessary data set in the following query:
SQL> select code, min(sales) from bowie group by code; CODE MIN(SALES) ---------- ---------- 1 1 2 1 0 1 Execution Plan ---------------------------------------------------------- Plan hash value: 2387048003 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 21 | 1079 (43) | 00:00:01 | | 1 | HASH GROUP BY | | 3 | 21 | 1079 (43) | 00:00:01 | | 2 | INDEX FAST FULL SCAN | BOWIE_I | 1000K | 6835K | 677 (10) | 00:00:01 | --------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2477 consistent gets 0 physical reads 0 redo size 677 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed
However, only an INDEX FAST FULL SCAN is used at a relatively expensive 2477 consistent gets. Note that Oracle is using a HASH GROUP BY for its aggregation by default (and not some form of sort/group type aggregation). As a result, the index is not considered here and note also that the final result set is NOT in CODE order (the data is returned in 1,2,0 CODE order).
We could try to force the use of an INDEX SKIP SCAN via a hint:
SQL> select /*+ index_ss (bowie, bowie_i) */ code, min(sales) from bowie group by code; CODE MIN(SALES) ---------- ---------- 0 1 1 1 2 1 Execution Plan ---------------------------------------------------------- Plan hash value: 944722262 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 21 | 2558 (4) | 00:00:01 | | 1 | SORT GROUP BY NOSORT | | 3 | 21 | 2558 (4) | 00:00:01 | | 2 | INDEX SKIP SCAN | BOWIE_I | 1000K | 6835K | 2558 (4) | 00:00:01 | -------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 2469 consistent gets 0 physical reads 0 redo size 677 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed
It’s now using an INDEX SKIP SCAN in the plan. But it’s not being “clever” in its use of the INDEX SKIP SCAN with 2469 consistent gets suggesting Oracle is not effectively making use of the (MIN/MAX) scan capability per access of distinct CODE and is unnecessarily reading most of the index leaf blocks.
Oracle however is now using a sort/group based process when performing it’s aggregation (as evidenced in the second step of the plan), resulting in the data now being returned in CODE order.
Even if we take the aggregation out of the equation with a simple MIN based query accessing more than one CODE value:
SQL> select min(sales) from bowie where code in (1,2); Execution Plan ---------------------------------------------------------- Plan hash value: 1467404054 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 7 | 3 (0) | 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 7 | | | | 2 | INLIST ITERATOR | | | | | | |* 3 | INDEX RANGE SCAN | BOWIE_I | 1 | 7 | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("CODE"=1 OR "CODE"=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 1676 consistent gets 0 physical reads 0 redo size 544 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The CBO still doesn’t consider INDEX SKIP SCAN type of processing in this scenario and is using a relatively inefficient INDEX RANGE SCAN instead.
So unfortunately, although I have an existing index in place that if only used effectively could potentially return the GROUP BY result set very efficiently, the CBO is not using the index. The CBO doesn’t appear to be able use an INDEX SKIP SCAN in combination with multiple MIN/MAX scans in scenarios when it perhaps could.
Richard – there’s an example of this type of thing on my blog with the title “index bouncy scan”: , but there are a couple of links in the comments https://jonathanlewis.wordpress.com/2017/02/09/index-bouncy-scan/#comments to a couple of nice solutions that preceded my blog:
One from Sayan Malakshinov: http://orasql.org/2012/09/21/distinct-values-by-index-topn/
One from Markus Winand: https://wiki.postgresql.org/wiki/Loose_indexscan
LikeLiked by 1 person
Thanks for the links Jonathan.Yes, I’ve was looking at other techniques to get the same result more efficiently, similar to how one gets both min/max values efficiently within the one SQL. These links show some nice methods and skills that will likely still be needed in the future autonomous database world 🙂
LikeLike
Hi Richard.
Many times in such scenarios there is also a foreign key from BOWIE.CODE to a list-of-values table (say CODES):
create table codes (code number constraint codes_pk primary key, description varchar2(100));
insert into codes select rownum-1,'code '||(rownum-1) from dual connect by level<=4;
alter table bowie add foreign key (code) references codes (code);
If this is the case, mimicking an INDEX SKIP SCAN while taking advantage of MIN/MAX SCAN becomes simple:
select * from (
select c.code,
(select min(sales) from bowie b where b.code = c.code) sales
from codes c)
where sales is not null;
CODE SALES
---------- ----------
0 1
1 1
2 1
3 rows selected.
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 104 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | FIRST ROW | | 1 | 7 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| BOWIE_I | 1 | 7 | 3 (0)| 00:00:01 |
|* 4 | VIEW | | 4 | 104 | 2 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN | CODES_PK | 4 | 52 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("B"."CODE"=:B1)
4 - filter("SALES" IS NOT NULL)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
15 consistent gets
0 physical reads
0 redo size
656 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
Thanks,
Oren.
LikeLiked by 1 person
In 12c we may need to add a hint to prevent scalar subquery unnesting:
LikeLiked by 1 person
Hi Oren
Thanks for sharing your examples, great stuff. Much appreciated.
LikeLike