jump to navigation

Index Skip Scan: Potential Use Case or Maybe Not ? (Shine On You Crazy Diamond) January 30, 2018

Posted by Richard Foote in Oracle Indexes.
trackback

shine-on-you-crazy-diamond

While answering a recent question on a LinkedIn forum, it got me thinking whether there’s a potential use case for using an INDEX SKIP SCAN I hadn’t previously considered.

I’ve discussed Index Skip Scans previously (as I did here), a feature introduced around Oracle9i that allows an index to be considered by the CBO even if the leading column of the index is not included in a query predicate. However, the index is only going to be used by the CBO if there are relatively few distinct values in the missing leading column, as Oracle has to effectively scan the index multiple times for each potential leading column value. If there are too many distinct values, each scan might not result in sufficient index leaf blocks being “skipped” thereby making the INDEX SKIP SCAN access path too inefficient.

But it occurred to me that the strategy of using an index skip scan could also potentially be applied when performing a sort-based aggregate function. Or maybe not.

So a little test case to find out.

I begin by creating a table with 1M rows. The key here though is that the CODE column only has 3 distinct values, so any aggregation based on CODE will only return 3 or fewer rows.

SQL> create table bowie (id number not null, code number not null, sales number not null, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,3), ceil(dbms_random.value(0,10000)), 'David Bowie' from dual connect by level  commit;

Commit complete.

I now create an index based on the CODE and SALES columns, with CODE the leading column:

SQL> create index bowie_i on bowie(code, sales);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

As CODE only has very few distinct values, the index is a candidate for an INDEX SKIP SCAN if CODE is not specified in a predicate. For example:

SQL> select code, sales from bowie where sales=1;

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3861840421

----------------------------------------------------------------------------
| Id | Operation        | Name    | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------
| 0  | SELECT STATEMENT |         |  100 |   700 |       5 (0) | 00:00:01 |
|* 1 | INDEX SKIP SCAN  | BOWIE_I |  100 |   700 |       5 (0) | 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("SALES"=1)
    filter("SALES"=1)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
  31 consistent gets
   0 physical reads
   0 redo size
2305 bytes sent via SQL*Net to client
 674 bytes received via SQL*Net from client
   8 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
 100 rows processed

So an INDEX SKIP SCAN has indeed been used by the CBO and at just 31 consistent gets, not a bad result when fetching the 100 rows of interest. With so few distinct values of CODE, Oracle only has to perform a relatively few number of scans of the index to retrieve all possible SALES of interest across each og the (3) CODE values. At a cost of just 5, the CBO has estimated that relatively few index leaf blocks indeed need to be accessed here.

It’s also worth mentioning at this point that Oracle can also use the index very effectively to return just the MIN (or potentially MAX) SALES column for each CODE value of interest, as it only has to read the first CODE index entry to subsequently determine the associated minimum SALES value:

SQL> select min(sales) from bowie where code=1;

MIN(SALES)
----------
1

Execution Plan
----------------------------------------------------------
Plan hash value: 2634611566

----------------------------------------------------------------------------------------
| Id | Operation                  | Name    | Rows | Bytes | Cost (%CPU) | Time      |
----------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT           |         |    1 |     7 |       3 (0) |  00:00:01 |
| 1  | SORT AGGREGATE             |         |    1 |     7 |             |           |
| 2  | FIRST ROW                  |         |    1 |     7 |       3 (0) |  00:00:01 |
|* 3 | INDEX RANGE SCAN (MIN/MAX) | BOWIE_I |    1 |     7 |       3 (0) |  00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=1)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  3 consistent gets
  0 physical reads
  0 redo size
544 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

At just 3 consistent gets, Oracle has only had to read the first CODE=1 index entry to immediately determine the minimum associated SALES value.

In theory, Oracle could use the same strategies when processing a GROUP BY aggregate query to very quickly and efficiently determine the minimum SALES value for each distinct CODE in my data. Oracle knows there are only 3 (few) distinct key values and with a combination of using the index to quickly access the minimum (first) SALES of each CODE value and an INDEX SKIP SCAN to quickly re-scan the index to get to the next CODE index entry, an index could be used to very quickly and efficiently find and retrieve the necessary data set in the following query:

SQL> select code, min(sales) from bowie group by code;

CODE MIN(SALES)
---------- ----------
1 1
2 1
0 1

Execution Plan
----------------------------------------------------------
Plan hash value: 2387048003

---------------------------------------------------------------------------------
| Id | Operation            | Name    | Rows  | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------------
| 0  | SELECT STATEMENT     |         |     3 |    21 |   1079 (43) | 00:00:01 |
| 1  | HASH GROUP BY        |         |     3 |    21 |   1079 (43) | 00:00:01 |
| 2  | INDEX FAST FULL SCAN | BOWIE_I | 1000K | 6835K |    677 (10) | 00:00:01 |
---------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
2477 consistent gets
   0 physical reads
   0 redo size
 677 bytes sent via SQL*Net to client
 608 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   3 rows processed

However, only an INDEX FAST FULL SCAN is used at a relatively expensive 2477 consistent gets. Note that Oracle is using a HASH GROUP BY for its aggregation by default (and not some form of sort/group type aggregation). As a result, the index is not considered here and note also that the final result set is NOT in CODE order (the data is returned in 1,2,0 CODE order).

We could try to force the use of an INDEX SKIP SCAN via a hint:

SQL> select /*+ index_ss (bowie, bowie_i) */ code, min(sales) from bowie group by code;

CODE MIN(SALES)
---------- ----------
0 1
1 1
2 1

Execution Plan
----------------------------------------------------------
Plan hash value: 944722262

--------------------------------------------------------------------------------
| Id | Operation            | Name    | Rows  | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------
| 0  | SELECT STATEMENT     |         |     3 |    21 |    2558 (4) | 00:00:01 |
| 1  | SORT GROUP BY NOSORT |         |     3 |    21 |    2558 (4) | 00:00:01 |
| 2  | INDEX SKIP SCAN      | BOWIE_I | 1000K | 6835K |    2558 (4) | 00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
2469 consistent gets
   0 physical reads
   0 redo size
 677 bytes sent via SQL*Net to client
 608 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   3 rows processed

It’s now using an INDEX SKIP SCAN in the plan. But it’s not being “clever” in its use of the INDEX SKIP SCAN with 2469 consistent gets suggesting Oracle is not effectively making use of the (MIN/MAX) scan capability per access of distinct CODE and is unnecessarily reading most of the index leaf blocks.

Oracle however is now using a sort/group based process when performing it’s aggregation (as evidenced in the second step of the plan), resulting in the data now being returned in CODE order.

Even if we take the aggregation out of the equation with a simple MIN based query accessing more than one CODE value:

SQL> select min(sales) from bowie where code in (1,2);

Execution Plan
----------------------------------------------------------
Plan hash value: 1467404054

------------------------------------------------------------------------------
| Id | Operation        | Name    | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------
| 0  | SELECT STATEMENT |         |    1 |     7 |       3 (0) |  00:00:01 |
| 1  | SORT AGGREGATE   |         |    1 |     7 |             |           |
| 2  | INLIST ITERATOR  |         |      |       |             |           |
|* 3 | INDEX RANGE SCAN | BOWIE_I |    1 |     7 |       3 (0) |  00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=1 OR "CODE"=2)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
1676 consistent gets
   0 physical reads
   0 redo size
 544 bytes sent via SQL*Net to client
 608 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The CBO still doesn’t consider INDEX SKIP SCAN type of processing in this scenario and is using a relatively inefficient INDEX RANGE SCAN instead.

So unfortunately, although I have an existing index in place that if only used effectively could potentially return the GROUP BY result set very efficiently, the CBO is not using the index. The CBO doesn’t appear to be able use an INDEX SKIP SCAN in combination with multiple MIN/MAX scans in scenarios when it perhaps could.

Comments»

1. Jonathan Lewis - January 30, 2018

Richard – there’s an example of this type of thing on my blog with the title “index bouncy scan”: , but there are a couple of links in the comments https://jonathanlewis.wordpress.com/2017/02/09/index-bouncy-scan/#comments to a couple of nice solutions that preceded my blog:

One from Sayan Malakshinov: http://orasql.org/2012/09/21/distinct-values-by-index-topn/

One from Markus Winand: https://wiki.postgresql.org/wiki/Loose_indexscan

Liked by 1 person

Richard Foote - January 31, 2018

Thanks for the links Jonathan.Yes, I’ve was looking at other techniques to get the same result more efficiently, similar to how one gets both min/max values efficiently within the one SQL. These links show some nice methods and skills that will likely still be needed in the future autonomous database world 🙂

Like

2. Oren Nakdimon (@DBoriented) - January 31, 2018

Hi Richard.
Many times in such scenarios there is also a foreign key from BOWIE.CODE to a list-of-values table (say CODES):


create table codes (code number constraint codes_pk primary key, description varchar2(100));
insert into codes select rownum-1,'code '||(rownum-1) from dual connect by level<=4;
alter table bowie add foreign key (code) references codes (code);

If this is the case, mimicking an INDEX SKIP SCAN while taking advantage of MIN/MAX SCAN becomes simple:


select * from (
select c.code,
(select min(sales) from bowie b where b.code = c.code) sales
from codes c)
where sales is not null;

CODE SALES
---------- ----------
0 1
1 1
2 1

3 rows selected.

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 104 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | FIRST ROW | | 1 | 7 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| BOWIE_I | 1 | 7 | 3 (0)| 00:00:01 |
|* 4 | VIEW | | 4 | 104 | 2 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN | CODES_PK | 4 | 52 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("B"."CODE"=:B1)
4 - filter("SALES" IS NOT NULL)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
15 consistent gets
0 physical reads
0 redo size
656 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed

Thanks,
Oren.

Liked by 1 person

Oren Nakdimon (@DBoriented) - January 31, 2018

In 12c we may need to add a hint to prevent scalar subquery unnesting:

select * from (
  select c.code,
         (select /*+ no_unnest */ min(sales) from bowie b where b.code = c.code) sales
  from codes c)
where sales is not null;

Liked by 1 person

Richard Foote - February 1, 2018

Hi Oren

Thanks for sharing your examples, great stuff. Much appreciated.

Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: