jump to navigation

12c Partial Indexes For Partitioned Tables Part I (Ignoreland) July 8, 2013

Posted by Richard Foote in 12c, Oracle Indexes, Partial Indexes, Partitioning.
trackback

In my opinion, one of the unsung “Heroes” of the new 12c Oracle database are improvements associated with Partitioning. There are lots of really nice capabilities introduced in the 12c database that significantly improves the manageability of partitioned environments.

One of those new capabilities is the ability to now create both local and (importantly) global  indexes on only a subset of partitions within a partitioned table. This provides us with the flexibility to say only create partitions with data that would make sense to index, to not index current partitions where perhaps data insert performance is paramount, etc. Additionally and just as importantly, the CBO is aware of the indexing characteristics of individual partitions and can access partitions in differing manners accordingly.

To illustrate, a simple little demo as usual 🙂 Firstly, I’ll create a partitioned table with the new INDEXING clause:

SQL> create table pink_floyd (id number, status varchar2(6), name varchar2(30))
INDEXING OFF
partition by range (id)
(partition pf1 values less than (1000001),
partition pf2 values less than (2000001) INDEXING OFF,
partition pf3 values less than (maxvalue) INDEXING ON);
Table created.

The INDEXING clause determines whether or not the partition is to be indexed. It can be set at the table level and so set the default behaviour for the table or at the individual partition/subpartition level.

In the above example, I’ve set INDEXING OFF at the table level and so indexing by default is not be enabled for the table partitions. Therefore the PF1 partition is not indexed by default. The PF2 partition is explicitly set to also not be indexed but the PF3 index is explicitly set (INDEXING ON) to enable indexing and so override the table level default.

Let’s now populate the table with some basic data:

SQL> insert into pink_floyd select rownum, 'CLOSED', 'DAVID BOWIE' from dual connect by level <= 3000000;

3000000 rows created.

SQL> commit;

Commit complete.

SQL> update pink_floyd set status = 'OPEN' where id > 2000000 and mod(id,10000)=0;

100 rows updated.

SQL> commit;

Commit complete.

Most of the data has a STATUS column value of ‘CLOSED’ but I’ve updated a few rows within just the last partition with a STATUS set to ‘OPEN’.

Let’s now create an index on this STATUS column and collect table statistics:

SQL> create index pink_floyd_status_i on pink_floyd(status);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'PINK_FLOYD', estimate_percent=>null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1 FOR COLUMNS STATUS SIZE 5');

PL/SQL procedure successfully completed.

SQL> select index_name, num_rows, leaf_blocks, indexing from dba_indexes where index_name = 'PINK_FLOYD_STATUS_I';

INDEX_NAME             NUM_ROWS LEAF_BLOCKS INDEXIN
-------------------- ---------- ----------- -------
PINK_FLOYD_STATUS_I     3000000        9203 FULL

By default, an index will include all partitions in a table, regardless of the INDEXING table clause setting. So this index covers all 3M rows in the table and currently has 9203 leaf blocks. The new INDEXING column in DBA_INDEXES shows us that this index is a FULL (non-Partial) index.

We can of course get the data of interest (STATUS = ‘OPEN’) via this index now:

SQL> select * from pink_floyd where status = 'OPEN';

100 rows selected.

Execution Plan
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                     |   964 | 24100 |     4   (0)| 00:00:01 |       |       |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| PINK_FLOYD          |   964 | 24100 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  2 |   INDEX RANGE SCAN                         | PINK_FLOYD_STATUS_I |   100 |       |     3   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("STATUS"='OPEN')

Statistics
----------------------------------------------------------

0  recursive calls

0  db block gets

110  consistent gets

0  physical reads

0  redo size

4479  bytes sent via SQL*Net to client

609  bytes received via SQL*Net from client

8  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

100  rows processed

However, we can potentially also run a query based on just the last partition as all the ‘OPEN’ statuses of interest only reside in this last partition:

SQL> select * from pink_floyd where status = 'OPEN' and id > 2000001;

100 rows selected.

Execution Plan
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                     |    99 |  2475 |     4   (0)| 00:00:01 |       |       |
|*  1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| PINK_FLOYD          |    99 |  2475 |     4   (0)| 00:00:01 |     3 |     3 |
|*  2 |   INDEX RANGE SCAN                         | PINK_FLOYD_STATUS_I |   100 |       |     3   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID">2000001)
2 - access("STATUS"='OPEN')

Statistics
----------------------------------------------------------

0  recursive calls

0  db block gets

110  consistent gets

0  physical reads

0  redo size

2787  bytes sent via SQL*Net to client

609  bytes received via SQL*Net from client

8  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

100  rows processed

Currently however, the index includes data from all table partitions, even though we’re only really interested in using the index to retrieve the less common ‘OPEN’ status that resides in only the last table partition. With 12c, there is now the capability to only index those partitions that are of interest to us, which with proper design can also be implemented such that only those column values of interest are included within an index.

I’m going to drop and create the index as a “Partial” Index:

SQL> drop index pink_floyd_status_i;

Index dropped.

SQL> create index pink_floyd_status_i on pink_floyd(status) indexing partial;

Index created.

SQL> select index_name, num_rows, leaf_blocks, indexing from dba_indexes where index_name = 'PINK_FLOYD_STATUS_I';

INDEX_NAME             NUM_ROWS LEAF_BLOCKS INDEXIN
-------------------- ---------- ----------- -------
PINK_FLOYD_STATUS_I     1000000        3068 PARTIAL

The new INDEXING PARTIAL clause means only those table partitions with INDEXING ON are to be included within the index.

Notice how the index, which is a Global, Non-Partitioned Index, now only has 1M entries (not all 3M as previously) and with 3068 leaf blocks is only 1/3 of what it was previously. The INDEXING column now denotes this as a “Partial” index.

If we run the query again that only explicitly references the last “active” table partition:

SQL> select * from pink_floyd where status = 'OPEN' and id >= 2000001;

100 rows selected.

Execution Plan
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                     |   100 |  2500 |     4   (0)| 00:00:01 |       |       |
|*  1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| PINK_FLOYD          |   100 |  2500 |     4   (0)| 00:00:01 |     3 |     3 |
|*  2 |   INDEX RANGE SCAN                         | PINK_FLOYD_STATUS_I |    33 |       |     3   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID">=2000001)
2 - access("STATUS"='OPEN')

Statistics
----------------------------------------------------------

0  recursive calls

0  db block gets

110  consistent gets

0  physical reads

0  redo size

2787  bytes sent via SQL*Net to client

609  bytes received via SQL*Net from client

8  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

100  rows processed

We see that the index is used as it was previously. By stating with the ID > 2000001 predicate we’re only interested in data that can only reside in the last table partition, the partition with INDEXING ON, the CBO knows the index can be used to retrieve all the rows of interest. If we know the application will only extract data in this manner, all is well with our smaller, Partial index.

However, if it’s possible within the application to perhaps search for STATUS values from other partitions, that have INDEXING OFF:

SQL> select * from pink_floyd where status = 'OPEN';

100 rows selected.

Execution Plan

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                     |   100 |  2500 |  2474   (1)| 00:00:01 |       |
|   1 |  VIEW                                        | VW_TE_2             |    99 |  3465 |  2474   (1)| 00:00:01 |       |
|   2 |   UNION-ALL                                  |                     |       |       |            |          |       |
|*  3 |    TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| PINK_FLOYD          |    33 |   825 |     4   (0)| 00:00:01 | ROWID | ROWID
|*  4 |     INDEX RANGE SCAN                         | PINK_FLOYD_STATUS_I |   100 |       |     3   (0)| 00:00:01 |       |
|   5 |    PARTITION RANGE ITERATOR                  |                     |    66 |  1650 |  2470   (1)| 00:00:01 |     1 |     2
|*  6 |     TABLE ACCESS FULL                        | PINK_FLOYD          |    66 |  1650 |  2470   (1)| 00:00:01 |     1 |     2
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("PINK_FLOYD"."ID">=2000001 OR "PINK_FLOYD"."ID" IS NULL)
4 - access("STATUS"='OPEN')
6 - filter("STATUS"='OPEN')

Statistics
----------------------------------------------------------

0  recursive calls

0  db block gets

16341  consistent gets

8204  physical reads

0  redo size

2787  bytes sent via SQL*Net to client

609  bytes received via SQL*Net from client

8  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

100  rows processed

We notice the index is still used to efficiently find those rows of interest from the last partition, but a Full Table (Partition) Scan is performed to search for data from the other two partitions, for which with INDEXING OFF means the index does not contain entries that reference these partitions. As a result, this query is now much more expensive than it was previously as the index can not be used to exclusively find the rows of interest. The CBO within the one execution plan uses the index where it can and a full scan of the other partitions where it can’t use the index.

If however we were a little cleverer in how we designed our table and also created table subpartitions based on the STATUS column so that rows of interest resided in their own subpartitions and only set INDEXING ON for these subpartitions:

SQL> create table pink_floyd (id number, status varchar2(6), name varchar2(30))
indexing off
partition by range (id) subpartition by list(status)
subpartition template
(subpartition closed values ('CLOSED') indexing off, subpartition open values ('OPEN') indexing on)
(partition pf1 values less than (1000001),
partition pf2 values less than (2000001),
partition pf3 values less than (maxvalue))
enable row movement;

Table created.

Notice how only the subpartitions with a STATUS of ‘OPEN’ are now to be indexed. If we populate the table with the exact same data as before, we find the table and partial index have the follow characteristics:

SQL> select subpartition_position, subpartition_name, num_rows, indexing from dba_tab_subpartitions where table_name = 'PINK_FLOYD';

SUBPARTITION_POSITION SUBPARTITION_NAME      NUM_ROWS IND
--------------------- -------------------- ---------- ---
                    1 PF1_CLOSED              1000000 OFF
                    2 PF1_OPEN                      0 ON
                    1 PF2_CLOSED              1000000 OFF
                    2 PF2_OPEN                      0 ON
                    1 PF3_CLOSED               999900 OFF
                    2 PF3_OPEN                    100 ON

SQL> select index_name, num_rows, leaf_blocks, indexing from dba_indexes where index_name = 'PINK_FLOYD_STATUS_I';

INDEX_NAME             NUM_ROWS LEAF_BLOCKS INDEXIN
-------------------- ---------- ----------- -------
PINK_FLOYD_STATUS_I         100           1 PARTIAL

We can see that only the subpartitions with data of interest are now indexed. The resultant Partial global non-partitioned index is now tiny, with just the 100 index entries of interest residing in a single leaf block.

Just as importantly, a query searching for this data across the whole table is now extremely efficient and can be fully serviced by this tiny Partial index:

SQL> select * from pink_floyd where status = 'OPEN';

100 rows selected.

Execution Plan
----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                     |   100 |  2500 |     2   (0)| 00:00:01 |       |       |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| PINK_FLOYD          |   100 |  2500 |     2   (0)| 00:00:01 | ROWID | ROWID |
|*  2 |   INDEX RANGE SCAN                         | PINK_FLOYD_STATUS_I |   100 |       |     1   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("STATUS"='OPEN')

Statistics
----------------------------------------------------------

0  recursive calls

0  db block gets

16  consistent gets

0  physical reads

0  redo size

4479  bytes sent via SQL*Net to client

609  bytes received via SQL*Net from client

8  SQL*Net roundtrips to/from client

0  sorts (memory)

0  sorts (disk)

100  rows processed

The query can now be fully serviced by the Partial index as all subpartitions that could contain data of interest are indexed and because the data of interest is all neatly clustered within the table subpartitions, can be retrieved with far fewer consistent gets than previously.

If we ever wanted to access those STATUS values of ‘CLOSED’, the CBO can only do so via a Full Table Scan as such values are not indexed. However, as these values represent the vast majority of rows in the table, the Full Table Scan would be the most appropriate and efficient manner to access these rows any-ways.

The new Partial Index capabilities introduced in Oracle 12c enables us to easily use global (and local) indexes to just index data of interest without having to change the application. Such a capability has many potential uses.

More on Partial Indexes to come soon in Part II.

Comments»

1. Matthias Rogel - July 8, 2013

If however we were a little cleverer in how we designed our table and also created table subpartitions based on the STATUS column so that rows of interest resided in their own subpartitions, we need no index at all, partition pruning will do the whole job for us ?

sokrates@12.1 > select * from pink_floyd where status = 'OPEN';
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 878218542

-----------------------------------------------------------------------------------------------------
| Id  | Operation	       | Name	    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |	    |	100 |  3500 |	 14   (0)| 00:00:01 |	    |	    |
|   1 |  PARTITION RANGE ALL   |	    |	100 |  3500 |	 14   (0)| 00:00:01 |	  1 |	  3 |
|   2 |   PARTITION LIST SINGLE|	    |	100 |  3500 |	 14   (0)| 00:00:01 |	  2 |	  2 |
|   3 |    TABLE ACCESS FULL   | PINK_FLOYD |	100 |  3500 |	 14   (0)| 00:00:01 |	KEY |	KEY |
-----------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Like

Richard Foote - July 8, 2013

Hi Matthias

Indeed, depending of the costs between a full partition scan and a partial index scan, the partition pruning could well be sufficient and cheaper. I just tried to keep my example as simplistic as possible to illustrate the point but in a more typrical scenario, where you might have a number of relatively uncommon values (‘OPENED’, ‘ACTIONED’, ‘INTRANSIT’, ‘AWAITING PUBLICATION’, ‘LOST’, ‘FINAL PAYMENT’, etc. etc.) vs. the most common ‘COMPLETE’, then a partial index scan on a specific status would make more sense, especially if I have to full scan multiple (sub)partitions.

Note that my partial index scan cost of 2 beats your FTS of 14 because of my environment setup 🙂

Like

2. Hans-Peter Sloot - July 25, 2013

Hello Richard,

Do you now what that :
“TABLE ACCESS BY INDEX ROWID BATCHED TABLE” means operation in the explain plan?

Regards Hans-Peter

Like

Richard Foote - July 26, 2013

Hi Hans-Peter

I recently answered this same question in another post.

The index rowid batched basically means Oracle can potentially access a number of rowids and sort them out into distinct blocks and so only access each distinct table block the once.

Like

3. marcos cubric - October 12, 2013

Hi,

How does this compare to making partitions of the index unusable?

Thanks
Marcos

Like

Richard Foote - November 14, 2013

Hi Marcos

Well, with regard to local indexes, this is pretty well what happens. Partitions and sub-partitions with indexing off have unusable local indexes.

But the real power and flexibility here is in relation to global indexes (both partitioned or not) in which they can contain just the data of interest from table partitions of interest.

Like

4. DB Oriented - January 5, 2014

[…] indexes: it is possible to create an index on a subset of partitions [posts by Richard Foote: part 1, part […]

Like

5. 12c partial indexes and list partitioning | Dmitry Nikiforov's Blog - January 13, 2014
6. Rainer Stenzel - October 22, 2015

Why is Oracle not using the more general concept of partial/filtered indexes, limiting it to the (one) partition key ?
https://en.wikipedia.org/wiki/Partial_index

Click to access ERL-M89-17.pdf

Like

Matthias Rogel - October 22, 2015

Hi Rainer,

it does ?

PostgreSQL
create index partial_status on txn_table (status)
where status in (‘A’, ‘P’, ‘W’);

translates into Oracle as
create index partial_status on txn_table (case when status status in (‘A’, ‘P’, ‘W’) then status end);

?

Like

Rainer Stenzel - October 22, 2015

But
then Oracle should also be smart enough to use such function based indexes even for queries not using the exactly matching predicate, i,e “where status=’P'”.
https://community.oracle.com/ideas/2624

Like

Richard Foote - November 2, 2015

Hi Rainer

I agree with you. It would be a nice enhancement if Oracle could provide partial indexing capabilities with non-partitioned tables or without the need for specific functioned-based indexes that need to be referenced within the application.

Oracle has no real idea on the output of a function which is why the function needs to be included within the application (although the trunc function is an exception and the CBO can do some clever things with a virtual column based on the trunc of a date for example).

Like

7. 12c partial indexes and list partitioning | Oracle, Java and old programmer in one boat (to say nothing of the dog…) - March 20, 2018
8. Index rebuild bug | Oracle Scratchpad - December 1, 2018

[…] before the introduction of partial indexing it was possible to emulate the feature manually for local indexes by setting partitions and […]

Like


Leave a comment