“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part I (The Jean Genie) October 4, 2018
Posted by Richard Foote in Global Indexes, Local Indexes, Non-Partitioned Indexes, Oracle Indexes, Partitioned Indexes, Partitioning.trackback
When it comes to indexing a partitioned table, many automatically opt for Local Indexes, as it’s often assumed they’re simply easier to manage and more efficient than a corresponding Global Index.
Having smaller index structures that are aligned to how the table is partitioned certainly has various advantages. The focus in this little series is on the “more efficient” claim.
A key point that many miss is that a Non-Partitioned Index on a Non-Partitioned table is not exactly the same beast as a Non-Partitioned Index on a Partitioned Table. The purpose of this initial post is to illustrate this difference.
Let’s begin by creating a Non-Partitioned table that has a number of years worth of data:
SQL> CREATE TABLE big_ziggy (id number, album_id number, country_id number, release_date date, total_sales number); Table created. SQL> INSERT INTO big_ziggy SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800), mod(rownum,200000)+1 FROM dual CONNECT BY LEVEL 2000000; 2000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_ZIGGY'); PL/SQL procedure successfully completed.
So we have a 2M row table with about 8 years worth of data (based on the RELEASE_DATE column) and a TOTAL_SALES column that has some 200,000 distinct columns throughout this period.
Let’s next create a standard Non-Partitioned index based on the TOTAL_SALES column:
SQL> create index big_ziggy_total_sales_i on big_ziggy(total_sales); Index created.
If we now run a query to access the 10 rows with a value equal to 42:
SQL> SELECT * FROM big_ziggy WHERE total_sales = 42; ID ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES ---------- ---------- ---------- --------- ----------- 1400041 42 42 24-AUG-18 42 400041 42 42 28-JAN-12 42 1000041 42 42 20-JUL-17 42 1800041 42 42 28-JAN-12 42 600041 42 42 15-JUN-16 42 800041 42 42 03-MAR-13 42 1200041 42 42 07-APR-14 42 1600041 42 42 12-MAY-15 42 200041 42 42 12-MAY-15 42 41 42 42 24-AUG-18 42 10 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1252095634 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 250 | 13 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BIG_ZIGGY | 10 | 250 | 13 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | BIG_ZIGGY_TOTAL_SALES_I | 10 | | 3 (0) | 00:00:01 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TOTAL_SALES"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 14 consistent gets 0 physical reads 0 redo size 1184 bytes sent via SQL*Net to client 624 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
We notice we need 14 consistent gets to access these 10 rows, 4 gets for index block accesses and 10 gets to access the relevant rows from the table blocks (as we have a terrible clustering due to the relevant data being distributed throughout the table).
If we run a query where we’re only interested in accessing data only within a specific year:
SQL> SELECT * FROM big_ziggy WHERE total_sales = 42 and release_date between '01-JAN-2017' and '31-JUL-2017'; ID ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES ---------- ---------- ---------- --------- ----------- 1000041 42 42 20-JUL-17 42 Execution Plan ---------------------------------------------------------- Plan hash value: 1252095634 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 25 | 13 (0) | 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BIG_ZIGGY | 1 | 25 | 13 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | BIG_ZIGGY_TOTAL_SALES_I | 10 | | 3 (0) | 00:00:01 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RELEASE_DATE">=TO_DATE(' 2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "RELEASE_DATE"<=TO_DATE(' 2017-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access("TOTAL_SALES"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 14 consistent gets 0 physical reads 0 redo size 885 bytes sent via SQL*Net to client 624 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Even though we now only return the one row, notice we still have to perform the same 14 consistent gets. That’s because the RELEASE_DATE column is NOT part of the index, so we still need to fetch all 10 matching rows with TOTAL_SALES=42 and then filter out those that don’t have a RELEASE_DATE of interest. The note above in the predicate information shows we now have this additional filtering taking place.
Let’s run the same queries on a table with identical data, but this time on a table that is partitioned based on the RELEASE_DATE column, with a partition for each years worth of data:
SQL> CREATE TABLE big_bowie(id number, album_id number, country_id number, release_date date, total_sales number) 2 PARTITION BY RANGE (release_date) 3 (PARTITION ALBUMS_2011 VALUES LESS THAN (TO_DATE('01-JAN-2012', 'DD-MON-YYYY')), 4 PARTITION ALBUMS_2012 VALUES LESS THAN (TO_DATE('01-JAN-2013', 'DD-MON-YYYY')), 5 PARTITION ALBUMS_2013 VALUES LESS THAN (TO_DATE('01-JAN-2014', 'DD-MON-YYYY')), 6 PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')), 7 PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')), 8 PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')), 9 PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')), 10 PARTITION ALBUMS_2018 VALUES LESS THAN (MAXVALUE)); Table created. SQL> INSERT INTO big_bowie SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800), mod(rownum,200000)+1 FROM dual CONNECT BY LEVEL 2000000; 2000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE'); PL/SQL procedure successfully completed.
Again, we create a standard, Non-Partitioned Index:
SQL> create index big_bowie_total_sales_i on big_bowie(total_sales); Index created.
If we now run the equivalent of the first query:
SQL> SELECT * FROM big_bowie WHERE total_sales = 42; ID ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES ---------- ---------- ---------- --------- ----------- 400041 42 42 28-JAN-12 42 1800041 42 42 28-JAN-12 42 800041 42 42 03-MAR-13 42 1200041 42 42 07-APR-14 42 1600041 42 42 12-MAY-15 42 200041 42 42 12-MAY-15 42 600041 42 42 15-JUN-16 42 1000041 42 42 20-JUL-17 42 41 42 42 24-AUG-18 42 1400041 42 42 24-AUG-18 42 10 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1761527485 -------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart | Pstop | -------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 250 | 13 (0) | 00:00:01 | | | | 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE | 10 | 250 | 13 (0) | 00:00:01 | ROWID | ROWID | |* 2 | INDEX RANGE SCAN | BIG_BOWIE_TOTAL_SALES_I | 10 | | 3 (0) | 00:00:01 | | | -------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TOTAL_SALES"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 14 consistent gets 0 physical reads 0 redo size 1184 bytes sent via SQL*Net to client 624 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
We get the exact same performance, with the same 14 consistent gets necessary to access the 10 rows of interest.
If we now run the equivalent of the second query:
SQL> SELECT * FROM big_bowie WHERE total_sales = 42 and release_date between '01-JAN-2017' and '31-JUL-2017'; ID ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES ---------- ---------- ---------- --------- ----------- 1000041 42 42 20-JUL-17 42 Execution Plan ---------------------------------------------------------- Plan hash value: 1081241859 -------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | Pstart | Pstop | -------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 25 | 13 (0) | 00:00:01 | | | |* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE | 1 | 25 | 13 (0) | 00:00:01 | 7 | 7 | |* 2 | INDEX RANGE SCAN | BIG_BOWIE_TOTAL_SALES_I | 10 | | 3 (0) | 00:00:01 | | | -------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RELEASE_DATE"=TO_DATE(' 2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access("TOTAL_SALES"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 885 bytes sent via SQL*Net to client 624 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We notice a key difference. Even though it’s equivalent to the same index as in the previous Non-Partitioned table and even though the index only contains just the TOTAL_SALES column, the number of consistent gets has dropped from 14 to just 5 consistent gets.
In this example, Oracle has clearly not had to fetch the rows from the table that do not match the RELEASE_DATE of interest. Even though the predicate information is listing the requirement for filtering to take place, this filtering has clearly been performed within the index, without having to actually fetch any of the rows that aren’t of interest.
The index is able to only access the row(s) of interest from the Partitioned Table…
This is the little “hidden efficiency” of Global Indexes on Partitioned Tables, which is what we effectively have here.
In Part II, I’ll discuss how Oracle does this additional filtering within the index and why understanding this is important in deciding which type of index to deploy, as from a “performance” perspective, Global Indexes are often the preferred option.
Is it because a global index on a non-partitioned table does include the partition column as part of the index so that the query didn’t need to fetch the data from the table?
LikeLike
Hi Yusuf
No, that’s not the explanation. Stay tuned 🙂
Cheers
Richard
LikeLike
[…] Part I of this series, I highlighted how a Non-Partitioned Global Index on a Partitioned Table is able to […]
LikeLike
[…] Part I and Part II of this series, we looked at how Global Indexes can effectively perform […]
LikeLike
[…] Part I, Part II and Part III we looked at some advantages of Global Indexes that may not be obvious to […]
LikeLike
Hi
Does this partition pruning on index level happen also with list partitioning by multiple columns and concerned partitions not being uniquely identified?
lh
LikeLike
Hi
As and answer to my previous question: it looks like that partition pruning does happen also with partitioning with multiple colums.
However:
create table t_demo_fact (
t_c1 varchar2(2 char),
t_c2 varchar2(10 char),
ref_c number(9)
);
create table t_demo_reference_table (
c_key number(9) not null,
t_status varchar2(10 char),
t_c3 varchar2(100)
)
partition by list(t_status) (
partition p_active values (‘ACTIVE’),
partition p_archived values (‘ARCHIVE’)
);
create unique index pk_t_demo_reference_table on t_demo_reference_table(c_key);
alter table t_demo_reference_table add constraint pk_t_demo_reference_table primary key (c_key);
select * from t_demo_fact where ref_c in (select c_key from t_demo_reference_table where t_status=’ACTIVE’);
Now execution plan is:
| 0 | SELECT STATEMENT | | | | 3 (100)| | | |
| 1 | NESTED LOOPS | | 1 | 76 | 3 (0)| 00:00:01 | | |
| 2 | NESTED LOOPS | | 1 | 76 | 3 (0)| 00:00:01 | | |
| 3 | TABLE ACCESS FULL | T_DEMO_FACT | 1 | 41 | 3 (0)| 00:00:01 | | |
|* 4 | INDEX UNIQUE SCAN | PK_T_DEMO_REFERENCE_TABLE | 1 | | 0 (0)| | | |
|* 5 | TABLE ACCESS BY GLOBAL INDEX ROWID| T_DEMO_REFERENCE_TABLE | 1 | 35 | 0 (0)| | 1 | 1 |
There is an unnecessary access to T_DEMO_REFERENCE_TABLE. In index and in partition definitions there is information, which keys are in partition containing values ‘ACTIVE’.
These find of structures, partitioning by status, product etc. colums have became more attractive with the ability to partition table by multiple columns.
lh
LikeLike