12.1.0.2 Introduction to Zone Maps Part III (Little By Little) November 24, 2014
Posted by Richard Foote in 12c, Attribute Clustering, Oracle Indexes, Zone Maps.1 comment so far
I’ve previously discussed the new Zone Map database feature and how they work in a similar manner to Exadata Storage indexes.
Just like Storage Indexes (and conventional indexes for that manner), they work best when the data is well clustered in relation to the Zone Map or index. By having the data in the table ordered in the same manner as the Zone Map, the ranges of the min/max values for each 8M “zone” in the table can be as narrow as possible, making them more likely to eliminate zone accesses.
On the other hand, if the data in the table is not well clustered, then the min/max ranges within the Zone Map can be extremely wide, making their effectiveness limited.
In my previous example on the ALBUM_ID column in my first article on this subject, the data was extremely well clustered and so the associated Zone Map was very effective. But what if the data is poorly clustered ?
To illustrate, I’m going to create a Zone Map based on the poorly clustered ARTIST_ID column, which has its values randomly distributed throughout the whole table:
SQL> create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id); create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id) * ERROR at line 1: ORA-31958: fact table "BOWIE"."BIG_BOWIE" already has a zonemap "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" on it
Another difference between an index and Zone Map is that there can only be the one Zone Map defined per table, but a Zone Map can include multiple columns. As I already have a Zone Map defined on just the ALBUM_ID column, I can’t just create another.
So I’ll drop the current Zone Map and create a new one based on both the ARTIST_ID and ALBUM_ID columns:
SQL> drop materialized zonemap big_bowie_album_id_zm; Materialized zonemap dropped. SQL> create materialized zonemap big_bowie_zm on big_bowie(album_id, artist_id); Materialized zonemap created. SQL> select measure, position_in_select, agg_function, agg_column_name from dba_zonemap_measures where zonemap_name='BIG_BOWIE_ZM'; MEASURE POSITION_IN_SELECT AGG_FUNCTION AGG_COLUMN_NAME -------------------- ------------------ ------------- -------------------- "BOWIE"."BIG_BOWIE". 5 MAX MAX_2_ARTIST_ID "ARTIST_ID" "BOWIE"."BIG_BOWIE". 4 MIN MIN_2_ARTIST_ID "ARTIST_ID" "BOWIE"."BIG_BOWIE". 3 MAX MAX_1_ALBUM_ID "ALBUM_ID" "BOWIE"."BIG_BOWIE". 2 MIN MIN_1_ALBUM_ID "ALBUM_ID"
So this new Zone Map has min/max details on each zone in the table for both the ARTIST_ID and ALBUM_ID columns.
The min/max ranges of a Zone Map provides an excellent visual representation of the clustering of the data. If I select Zone Map details of the ALBUM_ID column (see partial listing below):
SQL> select zone_id$, min_1_album_id, max_1_album_id, zone_rows$ from big_bowie_zm; ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_ROWS$ ---------- -------------- -------------- ---------- 3.8586E+11 1 2 66234 3.8586E+11 5 6 56715 3.8586E+11 7 7 76562 3.8586E+11 7 8 76632 3.8586E+11 8 9 76633 3.8586E+11 21 22 75615 3.8586E+11 29 29 75582 3.8586E+11 31 32 75545 3.8586E+11 35 36 75617 3.8586E+11 43 44 75615 ... 3.8586E+11 76 77 75615 3.8586E+11 79 80 75615 3.8586E+11 86 87 75616 3.8586E+11 88 89 75618 3.8586E+11 97 97 75771 3.8586E+11 100 100 15871 134 rows selected.
As the data in the table is effectively ordered based on the ALBUM_ID column (and so is extremely well clustered in relation to this column), the min/max ranges for each zone is extremely narrow. Each zone basically only contains one or two different values of ALBUM_ID and so if I’m just after a specific ALBUM_ID value, the Zone Map is very effective in eliminating zones from having to be accessed. Just what we want.
However, if we look at the Zone Map details of the poorly clustered ARTIST_ID column (again just a partial listing):
SQL> select zone_id$, min_2_artist_id, max_2_artist_id, zone_rows$ from big_bowie_zm; ZONE_ID$ MIN_2_ARTIST_ID MAX_2_ARTIST_ID ZONE_ROWS$ ---------- --------------- --------------- ---------- 3.8586E+11 3661 98244 66234 3.8586E+11 1 100000 56715 3.8586E+11 5273 81834 76562 3.8586E+11 1 100000 76632 3.8586E+11 1 100000 76633 3.8586E+11 1 100000 75615 3.8586E+11 2383 77964 75582 3.8586E+11 1 100000 75545 3.8586E+11 1 100000 75617 3.8586E+11 1 100000 75615 ... 3.8586E+11 1 100000 75615 3.8586E+11 1 100000 75615 3.8586E+11 1 100000 75615 3.8586E+11 1 100000 75615 3.8586E+11 1 100000 75616 3.8586E+11 1 100000 75618 3.8586E+11 4848 80618 75771 3.8586E+11 84130 100000 15871 134 rows selected.
We notice the ranges for most of the zones is extremely large, with many actually having a min value of 1 (the actual minimum) and a max of 100000 (the actual maximum). This is a worst case scenario as a specific required value could potentially reside in most of the zones, thereby forcing Oracle to visit most zones and making the Zone Map totally ineffective.
If we run a query searching for a specific ARTIST_ID:
SQL> select * from big_bowie where artist_id=42; 100 rows selected. Elapsed: 00:00:00.69 Execution Plan ---------------------------------------------------------- Plan hash value: 1980960934 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99 | 9108 | 3291 (13)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE | 99 | 9108 | 3291 (13)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ARTIST_ID"=42) filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ARTIST_ID" > :1 OR zm."MAX_2_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ZM" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ARTIST_ID"=42) Statistics ---------------------------------------------------------- 141 recursive calls 0 db block gets 101614 consistent gets 0 physical reads 0 redo size 5190 bytes sent via SQL*Net to client 618 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
We notice we are forced to perform a very high number of consistent gets (101,614) when returning just 100 rows, much higher than the 2,364 consistent gets required to return a full 100,000 rows for a specific ALBUM_ID and not far from the 135,085 consistent gets when performing a full table scan.
We need to improve the performance of these queries based on the ARTIST_ID column …
Let’s drop this zone map:
SQL> drop materialized zonemap big_bowie_zm; Materialized zonemap dropped.
and change the physical clustering of the data in the table so that the data is primarily now clustered in ARTIST_ID order:
SQL> alter table big_bowie add clustering by linear order(artist_id, album_id) with materialized zonemap; Table altered.
So we have added a clustering attribute to this table (previously discussed here) and based a new Zone Map on this clustering at the same time.
SQL> select zonemap_name from dba_zonemaps where fact_table='BIG_BOWIE'; ZONEMAP_NAME --------------- ZMAP$_BIG_BOWIE SQL> select zonemap_name, pruning, with_clustering, invalid, stale, unusable from dba_zonemaps where zonemap_name = 'ZMAP$_BIG_BOWIE'; ZONEMAP_NAME PRUNING WITH_CLUSTERING INVALID STALE UNUSABLE --------------- -------- --------------- ------- ------- -------- ZMAP$_BIG_BOWIE ENABLED YES NO NO NO
However, as we haven’t actually reorganized the table, the rows in the table are still clustered the same as before:
SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie; ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$ ---------- -------------- -------------- ---------- 3.8586E+11 43 44 75615 3.8586E+11 1 2 66234 3.8586E+11 81 82 75615 3.8586E+11 29 29 75582 3.8586E+11 50 50 75481 3.8586E+11 90 91 75484 3.8586E+11 5 6 56715 3.8586E+11 7 8 76632 3.8586E+11 8 9 76633 3.8586E+11 16 16 75481 ... 3.8586E+11 44 44 75480 3.8586E+11 82 83 75616 3.8586E+11 100 100 15871 3.8586E+11 34 35 75576 3.8586E+11 14 15 75615 3.8586E+11 33 34 75616 3.8586E+11 3 5 75707 134 rows selected. SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie; ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$ ---------- --------------- --------------- ---------- 3.8586E+11 1 100000 75545 3.8586E+11 1 100000 75616 3.8586E+11 1 100000 75617 3.8586E+11 1 100000 75911 3.8586E+11 1 100000 75616 3.8586E+11 1 100000 75616 3.8586E+11 1 100000 75615 3.8586E+11 1 100000 75616 3.8586E+11 132 75743 75612 3.8586E+11 1 100000 75615 ... 3.8586E+11 1 100000 66296 3.8586E+11 1 100000 75615 3.8586E+11 2360 96960 75701 3.8586E+11 1 100000 75615 3.8586E+11 1 100000 75616 3.8586E+11 23432 98911 75480 3.8586E+11 1 100000 75791 3.8586E+11 21104 96583 75480 134 rows selected.
But if we now reorganise the table so that the clustering attribute can take effect:
SQL> alter table big_bowie move; Table altered.
We notice the characteristics of the Zone Map has change dramatically. The previously well clustered ALBUM_ID now has a totally ineffective Zone Map with all the ranges effectively consisting of the full min/max values:
SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie; ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$ ---------- -------------- -------------- ---------- 3.9704E+11 1 142 21185 3.9704E+11 1 100 9452 3.9704E+11 1 100 76516 3.9704E+11 1 100 75501 3.9704E+11 1 100 75497 3.9704E+11 1 100 75501 3.9704E+11 1 100 75499 3.9704E+11 1 100 75504 3.9704E+11 1 100 75500 3.9704E+11 1 100 75501 ... 3.9704E+11 1 100 75503 3.9704E+11 1 100 75498 3.9704E+11 1 100 75501 3.9704E+11 1 100 75501 3.9704E+11 1 100 75501 3.9704E+11 1 100 75501 3.9704E+11 1 100 75794 144 rows selected.
While the previously ineffective Zone Map on the ARTIST_ID column is now much more effective with significantly smaller min/max ranges for each zone:
SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie; ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$ ---------- --------------- --------------- ---------- 3.9704E+11 67 1036 21185 3.9704E+11 2359 2453 9452 3.9704E+11 8341 9106 76516 3.9704E+11 18933 19688 75501 3.9704E+11 22708 23463 75497 3.9704E+11 26483 27238 75501 3.9704E+11 27238 27993 75499 3.9704E+11 33278 34033 75504 3.9704E+11 36674 40449 75500 3.9704E+11 38563 39318 75501 ... 3.9704E+11 49888 50643 75503 3.9704E+11 62723 63478 75498 3.9704E+11 77824 78579 75501 3.9704E+11 82354 83109 75501 3.9704E+11 88394 89149 75501 3.9704E+11 93679 94434 75501 3.9704E+11 98211 98969 75794 144 rows selected.
The same query now runs so much faster as the Zone Map can eliminate almost all zones from being accessed:
SQL> select * from big_bowie where artist_id=42; 100 rows selected. Elapsed: 00:00:00.02 Execution Plan ---------------------------------------------------------- Plan hash value: 1980960934 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 99 | 9108 | 3291 (13)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE | 99 | 9108 | 3291 (13)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ARTIST_ID"=42) filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ARTIST_ID" > :1 OR zm."MAX_1_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ARTIST_ID"=42) Statistics ---------------------------------------------------------- 187 recursive calls 0 db block gets 175 consistent gets 0 physical reads 0 redo size 5190 bytes sent via SQL*Net to client 618 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 9 sorts (memory) 0 sorts (disk) 100 rows processed
Consistent gets has reduced dramatically down to just 175 from the previously massive 101,614.
As is common with changing the clustering of data, what improves one thing makes something else significantly worse. The previously efficient accesses based on the ALBUM_ID column is now nowhere near as efficient as before:
SQL> select * from big_bowie where album_id = 42; 100000 rows selected. Elapsed: 00:00:01.27 Execution Plan ---------------------------------------------------------- Plan hash value: 1980960934 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 8984K| 3269 (12)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE | 100K| 8984K| 3269 (12)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ALBUM_ID"=42) filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ALBUM_ID" > :1 OR zm."MAX_2_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42) Statistics ---------------------------------------------------------- 187 recursive calls 0 db block gets 141568 consistent gets 0 physical reads 0 redo size 4399566 bytes sent via SQL*Net to client 73878 bytes received via SQL*Net from client 6668 SQL*Net roundtrips to/from client 9 sorts (memory) 0 sorts (disk) 100000 rows processed
We now have to perform a whopping 141,568 consistent gets up from the previous 2,364 consistent gets.
So Zone Maps, like database indexes and Exadata Storage Indexes, can be extremely beneficial in reducing I/O but their effectiveness is very much dependant on the clustering of the underlining data.