12.1.0.2 Introduction to Zone Maps Part I (Map Of The Problematique) September 3, 2014
Posted by Richard Foote in Uncategorized.trackback
Zone Maps are new index-like structures that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A zone is simply a range of contiguous blocks within a table. Zone Maps are similar in concept to Exadata storage indexes but are physical objects that can be explicitly controlled and maintained. Additionally, in a similar manner to Bitmap-Join indexes, they can also be defined on a table with an outer join to other tables, with the Zone Map attributes based on columns from these joined tables. They can also be created as part of the new Attribute Clustering capabilities I introduced in my last post.
Before getting too excited though, Zone Maps unfortunately requires the Partitioning Option AND either Exadata or SuperCluster. If you’re still interested, read on 🙂
I’ll start by creating and populating my BIG_BOWIE table. For this specific post, the key thing to note here is that the ALBUM_ID column is very well clustered.
SQL> create table big_bowie (id number not null, album_id number not null, artist_id number not null, format_id number, release_date date, total_sales number, description varchar2(100)); Table created. SQL> create sequence bowie_seq order; Sequence created. SQL> create or replace procedure pop_big_bowie as begin for v_album_id in 1..100 loop for v_artist_id in 1..100000 loop insert into big_bowie values (bowie_seq.nextval, v_album_id, v_artist_id, ceil(dbms_random.value(0,5)) * 2, trunc(sysdate-ceil(dbms_random.value(0,10000))), ceil(dbms_random.value(0,500000)), 'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'); end loop; commit; end loop; commit; end; / Procedure created. SQL> exec pop_big_bowie PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE', method_opt=>'FOR ALL COLUMNS SIZE 1');
I’ll next create a standard b-tree index on the ALBUM_ID column:
SQL> create index dwh_bowie_album_id_i on dwh_bowie(album_id); Index created.
If I run the following query looking for just specific value of ALBUM_ID (1% of the data):
SQL> select * from big_bowie where album_id=42; 100000 rows selected. Elapsed: 00:00:00.29 Execution Plan ---------------------------------------------------------- Plan hash value: 1830705794 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 100K| 8984K| 1554 (1)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BIG_BOWIE | 100K| 8984K| 1554 (1)| 00:00:01 | |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 100K| | 201 (2)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 1590 consistent gets 0 physical reads 0 redo size 9689464 bytes sent via SQL*Net to client 760 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
The CBO used the index because of the excellent clustering of the data and performed a relatively low 1590 consistent gets (note arraysize was set to 5000).
If I make the index invisible and re-run the query a number of times:
SQL> alter index big_bowie_album_id_i invisible; Index altered. SQL> select * from big_bowie where album_id=42; 100000 rows selected. Elapsed: 00:00:00.29 Execution Plan ---------------------------------------------------------- Plan hash value: 469213804 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 8984K| 3269 (12)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 100K| 8984K| 3269 (12)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ALBUM_ID"=42) filter("ALBUM_ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 135085 consistent gets 135068 physical reads 0 redo size 3130019 bytes sent via SQL*Net to client 760 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
We notice execution times are almost identical despite the Full Table Scan (FTS), as Exadata Storage Indexes (SI) have kicked in. With the data being so well clustered, the SI has been able to very effectively “prune” the physical blocks that need to be accessed. This is clearly highlighted by looking at the sessions statistics of a session that performed one of these queries:
SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index'); NAME MB ---------------------------------------------------------------- ---------- cell physical IO bytes saved by storage index 1038.32813 cell physical IO interconnect bytes returned by smart scan 9.56008911
The vast majority of the table has actually been skipped as a result of the storage indexes.
I’m now going to create a very basic version of a Zone Map, one based just on this ALBUM_ID column:
SQL> create materialized zonemap big_bowie_album_id_zm on big_bowie(album_id); Materialized zonemap created.
Zone Maps are implemented in a similar manner to Materialized Views and are separate table structures that can be refreshed in a similar manner to how Materialized Views are refreshed. This is an important point and a key difference between Zone Maps and conventional indexes in that depending on the refresh property of the Zone Map, may not be updated during DML operations. The default refresh property is REFRESH ON LOAD DATA MOVEMENT, which means the Zone Map in part becomes “stale” after DMLs.
Zone Maps basically store the min and max of the corresponding column(s) within each “zone” of the table. If we have a look at the contents of the Zone Map (below listing in part):
SQL> select zone_id$, min_1_album_id, max_1_album_id, zone_rows$ from big_bowie_album_id_zm; ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_ROWS$ --------------- -------------- -------------- ---------- 385855025152 1 2 66234 385855029250 5 6 56715 385855029251 7 7 76562 385855025155 7 8 76632 385855004675 8 9 76633 385855025161 21 22 75615 385855004684 29 29 75582 385855004685 31 32 75545 385855004687 35 36 75617 385855029267 43 44 75615 385855029270 50 50 75481 385855029275 61 62 75616 385855025179 62 63 75530 385855029284 81 82 75615 385855029285 84 84 75480 385855004710 87 88 75616 385855004711 90 91 75484 385855029293 100 100 75799 385855029254 13 14 75615 385855029255 16 16 75481 385855004681 22 22 75480 385855004682 24 25 75616 ... 385855025184 73 74 75615 385855004705 76 77 75615 385855029283 79 80 75615 385855029286 86 87 75616 385855029287 88 89 75618 385855004714 97 97 75771 385855029295 100 100 15871 134 rows selected.
A few things to note here. Firstly, the amount of data stored for the Zone Map is tiny. Each zone corresponds to approx. 8M of storage or in this case roughly 75,500 rows. This is another key difference between Zone Maps and conventional B-Tree indexes. There’s an index entry per referenced row of the table whereas a Zone Map only has one entry per zone of the table. Note also, because the ALBUM_ID data is so well clustered, each zone has a very narrow min/max range of ALBUM_ID values. This means the Zone Map can be extremely effective in “pruning” storage accesses as Oracle can very easily eliminate the majority zones from containing data of interest.
To highlight just how tiny this Zone Map is:
SQL> select segment_name, segment_type, bytes from dba_segments where segment_name like 'BIG_BOWIE%'; SEGMENT_NAME SEGMENT_TYPE BLOCKS BYTES ------------------------- ------------ ---------- ---------- BIG_BOWIE TABLE 139264 1140850688 BIG_BOWIE_ALBUM_ID_I INDEX 20480 167772160 BIG_BOWIE_ALBUM_ID_ZM TABLE 8 65536
We only need one 64K extent for the Zone Map whereas the Table/Index are much larger. If fact, the 134 rows can actually all fit in just the one 8K data block !!
Two new data dictionary views provides details on our Zone Maps:
SQL> select zonemap_name, pruning, refresh_mode, invalid, stale, unusable from dba_zonemaps where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM'; ZONEMAP_NAME PRUNING REFRESH_MODE INVALID STALE UNUSABLE ------------------------- -------- ----------------- ------- ------- -------- BIG_BOWIE_ALBUM_ID_ZM ENABLED LOAD DATAMOVEMENT NO NO NO
So the current status of the Zone Map can be determined in DBA_ZONEMAPS.
SQL> select measure, position_in_select, agg_function, agg_column_name from dba_zonemap_measures where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM'; MEASURE POSITION_IN_SELECT AGG_FUNCTION AGG_COLUMN_NAME ------------------------------ ------------------ ------------- --------------- "BOWIE"."BIG_BOWIE"."ALBUM_ID" 3 MAX MAX_1_ALBUM_ID "BOWIE"."BIG_BOWIE"."ALBUM_ID" 2 MIN MIN_1_ALBUM_ID
Additionally, details of the columns within the Zone Map can be determined in DBA_ZONEMAP_MEASURES.
If we now re-run the same query on the BIG_BOWIE table:
SQL> select * from big_bowie where album_id=42; 100000 rows selected. Elapsed: 00:00:00.28 Execution Plan ---------------------------------------------------------- Plan hash value: 1980960934 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 8984K| 3269 (12)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE | 100K| 8984K| 3269 (12)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ALBUM_ID"=42) filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42) Statistics ---------------------------------------------------------- 141 recursive calls 0 db block gets 2364 consistent gets 0 physical reads 0 redo size 3130019 bytes sent via SQL*Net to client 760 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
We notice the CBO has used the Zone Map during the FTS (see the TABLE ACCESS STORAGE FULL WITH ZONEMAP execution step and filter predicate information). It’s performing some additional consistent gets over the database index (2364 vs. 1590) but it’s not too bad, nowhere near the 139,000 odd blocks in the table, with performance comparable to before. The extra consistent gets are necessary as any possible zones need to be read in their entirety, whereas a B-Tree index can point to just the specific table blocks that contain data of interest.
So the Zone Map has provided a “good enough” outcome here. Meanwhile, it has required very minimal storage, we can control how the Zone Map is maintained (to be discussed later) and we can guarantee that the Zone Map object will always exist and be present, even after bouncing either the database or Exadata storage servers.
In Part II, I’ll look at what happens when I perform some DML on the base table.
Hey Richard,
Nice post. Zone Maps are cool and well named I think (vs. storage indexes). I’m thinking they should be even more effective than Exa storage indexes because the i/o requests are cut off at the compute layer before ever being sent to the storage layer, but I haven’t had a chance to test that yet. Also min/max pruning seems to be an idea that has really taken root as it shows up in the new in-memory option as well. Anyway, thanks for sharing and keep the posts coming.
Kerry
LikeLike
Thanks Kerry 🙂 A real advantage with Zone Maps over storage indexes I think is that they can be use in a similar manner to bitmap-join indexes in that they can be based on columns from other joined tables. I’ll cover this once I get access again to an Exadata box 🙂
LikeLike
Hi Richard,
each zone will be exactly 8MB in size with the (possible) exception for the first and the last zones for the segment. The zone id is a direct calculation from a (subset of) rowid so the translation can work both ways. The zone size can be controlled by the scale parameter which tells the database how many blocks should be allocated per each zone. Default is 10 which results in 8MB chunks for 8KB database (2^10*8).
LikeLike
Thanks heaps for that Alex. This indeed confirm my observations.
LikeLike
Your post says “Before getting too excited though, Zone Maps unfortunately requires the Partitioning Option AND either Exadata or SuperCluster”; I haven’t been able find that this is Exadata/SC only. Can you please provide a reference? Thanks!
LikeLike
Hi Jeff
Looks like you found the reference. Here’s a link for any visitors:
http://docs.oracle.com/database/121/DBLIC/editions.htm#CJACGHEB
The decision on how all these things get licensed is way way above my humble pay-scale 🙂
LikeLike
Zone Maps are indeed Exadata (or engineered) only. However, there is no technical reason for this; the zone map is evaluated on the server side, not the cell side. Apparently, this appeared in a previous beta version of 12, but was only released as Exadata.
LikeLike
And here goes another nice feature. With such a licensing requirements, I doubt that zone maps will be used much. Oh well, maybe in the next version.
LikeLike
Hi Mladin
I think they’ll be very popular with Exadata storage environments where partitioning is pretty well standard and having full control of storage index behavior has been a common request from DBAs.
LikeLike
Hi Richard,
Interesting feature. How would the performance compare to old school compressed indexes (say for example in your demo on the same ALBUM_ID column)?
LikeLike
Hi Shakir
In the demo above, I actually started out with a normal B-Tree database index which required 1590 consistent gets compared to the 2364 consistent gets with the zone map for the same SQL. If I compressed the index, it would only have a minimum impact as the vast majority of these gets are related to fetching data from the table (only save perhaps 30-40 gets in this example).
If more efficient than a FTS, corresponding indexes will generally outperform zone maps as they can reference just the (say 8k) blocks of interest, rather than full 8M zones. A specific Zone Map can vary between being “good enough” to totally useless depending on the data distribution and can potentially make a cheaper FTS access path more efficient.
LikeLike