jump to navigation

12.1.0.2 Introduction to Zone Maps Part I (Map Of The Problematique) September 3, 2014

Posted by Richard Foote in Uncategorized.
trackback

Zone Maps are new index-like structures that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A zone is simply a range of contiguous blocks within a table. Zone Maps are similar in concept to Exadata storage indexes but are physical objects that can be explicitly controlled and maintained. Additionally, in a similar manner to Bitmap-Join indexes, they can also be defined on a table with an outer join to other tables, with the Zone Map attributes based on columns from these joined tables. They can also be created as part of the new Attribute Clustering capabilities I introduced in my last post.

Before getting too excited though, Zone Maps unfortunately requires the Partitioning Option AND either Exadata or SuperCluster. If you’re still interested, read on 🙂

I’ll start by creating and populating my BIG_BOWIE table. For this specific post, the key thing to note here is that the ALBUM_ID column is very well clustered.

SQL> create table big_bowie (id number not null, album_id number not null, artist_id number not null, format_id number,

release_date date, total_sales number, description varchar2(100));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> create or replace procedure pop_big_bowie as
  begin
    for v_album_id in 1..100 loop
        for v_artist_id in 1..100000 loop
            insert into big_bowie values (bowie_seq.nextval, v_album_id, v_artist_id, ceil(dbms_random.value(0,5)) * 2,
     trunc(sysdate-ceil(dbms_random.value(0,10000))), ceil(dbms_random.value(0,500000)), 'THE RISE AND FALL OF ZIGGY

STARDUST AND THE SPIDERS FROM MARS');
        end loop;
          commit;
    end loop;
    commit;
end;
/

Procedure created.

SQL> exec pop_big_bowie

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE', method_opt=>'FOR ALL COLUMNS SIZE 1');

 

 

I’ll next create a standard b-tree index on the ALBUM_ID column:

SQL> create index dwh_bowie_album_id_i on dwh_bowie(album_id);

Index created.

If I run the following query looking for just specific value of ALBUM_ID (1% of the data):

SQL> select * from big_bowie where album_id=42;

100000 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 1830705794

------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   100K|  8984K|  1554   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BIG_BOWIE            |   100K|  8984K|  1554   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BIG_BOWIE_ALBUM_ID_I |   100K|       |   201   (2)| 00:00:01 |
------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ALBUM_ID"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1590  consistent gets
          0  physical reads
          0  redo size
    9689464  bytes sent via SQL*Net to client
        760  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

The CBO used the index because of the excellent clustering of the data and performed a relatively low 1590 consistent gets (note arraysize was set to 5000).

If I make the index invisible and re-run the query a number of times:

SQL> alter index big_bowie_album_id_i invisible;

Index altered.

SQL> select * from big_bowie where album_id=42;

100000 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 469213804

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
---------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("ALBUM_ID"=42)
       filter("ALBUM_ID"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     135085  consistent gets
     135068  physical reads
          0  redo size
    3130019  bytes sent via SQL*Net to client
        760  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

 

We notice execution times are almost identical despite the Full Table Scan (FTS), as Exadata Storage Indexes (SI) have kicked in. With the data being so well clustered, the SI has been able to very effectively “prune” the physical blocks that need to be accessed. This is clearly highlighted by looking at the sessions statistics of a session that performed one of these queries:

SQL> select name , value/1024/1024 MB
from v$statname n,  v$mystat s
where n.statistic# = s.statistic# and
n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index'); 

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1038.32813
cell physical IO interconnect bytes returned by smart scan       9.56008911

The vast majority of the table has actually been skipped as a result of the storage indexes.

I’m now going to create a very basic version of a Zone Map, one based just on this ALBUM_ID column:

SQL> create materialized zonemap big_bowie_album_id_zm on big_bowie(album_id);

Materialized zonemap created.

Zone Maps are implemented in a similar manner to Materialized Views and are separate table structures that can be refreshed in a similar manner to how Materialized Views are refreshed. This is an important point and a key difference between Zone Maps and conventional indexes in that depending on the refresh property of the Zone Map, may not be updated during DML operations. The default refresh property is REFRESH ON LOAD DATA MOVEMENT, which means the Zone Map in part becomes “stale” after DMLs.

Zone Maps basically store the min and max of the corresponding column(s) within each “zone” of the table. If we have a look at the contents of the Zone Map (below listing in part):

SQL> select zone_id$, min_1_album_id, max_1_album_id, zone_rows$ 

from big_bowie_album_id_zm;

       ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_ROWS$
--------------- -------------- -------------- ----------
   385855025152              1              2      66234
   385855029250              5              6      56715
   385855029251              7              7      76562
   385855025155              7              8      76632
   385855004675              8              9      76633
   385855025161             21             22      75615
   385855004684             29             29      75582
   385855004685             31             32      75545
   385855004687             35             36      75617
   385855029267             43             44      75615
   385855029270             50             50      75481
   385855029275             61             62      75616
   385855025179             62             63      75530
   385855029284             81             82      75615
   385855029285             84             84      75480
   385855004710             87             88      75616
   385855004711             90             91      75484
   385855029293            100            100      75799
   385855029254             13             14      75615
   385855029255             16             16      75481
   385855004681             22             22      75480
   385855004682             24             25      75616


...

   385855025184             73             74      75615
   385855004705             76             77      75615
   385855029283             79             80      75615
   385855029286             86             87      75616
   385855029287             88             89      75618
   385855004714             97             97      75771
   385855029295            100            100      15871



134 rows selected.

A few things to note here. Firstly, the amount of data stored for the Zone Map is tiny. Each zone corresponds to approx. 8M of storage or in this case roughly 75,500 rows.  This is another key difference between Zone Maps and conventional B-Tree indexes. There’s an index entry per referenced row of the table whereas a Zone Map only has one entry per zone of the table. Note also, because the ALBUM_ID data is so well clustered, each zone has a very narrow min/max range of ALBUM_ID values. This means the Zone Map can be extremely effective in “pruning” storage accesses as Oracle can very easily eliminate the majority zones from containing data of interest.

To highlight just how tiny this Zone Map is:

SQL> select segment_name, segment_type, bytes from dba_segments where segment_name like 'BIG_BOWIE%';

SEGMENT_NAME              SEGMENT_TYPE     BLOCKS      BYTES
------------------------- ------------ ---------- ----------
BIG_BOWIE                 TABLE            139264 1140850688
BIG_BOWIE_ALBUM_ID_I      INDEX             20480  167772160
BIG_BOWIE_ALBUM_ID_ZM     TABLE                 8      65536

 

We only need one 64K extent for the Zone Map whereas the Table/Index are much larger. If fact, the 134 rows can actually all fit in just the one 8K data block !!

Two new data dictionary views provides details on our Zone Maps:

SQL> select zonemap_name, pruning, refresh_mode, invalid, stale, unusable  
from dba_zonemaps where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM';

ZONEMAP_NAME              PRUNING  REFRESH_MODE      INVALID STALE   UNUSABLE
------------------------- -------- ----------------- ------- ------- --------
BIG_BOWIE_ALBUM_ID_ZM     ENABLED  LOAD DATAMOVEMENT NO      NO      NO

So the current status of the Zone Map can be determined in DBA_ZONEMAPS.

 

SQL> select measure, position_in_select, agg_function, agg_column_name
from dba_zonemap_measures where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM';

MEASURE                        POSITION_IN_SELECT AGG_FUNCTION  AGG_COLUMN_NAME
------------------------------ ------------------ ------------- ---------------
"BOWIE"."BIG_BOWIE"."ALBUM_ID"                  3 MAX           MAX_1_ALBUM_ID
"BOWIE"."BIG_BOWIE"."ALBUM_ID"                  2 MIN           MIN_1_ALBUM_ID

Additionally, details of the columns within the Zone Map can be determined in DBA_ZONEMAP_MEASURES.

If we now re-run the same query on the BIG_BOWIE table:

SQL> select * from big_bowie where album_id=42;

100000 rows selected.

Elapsed: 00:00:00.28

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
"ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
2364  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
760  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

We notice the CBO has used the Zone Map during the FTS (see the TABLE ACCESS STORAGE FULL WITH ZONEMAP execution step and filter predicate information). It’s performing some additional consistent gets over the database index (2364 vs. 1590) but it’s not too bad, nowhere near the 139,000 odd blocks in the table, with performance comparable to before. The extra consistent gets are necessary as any possible zones need to be read in their entirety, whereas a B-Tree index can point to just the specific table blocks that contain data of interest.

So the Zone Map has provided a “good enough” outcome here. Meanwhile, it has required very minimal storage, we can control how the Zone Map is maintained (to be discussed later) and we can guarantee that the Zone Map object will always exist and be present, even after bouncing either the database or Exadata storage servers.

In Part II, I’ll look at what happens when I perform some DML on the base table.

Comments»

1. pieboy13 - September 4, 2014

Hey Richard,

Nice post. Zone Maps are cool and well named I think (vs. storage indexes). I’m thinking they should be even more effective than Exa storage indexes because the i/o requests are cut off at the compute layer before ever being sent to the storage layer, but I haven’t had a chance to test that yet. Also min/max pruning seems to be an idea that has really taken root as it shows up in the new in-memory option as well. Anyway, thanks for sharing and keep the posts coming.

Kerry

Like

Richard Foote - October 8, 2014

Thanks Kerry 🙂 A real advantage with Zone Maps over storage indexes I think is that they can be use in a similar manner to bitmap-join indexes in that they can be based on columns from other joined tables. I’ll cover this once I get access again to an Exadata box 🙂

Like

2. Alex Fatkulin - September 7, 2014

Hi Richard,

each zone will be exactly 8MB in size with the (possible) exception for the first and the last zones for the segment. The zone id is a direct calculation from a (subset of) rowid so the translation can work both ways. The zone size can be controlled by the scale parameter which tells the database how many blocks should be allocated per each zone. Default is 10 which results in 8MB chunks for 8KB database (2^10*8).

Like

Richard Foote - October 8, 2014

Thanks heaps for that Alex. This indeed confirm my observations.

Like

3. Jeff Jacobs, Oracle Ace - September 11, 2014

Your post says “Before getting too excited though, Zone Maps unfortunately requires the Partitioning Option AND either Exadata or SuperCluster”; I haven’t been able find that this is Exadata/SC only. Can you please provide a reference? Thanks!

Like

Richard Foote - October 8, 2014

Hi Jeff

Looks like you found the reference. Here’s a link for any visitors:

http://docs.oracle.com/database/121/DBLIC/editions.htm#CJACGHEB

The decision on how all these things get licensed is way way above my humble pay-scale 🙂

Like

4. Jeff Jacobs - October 6, 2014

Zone Maps are indeed Exadata (or engineered) only. However, there is no technical reason for this; the zone map is evaluated on the server side, not the cell side. Apparently, this appeared in a previous beta version of 12, but was only released as Exadata.

Like

5. mgogala - November 15, 2014

And here goes another nice feature. With such a licensing requirements, I doubt that zone maps will be used much. Oh well, maybe in the next version.

Like

Richard Foote - November 21, 2014

Hi Mladin

I think they’ll be very popular with Exadata storage environments where partitioning is pretty well standard and having full control of storage index behavior has been a common request from DBAs.

Like

6. Shakir - December 20, 2014

Hi Richard,

Interesting feature. How would the performance compare to old school compressed indexes (say for example in your demo on the same ALBUM_ID column)?

Like

Richard Foote - January 6, 2015

Hi Shakir

In the demo above, I actually started out with a normal B-Tree database index which required 1590 consistent gets compared to the 2364 consistent gets with the zone map for the same SQL. If I compressed the index, it would only have a minimum impact as the vast majority of these gets are related to fetching data from the table (only save perhaps 30-40 gets in this example).

If more efficient than a FTS, corresponding indexes will generally outperform zone maps as they can reference just the (say 8k) blocks of interest, rather than full 8M zones. A specific Zone Map can vary between being “good enough” to totally useless depending on the data distribution and can potentially make a cheaper FTS access path more efficient.

Like


Leave a reply to Jeff Jacobs Cancel reply