jump to navigation

12.1.0.2 Introduction to Zone Maps Part II (Changes) October 30, 2014

Posted by Richard Foote in 12c, Exadata, Oracle Indexes, Zone Maps.
add a comment

In Part I, I discussed how Zone Maps are new index like structures, similar to Exadata Storage Indexes, that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A Zone being a range of contiguous (8M) blocks.

I showed how a Zone Map was relatively tiny but very effective in reducing the number of consistent gets for a well clustered column (ALBUM_ID).

In this post, we’re going to continue with the demo and look at what happens when we update data in the table with a Zone Map in place.

So lets update the ALBUM_ID column (which currently has a Zone Map defined) for a few rows. The value of ALBUM_ID was previously 1 for all these rows (the full range of values is currently between 1 and 100) but we’re going to update them to 142:

SQL> update big_bowie set album_id=142 where id between 1 and 100;

100 rows updated.

SQL> commit;

Commit complete.

 

So the maximum value of ALBUM_ID is now 142, not 100. If we look at the maximum value as currently listed in the Zone Map:

 

SQL> select max(max_1_album_id) from  big_bowie_album_id_zm;

MAX(MAX_1_ALBUM_ID)
-------------------
100

 

We notice the maximum is still defined as being 100. So the update on the table has not actually updated the contents of the Zone Map. So this is a big difference between Zone Maps and conventional indexes, indexes are automatically updated during DML operations, Zone Maps are not (unless the REFRESH ON COMMIT option is specified).

If we look at the state of Zone Map entries that have a minimum of 1 (the previous values of ALBUM_ID before the update):

SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
---------- -------------- -------------- ----------- ----------- ----------
3.8586E+11              1              2           0           0      66234
3.8586E+11              1              2           0           1      65787
3.8586E+11              1              2           0           0      66223

 

We notice that one of the entries has a status of 1, meaning that a specific zone has been marked as stale. However, all the other zones are still OK.

If we look at the status of the overall Zone Map:

SQL> select zonemap_name, pruning, refresh_mode, invalid, stale, unusable
from dba_zonemaps where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM';

ZONEMAP_NAME              PRUNING  REFRESH_MODE      INVALID STALE   UNUSABLE
------------------------- -------- ----------------- ------- ------- --------
BIG_BOWIE_ALBUM_ID_ZM     ENABLED  LOAD DATAMOVEMENT NO      NO      NO

 

We notice that the Zone Map is still “hunky dory” after the update.

If we now re-run the query we ran in Part I:

 

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
3238  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
761  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

 

We see the Zone Map was still used by the CBO. The number of consistent gets has increased (up from 2364 to 3238) as we now have to additional access all the blocks associated with this stale zone, but it’s still more efficient that reading all the blocks from the entire table.

If we want to remove the stale zone entries, we can refresh the Zone Map or rebuild it (for ON DEMAND refresh):

 

SQL> alter materialized zonemap big_bowie_album_id_zm rebuild;

Materialized zonemap altered.

 

If we now look at the Zone Map entry:

 

SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
---------- -------------- -------------- ----------- ----------- ----------
3.8586E+11              1              2           0           0      66234
3.8586E+11              1            142           0           0      65787
3.8586E+11              1              2           0           0      66223

 

We see that the entry is no longer stale and now correctly reflects the actual maximum value within the zone (142).

If we now re-run the query:

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:00.30

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
3238  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
761  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

 

We notice nothing has appreciably changed, the Zone Map is still being used but the number of consistent gets remains the same as before. Why haven’t we returned back to our previous 2364 consistent gets ?

Well, as the range of possible values within the updated zone is now between 1 and 142, the required value of 42 could potentially be found within this zone and so still needs to be accessed just in case. We know that the value of 42 doesn’t exist within this zone, but Oracle has no way of knowing this based on the possible 1 to 142 range.

Hence Zone Maps work best when the data is well clustered and the Min/Max ranges of each zone can be used to limit which zones need to be accessed. If the data was not well clustered and the values within each zone mostly had ranges between the min and max values, then Oracle wouldn’t be able to effectively prune many/any zone and the Zone Map would be useless.

As we’ll see in Part III :)

Storage Indexes vs Database Indexes IV: 8 Column Limit (Eight Line Poem) May 1, 2013

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
2 comments

As Exadata Storage Indexes (SI) are purely memory only structures located on the Exadata storage servers, care needs to be taken in how much memory they can potentially consume. As a result, there is a limit of 8 columns (or 8 SIs) that can be defined for a given 1M storage region at any point in time, even though SIs are much smaller structures than equivalent database indexes. As database indexes are physical constructs however, there’s no such limit on the number of indexes that can be defined for a table. This can become another key difference between SIs and database indexes.

The following table has more than 8 columns, each with differing numbers of distinct values or distributions of data:

SQL> create table radiohead (id number, col1 number, col2 number, col3 number, col4 number, col5 number, col6 number, col7 number, col8 number, col9 number, col10 number, col11 number, col12 number, some_text varchar2(50));
Table created.

SQL> insert into radiohead select rownum, mod(rownum,10), mod(rownum,100), mod(rownum,1000), mod(rownum,10000), mod (rownum,100000), mod(rownum,1000000), ceil(dbms_random.value(0,10)), ceil(dbms_random.value(0,100)), ceil
 (dbms_random.value(0,1000)), ceil(dbms_random.value(0,10000)), ceil(dbms_random.value(0,100000)), ceil(dbms_random.value (0,1000000)), 'OK COMPUTER' from dual connect by level <=2000000; 2000000 rows created. SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

4000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

8000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'RADIOHEAD', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

Let’s start by running a highly selective query of the ID column:

SQL> select * from radiohead where id = 42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=121212)
 filter("ID"=121212)

If we look at the session statistics, we’ll notice that a SI has been created and saved us physical IOs. Note: If you follow the demo, you’ll need to keep track of these statistics after each query or simply reconnect as a new session to ensure a SI has or has not been used.

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1333.99219
 cell physical IO interconnect bytes returned by smart scan .164878845

OK, let’s now run a selective query on the COL1 column (there are no values 42 in this case and so no rows are returned):

SQL> select * from radiohead where col1=42;
no rows selected

Elapsed: 00:00:00.01

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 52 | 42440 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1 | 52 | 42440 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL1"=42)
 filter("COL1"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2546.90625
 cell physical IO interconnect bytes returned by smart scan .341438293

Again, a SI has been created and used here. The SI in this case has been extremely beneficial because no data exists for 42 (only the values 1 – 10 exist). However, if an existing value were to be selected, the SI would be next to useless as such a value would exist throughout all 1M storage regions. With just 10 distinct randomly distributed values, this SI has the potential to be a waste of time. But while we search for values that don’t exist, it serves a very useful purpose. An important consideration in what’s to come.

If we now run a query now using column COL4:

SQL> select * from radiohead where col4=42;
1600 rows selected.

Elapsed: 00:00:00.68

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=42)
 filter("COL4"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2612.64063 <= + 66MB
 cell physical IO interconnect bytes returned by smart scan 32.3048401

OK, this is the really important one to note. Firstly, yes again a SI has been created and used. Note though that this has not been the first SI to be created or used on this table; SIs have previously been created and used in the previous two queries on different columns. This will also not be the most recent SI to be created, more will soon follow. However, this is by far the least effective use of a SI, because of both the selectivity of the query and distribution of data. Here, only some 66MB or so of physical IOs have been saved.

We now repeat this process, running a query against a different column, being very selective and ensuring a SI is created and used. We finally reach a point when 8 SIs have been created on the table:

SQL> select * from radiohead where col9=0;
no rows selected

Elapsed: 00:00:00.02

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 15984 | 811K| 42561 (1)| 00:08:31 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 15984 | 811K| 42561 (1)| 00:08:31 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL9"=0)
 filter("COL9"=0)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 8792.94531
 cell physical IO interconnect bytes returned by smart scan 45.3872757

OK, we’ve now reached the point where  8 SIs have definitely been created on this table.

If we now run a query that could potentially use a SI but isn’t particularly effective, basically on a par to the one we saw created previously on COL4:

SQL> select * from radiohead where col10=42;
1536 rows selected.

Elapsed: 00:00:00.73

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1600 | 83200 | 42577 (1)| 00:08:31 |
|* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42577 (1)| 00:08:31 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("COL10"=42)
 filter("COL10"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 8792.94531
cell physical IO interconnect bytes returned by smart scan 45.9288864

We notice that no bytes have been saved here via a SI. We can run this query repeatedly and the results will be the same. No SI is created and no bytes are saved. Although Oracle could potentially create a SI and save some work, the fact we already have 8 SIs created for this table means we have already reached the limit on the number of SIs that can be created for this table. 8 is it.

Let’s run another query now using yet another different column (COL12), but this time it’s again a very selective query, much more selective and efficient than the previous query based on COL4 in which a SI had been created:

SQL> select * from radiohead where col12=42;
8 rows selected.

Elapsed: 00:00:00.39

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 19 | 988 | 42607 (1)| 00:08:32 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 19 | 988 | 42607 (1)| 00:08:32 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL12"=42)
 filter("COL12"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 9526.01563
 cell physical IO interconnect bytes returned by smart scan 53.5218124

This time however the number of bytes saved has again gone up from previously meaning that indeed a new SI has been created and used for column COL12. But this makes 9 SIs in total now for this table where the limit should be a maximum of 8 ?

Does this mean that a previously created SI has been dropped and replaced by this new one. If so, which SI is now gone ?

Well, let’s go back to the first SI we created and the one that hasn’t been used for the longest period of time. If we re-run the first query again:

SQL> select * from radiohead where id=42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=42)
 filter("ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844
 cell physical IO interconnect bytes returned by smart scan 53.5264816

We notice that not only has it used a SI, but it has saved the maximum amount of IO bytes possible here meaning there was no “warming up” processes happening here indicating a newly created SI. So not only did it use a SI, it clearly used a previously created one. So this SI was not obviously impacted by the creation of this “9th” SI.

However, if we run the query again that used column COL4, the query that previously used a SI but was by far the least effective in saving physical IOs:

SQL> select * from radiohead where col4 = 4242;
1600 rows selected.

Elapsed: 00:00:00.73

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=4242)
 filter("COL4"=4242)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844 No Change !!
 cell physical IO interconnect bytes returned by smart scan 54.1522598

This time we notice there’s no change to the number of bytes saved by a SI. No matter how often we run this query now, no SI is used. So the SI that was created previously is now gone as a result of creating the more effective SI on the COL12 column. Indeed there are still just the 8 SIs on this table.

So Oracle will indeed limit the number of SIs to 8 for each table/storage region. However, it’s not simply a case of “first in first served” or some such, with Oracle using (undocumented) performance metrics to determine which 8 SIs/columns to choose. This means it might be possible in some rarer scenarios where more than 8 columns get referenced in SQL statements for SIs to come and go depending on changing workloads.  Another example of where database indexes may yet play a role in Exadata environments, where currently tables have more than 8+ indexed columns.

Storage Indexes vs Database Indexes Part III: Hidden Values (Hide Away) February 7, 2013

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
3 comments

OK, my holiday to Hawaii is now slowing fading away into distant memory. Time for a new post :)

In my previous post on differences between Exadata Storage Indexes and Database Indexes Part II, I discussed how the clustering of data within the data is an important factor (pun fully intended !!) in the performance and efficiency of both types of indexes. In this post, I’ll expand a little more on this point and highlight scenarios where a normal database index can significantly improve performance but be totally ineffective as a Storage Index.

Storage Indexes can only determine where required data can’t exist within a (1M default) storage region via the associated Min/Max values of the index. If a required value can’t possibly exist within the Min/Max range boundary, the specific storage region can be ignored and not be accessed during the full scan operation. However, in some scenarios, there can be limitations in the usefulness of only having Min/Max values to work with if both Min and Max values are relatively common and randomly distributed throughout the table. This is not an uncommon scenario when dealing with columns that have data such as code status values.

In the BIG_BOWIE table I’ve been using in this series, I have a FORMAT_ID column that was initially populated with just values 2,4,6,8 and 10, evenly distributed throughout the table. I then updated a relatively small number of these values in the following manner:

SQL> update big_bowie set format_id = 3 where mod(id,10000)=0;
1000 rows updated.

SQL> commit;

Commit complete.

SQL> update big_bowie set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

SQL> commit;

Commit complete.

So I have relatively few FORMAT_ID = 3 (1000 in 10M rows, 0.01% of data) littered throughout the table and a few FORMAT_ID = 5 located in a specific portion/location of the table.

I’ll now create a normal database index on this FORMAT_ID column and a histogram to let the CBO know there are indeed relatively few occurances of values 3 and 5 within the table:

SQL> create index big_bowie_format_id_i on big_bowie(format_id);
Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE', method_opt=> 'FOR COLUMNS FORMAT_ID SIZE 10');

PL/SQL procedure successfully completed.

OK, if we now run a query looking for all the FORMAT_ID = 3:

SQL> select album_id, total_sales from big_bowie where format_id = 3;
1000 rows selected.

Elapsed: 00:00:00.01

Execution Plan
 -----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 -----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 72 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 11000 | 72 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_FORMAT_ID_I | 1000 | | 4 (0)| 00:00:01 |
 -----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("FORMAT_ID"=3)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1072 consistent gets
 0 physical reads
 0 redo size
 22984 bytes sent via SQL*Net to client
 1250 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

We notice  the CBO has used the index to efficiently retrieve just the 0.01% of required rows because of the high selectivity of the query, even though the overall Clustering Factor of the index is appalling.

It does an even better job when accessing FORMAT_ID = 5 as these specific values are all well clustered and  found within in a very tight portion of the table:

SQL> select album_id, total_sales from big_bowie where format_id = 5;
1000 rows selected.

Elapsed: 00:00:00.00

Execution Plan
 -----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 -----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 72 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 11000 | 72 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_FORMAT_ID_I | 1000 | | 4 (0)| 00:00:01 |
 -----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("FORMAT_ID"=5)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 154 consistent gets
 0 physical reads
 0 redo size
 22685 bytes sent via SQL*Net to client
 1250 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

At just 154 consistent gets in this example, the results return almost instantaneously. The database index is extremely effective in these examples because it can point just to the locations within the table where these relatively few rows of interest are located.

Let’s see how Storage Indexes cope in this scenario. First, we make the database index invisible:

SQL> alter index big_bowie_format_id_i invisible;
Index altered.

We now run the same query again (this can be repeated as many times as you like):

SQL> select album_id, total_sales from big_bowie where format_id = 3;
1000 rows selected.

Elapsed: 00:00:00.80

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 36682 (1)| 00:07:21 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 1000 | 11000 | 36682 (1)| 00:07:21 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("FORMAT_ID"=3)
 filter("FORMAT_ID"=3)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 0
 cell physical IO interconnect bytes returned by smart scan .397476196

We notice no matter how often we try, that Storage Indexes have managed to save us precisely nothing. The entire table has had to be scanned and read during the Full Table Scan operation.

SQL> select album_id, total_sales from big_bowie where format_id = 5;
1000 rows selected.

Elapsed: 00:00:00.89

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 11000 | 36682 (1)| 00:07:21 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 1000 | 11000 | 36682 (1)| 00:07:21 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("FORMAT_ID"=5)
 filter("FORMAT_ID"=5)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 0
 cell physical IO interconnect bytes returned by smart scan .565643311

Same results when accessing the FORMAT_ID = 5. Any possible Storage Indexes have been totally ineffective and the Full Table Scan has had to read the entire table.

The result being performance is significantly slower than with the database indexes visible and in place.

Why ?

The data consists primarily of randomly distributed values 2,4,6,8 and 10, with the odd value 3 and 5 littered around. This means that if created, the Storage Index Min/Max values will effectively be 2 and 10 for all 1M storage regions throughout the entire table.

Therefore, both values 3 and 5 could possibly exist within all/any of the table storage regions as they sit inside the 2 – 10 Min/Max range boundaries. These 3 and 5 values are effectively “hidden” within the Storage Index Min/Max values making Storage Indexes totally ineffective in this scenario, as they can’t skip a thing.

A final point here. If one of these rarer values of interest were to be say value “12”, then a Storage Index could be useful when searching for this particular value as any Min/Max ranges of 2-10 that don’t therefore contain a value of 12 can be skipped. However, if values 3 and 5 were also of interest, the Storage Index would still only be useful in a subset of required cases. A database index would therefore still be required to cater all necessarily cases.

As Database indexes can directly point to indexed values of interest, this is another example of where we would likely not want to just drop a database index in Exadata.

Storage Indexes vs Database Indexes Part II: Clustering Factor (Fast Track) December 19, 2012

Posted by Richard Foote in Clustering Factor, Exadata, Oracle Indexes, Storage Indexes.
2 comments

Two posts in two days !! Well, with Christmas just around the corner, I thought I better finish off a couple of blog posts before I get fully immersed in the festive season :)

The Clustering Factor (CF) is the most important index related statistic, with the efficiency of an index performing multi-row range scans very much dependent on the CF of the index. If the data in the table is relatively well clustered in relation to the index (i.e. it has a “low” CF), then an index range scan can visit relatively few table blocks to obtain the necessary data. If the data is effectively randomised and not well clustered in relation to the index (i.e. has a “high” CF), then an index range scan has to visit many more table blocks and not be as efficient/effective as a result. The CBO will be less inclined to use such an index as a result, depending on the overall selectivity of the query.

It’s something I’ve discussed here many times before.

It’s a very similar story for Exadata Storage Indexes (SI) as well. The better the table data is clustered in relation to the SIs, the more efficient and effective the SIs are likely to be in relation to being able to eliminate accessing storage regions that can’t possibly contain data of interest. By having the data more clustered (or ordered) in relation to a specific SI, the Min/Max ranges associated with the SI are more likely to be able to determine areas of the table where data can’t exist.

For the data I’ll be using in the following examples, I refer you to the previous post where I setup the necessary data.

The following query is on a 10 million row table, based on the ALBUM_ID column that has an excellent CF:

SQL> select * from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:01.07

Execution Plan
 ----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 8984K| 1550 (1)| 00:00:19 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 100K| 8984K| 1550 (1)| 00:00:19 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 100K| | 199 (1)| 00:00:03 |
 ----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1590 consistent gets
 1550 physical reads
 0 redo size
 9689267 bytes sent via SQL*Net to client
 733 bytes received via SQL*Net from client
 21 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 100000 rows processed

An index range scan access path is selected by the CBO to retrieve the 100,000 rows. At just 1590 consistent gets, with such an excellent CF, the index can very efficiently access just the necessary 100,000 rows of data. Notice that most of these consistent gets are still physical reads (1550). If we re-run the query several times and are able to cache the corresponding index/table blocks in the database buffer cache:

SQL> select * from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.27

Execution Plan
 ----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 8984K| 1550 (1)| 00:00:19 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 100K| 8984K| 1550 (1)| 00:00:19 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 100K| | 199 (1)| 00:00:03 |
 ----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1590 consistent gets
 0 physical reads
 0 redo size
 9689267 bytes sent via SQL*Net to client
 733 bytes received via SQL*Net from client
 21 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 100000 rows processed

The overall execution times reduce down from 1.07 to just 0.27 seconds. Not bad at all considering we’re returning 100,000 rows.

However, if we run the same query on the same data with all the smarts turned on in Exadata (with the index made Invisible so that it doesn’t get used by the CBO):

SQL> select * from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.27
 Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 8984K| 36663 (1)| 00:07:20 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 100K| 8984K| 36663 (1)| 00:07:20 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ALBUM_ID"=42)
 filter("ALBUM_ID"=42)

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 134834 consistent gets
 134809 physical reads
 0 redo size
 4345496 bytes sent via SQL*Net to client
 73850 bytes received via SQL*Net from client
 6668 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 100000 rows processed

 SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1042.24219
 cell physical IO interconnect bytes returned by smart scan 9.56161499

We notice that although a Full Table Scan is being performed, the overall performance of the query is practically identical to that of using the index. That’s because the SIs are kicking in here and by saving 1042 MB (approximately 99% of the table), Oracle only has to actually physically access a tiny 1% of the table (basically, the 1% selectivity of the query itself). SIs based on the well clustered ALBUM_ID column are therefore very effective at eliminating the access of unnecessary data.

If we now run a query based on the TOTAL_SALES column in which the data is randomly distributed all over the place and so the associated index has a very poor CF:

SQL> select album_id, artist_id from big_bowie where total_sales between 42 and 142;
2009 rows selected.

Elapsed: 00:00:01.45

Execution Plan
 -------------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time |
 -------------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 2040 | 26520 | 2048 (1)| 00:00:25 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 2040 | 26520 | 2048 (1)| 00:00:25 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_TOTAL_SALES_I | 2040 | | 7 (0)| 00:00:01 |
 -------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("TOTAL_SALES">=42 AND "TOTAL_SALES"<=142)

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 2150 consistent gets
 2005 physical reads
 0 redo size
 43311 bytes sent via SQL*Net to client
 1987 bytes received via SQL*Net from client
 135 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2009 rows processed

We notice that although only 2009 rows are retrieved with this query, 2150 consistent gets have been performed (practically 1 for each row returned) . This is somewhat more than the 1590 consistent gets of the previous example when a full 100,000 rows were returned. Using this index therefore is nowhere near as efficient/effective in retrieving data as was the index in the previous example.

If all this data can be cached in the buffer cache however, we can again improve overall execution times:

SQL> select album_id, artist_id from big_bowie where total_sales between 42 and 142;
2009 rows selected.

Elapsed: 00:00:00.02

Execution Plan
 ------------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time |
 ------------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 2040 | 26520 | 2048 (1)| 00:00:25 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 2040 | 26520 | 2048 (1)| 00:00:25 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_TOTAL_SALES_I | 2040 | | 7 (0)| 00:00:01 |
 ------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("TOTAL_SALES">=42 AND "TOTAL_SALES"<=142)

Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 2150 consistent gets
 0 physical reads
 0 redo size
 43308 bytes sent via SQL*Net to client
 1987 bytes received via SQL*Net from client
 135 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2009 rows processed

So with all the index/table data now cached, we can return the 2000 odd rows in just 0.02 seconds.

If we now run the same query with the same data in Exadata:

SQL> select album_id, artist_id from big_bowie where total_sales between 42 and 142;
2009 rows selected.

Elapsed: 00:00:01.25

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 2040 | 26520 | 36700 (1)| 00:07:21 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 2040 | 26520 | 36700 (1)| 00:07:21 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42)
 filter("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42)

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 134834 consistent gets
 134809 physical reads
 0 redo size
 47506 bytes sent via SQL*Net to client
 1987 bytes received via SQL*Net from client
 135 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 2009 rows processed

 SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 72.65625
 cell physical IO interconnect bytes returned by smart scan .383415222

We noticed that the physical IO bytes saved by the SIs has significantly reduced from the previous example (just 72 MBs down from 1042 MBs), even though at just 2000 odd rows we require much less data than before. In this example, only approximately 7% of table storage need not be accessed, meaning we still have to access a significant 93% of the table as the required data could potentially exist throughout the majority of the table. The poor clustering of the data in relation the TOTAL_SALES column has effectively neutralised the effectiveness of the associated SIs on the TOTAL_SALES column.

Note also that the 1.25 seconds is as good as it gets when performing a FTS with the fully generated SIs in place. In this case, using a fully cached database index access path can outperform the FTS/SI combination and provide a more efficient and ultimately more scalable method of accessing this data. As the required selectively on this column is low enough to warrant the use of a database index despite the poor CF, this is again another example of an index we may not necessarily want to automatically drop when moving to Exadata.

Storage Indexes vs Database Indexes Part I MIN/MAX (Maxwell’s Silver Hammer) December 18, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
add a comment

It’s often stated that in Exadata, you don’t need conventional database indexes anymore as everything runs so damn fast that indexes are simply a waste of time and space. Simply drop all database indexes and things will run just as fast.

Well, not quite …

There are many many scenarios where database indexes are still critical for optimal performance and scalability in an Exadata environment.

Prompted by a question in an earlier post (thanks Paul), I thought I might start looking at some scenarios where dropping a database index in Exadata would not be the best of ideas.

The first example is the scenario where one wants either the minimum/maximum of a column value. As database index entries are always ordered, this can very efficiently be found by traversing down to either the first or last index leaf block within an appropriate index. There’s an execution the INDEX FULL SCAN MIN/MAX execution path by which the CBO can decide to access an index in this manner.

For example, if I create an index on the ID column of my large DWH_BOWIE table and select the MIN(ID):

SQL> create index dwh_bowie_id_i on dwh_bowie(id);
Index created.

SQL> select min(id) from dwh_bowie;

MIN(ID)
 ----------
 1

Elapsed: 00:00:00.01

Execution Plan
 ---------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 6 | 4 (0)| 00:00:01 |
 | 1 | SORT AGGREGATE | | 1 | 6 | | |
 | 2 | INDEX FULL SCAN (MIN/MAX)| DWH_BOWIE_ID_I | 1 | 6 | 4 (0)| 00:00:01 |
 ---------------------------------------------------------------------------------------------

Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 4 consistent gets
 3 physical reads
 0 redo size
 525 bytes sent via SQL*Net to client
 524 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed

With only a handful of consistent gets, we have managed to get the minimum ID value extremely quickly and efficiently.

However, if we drop this index and re-run the same query in Exadata numerous times:

SQL> select min(id) from dwh_bowie;
MIN(ID)
 ----------
 1

Elapsed: 00:00:41.31

Execution Plan
 ----------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 6 | 2345K (1)| 07:49:12 |
 | 1 | SORT AGGREGATE | | 1 | 6 | | |
 | 2 | TABLE ACCESS STORAGE FULL| DWH_BOWIE | 640M| 3662M| 2345K (1)| 07:49:12 |
 ----------------------------------------------------------------------------------------

Statist5ics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 8626963 consistent gets
 8625479 physical reads
 0 redo size
 525 bytes sent via SQL*Net to client
 524 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1 rows processed

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 0
 cell physical IO interconnect bytes returned by smart scan 7644.38194

We notice the query takes considerably longer and that the Storage Indexes have been unable to be of any help with not a single byte saved.

This is because all aggregation type operations are performed at the database level, not within the storage servers. Even though in theory Storage Indexes could be seen as perhaps being of value here, they are totally ignored by Oracle when retrieving either the minimum/maximum of a column value. Remember also that a key difference between a Storage Index and a Database Index is that a Storage Index does not necessarily have to exist fully for all regions of a given table.

Drop database indexes used in this manner at your peril …

Many more examples to come :)

Exadata Storage Indexes Part V: Warming Up (Here Come The Warm Jets) December 3, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
add a comment

As I mentioned in a previous post, there are a number of Similarities between Storage Indexes and Database Indexes.

One of these similarities is the “warming up” process that needs to take place before indexes become “optimal” after either the Storage Server (in the case of Storage Indexes) or the Database Server (in the case of Database Indexes) is restarted.

When a database server is restarted, the contents associated with the database buffer cache is lost and has to be reloaded (as well as other components of the SGA of course). This means for a period of time while the instance “warms up”, there is a spike in physical IO (PIO) activity. Index range scans, which generally greatly benefit from re-using cached data therefore need to perform additional PIOs for both the index blocks and the table blocks they reference. The overheads associated with these additional PIOs in turn slows the overall performance of (especially) these index related execution plans.

As a simple illustration, the following query is executed after a flushed buffer cache:

SQL> select * from big_bowie where album_id = 42;
1000 rows selected.

Elapsed: 00:00:01.99

Execution Plan
 ----------------------------------------------------------
 Plan hash value: 2703824528

----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 75000 | 101 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 75000 | 101 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 1000 | | 1 (0)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)
 Statistics
 ----------------------------------------------------------
 1 recursive calls
 0 db block gets
 1072 consistent gets
 1005 physical reads
 0 redo size
 91231 bytes sent via SQL*Net to client
 1245 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

There is a considerable amount of physical reads associated with having to re-load the contents of the database buffer cache. If we now re-run the query:

SQL> select * from big_bowie where album_id = 42;
1000 rows selected.

Elapsed: 00:00:00.01

Execution Plan
 ----------------------------------------------------------
 Plan hash value: 2703824528

----------------------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ----------------------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1000 | 75000 | 101 (0)| 00:00:01 |
 | 1 | TABLE ACCESS BY INDEX ROWID| BIG_BOWIE | 1000 | 75000 | 101 (0)| 00:00:01 |
 |* 2 | INDEX RANGE SCAN | BIG_BOWIE_ALBUM_ID_I | 1000 | | 1 (0)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

2 - access("ALBUM_ID"=42)
 Statistics
 ----------------------------------------------------------
 0 recursive calls
 0 db block gets
 1072 consistent gets
 0 physical reads
 0 redo size
 91231 bytes sent via SQL*Net to client
 1245 bytes received via SQL*Net from client
 68 SQL*Net roundtrips to/from client
 0 sorts (memory)
 0 sorts (disk)
 1000 rows processed

All the data is now cached and the overall query response time improves considerably. Once a database instance has warmed up and much of the database’s hot, data-set of interest has been cached, PIOs are minimised and performance improves accordingly.

Much has been made of the fact Storage Indexes (SIs) need to be created on the fly but in a similar manner to database indexes, they just need a period of time to warm up and be created/stored in memory to become optimal. The difference being that SIs are actually created on the fly when cached in memory unlike database indexes which physically preexist on disk.

So when an Exadata storage server is re-started, the performance associated with the SIs on the server is initially sluggish as well, while Oracle performs the additional PIOs off disk to create the necessary SIs in memory. Additionally, new queries using new column predicates will also be initially sluggish while any necessary SIs are created and cached in memory. Note that SIs are created on a 1M storage region basis so it could well take a period of time for a SI to be created across all the associated storage regions of the table. There is also a limit of 8 SIs per 1M storage region which might limit whether a SI is actually created or preserved (more on this point in a later post).

The following query is executed on a relatively new table and in a new session to capture fresh session statistics:

SQL> select artist_id, total_sales from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.89

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 1269K| 36663 (1)| 00:07:20 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 100K| 1269K| 36663 (1)| 00:07:20 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ALBUM_ID"=42)
 filter("ALBUM_ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 114.601563
 cell physical IO interconnect bytes returned by smart scan 1.77637482

The amount of physical IO bytes saved by a SI is initially minimal, about 114 MB.

However, when the query is re-executed and the associated SI on the ALBUM_ID column has been fully created:

SQL> select artist_id, total_sales from big_bowie where album_id = 42;
100000 rows selected.

Elapsed: 00:00:00.27

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 100K| 1269K| 36663 (1)| 00:07:20 |
 |* 1 | TABLE ACCESS STORAGE FULL| BIG_BOWIE | 100K| 1269K| 36663 (1)| 00:07:20 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ALBUM_ID"=42)
 filter("ALBUM_ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1156.84375
 cell physical IO interconnect bytes returned by smart scan 3.41764832

The amount of physical IO bytes saved by SIs increases considerably and overall performance improves as a result.

Restarting a database server will result in sluggish performance for a period until most of the database hot data is cached. This is also the case when an Exadata Storage server is re-started, in part while the necessary SIs are recreated and cached.

I’ll next discuss another similarity, that being the importance of the clustering of data within the table to the efficiency of these indexes.

Exadata Storage Indexes Part IV – Fast Full Table Scans (Speed of Life) November 8, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
2 comments

OK, let’s look at Storage Indexes in action.

But first, following is the setup for the various demos to come. I basically create one table called BIG_BOWIE that’s about 1GB in size and then simply create another table called DWH_BOWIE where the contents of this are re-insert into itself a few times to get to about a 60GB table. The various columns have differing distinct values and distributions of data.

I used an X2-2 1/2 rack as my toy and yes, once people saw the table names instantly knew who created them :)

SQL> create table big_bowie (id number not null, album_id number not null, artist_id number not null,
format_id number, release_date date, total_sales number, description varchar2(100));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> create or replace procedure pop_big_bowie as
  2  begin
  3     for v_album_id in 1..100 loop
  4         for v_artist_id in 1..100000 loop
  5             insert into big_bowie values (bowie_seq.nextval, v_album_id, v_artist_id, ceil(dbms_random.value(0,5)) * 2,
  6               trunc(sysdate-ceil(dbms_random.value(0,10000))), ceil(dbms_random.value(0,500000)),
'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS');
  7         end loop;
  8         commit;
  9     end loop;
 10     commit;
 11  end;
 12  /

Procedure created.

SQL> exec pop_big_bowie

PL/SQL procedure successfully completed.

I modified some of the data to have a few occurrences of some specific data. This will be used on a later post.

SQL> update big_bowie set format_id = 3 where mod(id,10000)=0;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> update big_bowie set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE‘, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select blocks from dba_tables where table_name = 'BIG_BOWIE';

    BLOCKS
----------
    134809

Like I said, another DWH_BOWIE table as basically created with this data re-inserted into itself 6 times to create roughly a 60GB table.

OK, let’s now run an “expensive” query on this bigger table without any of the Exadata smart scan magic enabled:

SQL> alter session set cell_offload_processing=false;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:04:43.38

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    8644512  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

So it takes about 4 mins and 43 secs to return the necessary 266,176 rows from an approx. 60GB table (that’s 8.6 million 8K blocks). Not bad really.

However, if we run the same query with the Exadata smarts enabled:

SQL> alter session set cell_offload_processing=true;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:00:03.97

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS STORAGE FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)
       filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    8626963  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

We see that execution times reduce significantly, down to just 4 secs. So we manage to read via a FTS a 60GB table in under 4 seconds, not bad at all.

A hint that something different might have occurred here is the appearance of the “storage” predicate listing. This basically tells us that a smart scan “might” have occurred. Note though that the “reported” physical reads and consistent gets as reported by the database is basically the same as the previous run. More on this in later posts.

The reason for this improvement is due in large part to the use of Exadata Storage Indexes. As discussed previously, a Storage Index can potentially automatically avoid having to read significant portions of data by determining areas of storage that can’t possibly contain data of interest. This is a classic example of Storage Indexes in operation, putting into practice the notion that the quickest way to do something is to avoid doing as much work as possible.

But how to tell whether a Storage Index really did kick in and how much work did it actually save ?

The V$SQL view now has a number of additional columns that provides useful information:

SQL> select sql_text, io_cell_offload_eligible_bytes, io_interconnect_bytes, io_cell_uncompressed_bytes, io_cell_offload_returned_bytes
     from v$sql where sql_id = 'admdhphbh1a1c';

SQL_TEXT
--------------------------------------------------------------------------------
IO_CELL_OFFLOAD_ELIGIBLE_BYTES IO_INTERCONNECT_BYTES IO_CELL_UNCOMPRESSED_BYTES
------------------------------ --------------------- --------------------------
IO_CELL_OFFLOAD_RETURNED_BYTES
------------------------------
select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200
                    1.4132E+11              55217792                 1.6718E+10
                      55217792

There are also numerous new statistics which begin with ‘cell ‘ that I generally capture before and after a particular operation to see the storage related workloads. Two statistics I find particularly useful are:

SQL> select name , value/1024/1024 MB from v$statname n,  v$mystat s
where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME                                                                MB
----------------------------------------------------------- ----------
cell physical IO bytes saved by storage index               59414.9453
cell physical IO interconnect bytes returned by smart scan   26.329895

The cell physical IO bytes saved by storage index statistic denotes how much storage has not had to be read due to Storage Indexes. So this tells us just how useful Storage Indexes have been in reducing physical IO operations. The cell physical IO interconnect bytes returned by smart scan statistic denotes how much data has actually been returned to the database servers as a result of a smart scan.

As the above numbers highlight (note this last query had been the only activity within the session), the Storage Indexes were highly effective and were able to physically skip reading the vast majority of the table (59,414MB) during the Full Table Scan operation and that only a  relatively small amount data (26MB) had to be returned back to the database servers.

By not having to read most of the data, the resultant Full Table Scan on this relatively large table was completed not in minutes as previously, but in a matter of a few seconds.

The potential power of Storage Indexes …

Exadata Storage Indexes Part III – Similarities With Database Indexes (Same Old Scene) October 15, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
2 comments

As discussed previously, there are quite a number of differences between Storage Indexes (SIs) and Database Indexes (DIs). However, there are also a number similarities between both of them as well.

The obvious one is that they’re both designed specifically to reduce the overheads associated with retrieving the required data out of the database. Both index structures provides a method by which Oracle can avoid having to read every row/block in a table when searching for data of interest. It’s just the actual implementation of this mechanism that differs between the two general index types as I’ve previously discussed.

The efficiency of both index types is very largely dependant upon the clustering of the indexed data within the table (i.e. the index Clustoring Factor). The better the clustering of the related indexed data within the table, the more tightly coupled the required data is likely to be and so the more efficient both index types would be in avoiding accessing unnecessary data. If the required data were to be less well clustered and randomly distributed throughout the table, the less efficient would be both index types in retrieving the necessary data. Some actual examples of this to come in future posts.

Both index types have a period of “warming up” before being fully effective. It’s simply the manner in which this warming up process occurs that differs between the two. Database indexes on a freshly bounced database server initially incur substantial physical I/Os (PIOs) until the related index/table data is cached within the database buffer cache (and indeed in the flash cache). These PIOs can significantly reduce the performance of the SQL plans utilising database indexes. Storages indexes on a freshly bounced storage server need to be recreated and can’t immediately eliminate accessing unnecessary storage regions. This can significantly reduce the performance of Full Table Scans until the associated SIs are fully created. Again, some actual demos on all this to come in future posts.

Both index types can use “Index Combine” like logic and use multiple indexes in combination with each other to further reduce the actual number of table blocks that need to be accessed to retrieve the data of interest. Storage and Bitmap database indexes are especially suited to these types of index combine operations, although B-Tree indexes can also be used in this manner.

Both Oracle index types are really quite interesting and often misunderstood and so meets the general theme of this blog, meaning I can quite happily blog about them without shocking too many people in the process :)

Like I said, more to come …

Exadata Storage Indexes Part II – Differences With Database Indexes (Space Dementia) October 9, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
14 comments

Let’s explore some of the key differences between Storage Indexes (SI) and Database Indexes (DI). In no particular order, they include:

SIs are structures that exist only within the storage servers of an Exadata box, while DIs logically exist and can be accessed within the database servers.

SIs are purely memory only structures while DIs are physical segments that take up storage. As such, DIs are relatively expensive to both create and subsequently maintain as they generate considerable undo and redo within the database, can cause concurrency issues and require storage resources. SIs meanwhile require no physical storage and have little impact on DML operations.

SIs are generated automatically and transparently while DIs generally need to be explicitly created (except in some scenarios such as in the creation of Primary/Unique keys when they can be implicitly created).

SIs being memory only structures are transient in that if a storage server were to be restarted, the corresponding SIs are lost and need to be re-created. Additionally, Oracle may decide to drop a SI for a particular column and create one on a different column depending on current load and conditions. DIs are permanent objects that need to be explicitly dropped (except in some scenarios such as the dropping/disabling of Primary/Unique Key constraints when they can be implicitly dropped).

SIs can be stored in memory as they contain very brief summary information, just the min/max value and a null flag for each 1MB storage region. A corresponding DI (especially a B-Tree) would generally be significantly larger as it needs to store all indexed values from the table with associated rowids (unless compressed but still likely much larger even so).

SIs can index only a portion of a table at a specific point in time as they get generated and dropped (see above). DIs index the entire table/partition (prior to 12c), unless using smarts such as decode function-based indexes (which also index the entire table but based only on the results of the function).

SIs are limited to only 8 columns whereas DIs have no such limitations per table.

SIs reference a 1MB storage region whereas a DI references a specific database block (say 8K). Therefore, a DI is more “focused” in terms of the minimum amount of data that needs to be accessed.

SIs basically work by determining which areas of storage can not possibly contain data of interest, accessing just those storage regions that might contain data of interest. Therefore, it’s quite possible for a SI to generate false positives by having to access storage that might in the end not actually contain data of interest after all. A DI meanwhile via the rowid explicitly points to the exact location of an indexed value and does not generate false positives. A bitmap index is a little different in the manner in which rowids are stored and generated (and can have 0 bits set for rows that don’t actually exist) but again do not generate false positives when actually accessing the table blocks.

SIs are only used during Smart Scan operations, which in turn are only performed during direct-reads of full scans of larger database segments (tables / indexes / materialized views and partitions thereof). Therefore SIs are only used when DIs are not.

As SIs access data during a Smart Scan, the resultant data by-passes the Database Buffer Cache and can not be re-used by subsequent database users/operations. Therefore, SI accessed data may need to be frequently physically re-accessed. DIs perform single block reads (except for Fast Full Index Scans) which are cached in the Database Buffer Cache and which can therefore be globally reused by the database. Once cached, it may be unnecessary to subsequently physically re-access the DI retrieved data.

SIs are used, even if the majority of the data needs to be accessed regardless. As SIs are only used during a FTS, the concept of only using an index when it’s the cheaper alternative doesn’t apply to SIs. If a SI can save (say) just 5% of physical I/Os during a FTS, it’s better than no savings at all. DIs meanwhile are only used when the Cost Based Optimizer (CBO) considers it the cheapest option when accessing data.

As SIs are storage based structures, the CBO has no knowledge of their existence and play no part in the CBO cost calculations. DIs are fully known to the CBO and the DI related statistics are an important factor in the CBO calculations. The CBO only determines whether a FTS is the cheaper alternative, however the decision to perform a Direct-Read operation and so potentially enable the use of SIs is a run-time decision not made by the CBO.

SIs can be effectively used for IS NULL predicates, thanks to the null existence flag component of the SI. B-Tree Indexes can’t if all indexed columns are null (as such entries are not indexed) although Bitmap indexes can.

SIs can not be used to police Primary/Unique constraints. DIs can.

SIs can not be used to avoid performance issues in relation to Foreign Keys (such as locking implications and FTS requirements on child tables when deleting parent rows). DIs can.

SIs can not avoid sort operations. DIs can as data read via an index range scan is guaranteed to be returned in the order of the index entries.

SIs can not provide additional statistical information to the CBO, such as accurate selectivity information in multi-column predicates available in concatenated index distinct keys statistics. DIs can.

SIs can not be used to efficiently access the MIN/MAX of a column. DIs can.

Function-Based SIs are not supported. Function-Based DIs are supported.

SIs can not be treated as smaller tables and used as an alternative by the CBO to access just index related data, eg. select count(*), select indexed_column, etc. as SIs do not contain all the required data and are not visible to the CBO anyways. DIs can be treated as smaller tables and accessed accordingly if appropriate.

OK, that’s enough of a list for now to get one thinking about some of these differences :)

In the following posts, I’ll go through the benefits of SIs and show examples of how they’re implemented and used by Oracle.

Exadata Storage Indexes – Part I (Beginning To See The Light) October 4, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
12 comments

Thought I might discuss Exadata Storage Indexes, explore what they are, how they work and discuss their advantages and disadvantages. Following is but a brief introduction on this very cool Exadata feature.

A Storage Index basically maintains summary information about database table data down on the Exadata storage servers. The information consists of the minimum value, the maximum value and a flag to denote whether any Nulls exist for up to 8 different columns within each 1M storage region of disk.

My little diagram above attempts to describe this (click on image for a larger version).

The Storage Indexes are created automatically and transparently based on the SQL predicate information executed by Oracle and passed down to the storage servers from the database servers. Storage Indexes take up no physical storage of themselves and are built and maintained entirely in memory. As only this very basic summary information is stored for a maximum of 8 columns for each 1M of storage, Storage Indexes are very lightweight and can be created and maintained with minimal general overheads.

So how are they used ?

During an Exadata Smart Scan, Oracle can perform predicate filtering down at the storage layer and so only return just the rows of interest back up to the database. As part of this process, Oracle can use the Storage Indexes to visit just the 1M storage regions that can potentially contain rows of interest. Those storage regions that can’t possibly contain data of interest can be eliminated and not accessed at all during the Smart Scan operation.

So, in a very simple example, if we have an SQL predicate such as WHERE CODE = 5, if a corresponding Storage Index on the CODE column of the first 1M region of the table has MIN=2 and MAX=10, Oracle would need to access this portion of the table as the CODE value of interest could potentially exist here. However, if the next 1M storage region had a CODE Storage Index with a MIN=7 and MAX=12, then the CODE value of 5 can’t possibly exist within this portion of the table and can be ignored and not accessed at all during the Smart Scan.

So depending on the column, predicates and data distribution, a Storage Index can potentially eliminate having to physically access significant portions of a table during a Smart Scan. In an extreme example, on a search of a CODE value = 42 where the maximum CODE value that actually exists is say 35, a Smart Scan can perform a so-called Full Table Scan (FTS) via a Storage Index that doesn’t actually have to perform any physical I/O at all and can ignore the entire table.

The less physical I/O performed, the less work required and the faster the response time. If a FTS of say a 500GB table only had to physically read and access a few MBs here and there AND just return the data of interest back to the database servers, that would significantly improve the overall performance and overheads associated with the FTS.

Storage Indexes can be very cool indeed.

Much more to come …

E4 2012 Enkitec Extreme Exadata Expo (Blue Jean) June 14, 2012

Posted by Richard Foote in Advert, Exadata, Oracle Indexes.
6 comments

I’m very pleased to have been invited to speak at the E4 2012 Enkitec Exadata Expo to be held in Dallas, USA on 13-14 August.

It’s basically the first ever conference that’s dedicated exclusively to Oracle’s exciting Exadata platform. It should be a fabulous event, featuring some of the best Oracle talent going around, including Jonathan Lewis, Tanel Poder, Kerry Osbourne, Cary Millsap, Arup Nanda, Fritz Hoogland, Doug Burns, Karen Morton, Maria Colgan and Peter Bach to name but a few. The full list of speakers is listed here.

My talk will naturally be on “Indexing In Exadata“, the abstract being:

There’s often confusion regarding how indexing requirements may change when moving to Exadata, with some even suggesting that indexes are perhaps no longer required at all. Considering indexes can consume a considerable proportion of total storage within a database and can be crucial to general database performance, care needs to be taken to fully consider indexing requirements when moving to Exadata. This presentation will discuss the indexing structures unique to Exadata, how indexing considerations change (and don’t change), how database usage is critical to indexing requirements and how to implement safely an appropriate indexing strategy when migrating to Exadata that will ensure indexes get used when appropriate without compromising Exadata specific features such as Smart Scans and Storage Indexes.”

I’ve had the opportunity to research and have a good play with Exadata since re-joining the Oracle mothership so I’m looking forward to sharing some of what I’ve learnt.

Hopefully, you can make this historic event. If not in person, you might still be able to attend the event in a virtual capacity.

I’m hoping to confirm my attendance at another conference in the very near future. More details soon.

And yes, I’m planning to blog at some point about the challenges of having an appropriate indexing strategy on Exadata, busting a few myths in the making while I’m at it :)

Follow

Get every new post delivered to your Inbox.

Join 1,894 other followers