jump to navigation

Exadata Storage Indexes Part IV – Fast Full Table Scans (Speed of Life) November 8, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
5 comments

OK, let’s look at Storage Indexes in action.

But first, following is the setup for the various demos to come. I basically create one table called BIG_BOWIE that’s about 1GB in size and then simply create another table called DWH_BOWIE where the contents of this are re-insert into itself a few times to get to about a 60GB table. The various columns have differing distinct values and distributions of data.

I used an X2-2 1/2 rack as my toy and yes, once people saw the table names instantly knew who created them 🙂

SQL> create table big_bowie (id number not null, album_id number not null, artist_id number not null,
format_id number, release_date date, total_sales number, description varchar2(100));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> create or replace procedure pop_big_bowie as
  2  begin
  3     for v_album_id in 1..100 loop
  4         for v_artist_id in 1..100000 loop
  5             insert into big_bowie values (bowie_seq.nextval, v_album_id, v_artist_id, ceil(dbms_random.value(0,5)) * 2,
  6               trunc(sysdate-ceil(dbms_random.value(0,10000))), ceil(dbms_random.value(0,500000)),
'THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS');
  7         end loop;
  8         commit;
  9     end loop;
 10     commit;
 11  end;
 12  /

Procedure created.

SQL> exec pop_big_bowie

PL/SQL procedure successfully completed.

I modified some of the data to have a few occurrences of some specific data. This will be used on a later post.

SQL> update big_bowie set format_id = 3 where mod(id,10000)=0;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> update big_bowie set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE‘, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select blocks from dba_tables where table_name = 'BIG_BOWIE';

    BLOCKS
----------
    134809

Like I said, another DWH_BOWIE table as basically created with this data re-inserted into itself 6 times to create roughly a 60GB table.

OK, let’s now run an “expensive” query on this bigger table without any of the Exadata smart scan magic enabled:

SQL> alter session set cell_offload_processing=false;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:04:43.38

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
    8644512  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

So it takes about 4 mins and 43 secs to return the necessary 266,176 rows from an approx. 60GB table (that’s 8.6 million 8K blocks). Not bad really.

However, if we run the same query with the Exadata smarts enabled:

SQL> alter session set cell_offload_processing=true;

Session altered.

SQL> select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200;

266176 rows selected.

Elapsed: 00:00:03.97

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   266K|    23M|  2348K  (1)| 07:49:45 |
|*  1 |  TABLE ACCESS STORAGE FULL| DWH_BOWIE |   266K|    23M|  2348K  (1)| 07:49:45 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)
       filter("ALBUM_ID"=42 AND "ARTIST_ID"<=4200 AND "ARTIST_ID">=42)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
    8626963  consistent gets
    8625479  physical reads
          0  redo size
   12279634  bytes sent via SQL*Net to client
     195719  bytes received via SQL*Net from client
      17747  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     266176  rows processed

We see that execution times reduce significantly, down to just 4 secs. So we manage to read via a FTS a 60GB table in under 4 seconds, not bad at all.

A hint that something different might have occurred here is the appearance of the “storage” predicate listing. This basically tells us that a smart scan “might” have occurred. Note though that the “reported” physical reads and consistent gets as reported by the database is basically the same as the previous run. More on this in later posts.

The reason for this improvement is due in large part to the use of Exadata Storage Indexes. As discussed previously, a Storage Index can potentially automatically avoid having to read significant portions of data by determining areas of storage that can’t possibly contain data of interest. This is a classic example of Storage Indexes in operation, putting into practice the notion that the quickest way to do something is to avoid doing as much work as possible.

But how to tell whether a Storage Index really did kick in and how much work did it actually save ?

The V$SQL view now has a number of additional columns that provides useful information:

SQL> select sql_text, io_cell_offload_eligible_bytes, io_interconnect_bytes, io_cell_uncompressed_bytes, io_cell_offload_returned_bytes
     from v$sql where sql_id = 'admdhphbh1a1c';

SQL_TEXT
--------------------------------------------------------------------------------
IO_CELL_OFFLOAD_ELIGIBLE_BYTES IO_INTERCONNECT_BYTES IO_CELL_UNCOMPRESSED_BYTES
------------------------------ --------------------- --------------------------
IO_CELL_OFFLOAD_RETURNED_BYTES
------------------------------
select * from dwh_bowie where album_id = 42 and artist_id between 42 and 4200
                    1.4132E+11              55217792                 1.6718E+10
                      55217792

There are also numerous new statistics which begin with ‘cell ‘ that I generally capture before and after a particular operation to see the storage related workloads. Two statistics I find particularly useful are:

SQL> select name , value/1024/1024 MB from v$statname n,  v$mystat s
where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME                                                                MB
----------------------------------------------------------- ----------
cell physical IO bytes saved by storage index               59414.9453
cell physical IO interconnect bytes returned by smart scan   26.329895

The cell physical IO bytes saved by storage index statistic denotes how much storage has not had to be read due to Storage Indexes. So this tells us just how useful Storage Indexes have been in reducing physical IO operations. The cell physical IO interconnect bytes returned by smart scan statistic denotes how much data has actually been returned to the database servers as a result of a smart scan.

As the above numbers highlight (note this last query had been the only activity within the session), the Storage Indexes were highly effective and were able to physically skip reading the vast majority of the table (59,414MB) during the Full Table Scan operation and that only a  relatively small amount data (26MB) had to be returned back to the database servers.

By not having to read most of the data, the resultant Full Table Scan on this relatively large table was completed not in minutes as previously, but in a matter of a few seconds.

The potential power of Storage Indexes …