jump to navigation

Storage Indexes vs Database Indexes IV: 8 Column Limit (Eight Line Poem) May 1, 2013

Posted by Richard Foote in Exadata, Oracle Indexes, Smart Scans, Storage Indexes.
trackback

As Exadata Storage Indexes (SI) are purely memory only structures located on the Exadata storage servers, care needs to be taken in how much memory they can potentially consume. As a result, there is a limit of 8 columns (or 8 SIs) that can be defined for a given 1M storage region at any point in time, even though SIs are much smaller structures than equivalent database indexes. As database indexes are physical constructs however, there’s no such limit on the number of indexes that can be defined for a table. This can become another key difference between SIs and database indexes.

The following table has more than 8 columns, each with differing numbers of distinct values or distributions of data:

SQL> create table radiohead (id number, col1 number, col2 number, col3 number, col4 number, col5 number, col6 number, col7 number, col8 number, col9 number, col10 number, col11 number, col12 number, some_text varchar2(50));
Table created.

SQL> insert into radiohead select rownum, mod(rownum,10), mod(rownum,100), mod(rownum,1000), mod(rownum,10000), mod (rownum,100000), mod(rownum,1000000), ceil(dbms_random.value(0,10)), ceil(dbms_random.value(0,100)), ceil
 (dbms_random.value(0,1000)), ceil(dbms_random.value(0,10000)), ceil(dbms_random.value(0,100000)), ceil(dbms_random.value (0,1000000)), 'OK COMPUTER' from dual connect by level <=2000000; 2000000 rows created. SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

4000000 rows created.

SQL> commit;

Commit complete.

SQL> insert/*+ append */ into radiohead select * from radiohead;

8000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'RADIOHEAD', estimate_percent=>null, method_opt=> 'FOR ALL COLUMNS SIZE 1');

Let’s start by running a highly selective query of the ID column:

SQL> select * from radiohead where id = 42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=121212)
 filter("ID"=121212)

If we look at the session statistics, we’ll notice that a SI has been created and saved us physical IOs. Note: If you follow the demo, you’ll need to keep track of these statistics after each query or simply reconnect as a new session to ensure a SI has or has not been used.

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 1333.99219
 cell physical IO interconnect bytes returned by smart scan .164878845

OK, let’s now run a selective query on the COL1 column (there are no values 42 in this case and so no rows are returned):

SQL> select * from radiohead where col1=42;
no rows selected

Elapsed: 00:00:00.01

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1 | 52 | 42440 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1 | 52 | 42440 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL1"=42)
 filter("COL1"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2546.90625
 cell physical IO interconnect bytes returned by smart scan .341438293

Again, a SI has been created and used here. The SI in this case has been extremely beneficial because no data exists for 42 (only the values 1 – 10 exist). However, if an existing value were to be selected, the SI would be next to useless as such a value would exist throughout all 1M storage regions. With just 10 distinct randomly distributed values, this SI has the potential to be a waste of time. But while we search for values that don’t exist, it serves a very useful purpose. An important consideration in what’s to come.

If we now run a query now using column COL4:

SQL> select * from radiohead where col4=42;
1600 rows selected.

Elapsed: 00:00:00.68

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=42)
 filter("COL4"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 2612.64063 <= + 66MB
 cell physical IO interconnect bytes returned by smart scan 32.3048401

OK, this is the really important one to note. Firstly, yes again a SI has been created and used. Note though that this has not been the first SI to be created or used on this table; SIs have previously been created and used in the previous two queries on different columns. This will also not be the most recent SI to be created, more will soon follow. However, this is by far the least effective use of a SI, because of both the selectivity of the query and distribution of data. Here, only some 66MB or so of physical IOs have been saved.

We now repeat this process, running a query against a different column, being very selective and ensuring a SI is created and used. We finally reach a point when 8 SIs have been created on the table:

SQL> select * from radiohead where col9=0;
no rows selected

Elapsed: 00:00:00.02

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 15984 | 811K| 42561 (1)| 00:08:31 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 15984 | 811K| 42561 (1)| 00:08:31 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL9"=0)
 filter("COL9"=0)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 8792.94531
 cell physical IO interconnect bytes returned by smart scan 45.3872757

OK, we’ve now reached the point where  8 SIs have definitely been created on this table.

If we now run a query that could potentially use a SI but isn’t particularly effective, basically on a par to the one we saw created previously on COL4:

SQL> select * from radiohead where col10=42;
1536 rows selected.

Elapsed: 00:00:00.73

Execution Plan
----------------------------------------------------------
Plan hash value: 2516349655

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1600 | 83200 | 42577 (1)| 00:08:31 |
|* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42577 (1)| 00:08:31 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("COL10"=42)
 filter("COL10"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 8792.94531
cell physical IO interconnect bytes returned by smart scan 45.9288864

We notice that no bytes have been saved here via a SI. We can run this query repeatedly and the results will be the same. No SI is created and no bytes are saved. Although Oracle could potentially create a SI and save some work, the fact we already have 8 SIs created for this table means we have already reached the limit on the number of SIs that can be created for this table. 8 is it.

Let’s run another query now using yet another different column (COL12), but this time it’s again a very selective query, much more selective and efficient than the previous query based on COL4 in which a SI had been created:

SQL> select * from radiohead where col12=42;
8 rows selected.

Elapsed: 00:00:00.39

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 19 | 988 | 42607 (1)| 00:08:32 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 19 | 988 | 42607 (1)| 00:08:32 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL12"=42)
 filter("COL12"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 9526.01563
 cell physical IO interconnect bytes returned by smart scan 53.5218124

This time however the number of bytes saved has again gone up from previously meaning that indeed a new SI has been created and used for column COL12. But this makes 9 SIs in total now for this table where the limit should be a maximum of 8 ?

Does this mean that a previously created SI has been dropped and replaced by this new one. If so, which SI is now gone ?

Well, let’s go back to the first SI we created and the one that hasn’t been used for the longest period of time. If we re-run the first query again:

SQL> select * from radiohead where id=42;
8 rows selected.

Elapsed: 00:00:00.05

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 8 | 416 | 42425 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 8 | 416 | 42425 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ID"=42)
 filter("ID"=42)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844
 cell physical IO interconnect bytes returned by smart scan 53.5264816

We notice that not only has it used a SI, but it has saved the maximum amount of IO bytes possible here meaning there was no “warming up” processes happening here indicating a newly created SI. So not only did it use a SI, it clearly used a previously created one. So this SI was not obviously impacted by the creation of this “9th” SI.

However, if we run the query again that used column COL4, the query that previously used a SI but was by far the least effective in saving physical IOs:

SQL> select * from radiohead where col4 = 4242;
1600 rows selected.

Elapsed: 00:00:00.73

Execution Plan
 ---------------------------------------------------------------------------------------
 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
 ---------------------------------------------------------------------------------------
 | 0 | SELECT STATEMENT | | 1600 | 83200 | 42486 (1)| 00:08:30 |
 |* 1 | TABLE ACCESS STORAGE FULL| RADIOHEAD | 1600 | 83200 | 42486 (1)| 00:08:30 |
 ---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("COL4"=4242)
 filter("COL4"=4242)

SQL> select name , value/1024/1024 MB from v$statname n, v$mystat s where n.statistic# = s.statistic# and n.name in
 ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index');

NAME MB
 ---------------------------------------------------------------- ----------
 cell physical IO bytes saved by storage index 10726.9844 No Change !!
 cell physical IO interconnect bytes returned by smart scan 54.1522598

This time we notice there’s no change to the number of bytes saved by a SI. No matter how often we run this query now, no SI is used. So the SI that was created previously is now gone as a result of creating the more effective SI on the COL12 column. Indeed there are still just the 8 SIs on this table.

So Oracle will indeed limit the number of SIs to 8 for each table/storage region. However, it’s not simply a case of “first in first served” or some such, with Oracle using (undocumented) performance metrics to determine which 8 SIs/columns to choose. This means it might be possible in some rarer scenarios where more than 8 columns get referenced in SQL statements for SIs to come and go depending on changing workloads.  Another example of where database indexes may yet play a role in Exadata environments, where currently tables have more than 8+ indexed columns.

Comments»

1. Uwe Hesse - May 28, 2013

Great piece of research & demonstration, Richard! I’m always pointing to your posts on Oracle University Exadata courses 🙂
Kind regards
Uwe

Like

Richard Foote - May 31, 2013

Thanks for the kind words, much appreciated 🙂

I’ll actually be in your part of the world in the next couple of weeks delivering a seminar for Oracle University so I’ll be looking forward to escaping our winter here for a while !!

Like


Leave a comment