jump to navigation

Exadata Storage Indexes Part II – Differences With Database Indexes (Space Dementia) October 9, 2012

Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
trackback

Let’s explore some of the key differences between Storage Indexes (SI) and Database Indexes (DI). In no particular order, they include:

SIs are structures that exist only within the storage servers of an Exadata box, while DIs logically exist and can be accessed within the database servers.

SIs are purely memory only structures while DIs are physical segments that take up storage. As such, DIs are relatively expensive to both create and subsequently maintain as they generate considerable undo and redo within the database, can cause concurrency issues and require storage resources. SIs meanwhile require no physical storage and have little impact on DML operations.

SIs are generated automatically and transparently while DIs generally need to be explicitly created (except in some scenarios such as in the creation of Primary/Unique keys when they can be implicitly created).

SIs being memory only structures are transient in that if a storage server were to be restarted, the corresponding SIs are lost and need to be re-created. Additionally, Oracle may decide to drop a SI for a particular column and create one on a different column depending on current load and conditions. DIs are permanent objects that need to be explicitly dropped (except in some scenarios such as the dropping/disabling of Primary/Unique Key constraints when they can be implicitly dropped).

SIs can be stored in memory as they contain very brief summary information, just the min/max value and a null flag for each 1MB storage region. A corresponding DI (especially a B-Tree) would generally be significantly larger as it needs to store all indexed values from the table with associated rowids (unless compressed but still likely much larger even so).

SIs can index only a portion of a table at a specific point in time as they get generated and dropped (see above). DIs index the entire table/partition (prior to 12c), unless using smarts such as decode function-based indexes (which also index the entire table but based only on the results of the function).

SIs are limited to only 8 columns whereas DIs have no such limitations per table.

SIs reference a 1MB storage region whereas a DI references a specific database block (say 8K). Therefore, a DI is more “focused” in terms of the minimum amount of data that needs to be accessed.

SIs basically work by determining which areas of storage can not possibly contain data of interest, accessing just those storage regions that might contain data of interest. Therefore, it’s quite possible for a SI to generate false positives by having to access storage that might in the end not actually contain data of interest after all. A DI meanwhile via the rowid explicitly points to the exact location of an indexed value and does not generate false positives. A bitmap index is a little different in the manner in which rowids are stored and generated (and can have 0 bits set for rows that don’t actually exist) but again do not generate false positives when actually accessing the table blocks.

SIs are only used during Smart Scan operations, which in turn are only performed during direct-reads of full scans of larger database segments (tables / indexes / materialized views and partitions thereof). Therefore SIs are only used when DIs are not.

As SIs access data during a Smart Scan, the resultant data by-passes the Database Buffer Cache and can not be re-used by subsequent database users/operations. Therefore, SI accessed data may need to be frequently physically re-accessed. DIs perform single block reads (except for Fast Full Index Scans) which are cached in the Database Buffer Cache and which can therefore be globally reused by the database. Once cached, it may be unnecessary to subsequently physically re-access the DI retrieved data.

SIs are used, even if the majority of the data needs to be accessed regardless. As SIs are only used during a FTS, the concept of only using an index when it’s the cheaper alternative doesn’t apply to SIs. If a SI can save (say) just 5% of physical I/Os during a FTS, it’s better than no savings at all. DIs meanwhile are only used when the Cost Based Optimizer (CBO) considers it the cheapest option when accessing data.

As SIs are storage based structures, the CBO has no knowledge of their existence and play no part in the CBO cost calculations. DIs are fully known to the CBO and the DI related statistics are an important factor in the CBO calculations. The CBO only determines whether a FTS is the cheaper alternative, however the decision to perform a Direct-Read operation and so potentially enable the use of SIs is a run-time decision not made by the CBO.

SIs can be effectively used for IS NULL predicates, thanks to the null existence flag component of the SI. B-Tree Indexes can’t if all indexed columns are null (as such entries are not indexed) although Bitmap indexes can.

SIs can not be used to police Primary/Unique constraints. DIs can.

SIs can not be used to avoid performance issues in relation to Foreign Keys (such as locking implications and FTS requirements on child tables when deleting parent rows). DIs can.

SIs can not avoid sort operations. DIs can as data read via an index range scan is guaranteed to be returned in the order of the index entries.

SIs can not provide additional statistical information to the CBO, such as accurate selectivity information in multi-column predicates available in concatenated index distinct keys statistics. DIs can.

SIs can not be used to efficiently access the MIN/MAX of a column. DIs can.

Function-Based SIs are not supported. Function-Based DIs are supported.

SIs can not be treated as smaller tables and used as an alternative by the CBO to access just index related data, eg. select count(*), select indexed_column, etc. as SIs do not contain all the required data and are not visible to the CBO anyways. DIs can be treated as smaller tables and accessed accordingly if appropriate.

OK, that’s enough of a list for now to get one thinking about some of these differences 🙂

In the following posts, I’ll go through the benefits of SIs and show examples of how they’re implemented and used by Oracle.

Comments»

1. Ora600Tom - October 9, 2012

Excellent! Thank you very much Richard.
Thomas

Like

Richard Foote - October 10, 2012

My pleasure Tom 🙂

Like

2. Rebrickable (@Rebrickable) - October 10, 2012

Why can’t Oracle incorporate the use of storage indexes for normal full scans on normal storage, not just on the Exadata platform?

Like

Richard Foote - October 10, 2012

Hi Brick

Well, I can’t really answer as I don’t make those decisions. But I would suggest these things are a lot easier to implement if you control both the hardware infrastructure and software. And it makes sense to me if there’s a dash of “secret sauce” mixed into the Exadata platform 🙂

Like

3. saruamit4 - October 10, 2012

Wow !! SIs Vs DIs, Thank you Richard. Now waiting for next post.

Like

Richard Foote - October 10, 2012

Hopefully you won’t have to wait too long 🙂

Like

4. Uwe Hesse - October 10, 2012

Essentially there are much more differences between Database Indexes and Storage Indexes than they have in common.

They have in fact almost only the term ‘index’ in common – the post is therefore a little like comparing watch dogs and hot dogs (we probably shouldn’t have named these things indexes!)

That is not meant to say it doesn’t contain interesting information about watch dogs 😉

Like

Richard Foote - October 11, 2012

Hi Uwe

Thanks for the comments although having missed out on breakfast this morning and all this talk of hot dogs has got me really hungry now 🙂

OK, you’ve just given me the idea that I need a post now that describes what both SIs and DIs do have in common.

Like

5. Houri Mohamed - October 10, 2012

Thanks very much for this comparison.
You said
“SIs are only used during Smart Scan operations, which in turn are only performed during direct-reads of full scans of larger database segments (tables / indexes / materialized views and partitions thereof). Therefore SIs are only used when DIs are not”

But Smart Scan operations can also be done on regular B-Tree or Bitmap Index (DI) if these two segments are fast fully scanned via a direct path read.
Do you mean that its impossible to see in the same execution plan DI and SI smart scan?

You said
“As SIs access data during a Smart Scan, the resultant data by-passes the Database Buffer Cache and can not be re-used by subsequent database users/operations. Therefore, SI accessed data may need to be frequently physically re-accessed. DIs perform single block reads (except for Fast Full Index Scans) which are cached in the Database Buffer Cache and which can therefore be globally reused by the database. Once cached, it may be unnecessary to subsequently physically re-access the DI retrieved data.

But DI Fast Fully scanned via a direct path read will not be cached in the Database Buffer Cache also. right?

Best regards

Like

Richard Foote - October 11, 2012

Hi Houri

Yes, smart scans can be done on indexes (as I said), but as indexes don’t have storage indexes, they can’t be used concurrently. Although of course, within the same execution plan, you can have a mixture of all sorts of operations (index scans, FTS both direct and non-direct, etc.). Note also that SIs are never mentioned within execution plans.

Yes, fast full index scanned via a direct read will also bypass the buffer cache, which is why I mentioned that exception.

Keep the questions coming 🙂

Like

6. Paul - December 14, 2012

You mentioned that “SIs cannot be used to efficiently access the MIN/MAX of a column”. It seems like they should be able to help out somewhat by excluding the 1Mb areas which CANNOT store the min/max. Though this may not be much of a help in some cases. Is that what you meant by “efficiently”?

Like

Richard Foote - December 18, 2012

Hi Paul

I thought it easier to write a quick blog post rather than try and squeeze an answer within a comment 🙂

Like

7. Oracle Datenbanken - May 14, 2013

Thanks a lot for the great and useful comparison!
It helped me a lot to get more information about SIs and DIs.
I´d like to learn more about Oracle so I´m waiting for your new posts.

Have a nice Day!

Like

Richard Foote - May 15, 2013

I’ll try and keep them coming 🙂

Like


Leave a comment