Exadata Storage Indexes Part II – Differences With Database Indexes (Space Dementia) October 9, 2012Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
Let’s explore some of the key differences between Storage Indexes (SI) and Database Indexes (DI). In no particular order, they include:
SIs are structures that exist only within the storage servers of an Exadata box, while DIs logically exist and can be accessed within the database servers.
SIs are purely memory only structures while DIs are physical segments that take up storage. As such, DIs are relatively expensive to both create and subsequently maintain as they generate considerable undo and redo within the database, can cause concurrency issues and require storage resources. SIs meanwhile require no physical storage and have little impact on DML operations.
SIs are generated automatically and transparently while DIs generally need to be explicitly created (except in some scenarios such as in the creation of Primary/Unique keys when they can be implicitly created).
SIs being memory only structures are transient in that if a storage server were to be restarted, the corresponding SIs are lost and need to be re-created. Additionally, Oracle may decide to drop a SI for a particular column and create one on a different column depending on current load and conditions. DIs are permanent objects that need to be explicitly dropped (except in some scenarios such as the dropping/disabling of Primary/Unique Key constraints when they can be implicitly dropped).
SIs can be stored in memory as they contain very brief summary information, just the min/max value and a null flag for each 1MB storage region. A corresponding DI (especially a B-Tree) would generally be significantly larger as it needs to store all indexed values from the table with associated rowids (unless compressed but still likely much larger even so).
SIs can index only a portion of a table at a specific point in time as they get generated and dropped (see above). DIs index the entire table/partition (prior to 12c), unless using smarts such as decode function-based indexes (which also index the entire table but based only on the results of the function).
SIs are limited to only 8 columns whereas DIs have no such limitations per table.
SIs reference a 1MB storage region whereas a DI references a specific database block (say 8K). Therefore, a DI is more “focused” in terms of the minimum amount of data that needs to be accessed.
SIs basically work by determining which areas of storage can not possibly contain data of interest, accessing just those storage regions that might contain data of interest. Therefore, it’s quite possible for a SI to generate false positives by having to access storage that might in the end not actually contain data of interest after all. A DI meanwhile via the rowid explicitly points to the exact location of an indexed value and does not generate false positives. A bitmap index is a little different in the manner in which rowids are stored and generated (and can have 0 bits set for rows that don’t actually exist) but again do not generate false positives when actually accessing the table blocks.
SIs are only used during Smart Scan operations, which in turn are only performed during direct-reads of full scans of larger database segments (tables / indexes / materialized views and partitions thereof). Therefore SIs are only used when DIs are not.
As SIs access data during a Smart Scan, the resultant data by-passes the Database Buffer Cache and can not be re-used by subsequent database users/operations. Therefore, SI accessed data may need to be frequently physically re-accessed. DIs perform single block reads (except for Fast Full Index Scans) which are cached in the Database Buffer Cache and which can therefore be globally reused by the database. Once cached, it may be unnecessary to subsequently physically re-access the DI retrieved data.
SIs are used, even if the majority of the data needs to be accessed regardless. As SIs are only used during a FTS, the concept of only using an index when it’s the cheaper alternative doesn’t apply to SIs. If a SI can save (say) just 5% of physical I/Os during a FTS, it’s better than no savings at all. DIs meanwhile are only used when the Cost Based Optimizer (CBO) considers it the cheapest option when accessing data.
As SIs are storage based structures, the CBO has no knowledge of their existence and play no part in the CBO cost calculations. DIs are fully known to the CBO and the DI related statistics are an important factor in the CBO calculations. The CBO only determines whether a FTS is the cheaper alternative, however the decision to perform a Direct-Read operation and so potentially enable the use of SIs is a run-time decision not made by the CBO.
SIs can be effectively used for IS NULL predicates, thanks to the null existence flag component of the SI. B-Tree Indexes can’t if all indexed columns are null (as such entries are not indexed) although Bitmap indexes can.
SIs can not be used to police Primary/Unique constraints. DIs can.
SIs can not be used to avoid performance issues in relation to Foreign Keys (such as locking implications and FTS requirements on child tables when deleting parent rows). DIs can.
SIs can not avoid sort operations. DIs can as data read via an index range scan is guaranteed to be returned in the order of the index entries.
SIs can not provide additional statistical information to the CBO, such as accurate selectivity information in multi-column predicates available in concatenated index distinct keys statistics. DIs can.
SIs can not be used to efficiently access the MIN/MAX of a column. DIs can.
Function-Based SIs are not supported. Function-Based DIs are supported.
SIs can not be treated as smaller tables and used as an alternative by the CBO to access just index related data, eg. select count(*), select indexed_column, etc. as SIs do not contain all the required data and are not visible to the CBO anyways. DIs can be treated as smaller tables and accessed accordingly if appropriate.
OK, that’s enough of a list for now to get one thinking about some of these differences :)
In the following posts, I’ll go through the benefits of SIs and show examples of how they’re implemented and used by Oracle.