Exadata Storage Indexes – Part I (Beginning To See The Light) October 4, 2012Posted by Richard Foote in Exadata, Oracle Indexes, Storage Indexes.
Thought I might discuss Exadata Storage Indexes, explore what they are, how they work and discuss their advantages and disadvantages. Following is but a brief introduction on this very cool Exadata feature.
A Storage Index basically maintains summary information about database table data down on the Exadata storage servers. The information consists of the minimum value, the maximum value and a flag to denote whether any Nulls exist for up to 8 different columns within each 1M storage region of disk.
My little diagram above attempts to describe this (click on image for a larger version).
The Storage Indexes are created automatically and transparently based on the SQL predicate information executed by Oracle and passed down to the storage servers from the database servers. Storage Indexes take up no physical storage of themselves and are built and maintained entirely in memory. As only this very basic summary information is stored for a maximum of 8 columns for each 1M of storage, Storage Indexes are very lightweight and can be created and maintained with minimal general overheads.
So how are they used ?
During an Exadata Smart Scan, Oracle can perform predicate filtering down at the storage layer and so only return just the rows of interest back up to the database. As part of this process, Oracle can use the Storage Indexes to visit just the 1M storage regions that can potentially contain rows of interest. Those storage regions that can’t possibly contain data of interest can be eliminated and not accessed at all during the Smart Scan operation.
So, in a very simple example, if we have an SQL predicate such as WHERE CODE = 5, if a corresponding Storage Index on the CODE column of the first 1M region of the table has MIN=2 and MAX=10, Oracle would need to access this portion of the table as the CODE value of interest could potentially exist here. However, if the next 1M storage region had a CODE Storage Index with a MIN=7 and MAX=12, then the CODE value of 5 can’t possibly exist within this portion of the table and can be ignored and not accessed at all during the Smart Scan.
So depending on the column, predicates and data distribution, a Storage Index can potentially eliminate having to physically access significant portions of a table during a Smart Scan. In an extreme example, on a search of a CODE value = 42 where the maximum CODE value that actually exists is say 35, a Smart Scan can perform a so-called Full Table Scan (FTS) via a Storage Index that doesn’t actually have to perform any physical I/O at all and can ignore the entire table.
The less physical I/O performed, the less work required and the faster the response time. If a FTS of say a 500GB table only had to physically read and access a few MBs here and there AND just return the data of interest back to the database servers, that would significantly improve the overall performance and overheads associated with the FTS.
Storage Indexes can be very cool indeed.
Much more to come …