Cost of Virtual Indexes (Little Lies) May 30, 2012
Posted by Richard Foote in CBO, Clustering Factor, Fake Indexes, Oracle Indexes, Virtual Indexes.5 comments
I’ve previously discussed Virtual Indexes and how they can be used to do basic “what if” analysis if such an index really existed. However, a recent comment on the OTN forums regarding using them to compare index costs made me think a follow-up post regarding the dangers of Virtual Indexes might be warranted.
The big advantage of a Virtual Index of course is that it doesn’t really exist and so consumes no storage and can be created extremely quickly/cheaply. The disadvantage of a Virtual index is that it doesn’t really exist and so Oracle can not collect segment level statistics. Without statistics however, the CBO has a very tough time of doing its job properly …
To illustrate, a simple little demo. I begin by creating a table in which the data in the table is stored in CODE column order. An index on the CODE column would therefore have an excellent (very low) Clustering Factor. Note that the Clustering Factor is the most important index related statistic regarding the efficiency and potential cost of using the index.
SQL> create table bowie_ordered (id number, code number, name varchar2(30)); Table created. SQL> create sequence bowie_seq; Sequence created. SQL> declare 2 begin 3 for i in 1..100 loop 4 for j in 1..10000 loop 5 insert into bowie_ordered values (bowie_seq.nextval, i, 'DAVID BOWIE'); 6 end loop; 7 end loop; 8 end; 9 / PL/SQL procedure successfully completed.
OK, I’m now going to create a Virtual Index on the CODE column and collect 100% accurate statistics on the table:
SQL> create index bowie_ordered_i on bowie_ordered(code) nosegment; Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_ORDERED', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
Oracle allows statistics to be collected on the table and associated Virtual Index (so that existing statistic gathering jobs won’t now fail), however without an associated segment, no statistics can actually be derived for the index itself.
SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_name='BOWIE_ORDERED_I'; no rows selected
I’ll list the system statistics so anyone who wants to replicate the demo can get similar results (the database blocksize is 8K):
SQL> SELECT pname, pval1 FROM SYS.AUX_STATS$ WHERE pname IN ('SREADTIM', 'MREADTIM', 'MBRC', 'CPUSPEED'); PNAME PVAL1 ------------------------------ ---------- SREADTIM 2 MREADTIM 10 CPUSPEED 1000 MBRC 20
If we run the following query:
SQL> set arraysize 5000 SQL> select * from bowie_ordered where code = 42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1678744259 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 195K| 1005 (13)| 00:00:03 | |* 1 | TABLE ACCESS FULL| BOWIE_ORDERED | 10000 | 195K| 1005 (13)| 00:00:03 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3471 consistent gets 0 physical reads 0 redo size 100802 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
We note that the CBO uses a Full Table Scan as the CBO has no real choice here as Virtual Indexes are not considered by default.
However, if we change the following hidden parameter and re-run:
SQL> alter session set "_use_nosegment_indexes" = true; Session altered. SQL> select * from bowie_ordered where code = 42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1860500051 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 195K| 9(0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED | 10000 | 195K| 9(0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | BOWIE_ORDERED_I | 10000 | | 1(0)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 3471 consistent gets 0 physical reads 0 redo size 100802 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
We notice a few interesting details. Firstly, the CBO has decided to use the Virtual Index however the number of consistent gets remains the same as the previous run so we can clearly see that behind the covers, the Full Table Scan is still performed. The index is not “really there” and so at execution time, the SQL statement is reparsed using the next best available plan.
If we look at the execution plan costs, both the estimate row (10000) and byte values are spot on as these statistics are based on the underlining table/column statistics and the 100 distinct CODE values are evenly distributed. However, the index related costs look remarkably low. Just a cost of 1 to read the index and extract 10,000 index entries (that means an index entry is less than 1 byte in length on average !!). Just a cost of 9 to visit the table and read 10,000 rows. Even with the most efficient of physical indexes, these costings are not realistic and are based on highly questionable default metrics.
Basically, the creation of this Virtual Column is telling us that there is no reason why the index couldn’t potentially be used, IF (that’s a big IF in case no-one noticed) the actual index related statistics are such that the CBO determines the index to be the cheaper option. But it depends on the actual characteristics of the index which can’t be accurately determined until it’s been physically created.
As the Virtual Index suggests the index might be used if it existed, let’s now create it for real:
SQL> drop index bowie_ordered_i; Index dropped. SQL> create index bowie_ordered_i on bowie_ordered(code); Index created. SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_ name='BOWIE_ORDERED_I'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- BOWIE_ORDERED_I 1000000 3546
As predicted, a Clustering Factor of 3546 on an index with 1M index entries is indeed nice and low.
If we now re-run the query again:
SQL> select * from bowie_ordered where code = 42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1860500051 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 195K| 60(4)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| BOWIE_ORDERED | 10000 | 195K| 60(4)| 00:00:01 | |* 2 | INDEX RANGE SCAN | BOWIE_ORDERED_I | 10000 | | 23(5)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 61 consistent gets 21 physical reads 0 redo size 100802 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
Indeed, the index has been used by the CBO. However, note that the costs are substantially higher (and more accurate) than previously suggested with the Virtual Index. Indeed the final cost of 60 is very close to the number of consistent gets (61) required by the execution plan and so suggests the CBO is making reasonable index based calculations here.
OK, another demo, but this time with a table in which the CODE values are distributed throughout the whole table (rather than being perfectly clustered together as in the first example):
SQL> create table bowie_random (id number, code number, name varchar2(30)); Table created. SQL> insert into bowie_random select rownum, mod(rownum,100)+1, 'DAVID BOWIE' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>'BOWIE', tabname=>'BOWIE_RANDOM', cascade=>true, estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
When we now create a Virtual Index based on the CODE column and re-run the same query:
SQL> create index bowie_random_i on bowie_random(code) nosegment; Index created. SQL> select * from bowie_random where code = 42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 711259049 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 195K| 9 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM | 10000 | 195K| 9 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | BOWIE_RANDOM_I | 10000 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3470 consistent gets 0 physical reads 0 redo size 100802 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
We notice that both the execution plan and all the associated costs are identical to those of the previous example. So although the actual Clustering Factor of the index is likely to be dramatically greater here than it was in the previous example and so likely dramatically impact the costs associated with using this index, the Virtual Index is treated and costed identically. This is the simple consequence of not having the physical index structure by which to calculate the appropriate segment statistics.
If we now physically create this index for real:
SQL> drop index bowie_random_i; Index dropped. SQL> create index bowie_random_i on bowie_random(code); Index created. SQL> select index_name, num_rows,clustering_factor from dba_indexes where index_ name='BOWIE_RANDOM_I'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- BOWIE_RANDOM_I 1000000 344700
We can see that indeed the Clustering Factor is dramatically worse than before, increasing here from 3546 to 344700.
If we now re-run the query:
SQL> select * from bowie_random where code = 42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1983602984 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 195K| 1005 (13)| 00:00:03 | |* 1 | TABLE ACCESS FULL| BOWIE_RANDOM | 10000 | 195K| 1005 (13)| 00:00:03 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=42) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 3470 consistent gets 0 physical reads 0 redo size 100802 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
We see the CBO has decided to perform the now cheaper Full Table Scan. Although the Virtual Index on this column was used, once the actual characteristics of the index are determined via the index statistics, the CBO has decided the actual physical index was just too expensive to use to retrieve the 1% of rows.
If we re-run the query with an index hint:
SQL> select /*+ index (bowie_random) */ * from bowie_random where code = 42; 10000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 711259049 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 195K| 3483 (1)| 00:00:07 | | 1 | TABLE ACCESS BY INDEX ROWID| BOWIE_RANDOM | 10000 | 195K| 3483 (1)| 00:00:07 | |* 2 | INDEX RANGE SCAN | BOWIE_RANDOM_I | 10000 | | 23 (5)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 3472 consistent gets 21 physical reads 0 redo size 100802 bytes sent via SQL*Net to client 534 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10000 rows processed
The index is used but we can see why at a cost of 3483, the Full Table Scan at a cost of only 1005 was selected by the CBO.
Virtual Indexes can be useful to quickly determine whether an index is a viable option if it were to be actually created. However, caution needs to be exercised if Virtual Indexes are used for cost comparison purposes and although Virtual Indexes might be used by the CBO, it might be another story entirely once the index is physically created and the actual index related statistics determined.
Introduction to Fake / Virtual / NOSEGMENT Indexes January 11, 2008
Posted by Richard Foote in Fake Indexes, Index Access Path, NOSEGMENT Option, Oracle Cost Based Optimizer, Oracle Indexes, Virtual Indexes.10 comments
OK, as promised, answer to index fact #5 you may not have known:
“It’s possible to make the CBO reference and use within an execution plan indexes that don’t in actual fact exist”.
Before I start, please note this feature is not officially documented other than the odd Metalink note and requires the setting of an undocumented parameter to work, so please exercise caution.
Fake Indexes (also known as Virtual or Nosegment Indexes) have been around for a long time, since 8i days. They’re used primarily by Oracle Enterprise Manager and its Tuning Pack which has various wizards that can do “what if” type analysis. One of these is the Index Wizard which can kinda “pretend” to create an index and see what the Cost Based Optimizer might do if such an index really existed.
It’s possible to create these Fake indexes manually by using the NOSEGMENT clause when creating an index:
CREATE INDEX Bowie_idx ON Bowie_Table(Ziggy) NOSEGMENT;
This will populate some (but not many) DD related tables but will not actually create an index segment or consume any actual storage. It’s not maintained in any way by DML operations on the parent table and it can’t be altered or rebuilt as can a conventional, “real” index (it will generate an ORA-08114 error if you try to do so). You can analyze or run dbms_stats over the index but the index is not treated as analyzed as such (as can be seen via a 10053 trace).
It’s also only visible to the CBO, if and only if a session has the following parameter set:
ALTER SESSION SET “_use_nosegment_indexes” = true;
The CBO will now consider the index and potentially include it within an execution plan. However, at execution time Oracle can of course not use the index and will revert to the next best thing.
A Fake index is basically an index you have when you don’t really have an index in order to see if it could be useful if it really existed.
This Fake Indexes Demo shows how they work and can be used.
1 down, 7 to go … 😉
AIOUG Sangam21: “10 Things You Might Not Know, But Really Should, About Oracle Indexes” November 26, 2021
Posted by Richard Foote in Oracle Indexes, Sangam21.add a comment
It seems to be the conference season 🙂
I have the very great pleasure of presenting at this years AIOUG Sangam21 Virtual Conference, on Friday, 3 December 2021 between 12pm-1pm IST (+5.5GMT). I’ll be presenting a version of my “10 Things You Might Not Know, But Really Should, About Oracle Indexes” presentation.
I’ve yet to finalise my list, but it will definitely include information on how the CBO costs the use of an index, will definitely include information on how to use the TABLE_CACHED_BLOCKS statistics gathering preference, and will definitely include as much useful tips and tricks as I can cram into a 1 hour presentation.
To register and join in on the fun, please visit: https://www.aioug.org/sangam21#agenda
Oracle Groundbreakers APAC Virtual Tour 2021: “Automatic Indexing: An Update On Improvements and New Capabilities” November 19, 2021
Posted by Richard Foote in Automatic Indexing.add a comment
I’m very excited to have another opportunity to present at the upcoming APACOUC Oracle Groundbreakers APAC Virtual Tour 2021 a somewhat expanded version of my new presentation “Automatic Indexing: An Update On Improvements and New Capabilities”.
This massive virtual conference runs from 22 November through to 11 December and features some of the very best talent in the Oracle community. For all the session details and to register for FREE, please visit: https://apacgbt.mykommu.com/
My session is on at Monday, 29 November at 12pm (AEDT), with the following description:
“One the most impressive and exciting database innovations introduced in Oracle Database 19c is “Automatic Indexing”, an expert system built into the database engine that not only recommends potential new indexes to create, but will actually introduce new indexes in a safe and automated manner.
This session explores in-depth this ground breaking feature, detailing how Automatic Indexing works, how it can be deployed and highlights a number of improvements and new capabilities that have been introduced since Automatic Indexing was first released.
Attendees will gain an excellent appreciation of the power to Automatic Indexing and how this feature can substantially assist in reducing the risk of having a sub-optimal indexing strategy that causes performance implications.”
Even if you don’t have the opportunity to directly use Automatic Indexing, you will pick up some useful indexing hints and tips.
Hope to see you then 🙂
Oracle 19c Automatic Indexing: Function-Based Indexes? Part II (If You Can See Me) February 5, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Function Based Indexes, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Virtual Columns.1 comment so far
In my previous post, I discussed how Automatic Indexing does not currently support creating an index based on a function or expression predicate, even if it’s an equality predicate. You must manually create the associated function-based index.
However, if you have access to the application, there’s a better strategy when frequently searching on a function-based predicate. That’s to create a Virtual Column and use this column in your searching criteria (as mentioned by Connor McDonald in this comment).
To illustrate, I’m going to drop the previously manually created function-based index and hence the associated hidden virtual column, as Oracle quite rightly doesn’t allow you to have two virtual columns based on the same expression in the same table.
SQL> drop index david_upper_name_i; Index dropped.
Since Oracle 11g, Oracle has supported the use of Visible Virtual Columns, a column that doesn’t physically exist, but defines a function/expression that can be easily accessed and populated when queried.
I’ll next create a Virtual Column called UPPER_NAME that is defined not based on a Data Type, but on the result on the UPPER function on the previously defined NAME column:
SQL> alter table david add (upper_name as (upper(name))); Table altered.
Regardless of size of table, this column is added virtually instantly (pun fully intended), as no data is physically stored in the table itself. I view it (yep, another pun) as a “mini-view”, that can be used to hide complexity from the developer, with the actual data derived at run-time when the column is accessed in an SQL.
After I generate fresh statistics:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', estimate_percent=>null); PL/SQL procedure successfully completed. SQL> select column_name, hidden_column, virtual_column, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID'; COLUMN_NAME HID VIR NUM_DISTINCT DENSITY HISTOGRAM -------------------- --- --- ------------ ---------- --------------- NAME NO NO 10000000 0 HYBRID MORE_STUFF9 NO NO 1 .00000005 FREQUENCY MORE_STUFF8 NO NO 1 .00000005 FREQUENCY MORE_STUFF7 NO NO 1 .00000005 FREQUENCY MORE_STUFF6 NO NO 1 .00000005 FREQUENCY MORE_STUFF5 NO NO 1 .00000005 FREQUENCY MORE_STUFF4 NO NO 1 .00000005 FREQUENCY MORE_STUFF3 NO NO 1 .00000005 FREQUENCY MORE_STUFF2 NO NO 1 .00000005 FREQUENCY MORE_STUFF10 NO NO 1 .00000005 FREQUENCY MORE_STUFF1 NO NO 1 .00000005 FREQUENCY ID NO NO 10000000 0 HYBRID CODE NO NO 10000 .0001 HYBRID UPPER_NAME NO YES 10000000 0 HYBRID
Note how the UPPER_NAME virtual column is NOT hidden and now has up to date statistics.
We can now run this simplified query based on the new UPPER_NAME column, which does not need to include the potentially complex function expression:
SQL> select * from david where upper_name='DAVID BOWIE 42'; 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2426813604 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 200 | 3349 (6) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | DAVID | 1 | 200 | 3349 (6) | 00:00:01 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("UPPER_NAME"='DAVID BOWIE 42') filter("UPPER_NAME"='DAVID BOWIE 42') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 263469 consistent gets 263452 physical reads 0 redo size 1328 bytes sent via SQL*Net to client 375 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
If we look at portions of the subsequent Automatic Indexing report:
SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 360.71 MB (360.71 MB / 0 B) Indexes dropped : 0 SQL statements verified : 2 SQL statements improved (improvement factor) : 2 (263476.8x) SQL plan baselines created : 0 Overall improvement factor : 263476.8x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ------------------------------------------------------------------------------- --------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | --------------------------------------------------------------------------- | BOWIE | DAVID | SYS_AI_4k4mkgkw049ht | UPPER_NAME | B-TREE | NONE | --------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 7tfqh3pu526mt SQL Text : select * from david where upper_name='DAVID BOWIE 42' Improvement Factor : 263484.7x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 1471249 1414 CPU Time (s): 300584 986 Buffer Gets: 3161816 4 Optimizer Cost: 3349 4 Disk Reads: 3161432 3 Direct Writes: 0 0 Rows Processed: 12 1 Executions: 12 1 PLANS SECTION -------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 2426813604 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3349 | | | 1 | TABLE ACCESS STORAGE FULL | DAVID | 1 | 200 | 3349 | 00:00:01 | ----------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - cardinality_feedback = yes - With Auto Indexes ----------------------------- Plan Hash Value : 1447691372 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 200 | 4 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID | 1 | 200 | 4 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_4k4mkgkw049ht | 1 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("UPPER_NAME"='DAVID BOWIE 42') Notes ----- - Dynamic sampling used for this statement ( level = 11 )
We see from the report that Automatic Indexing has now created the associated, implicitly created function-based index (SYS_AI_4k4mkgkw049ht) based on the virtual UPPER_NAME column:
SQL> select index_name, index_type, auto, constraint_index, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='DAVID'; INDEX_NAME INDEX_TYPE AUT CON VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR -------------------- --------------------------- --- --- --------- -------- ---------- ----------- ----------------- SYS_AI_4k4mkgkw049ht FUNCTION-BASED NORMAL YES NO VISIBLE VALID 10000000 43104 2136839 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='DAVID' order by index_name, column_position; INDEX_NAME COLUMN_NAME COLUMN_POSITION -------------------- -------------------- --------------- SYS_AI_4k4mkgkw049ht UPPER_NAME 1
If we now re-run the SQL query:
SQL> select * from david where upper_name='DAVID BOWIE 4242'; 1 row selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1447691372 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 200 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID | 1 | 200 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_4k4mkgkw049ht | 1 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("UPPER_NAME"='DAVID BOWIE 4242') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1334 bytes sent via SQL*Net to client 377 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The CBO now uses the new Automatic Index to significantly improve the performance of the query.
So not only is using a user defined Virtual Column a cleaner solution with respect to the frequent use of a function-based expressions, but has the added advantage of being supported with Automatic Indexing.
Oracle 19c Automatic Indexing: Function-Based Indexes? (No Plan) February 4, 2021
Posted by Richard Foote in 19c, 19c New Features, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Function Based Indexes, Oracle, Oracle Cloud, Oracle General, Oracle Indexes, Oracle19c, Virtual Columns.3 comments
I previously discussed how Automatic Indexing only currently supports Equality based predicates.
The question I have today is does Automatic Indexing support function-based indexes? Let’s take a look.
The below DAVID table has the key column NAME which is an effectively unique VARCHAR2 column:
SQL> create table david (id number, code number, name varchar2(42), more_stuff1 varchar2(42), more_stuff2 varchar2(42), more_stuff3 varchar2(42), more_stuff4 varchar2(42), more_stuff5 varchar2(42), more_stuff6 varchar2(42), more_stuff7 varchar2(42), more_stuff8 varchar2(42), more_stuff9 varchar2(42), more_stuff10 varchar2(42)); Table created. SQL> insert into david select rownum, mod(rownum, 10000)+1, 'David Bowie '|| rownum, 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust', 'Ziggy Stardust' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', estimate_percent=>null); PL/SQL procedure successfully completed.
If we look at the current details of the table columns:
SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM -------------------- ------------ ---------- --------------- NAME 10000000 0 HYBRID MORE_STUFF9 1 .00000005 FREQUENCY MORE_STUFF8 1 .00000005 FREQUENCY MORE_STUFF7 1 .00000005 FREQUENCY MORE_STUFF6 1 .00000005 FREQUENCY MORE_STUFF5 1 .00000005 FREQUENCY MORE_STUFF4 1 .00000005 FREQUENCY MORE_STUFF3 1 .00000005 FREQUENCY MORE_STUFF2 1 .00000005 FREQUENCY MORE_STUFF10 1 .00000005 FREQUENCY MORE_STUFF1 1 .00000005 FREQUENCY ID 10000000 0 HYBRID CODE 10000 .0001 HYBRID
We notice the same oddity of my previous post that all columns have histograms…
Let’s run the following query with an UPPER function-based predicate that returns only the one row:
SQL> select * from david where upper(name) = 'DAVID BOWIE 4242'; Execution Plan ---------------------------------------------------------- Plan hash value: 2426813604 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K | 17M | 3350 (6) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | DAVID | 100K | 17M | 3350 (6) | 00:00:01 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(UPPER("NAME")='DAVID BOWIE 4242') filter(UPPER("NAME")='DAVID BOWIE 4242') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 263469 consistent gets 263452 physical reads 0 redo size 1256 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
What does Automatic Indexing make of this scenario?
Basically, it does nothing. Currently, Automatic Indexing does NOT support such function-based indexes, even with equality based predicates (as of at least version 19.5.0.0.0). If we look at the next Automatic Indexing report:
SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 2 SQL statements improved : 0 SQL plan baselines created : 0 Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 -------------------------------------------------------------------------------
No such function-based index is ever created by Automatic Indexing:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='DAVID'; no rows selected
To improve the performance of this query, one has to manually create the necessary function-based index:
SQL> create index david_upper_name_i on david(upper(name)); Index created.
If we now re-run the query:
SQL> select name from david where upper(name) = 'DAVID BOWIE 4242'; Execution Plan ---------------------------------------------------------- Plan hash value: 2675555529 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K | 4199K | 3175 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID | 100K | 4199K | 3175 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | DAVID_UPPER_NAME_I | 40000 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(UPPER("NAME")='DAVID BOWIE 4242') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 369 bytes sent via SQL*Net to client 384 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The query now uses the function-based index to significantly improve the performance of this query, with just 5 consistent gets.
Note however as with all function-based indexes, by default the estimated cardinality estimate and associated CBO costs are way off (100K rows are estimated, not the 1 row that is actually returned). This is due to the CBO having no real idea of the number and distribution of values coming out of the “black box” function-based predicate.
This is why Oracle automatically creates an hidden virtual column by which to store the necessary statistics associated to the function (in this case the SYS_NC00014$ column):
SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM -------------------- ------------ ---------- --------------- NAME 10000000 0 HYBRID MORE_STUFF9 1 .00000005 FREQUENCY MORE_STUFF8 1 .00000005 FREQUENCY MORE_STUFF7 1 .00000005 FREQUENCY MORE_STUFF6 1 .00000005 FREQUENCY MORE_STUFF5 1 .00000005 FREQUENCY MORE_STUFF4 1 .00000005 FREQUENCY MORE_STUFF3 1 .00000005 FREQUENCY MORE_STUFF2 1 .00000005 FREQUENCY MORE_STUFF10 1 .00000005 FREQUENCY MORE_STUFF1 1 .00000005 FREQUENCY ID 10000000 0 HYBRID CODE 10000 .0001 HYBRID SYS_NC00014$ NONE
But we need to first collect statistics on this hidden virtual column for the statistics to be populated:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DAVID', no_invalidate=> false, method_opt=> 'FOR ALL HIDDEN COLUMNS SIZE 1'); SQL> select column_name, num_distinct, density, histogram from dba_tab_cols where table_name='DAVID'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM -------------------- ------------ ---------- --------------- NAME 10000000 0 HYBRID MORE_STUFF9 1 .00000005 FREQUENCY MORE_STUFF8 1 .00000005 FREQUENCY MORE_STUFF7 1 .00000005 FREQUENCY MORE_STUFF6 1 .00000005 FREQUENCY MORE_STUFF5 1 .00000005 FREQUENCY MORE_STUFF4 1 .00000005 FREQUENCY MORE_STUFF3 1 .00000005 FREQUENCY MORE_STUFF2 1 .00000005 FREQUENCY MORE_STUFF10 1 .00000005 FREQUENCY MORE_STUFF1 1 .00000005 FREQUENCY ID 10000000 0 HYBRID CODE 10000 .0001 HYBRID SYS_NC00014$ 9947366 0 HYBRID
Now the CBO has the necessary statistics by which to determine a much more accurate cardinality estimate for the function-based predicate and so potentially a more efficient execution plan:
SQL> select * from david where upper(name) = 'DAVID BOWIE 4242'; Execution Plan ---------------------------------------------------------- Plan hash value: 2675555529 ---------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 200 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | DAVID | 1 | 200 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | DAVID_UPPER_NAME_I | 1 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(UPPER("NAME")='DAVID BOWIE 4242') Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1256 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
With the virtual column statistics in place, the CBO now has the cardinality estimate of 1 and associated costs spot on, which is always a good thing.
This requirement to collect the necessary statistics on the associated virtual column created as a result of the function-based index to ensure the index is costed and used effectively is perhaps but one reason why function-based indexes are currently not supported by Automatic Indexing.
As always, this can always change in the future…
Descending Indexes Solution (Yellow Submarine) September 9, 2011
Posted by Richard Foote in Descending Indexes, Oracle Indexes, Quiz.28 comments
Answers to the quiz on Descending Indexes and hopefully some useful dangers and tips on using them.
The answer to the first question is Yes, a “normal” Ascending Index can be used by the CBO to retrieve data in descending order and hence possibly avoid a sort. The reason being that leaf blocks in the index structure have effectively two pointers, one that points to the next leaf block in the index structure (except for the very last leaf block) and one that points to the previous block (except for the first leaf block). So the data in an index can be retrieved in either order.
The answer to the second question is Yes as well, a Descending Index can also be used to also retrieve data in either logical order as again all the leaf blocks have the two set of pointers.
That being the case, if an index has just the one column value, does it therefore make any difference which index one creates, ascending or descending ?
Hence my last question. The answer is maybe, as there are a number of fundamental differences in how each type of index is implemented.
Naturally, a little demo to illustrate 🙂
Let’s begin by creating a simple little table and a normal B-Tree index on an ID column, which has monotonically increasing values:
SQL> create table bowie (id number, name varchar2(30)); Table created. SQL> create index bowie_id_i on bowie(id); Index created. SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <= 100000; 100000 rows created. SQL> commit; Commit complete.
Note the index is indeed a “Normal” B-Tree index and because the indexed values monotonically increase, all index leaf block splits are 90-10 splits resulting a perfectly compact, 100% utilised index structure:
SQL> select index_type from dba_indexes where index_name = 'BOWIE_ID_I'; INDEX_TYPE --------------------------- NORMAL SQL> analyze index bowie_id_i validate structure; Index analyzed. SQL> select lf_rows, lf_blks, pct_used from index_stats; LF_ROWS LF_BLKS PCT_USED ---------- ---------- ---------- 100000 199 100
Let’s now run a query to ensure the index is indeed used and that the sort can indeed be avoided. Note I’ve not actually collected any CBO statistics at this stage but I’m definitely using the CBO:
SQL> alter system set optimizer_mode='ALL_ROWS' scope=both; System altered. SQL> select * from bowie where id between 42 and 84 order by id desc; 43 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2771731789 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 43 | 1290 | 1 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID | BOWIE | 43 | 1290 | 1 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN DESCENDING| BOWIE_ID_I | 43 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=42 AND "ID"<=84) filter("ID">=42 AND "ID"<=84) Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 9 consistent gets 0 physical reads 0 redo size 1092 bytes sent via SQL*Net to client 418 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 43 rows processed
So the execution plan clearly shows the use of the index via an index range scan descending and that there are indeed no sort operations performed. There were no statistics gathered, so the CBO performed some dynamic sampling to determine a taste for the data.
Let’s now change the optimizer_mode to CHOOSE, a common default setting (especially pre 11g, this example is run on a 10.2.0.4 database) and re-run the query:
SQL> select * from bowie where id between 42 and 84 order by id desc; 43 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3062669298 --------------------------------------------------- | Id | Operation | Name | --------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT ORDER BY | | | 2 | TABLE ACCESS BY INDEX ROWID| BOWIE | |* 3 | INDEX RANGE SCAN | BOWIE_ID_I | --------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("ID">=42 AND "ID"<=84) Note ----- - rule based optimizer used (consider using cbo) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 1092 bytes sent via SQL*Net to client 418 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 43 rows processed
No statistics on the table now means the Rule Based Optimizer kicks in and although a sort operation is performed (as there’s no descending scan), Oracle at least used the index.
OK, let’s now run the exact same sequence of events, but this time using a Descending Index.
SQL> drop table bowie; Table dropped. SQL> create table bowie (id number, name varchar2(30)); Table created. SQL> create index bowie_id_i on bowie(id desc); Index created. SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <= 100000; 100000 rows created. SQL> commit; Commit complete.
So it’s the exact same table and set of data. Let’s now look at the type of index created:
SQL> select index_type from dba_indexes where index_name = 'BOWIE_ID_I'; INDEX_TYPE --------------------------- FUNCTION-BASED NORMAL
OK, Difference Number 1. A Descending Index is no ordinary “Normal” index, but is implemented as a “Function-Based Normal” index instead. This means there’ll be a new hidden virtual column created behind the scenes and that the Rule Based Optimizer is going to have an issue here as it can’t cope with Function-based Indexes.
Let’s look at some Index_Stats:
SQL> analyze index bowie_id_i validate structure; Index analyzed. SQL> select lf_rows, lf_blks, pct_used from index_stats; LF_ROWS LF_BLKS PCT_USED ---------- ---------- ---------- 100000 426 50
Difference Number 2: This index is approximately double the size of the previous index and only half as efficient with its storage. Why ? Because as the data is now inserted in reverse logical order, the last index leaf block no longer receives the largest current index value and so 90-10 splits are not performed. As only 50-50 splits are performed, the index structure is left with 50% empty blocks which can not be reused. Unfortunately, a possible candidate for periodic index rebuilds …
Let’s now re-run the query using the CBO:
SQL> select * from bowie where id between 42 and 84 order by id desc; 43 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3472402785 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 208 | 6240 | 1 (0)|00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| BOWIE | 208 | 6240 | 1 (0)|00:00:01 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | | 1 (0)|00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(SYS_OP_DESCEND("ID")>=HEXTORAW('3EAAFF') AND SYS_OP_DESCEND("ID")<=HEXTORAW('3ED4FF') ) filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("ID"))>=42 AND SYS_OP_UNDESCEND(SYS_OP_DESCEND("ID"))<=84) Note ----- - dynamic sampling used for this statement Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 9 consistent gets 0 physical reads 0 redo size 1092 bytes sent via SQL*Net to client 418 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 43 rows processed
Difference Number 3. Although the same execution plan with the same number of consistent gets is performed, the cardinality estimates are not as accurate and the SYS_OP_DESCEND and SYS_OP_UNDESCEND functions are used as access/filter conditions as they’re the functions implemented in the function-based index.
If we run the same query using the Rule Based Optimizer (remember, we “forgot” to collect statistics on the table):
SQL> alter system set optimizer_mode='CHOOSE' scope=both; System altered. SQL> select * from bowie where id between 42 and 84 order by id desc; 43 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2027917145 ------------------------------------ | Id | Operation | Name | ------------------------------------ | 0 | SELECT STATEMENT | | | 1 | SORT ORDER BY | | |* 2 | TABLE ACCESS FULL| BOWIE | ------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ID"<=84 AND "ID">=42) Note ----- - rule based optimizer used (consider using cbo) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 309 consistent gets 0 physical reads 0 redo size 1092 bytes sent via SQL*Net to client 418 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 43 rows processed
Difference Number 4. The Rule based Optimizer does not support Function-Based Indexes and so the index is now completely ignored. Oracle has no choice here but to perform the much more expensive Full Table Scan, when previously the ascending index was used.
A Descending Index can potentially be useful in a concatenated, multi-column index, in which the columns could be ordered in a combination of ascending/descending order that could in turn return the data in a required specific order, thereby negating the need for a potentially expensive sort operation.
However, with a single column index, one would need to question the need for making such an index descending …
Having fun 🙂 Enjoy your weekend !!
11g Virtual Columns and Fast Refreshable Materialized Views (What In The World) November 24, 2010
Posted by Richard Foote in 11g, 11g New features, Function Based Indexes, Oracle Bugs, Virtual Columns.15 comments
Previous to Oracle 11g Rel 2, two very common and useful features previously worked well together, they being fast refreshable materialized views and the introduction of virtual columns due to the creation of function-based indexes.
To illustrate, we create and populate a little demo table:
SQL> create table bowie (a number, b number, c number); Table created. SQL> insert into bowie select rownum, mod(rownum,10), mod(rownum,100) from dual connect by level <= 100000; 100000 rows created. SQL> commit; Commit complete. SQL> alter table bowie add primary key (a); Table altered.
We now create a simple little function-based index:
SQL> create index bowie_func_i on bowie(b+c); Index created.
If we look at the columns in the table via DBA_TAB_COLS:
SQL> select column_name, data_default, virtual_column, hidden_column from dba_tab_cols where table_name = 'BOWIE'; COLUMN_NAME DATA_DEFAULT VIR HID ------------ ------------ --- --- SYS_NC00004$ "B"+"C" YES YES C NO NO B NO NO A NO NO
We notice Oracle has introduced a new, hidden virtual column (SYS_NC00004$), required to store statistics for use by the Cost Based Optimizer.
Next we create a materialized view log on this table and a fast refreshable materialized view:
SQL> create materialized view log on bowie WITH PRIMARY KEY,SEQUENCE, ROWID (b,c) INCLUDING NEW VALUES; Materialized view log created. SQL> create materialized view bowie_mv 2 build immediate 3 refresh fast 4 with primary key 5 enable query rewrite 6 as 7 select b, count(*) from bowie group by b; Materialized view created.
Collect a few statistics and we note the Materialized View does indeed get used during a query rewrite scenario:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1') PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_MV', estimate_percent=>null, cascade=> true, method_opt=> 'FOR ALL COLUMNS SIZE 1') PL/SQL procedure successfully completed. SQL> select b, count(*) from bowie having b > 3 group by b; B COUNT(*) ---------- ---------- 6 10000 4 10000 5 10000 8 10000 7 10000 9 10000 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 593592962 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 42 | 2 (0)| 00:00:01 | |* 1 | MAT_VIEW REWRITE ACCESS FULL| BOWIE_MV | 7 | 42 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("BOWIE_MV"."B">3) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 538 bytes sent via SQL*Net to client 395 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
And indeed the materialized view is fast refreshable:
SQL> insert into bowie values (100001, 5, 42); 1 row created. SQL> commit; Commit complete. SQL> exec dbms_mview.refresh('BOWIE_MV', 'F'); PL/SQL procedure successfully completed. SQL> select b, count(*) from bowie having b > 3 group by b; B COUNT(*) ---------- ---------- 6 10000 4 10000 5 10001 8 10000 7 10000 9 10000 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 593592962 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 42 | 2 (0)| 00:00:01 | |* 1 | MAT_VIEW REWRITE ACCESS FULL| BOWIE_MV | 7 | 42 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("BOWIE_MV"."B">3) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 546 bytes sent via SQL*Net to client 395 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
Notice how the materialized view does indeed displayed the correct updated information via the query rewrite operation . So the materialized view behaved and worked as expected even though the underlining master table has a virtual column due to the creation of the function-based index (note that QUERY_REWRITE_INTEGRITY is set to STALE_TOLERATED)
Unfortunately, things go off the rails somewhat since Oracle 11g Rel 2 when a virtual column is introduced due to one of the 11g new features. For example, I now collect some Extended Statistics on this table:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', method_opt=> 'FOR COLUMNS (A,B,C) SIZE 254'); PL/SQL procedure successfully completed. SQL> select column_name, data_default, virtual_column, hidden_column from dba_tab_cols where table_name = 'BOWIE'; COLUMN_NAME DATA_DEFAULT VIR HID ------------------------------ --------------------------------- --- --- SYS_STUM4KJU$CCICS9C1UJ6UWC4YP SYS_OP_COMBINED_HASH("A","B","C") YES YES SYS_NC00004$ "B"+"C" YES YES C NO NO B NO NO A NO NO
Notice how extended statistics has resulted in another hidden virtual column (SYS_STUM4KJU$CCICS9C1UJ6UWC4YP) being created to store the resultant statistics.
However, if now attempt to perform a fast refresh on the Materialized View:
SQL> insert into bowie values (100002, 5, 42); 1 row created. SQL> commit; Commit complete. SQL> exec dbms_mview.refresh('BOWIE_MV', 'F'); BEGIN dbms_mview.refresh('BOWIE_MV', 'F'); END; * ERROR at line 1: ORA-12008: error in materialized view refresh path ORA-00904: "MAS$"."SYS_STUM4KJU$CCICS9C1UJ6UWC4YP": invalid identifier ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2558 ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2771 ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2740 ORA-06512: at line 1
We get an error, complaining about the existence of this new virtual column.
If we attempted to drop and re-create the materialized view:
SQL> drop materialized view bowie_mv; Materialized view dropped. SQL> create materialized view bowie_mv 2 build immediate 3 refresh fast 4 with primary key 5 enable query rewrite 6 as 7 select b, count(*) from bowie group by b; select b, count(*) from bowie group by b * ERROR at line 7: ORA-12033: cannot use filter columns from materialized view log on "BOWIE"."BOWIE"
It fails, complaining that the materialized view log is somehow missing a filter column (which it isn’t). We get exactly the same set of issues if we add a visible virtual column via this new 11g capability:
SQL> create table bowie2 (a number, b number, c number, d as (a+b+c)); Table created. SQL> select column_name, data_default, virtual_column, hidden_column from dba_tab_cols where table_name = 'BOWIE2'; COLUMN_NAME DATA_DEFAULT VIR HID ------------ ------------ --- --- D "A"+"B"+"C" YES NO C NO NO B NO NO A NO NO SQL> insert into bowie2 (a,b,c) select rownum, mod(rownum,10), mod(rownum,100) from dual connect by level <= 100000; 100000 rows created. SQL> commit; Commit complete. SQL> alter table bowie2 add primary key (a); Table altered. SQL> create materialized view log on bowie2 WITH PRIMARY KEY,SEQUENCE, ROWID (b,c) INCLUDING NEW VALUES; Materialized view log created. SQL> create materialized view bowie2_mv 2 build immediate 3 refresh fast 4 with primary key 5 enable query rewrite 6 as 7 select b, count(*) from bowie2 group by b; select b, count(*) from bowie2 group by b * ERROR at line 7: ORA-12033: cannot use filter columns from materialized view log on "BOWIE"."BOWIE2"
Extended statistics and visible virtual columns are both potentially extremely useful new features introduced in 11g but unfortunately both can not be implemented on any table that needs to be fast refreshable within a complex materialized view.
I raised this issue with Oracle Support who have raised bug 10281402 as a result as it occurs in both 11.2.0.1 and 11.2.0.2 on various platforms I’ve tested.
Function-Based Indexes and Missing Statistics (No Surprises) December 4, 2008
Posted by Richard Foote in Function Based Indexes, Oracle Cost Based Optimizer, Oracle Indexes.9 comments
Thought I might mention a common trap associated with Function-Based Indexes following a recent question on the OTN Database Forum.
First of all, considered this “simple” little scenario. You have a column in a table that’s a standard VARCHAR2 field and it has 100 distinct values. Therefore, if you select just one value, assuming even distribution of data, you’ll select approximately 1% of the data.
However, this table can potentially allow mixed case fields, with values such as ‘Bowie’, ‘bowie’, ‘BowiE’, etc.
Therefore, you decide to write a query that first converts all the fields to UPPER case such that you can now compare all possible values of a specific name to say ‘BOWIE’.
Remembering you previously had a selectivity of 1%, what’s the selectively of the column after you’ve converted everything to upper case ?
Ummmm, actually, there’s no easy answer to that. It all depends on how many actual different names you have in your data. In theory, you may now have just 1 distinct value or you may have 100 distinct values, you can’t really tell.
However, what if the function did something more complex and converted the name to some numerical value based on the sum of all the characters via some weird formula. In theory, you could have no distinct values in that everything could get converted to a NULL or you could possibly have as many distinct values as there are rows in the table.
Again, Oracle can’t really tell how many rows may get selected based on the determination of such a function.
When a predicate is used with a function call, Oracle has great difficulty in determining the correct selectivity. Therefore when you create a Function-Based Index, which Oracle assumes you may wish to access, Oracle behind the scenes creates a hidden virtual column on the parent table in order to capture the data characteristics of the function so that the CBO can make an accurate determination of the selectivity associated with using the function.
If Oracle knows the low value, the high value and the number of distinct values of the data associated with the function on a column, it can then accurately determine the associated selectivity and cardinality when the function is used on the column (assuming even distribution of data) and hence calculate an accurate cost and determine whether the use of the function-based index is appropriate.
However, and here comes the trap, when a function-based index is created, Oracle will now (since 10g) automatically calculate the statistics associated with the index (such as the blevel, number of leaf blocks, clustering factor, etc.) but it will NOT calculate the statistics associated with the hidden virtual column as these statistics are associated with the parent table, not directly with the index itself.
Here I create a new function-based index on a table:
SQL> create index pink_floyd_upp_tab_name_i on pink_floyd(upper(table_name));
Index created.
But there are still no statistics on the hidden virtual column created by Oracle behind the scenes.
SQL> select column_name, num_distinct, hidden_column, virtual_column from dba_tab_cols where table_name=’PINK_FLOYD’;
COLUMN_NAME NUM_DISTINCT HID VIR --------------- ------------ --- --- OWNER 74 NO NO TABLE_NAME 5739 NO NO NUM_ROWS 886 NO NO BLOCKS 152 NO NO SYS_NC00005$ YES YES
Therefore, even after you’ve created the function-based index, Oracle still has no idea on the selectivity associated the function because the necessary virtual column statistics are still missing. Oracle simply takes a “guess” and as with most guesses, it’s quite likely to be wrong which means the selectivity is likely to be wrong which means the costings is likely to be wrong which means the execution plan could very well be wrong as well.
What should you do after you create a function-based index ? You should collect the statistics on the hidden virtual column so that the CBO can accurately determine the cardinality associated with using the function and so make correct decisions regarding the most appropriate execution plan.
The ‘FOR ALL HIDDEN COLUMNS’ method_opt option with DBMS_STATS is one way of collecting just the necessary table statistics on these hidden virtual columns:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> ‘PINK_FLOYD’, estimate_percent=>null, cascade=>true, method_opt=> ‘FOR ALL HIDDEN COLUMNS SIZE 1′);
PL/SQL procedure successfully completed.
SQL> select column_name, num_distinct, hidden_column, virtual_column from dba_tab_cols where table_name=’PINK_FLOYD’;
COLUMN_NAME NUM_DISTINCT HID VIR --------------- ------------ --- --- OWNER 74 NO NO TABLE_NAME 5739 NO NO NUM_ROWS 886 NO NO BLOCKS 152 NO NO SYS_NC00005$ 5739 YES YES
This demo on missing virtual column statistics asscociated with function-based indexes goes through this whole issue.
My early Christmas wish to Oracle would be for the CBO to at least consider the DISTINCT_KEYS index statistic when determining the correctly selectivity when hidden virtual column statistics are missing on function-based indexes, rather than take it’s 0.4% / 1% guess …
More on virtual columns, hidden and otherwise, in the future …
Separate Indexes From Tables, Some Thoughts Part I (Everything In Its Right Place) April 16, 2008
Posted by Richard Foote in Oracle General, Oracle Indexes, Oracle Myths, Oracle Opinion, Tablespace Management.19 comments
Although by no means as common as it once was, there’s still some who believe separating indexes in a different tablespace from their parent tables somehow improves performance.
The theory goes that by having indexes in their own tablespace, we can reduce overall contention issues and thereby improve the overall performance of the database.
Here are some thoughts for consideration for those who might be so inclined …
First, let’s just have a little look at the behaviour of a “typical” large scale index range scan, using an index with a height of say 3.
We first begin by accessing the root block of the index. This is a single block read which for many high accessed indexes would typically be cached and not result in a physical I/O. Next we read an intermediate branch block. This is also a single block read and is also likely to be cached if the index is heavily accessed. Regardless, it’s another index related I/O. Next we finally reach and read the first index leaf block containing the start of the index entries of interest. Again, it’s a single block I/O and again it’s index related.
So far we’ve performed 3 “random”, single block I/Os of index related blocks. If the index were in a separate tablespace, all the action would only be on the index tablespace thus far.
We next read our first table block containing the first row referenced by the first index entry of interest. This yet again is a single block I/O that could potentially be any block within the table. If the table were in a separate tablespace from the index, we would still need to perform a physical I/O (assuming the block isn’t already cached) on a “random” block within the table tablespace. If the table were in the same tablespace as the index, we again need to perform a physical I/O on a random table block. Still no difference thus far.
We next (very likely) reference the same index leaf block to determine the second row of interest. Note this block will almost certainly still be cached as it’s just been accessed. Therefore, if the index were in the same or different tablespace to the table, still no difference as there’s no associated physical I/O.
We then read the second table block of interest via a single block I/O. Unless this index has a very good clustering factor, we’re likely to read a totally different table block that could be any other block within the table. It’s extremely unlikely therefore to be the block that is physically contiguous to the block previously read. Only if the index were very well clustered, could it possibly be the same block as previously read or possibly the next logical block in the table.
However, in all these scenarios, having the table in a separate tablespace still makes no difference at this stage. We either need to perform another physical I/O on the table or we perform just a logical I/O. Even in the extremely unlikely case the next block read is physically contiguous to the previous block read, it would still be contiguous whether the index was separate or not and not be impacted by the index read activity thus far. Again, thus far it makes no real difference having the index in a separate tablespace.
We go back to the same index leaf block to determine the next row of interest and then access the next table block, which for a large randomly distributed table is again likely to be another different block. The point being we’re accessing the index and the table in a sequential fashion, reading the index, then reading the table. Reading the index and then reading the table again.
For small index scans, the index leaf block in question is likely to be the same single leaf block as a leaf block can potentially store hundreds of index entries (depending of course on block size, index row size and where within the index leaf block we logically begin to read the index entries of interest). So for small scans, it’s not going to have any real impact having indexes in a separate tablespace as we’re basically reading a few index related blocks followed by the table related blocks.
The table blocks are likely to be different blocks in a randomly distributed, poorly clustered index or possibly (although more rarely) a small sample of blocks in a well clustered index. However, in either scenario, if if we need to access just the one leaf block, it makes no difference whether the index is in a separate tablespace or not, the I/Os and so-called contention are the same regardless.
In some scenarios, Oracle can perform a prefetch step whereby it orders the index entries based on the rowids to first determine which table blocks need to be accessed, thus preventing the same table block having to be re-read several times. However, again, it makes no difference thus far if the index is in a separate tablespace or not as the I/O requirements are the same regardless.
In larger index range scans however, we might need to visit the next logical index leaf block or indeed subsequently many such index leaf blocks. Note each leaf block contains a pointer (referred to as kdxlenxt in a block dump) so Oracle can directly access the next index leaf block. If our index were in a separate tablespace and making the HUGE assumption that there’s thus far been no other activity in the index tablespace, the disk head may not have moved from where it left off after reading the last leaf block. With the indexes and tables coexisting in the same tablespace, we have very likely moved on from this location with any subsequent table related I/O activity.
Maybe now at last, finally we have a benefit in having indexes in their own tablespace …
However, reading the next index leaf block is again a single block read and most importantly is not necessarily “physically” contiguous to the previous leaf block. Remember, index leaf blocks split as part of their natural growth and the new block allocated is simply the next block available in the index freelist. Therefore the next logical index leaf block in an index structure could physically be virtually anywhere within the extents allocated to the index. When we read the next “logical” index leaf block, it does not necessarily mean it’s the next “physical” block within the index segment. It’s likely just another random, single block I/O.
That being the case, again we have no benefit in the index being in a separate tablespace. In both scenarios, we have to go scanning the disk looking for the physical location of the next index leaf block (again assuming the index leaf block isn’t already cached). This activity needs to be performed whether the index is in it’s own tablespace or not.
When we move back to read the next table block based on the first index entry from the newly accessed index leaf block, again, it’s extremely unlikely the next table block accessed will be the next contiguous block from the previously read table block. So again, we very likely need to go a hunting for the next table block on disk, regardless of it being in a separate tablespace from the index. Again, separating indexes from tables makes no real difference.
So not only do we move between index and table in a sequential manner but the actual blocks read within both the index and the table are likely to be totally random, non contiguous, single block reads.
That being the case, what are the performance benefits of storing indexes and tables separately ? How does storing indexes and tables separately actually reduce contention when most physical I/Os in both index and table segments are effectively random, single block reads ?
Now this example has just been a single index scan, performed by one user on just one index and table. The benefits therefore of separating indexes and tables even in a single user environment are somewhat “dubious”.
However, how many environments only have the one user. Not many. Most environments have lots of users, some with many hundreds, some with many thousands of concurrent users . All these users are potentially performing concurrent I/O operations, not only potentially on these very same tables and indexes but on lots of different tables and lots of different indexes within our table and index tablespaces. Even if index leaf blocks were to be physically contiguous in some cases (such as monotonically increasing indexes where this is more likely), by the time we’ve read the index leaf block, processed and read all the associated table blocks referenced by the index leaf block, the chances of there being no subsequent physical activity in the index tablespace due to another user session is virtually nil. We would still need to re-scan the disk to physically access the next index leaf block (or table block) anyways.
Add to the mix the fact many sites now use SANS, NAS, ASM etc. and what might appear to be one contiguous file could actually be physically split and spread all over the place. The whole notion of what is actually physically contiguous and what isn’t is blurred anyways.
The next time someone suggests separating indexes from table improves performance, you may just want to ask a couple of little questions; why and how ?
However, I’ll next discuss how indeed performance can improve by storing indexes in a separate tablespace. But just like our mad scientist thinking flies with no wings go deaf, I’ll explain how the performance improvement is not actually directly related to the indexes being separate from the tables.
I’ll also discuss how database recoveries are not typically helped by having indexes in a separate tablespace as often suggested.
Introduction To Reverse Key Indexes: Part III (A Space Oddity) January 18, 2008
Posted by Richard Foote in Index Block Splits, Index Internals, Oracle Indexes, Performance Tuning, Reverse Key Indexes.18 comments
A possibly significant difference between a Reverse and a Non-Reverse index is the manner in which space is used in each index and the type of block splitting that takes place.
Most Reverse Key Indexes are created to resolve contention issues as a result of monotonically increasing values. As monotonically increasing values get inserted, each value is greater than all previous values (providing there are no outlier values present) and so fill the “right-most” leaf block. If the “right-most” block is filled by the maximum current value in the index, Oracle performs 90-10 block splits meaning that full index blocks are left behind in the index structure. Assuming no deletes or updates, the index should have virtually 100% used space.
However, it’s equivalent Reverse Key index will have the values reversed and dispersed evenly throughout the index structure. As index blocks fill, there will be a very remote chance of it being due to the maximum indexed value and 50-50 block splits will result. The PCT_USED is likely therefore to be significantly less, averaging approximately 70-75% over time.
Therefore, for indexes with no deletions, a Reverse Key index is likely to be less efficient from a space usage point of view.
However, if there are deletions, the story may differ.
Deleted space can be reused if an insert is subsequently made into an index block with deleted entries or if a leaf block is totally emptied. However, if a leaf block contains any non-deleted entries and if subsequent inserts don’t hit the leaf block, then the deleted space can not reused. As monotonically increasing values in a non-reverse index only ever insert into the “right-most” leaf block, it won’t be able to reuse deleted space if leaf blocks are not totally emptied. Overtime, the number of such “almost but not quite empty” index leaf blocks may in some scenarios increase to significant levels and the index may continue to grow at a greater proportional rate than the table (where the reuse of space is set and controlled by the PCTUSED physical property).
However, Reverse Key indexes will be able to reuse any deleted space as they evenly distribute inserts throughout the index structure. Overtime, the index is likely to grow at a similar proportional rate as the table.
For indexes that have deletions resulting in many sparsely (but not totally emptied) leaf blocks, a Reverse Key index could be more efficient from a space usage point of view.
See this demo Differences in Space Usage Between a Reverse and a Non-Reverse Index for further details.
The Fake Index Trap (“Nowhere Now”) May 2, 2023
Posted by Richard Foote in CBO, Drop Index, Fake Indexes, Index Internals, NOSEGMENT Option, Online DDL, Oracle, Oracle 21c, Oracle Bugs, Oracle General, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle19c, Tablespace Management, Virtual Indexes.add a comment
In a recent correspondence, I was alerted to an issue in relation to the use of Virtual/Fake/Nosegment Indexes that I wasn’t aware of previously. Having a play, it appears this issue is still present in at least Oracle Database 21c, so I thought it worth a mention in case anyone else happens to fall into this trap.
I’ve discussed Virtual/Fake/Nosegment Indexes a number of times previously. These are indexes that do not exist as a physical segment (and so can be created almost immediately without consuming any storage), that can be used to determine if an index could potentially be used by the CBO if it were to be actually created.
Although such Fake Indexes don’t physically exist, they can cause issues if forgotten…
To illustrate this issue, I’ll start by creating a new tablespace:
SQL> create tablespace BOWIE_TS datafile 'C:\ORADATA\ZIGGY\ZIGGYPDB1\BOWIE_TS.DBF' size 100M; Tablespace created.
Next, I’ll create and populate a table in this BOWIE_TS tablespace:
SQL> create table bowie_test (id number, name varchar2(42)) tablespace bowie_ts; Table created. SQL> insert into bowie_test select rownum, 'DAVID BOWIE' from dual connect by level <=10000; 10000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_TEST'); PL/SQL procedure successfully completed.
I’ll next create a Virtual/Fake index, using the NOSEGMENT option:
SQL> create index bowie_test_id_i on bowie_test(id) nosegment tablespace bowie_ts; Index created.
We note this Fake Index is NOT listed in either USER_INDEXES or USER_SEGMENTS:
SQL> select index_name, tablespace_name from user_indexes where table_name='BOWIE_TEST'; no rows selected SQL> select segment_name, segment_type, tablespace_name from user_segments where segment_name='BOWIE_TEST_ID_I'; no rows selected
If we run a basic, highly selective query on this table:
SQL> select * from bowie_test where id=42; Execution Plan ---------------------------------------------------------- Plan hash value: 65548668 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 11 (0) | 00:00:01 | |* 1 | TABLE ACCESS FULL | BOWIE_TEST | 1 | 16 | 11 (0) | 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 38 consistent gets 0 physical reads 0 redo size 648 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We notice the CBO uses a FTS. The Fake Index is NOT considered by default.
However, if we set the session as follows and re-run the query:
SQL> alter session set "_use_nosegment_indexes" = true; Session altered. SQL> select * from bowie_test where id=42; Execution Plan ---------------------------------------------------------- Plan hash value: 1280686875 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 2 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_TEST | 1 | 16 | 2 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | BOWIE_TEST_ID_I | 1 | | 1 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 38 consistent gets 0 physical reads 0 redo size 648 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We can see the CBO appears to now use the Fake Index, but as it doesn’t actually physically exist, actually uses a FTS behind the scenes (the number of consistent gets is evidence of this). But at least we now know the CBO would at least consider such an index if it physically existed.
We now decide to drop the tablespace and so first try to MOVE the table to another tablespace using the ONLINE option:
SQL> alter table bowie_test move online tablespace users; alter table bowie_test move online tablespace users * ERROR at line 1: ORA-14808: table does not support ONLINE MOVE TABLE because of the presence of nosegment index
The error message clearly states we can’t move the table ONLINE if such a Fake/Nosegment Index exists. This is our official warning of the potential danger to come…
We try to move the table using the default OFFLINE method:
SQL> alter table bowie_test move tablespace users; Table altered.
We have now successfully moved the table to another tablespace.
If we check to see if we have any other segments within the tablespace yto be dropped:
SQL> select segment_name from dba_segments where tablespace_name='BOWIE_TS'; no rows selected
Oracle tells us that no, we do NOT have any current segments in this tablespace.
So it’s now safe to purge and drop this tablespace (or so we think):
SQL> purge tablespace bowie_ts; Tablespace purged. SQL> drop tablespace bowie_ts; Tablespace dropped.
The tablespace has been successfully dropped.
However, if we now re-run the query on this table:
SQL> select * from bowie_test where id=42; select * from bowie_test where id=42 * ERROR at line 1: ORA-00959: tablespace 'BOWIE_TS' does not exist
We get this unexpected error that the tablespace BOWIE_TS does not exist.
BUT, we already know the tablespace doesn’t exist, we’ve just dropped it !!!
So why are we getting this error?
It’s all due to the damn Fake Index we created previously.
Although there is no physical index segment for our Fake Index, there are still some internal Data Dictionary links between the Fake Index and the tablespace it was associated with. The tablespace is gone, but NOT the Fake Index.
The only place where fake indexes can be easily found within Oracle, is within the USER_OBJECTS view:
SQL> select o.object_name, o.object_type, o.status from user_objects o left join user_indexes i on o.object_name=i.index_name where o.object_type='INDEX' and i.index_name is null; OBJECT_NAME OBJECT_TYPE STATUS ------------------------------ ----------------------- ------- BOWIE_TEST_ID_I INDEX VALID
To eliminate this error, we have to first drop the Fake Index associated with the dropped tablespace:
SQL> drop index bowie_test_id_i; Index dropped.
We can now safely run the query without error:
SQL> select * from bowie_test where id=42; ID NAME ---------- ------------------------------------------ 42 DAVID BOWIE
So if you do ever create Fake Indexes, don’t forget to drop them once you’ve finished experimenting with them.
ESPECIALLY if you ever decide to drop the tablespace into which they were associated. This is explained in part in Oracle Support Doc ID 1361049.1.
Automatic Indexing: JSON Expressions Part II (“Without You I’m Nothing)” April 14, 2022
Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Column Statistics, High Frequency Statistics Collection.2 comments
In my previous post on Automatic Indexing (AI) and JSON expressions, I mentioned two additional issues with the execution plan generated after the automatic indexes were created:
SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42'; Execution Plan ---------------------------------------------------------- Plan hash value: 832017402 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 20000 | 12M | 1524 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 20000 | 12M | 1524 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_ayvj257jd93cv | 8000 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 234168 consistent gets 200279 physical reads 0 redo size 1595 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The first issue is in relation to the Cost and the estimated cardinality (Rows) of the generated plan.
The execution plan has a Cost of 1524 but most importantly, an estimate cardinality of 20,000 rows. As only 1 row is actually returned, this row estimate is way way off and so therefore are the associated costs. This could potentially result in an inefficient plan and with the index not being used by the CBO.
The 20,000 row estimate comes from it being 1% of the number of rows (2 million) in the table. As I’ve discussed previously (as in this rather humourous post), the issue here is that the CBO has no idea what the expected cardinality might be, as the output from the JSON expression is effectively a black box.
Oracle generates virtual columns for this purpose, to capture column statistics that gives the CBO an accurate idea on the selectivity of expression based predicates.
But, if we look at the column statistics after the generation of the automatic indexes:
SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name='BOWIE_JSON'; TABLE_NAME COLUMN_NAME NUM_DISTINCT VIR ---------- -------------------------------------------------- ------------ --- BOWIE_JSON ID 1996800 NO BOWIE_JSON BOWIE_DATE 1 NO BOWIE_JSON BOWIE_ORDER 0 NO BOWIE_JSON SYS_IME_OSON_B33B6E113C404F2EBF6648C01F605C8D 1 YES BOWIE_JSON SYS_NC00005$ YES BOWIE_JSON SYS_NC00006$ YES
We notice that the two virtual columns generated for the JSON based expressions have no statistics. This is because we have yet to collect new statistics (or statistics specifically on hidden columns) since the creation of the automatic indexes. As a result, the CBO still has no idea on how many rows are estimated to come from the JSON based expressions and so has to resort to the (entirely wrong) 1% estimate.
The good news with Exadata environments (which of course includes the Autonomous Database environments), is that Oracle has the High Frequency Statistics Collection capability, which will automatically collect these missing statistics after a small (configurable) period of time. I’ve previously discussed High Frequency Statistics Collection here.
So if I just wait approximately 15 minutes in my “Exadata” environment and check out the columns statistics again:
SQL> select table_name, column_name, num_distinct, virtual_column from user_tab_cols where table_name='BOWIE_JSON'; TABLE_NAME COLUMN_NAME NUM_DISTINCT VIR ---------- -------------------------------------------------- ------------ --- BOWIE_JSON ID 1996800 NO BOWIE_JSON BOWIE_DATE 1 NO BOWIE_JSON BOWIE_ORDER 0 NO BOWIE_JSON SYS_IME_OSON_B33B6E113C404F2EBF6648C01F605C8D 1 YES BOWIE_JSON SYS_NC00005$ 2000000 YES BOWIE_JSON SYS_NC00006$ 1996800 YES
We can see that the missing statistics have now been populated and the CBO can now accurately determine that these virtual columns are effectively unique.
If we now re-run the queries again, e.g.:
SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242; Execution Plan ---------------------------------------------------------- Plan hash value: 1921179906 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 669 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON | 1 | 669 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_gpdkwzugdn055 | 1 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=4242) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 233150 consistent gets 200279 physical reads 0 redo size 1599 bytes sent via SQL*Net to client 526 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We can see that the CBO has now correctly estimated that just 1 row is to be returned and the associated CBO cost has reduced to just 4 (down from 1524) as a result.
So if you create a function-based index, make sure the generated virtual column (whether created automatically or if manually generated before the associated index) has the necessary statistics.
In the upcoming days, I’ll discuss the remaining issue associated with this plan…
AUSOUG Connect 2021- “Automatic Indexing: An Update On Improvements and New Capabilities” October 26, 2021
Posted by Richard Foote in AUSOUG, Connect 2021, Oracle Indexes.add a comment
I’ve very pleased that my paper “Automatic Indexing: An Update On Improvements and New Capabilities” has been accepted for the upcoming AUSOUG Connect 2021 Virtual Conference.
The conference runs between 9th – 12th November 2021 and features a host of great topics and speakers including Connor McDonald, Chris Saxon, Jim Czuprynski, Sandesh Rao, Karen Cannell, Kai Yu, Francesco Tisiot, Roy Swonger and Franco Ucci among many many others. The best part is that it’s all free !!
My Automatic Indexing presentation will be on at 3pm AEDT on 12th November and has the following description:
“One the most impressive and exciting database innovations introduced in Oracle Database 19c is “Automatic Indexing”, an expert system built into the database engine that not only recommends potential new indexes to create, but will actually introduce new indexes in a safe and automated manner.
This session explores in-depth this ground breaking feature, detailing how Automatic Indexing works, how it can be deployed and highlights a number of improvements and new capabilities that have been introduced since Automatic Indexing was first released.
Attendees will gain an excellent appreciation of the power to Automatic Indexing and how this feature can substantially assist in reducing the risk of having a sub-optimal indexing strategy that causes performance implications.”
For details of all the amazing content and to register for any of the sessions, please visit: https://ausoug.org.au/connect-2021/
Hopefully, you can tune in and I’ll see you at my session 🙂
Oracle 12c: Indexing JSON in the Database Part II (Find A Little Wood) August 5, 2016
Posted by Richard Foote in 12c, JSON, JSON Text Index, Oracle Indexes.4 comments
In Part I, we looked at how you can now store JSON documents within the Oracle 12c Database. For efficient accesses to JSON documents stored in the Oracle database, we can either create a function-based index based on the JSON_VALUE function or on JSON .dot notation.
These indexes are useful for indexing specific JSON attributes, but what if we want to index multiple JSON attributes within a single index structure.
To start, I’m just going to add an extra row to increase the selectivity of other columns.
SQL> insert into ziggy_json 2 select 3 100001, 4 SYSdate, 5 '{"PONumber" : 1000001, 6 "Reference" : "MTOM-20161", 7 "Requestor" : "Major Tom", 8 "User" : "MTOM", 9 "CostCenter" : "B42", 10 "ShippingInstructions" : {"name" : "Major Tom", 11 "Address": {"street" : "42 Ziggy Street", 12 "city" : "Canberra", 13 "state" : "ACT", 14 "zipCode" : 2601, 15 "country" : "Australia"}, 16 "Phone" : [{"type" : "Office", "number" : "417-555-7777"}, 17 {"type" : "Mobile", "number" : "417-555-1234"}]}, 18 "Special Instructions" : null, 19 "AllowPartialShipment" : true, 20 "LineItems" : [{"ItemNumber" : 1, 21 "Part" : {"Description" : "Hunky Dory", 22 "UnitPrice" : 10.95}, 23 "Quantity" : 5.0}, 24 {"ItemNumber" : 2, 25 "Part" : {"Description" : "Pin-Ups", 26 "UnitPrice" : 10.95}, 27 "Quantity" : 3.0}]}' 28 from dual; 1 row created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON'); PL/SQL procedure successfully completed.
We can still create composite indexes based on the JSON_VALUE function as we can with conventional columns:
SQL> create index ziggy_json_idx3 on ziggy_json(json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20)), json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6))); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON'); PL/SQL procedure successfully completed.
If we now run a query with a couple of JSON_VALUE based predicates:
SQL> select * from ziggy_json where json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20))='MTOM' and json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)) = 'B42'; Execution Plan ---------------------------------------------------------- Plan hash value: 3402615542 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1533 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON | 1 | 1533 | 4 (0) | 00:00:01 | |*2 | INDEX RANGE SCAN | ZIGGY_JSON_IDX3 | 1 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.User' RETURNING VARCHAR2(20) NULL ON ERROR)='MTOM' AND JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.CostCenter' RETURNING VARCHAR2(6) NULL ON ERROR)='B42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1248 bytes sent via SQL*Net to client 820 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The composite index is effectively used by the CBO as expected.
It does though make our SQL a little cumbersome to write. To simplify things a tad, we could create a couple of virtual columns based on these functions, create the JSON function-based indexes on these virtual columns and simplify the SQL accordingly.
First, we create the virtual columns (note they’re virtual columns and so consume no storage):
SQL> ALTER TABLE ziggy_json ADD (userid VARCHAR2(20) 2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20)))); Table altered. SQL> ALTER TABLE ziggy_json ADD (costcenter VARCHAR2(6) 2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)))); Table altered.
Next, create the index based on these newly created virtual columns:
SQL> CREATE INDEX ziggy_user_costctr_idx on ziggy_json(userid, costcenter); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON'); PL/SQL procedure successfully completed.
And then write a simplified version of the SQL to reference the virtual columns:
SQL> select * from ziggy_json where userid='MTOM' and costcenter='B42'; ID ZIGGY_DAT ---------- --------- ZIGGY_ORDER -------------------------------------------------------------------------------- USERID COSTCE -------------------- ------ 100001 24-JUN-16 {"PONumber" : 1000001, "Reference" : "MTOM-20161", MTOM B42 Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 5717455 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1535 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON | 1 | 1535 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | ZIGGY_USER_COSTCTR_IDX | 1 | | 3 (0) | 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("USERID"='MTOM' AND "COSTCENTER"='B42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1396 bytes sent via SQL*Net to client 820 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The index is again used as expected.
Of course, if we still want to reference the JSON functions directly within the SQL, the query can still be written as previously:
SQL> select * from ziggy_json where json_value(ziggy_order, '$.User' returning varchar2(20))='MTOM' and json_value(ziggy_order, '$.CostCenter' returning varchar2(6))='B42'; ID ZIGGY_DAT ---------- --------- ZIGGY_ORDER -------------------------------------------------------------------------------- USERID COSTCE -------------------- ------ 100001 24-JUN-16 {"PONumber" : 1000001, "Reference" : "MTOM-20161", MTOM B42 Elapsed: 00:00:00.00 Execution Plan ---------------------------------------------------------- Plan hash value: 5717455 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1535 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON | 1 | 1535 | 4 (0) | 00:00:01 | |*2 | INDEX RANGE SCAN | ZIGGY_USER_COSTCTR_IDX | 1 | | 3 (0) | 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ZIGGY_JSON"."USERID"='MTOM' AND "ZIGGY_JSON"."COSTCENTER"='B42') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 1396 bytes sent via SQL*Net to client 820 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
The index is again used as expected.
I’ll next look at using a JSON Text based index to effectively index the entire JSON document.