CBO Costing Plans With Migrated Rows Part II (“New Killer Star”) March 28, 2023
Posted by Richard Foote in CBO, Index Access Path, Index statistics, Leaf Blocks, Migrated Rows, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard's Blog, ROWID.add a comment
I’ve spent the past few months discussing Migrated Rows, in large part thanks to an excellent 15 minute video by Connor McDonald on how ROWIDs can now be updated on the fly in Oracle Autonomous databases.
Well 14 such posts later, I have finally reached the end of this topic (for now at least). So, an average of about 1 post per minute of video 🙂
In my previous post, I discussed how the CBO costs execution plans with tables that have migrated rows, when the statistics are collected as recommended via the DBMS_STATS package. In summary, migrated rows are basically just ignored, with the CBO blissfully unaware of the existence of any such migrated rows.
As I discussed, if I want to easily see how many migrated rows I have in a table, I can potentially use the ANALYZE command as follows:
SQL> analyze table bowie compute statistics; Table BOWIE analyzed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 4906 86 415 170 56186
As you can now see, the table currently has 56186 migrated rows (yes CHAIN_CNT can potentially count rows that simply can’t fit within a single block, but all these rows are definitely migrated rows as per the demo in my previous post).
Now, it had always been my belief that although you can use the ANALYZE command to count out these migrated rows, the CBO would simply ignore this statistic in its calculations.
But I was wrong.
If we now re-run the query from the previous post:
SQL> select * from bowie where id > 1 and id < 1001; 999 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 999 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 999 | 999 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 999 | 999 |00:00:00.01 | 4 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001) Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 1 DB time 9193 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 664 buffer is not pinned count 1662 buffer is pinned count 323 bytes received via SQL*Net from client 171333 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 666 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 666 consistent gets from cache 665 consistent gets pin 665 consistent gets pin (fastpath) 2 execute count 1 index range scans 5455872 logical read bytes from cache 665 no work - consistent read gets 39 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 2 process last non-idle time 1 session cursor cache hits 666 session logical reads 1 sorts (memory) 2024 sorts (rows) 999 table fetch by rowid 327 table fetch continued row 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 302 (100)| 999 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 999 | 141K| 302 (0)| 999 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 999 | | 4 (0)| 999 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001)
We can see that the cost of the plan has now changed.
Although the cost of reading the index itself is still the same with a cost of 4, the overall cost of the plan has increased to 302 (previously it was 21).
So the difference in plan costs is 302 – 21 = 281. And it’s pretty easy to see where this extra comes from…
The extra costs is basically query selectivity x no of migrated rows
Extra costs = 0.005 x 56186 = 281.
So the index scan costing formula should really be updated to be:
Index Scan Cost = blevel +
ceil(effective index selectivity x leaf_blocks) +
ceil(effective table selectivity x clustering_factor) +
ceil(effective table selectivity x chain_cnt)
Now, IMHO, this new cost is actually more accurate and better matches the true cost of now using the index, which requires 666 Consistent Gets (previously, before the rows migrated, the index plan required just 18 Consistent Gets).
So in some respects, this new cost might not be a bad thing. But then again, a sudden change in such costings due to a flood of new migrated rows might result in an unexpected and undesired plan changes that have been carefully crafted for statistics generated with the conventional DBMS_STATS collection method.
However, it’s not sufficient to simply collect fresh statistics using DBMS_STATS to get the previous CBO costings where migrated rows are ignored:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 4906 86 415 167 56186
Simply collecting fresh statistics does NOT clear out the CHAIN_CNT statistic and so the CBO costings remain the same as with ANALYZE command:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 302 (100)| 999 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 999 | 141K| 302 (0)| 999 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 999 | | 4 (0)| 999 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001)
You need to first delete the table statistics to remove the CHAIN_CNT statistic (which of course comes with obvious dangers now there are no table statistics present) before you collect fresh statistics using DBMS_STATS:
SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE'); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 4906 0 0 167 0 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 3250
The CHAIN_CNT statistic has finally been cleared to 0 and the CBO costings now returned to as it was previously when such migrated rows were ignored:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 21 (100)| 999 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 163K| 21 (0)| 999 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | | 4 (0)| 999 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001)
The CBO costs are now back to the previous 21.
So I’m a little in two minds about this. I think the statistics generated and used by the CBO are better with the ANALYZE command, but would still suggest collecting the necessary statistics using the recommended DBMS_STATS approach. Perhaps Oracle giving us the option to collect these additional statistics using DBMS_STATS might be a useful enhancement… 🤷♂️
Now, at one point in time, a long long time ago, I’m reasonably sure the CBO previously didn’t use the CHAIN_CNT statistic. However, it came as no real surprise when I researched when the CBO had started using the CHAIN_CNT statistic in its calculations, that Jonathan Lewis had already written on this subject way way back in April 2009 🙂
So Oracle definitely had this behaviour all the way back to at least 9i and continues to behave this way in 21c. Ah well, better late than never I guess to finally realise how all this actually works…
UPDATE (29 March 2023): Jonathan Lewis can kindly confirmed with me that CHAIN_CNT was definitely ignored back in version 8.1.7.4 and that this changed to the current behaviour in either 9.0 or 9.2.
CBO Costing Plans With Migrated Rows Part I (“Ignoreland”) March 21, 2023
Posted by Richard Foote in BLEVEL, CBO, Clustering Factor, Data Clustering, Index Access Path, Index Height, Index statistics, Leaf Blocks, Migrated Rows, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard's Blog, ROWID.3 comments
Whilst recently blogging about Migrated Rows and specifically changes to how ROWIDs are now maintained on the fly in Oracle Autonomous Databases, I made a discovery regarding how the Cost-Based Optimizer (CBO) costs such plans. This is one of the key reasons why I blog, not only to try and share odd titbits about how Oracle works, but also to hopefully learn much myself in the process.
Imagine my surprise in not only learning that Oracle and the CBO works differently to how I had always thought Oracle worked in this respect, but that this behaviour has been the case since at least Oracle 9i.
In Part I, I’ll use the same example of migrated rows as I’ve used in the past few blog posts and initially show how the CBO generally costs such plans (and by which I had incorrectly assumed ALWAYS costed such plans).
Let’s start by creating and populating a tightly packed table (in an environment where ROWIDs are NOT updated on the fly):
SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0; Table BOWIE created. SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000; 200,000 rows inserted. SQL> commit; Commit complete.
I’ll next create an index on the well clustered ID column (as the rows are inserted in ID column order within the table):
SQL> create index bowie_id_i on bowie(id); Index BOWIE_ID_I created.
Next, we’ll use the Oracle recommended method of collecting table/index statistics, by using the DBMS_STATS package:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 3268 0 0 111 0 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 3250
Note the key index statistics here: BLEVEL=1, LEAF_BLOCKS=473 and the near perfect CLUSTERING_FACTOR=3250.
If we run the following query featuring a non-equality range predicate:
SQL> select * from bowie where id > 1 and id < 1001; 999 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 999 |00:00:00.01 | 18 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 999 |00:00:00.01 | 18 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | 999 |00:00:00.01 | 4 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001) Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 1 DB time 7678 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 16 buffer is not pinned count 1983 buffer is pinned count 323 bytes received via SQL*Net from client 171383 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 18 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 18 consistent gets from cache 17 consistent gets pin 17 consistent gets pin (fastpath) 2 execute count 1 index range scans 147456 logical read bytes from cache 17 no work - consistent read gets 40 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 2 process last non-idle time 1 session cursor cache count 1 session cursor cache hits 18 session logical reads 1 sorts (memory) 2024 sorts (rows) 999 table fetch by rowid 3 user calls
We notice that the CBO indeed uses the index.
They key statistic to note here is that Consistent Gets is just 18, which is extremely low considering we’re returning 999 rows. This is due to the fact the index is currently extremely efficient as it can fetch multiple rows by visiting the same table block due to the excellent clustering/ordering of the required ID column values (and also due to my high arraysize session setting).
If we look at the CBO costings for this plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time |Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 21 (100)| 999 |00:00:00.01 | 18 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 108K| 21 (0)| 999 |00:00:00.01 | 18 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | | 4 (0)| 999 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001)
I’ve previously discussed many times how the CBO costs index access paths, but it’s always useful to go over this again, as it’s the most common question I get asked when I visit customer sites.
The KEY statistic the CBO has to determine is the estimated Selectivity of the query (the estimated percentage of rows to be returned), as this is the driver of all the subsequent CBO calculations.
The Selectivity of this range-based predicate query is calculated as follows:
Selectivity = (Highest Bound Value – Lowest Bound Value) / (Highest Value – Lowest Value)
= (1001-1) /(200000-1)
= 1000/199999
= approx. 0.005
Once Oracle has the selectivity, it can calculate the query Cardinality (estimated number of rows) as follows:
Cardinality = Selectivity x No of Rows
Cardinality = 0.005 x 200000 = 1000 rows
This is our visual window into the likelihood that the CBO has made an accurate decision with its execution plan. If the cardinality estimates are reasonably accurate, then the CBO is likely to generate a good plan. If the cardinality estimates are way off, then the CBO is more likely to generate an inappropriate plan.
The CBO cardinality estimate in the above plan is 1000 rows, whereas the number of rows actually returned is 999 rows.
So indeed, the CBO has got the cardinality almost spot on (except for a trivial rounding error) and so we have a high degree of confidence that the CBO is using the correct selectivity estimates when they get plugged into the following CBO formula for costing an index range scan (using this selectivity of 0.005 and the index statistics listed above):
Index Scan Cost = (blevel + ceil(effective index selectivity x leaf_blocks)) + ceil(effective table selectivity x clustering_factor)
= (1 + ceil(0.005 x 467)) + ceil(0.005 x 3250)
= (1 + 3) + 17
= 4 + 17 = 21
So we can clearly see where the CBO gets its costings for both reading the index during the Index Range Scan (4) and for the plan as a whole (21).
The CBO cost of 21 very closely resembles the 18 consistent gets accessed when the plan is executed. This to me suggests that the CBO has indeed costed this plan very accurately and appropriately.
It’s interesting to note in the above execution plan that Oracle is attributing 100% of this cost of 21 to CPU (21 (100)). That will be a discussion for another day…
OK, let’s now perform an update on the table, increasing the size of the rows such that I generate a bunch of migrated rows:
SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS'; 200,000 rows updated. SQL> commit; Commit complete.
If we now collect fresh statistics again using DBMS_STATS:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 4906 0 0 167 0 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 3250
We notice that none of the key statistics have changed, except for the number of Table Blocks (now 4906, previously it was 3268) and the Average Row Length has also increased (now 167, previously it was 111). Both of these can of course be attributed to the increase in the size of the values now stored in the NAME column following the Update.
Importantly, notice that collecting statistics via DBMS_STATS does NOT collect data for the CHAIN_CNT statistic, it remains at 0 even though many migrated rows were actually generated by the Update statement (as we’ll see below).
Increasing the Table Blocks will result in an associated increase in the cost of reading this table via a Full Table Scan (FTS).
We notice that none of the index-related statistics changed following the Update statement (as in this example, Oracle does NOT update the ROWIDs of any of the migrated rows, Oracle simply stores a pointer in the original block to denote the new physical location of the migrated rows as previously discussed).
So if we only INCREASE the cost of a FTS (via having more Table Blocks) but keep intact all the previous index related statistics, then the CBO is certainly going to again select the same Index Range Scan plan, as the plan will have the same (cheaper than FTS) costings as before.
If we re-run the query again:
SQL> select * from bowie where id > 1 and id < 1001; 999 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 999 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 999 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | 999 |00:00:00.01 | 4 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001) Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 7709 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 664 buffer is not pinned count 1662 buffer is pinned count 323 bytes received via SQL*Net from client 171500 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 666 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 666 consistent gets from cache 665 consistent gets pin 665 consistent gets pin (fastpath) 2 execute count 1 index range scans 5455872 logical read bytes from cache 665 no work - consistent read gets 39 non-idle wait count 1 non-idle wait time 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 666 session logical reads 1 sorts (memory) 2024 sorts (rows) 999 table fetch by rowid 327 table fetch continued row 3 user calls
We notice that indeed it’s the same Index Range Scan plan as before.
But we notice that the number of Consistent Gets has increased substantially to 666 (previously it was just 18). The reason for this large jump is due to the now 327 table fetch continued rows that need to be accessed due to the newly migrated rows following the Update. This number is then doubled (so 2 x 327 = 654) to represent the approximate additional Consistent Gets we now need to perform, as Oracle needs to read the additional table block to access the migrated row’s new physical location AND to now re-read the original table block to access the next row to be fetched (previously Oracle could read all the required consecutive rows required from the same table block within the one consistent get).
So it’s now actually substantially more expensive to read the required 1000 rows via this index due to this increase in necessary consistent gets.
But if we look at the actual cost of this plan now:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID b1vwpu2rgn8p5, child number 0 ------------------------------------- select * from bowie where id > 1 and id < 1001 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time |Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 21 (100)| 999 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 163K| 21 (0)| 999 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | | 4 (0)| 999 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">1 AND "ID"<1001)
We notice that as expected (as none of the index-related statistics have changed), that despite being much more expensive to now use this index, the costs of this plan (4 for reading the index and 21 overall) remain unchanged.
I would argue that these CBO costs are no longer as accurate as the 21 total CBO cost does not so closely represent the actual 666 consistent gets now required.
Now, the 327 table fetch continued row statistics from the previous run is clear proof we indeed have migrated rows following the Update statement.
But if we want to confirm how many migrated rows we now have in the table, we can use the ANALYZE command to collect these additional statistics:
SQL> analyze table bowie compute statistics; Table BOWIE analyzed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 4906 86 415 170 56186
We notice that we now have a CHAIN_CNT of 56186.
Now this statistic can represent any row that is not housed inside a single table block (for which there could be a number of possible reasons, such as a row simply being too long to fit in a single table block), but as all rows are still relatively tiny, we can be certain that indeed all 56186 chained rows represent migrated rows.
Now that I’ve gone and used ANALYZE, primarily to generate this CHAIN_CNT statistic, my previous understanding of how the CBO costs migrated rows crumbles away, as I’ll discuss in my next post…
Automatic Indexing: Non-Equality Predicates Part IV (“Like A Rocket Man”) November 29, 2022
Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, LIKE Predicates, Non-Equality Predicates, Oracle, Oracle 21c, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, _EXADATA_FEATURE_ON.add a comment
Forgive me, it’s been a while since I last posted, but life has so many distractions these days 🙂
I recently had a question on whether a LIKE predicate can generate an Automatic Index now that non-equality predicates are supported since Oracle Database 21c.
Now the answer I initially provided was “well, why don’t you just test it for yourself“. However, his subsequent responses highlighted to me that not everyone might necessarily know how to potentially play with many of the Exadata features, even if you don’t directly have access to an Exadata environment.
So the purpose of this post is not only to answer this question, but also to just highlight HOW to potentially test things out for yourself when you’re not lucky enough to work directly with Exadata.
One obvious manner in which to play on an Exadata environment is to simply create and connect to an Oracle Autonomous Database environment using Oracle’s Cloud services (which are all Exadata-based environments), where you can easily, FOR FREE, and WITH NO TIME RESTRICTIONS play with many Exadata database features. The “Always Free Cloud Services” is truly a fabulous resource provided by Oracle, where you can have a couple of Autonomous Database environments always at your disposal (and very very easily and quickly just drop an existing database environment and re-create a new one).
Follow the link for all the information you need on how to get started with Oracle’s Always Free Cloud Services: https://www.oracle.com/au/autonomous-database/free-trial/
If the version of Oracle Database you like to play with isn’t currently available on the Oracle Autonomous Database platforms, another option is to simply download the database version you want to play with and just make it think it’s actually on an Exadata platform, by setting the following hidden parameter:
SQL> alter system set "_exadata_feature_on"=true scope=spfile; System altered.
and restart your database.
You can now at least play and learn about many of the Exadata database features (such as Automatic Indexing), without having an actual Exadata machine on hand.
OK, now that you have an Exadata (or Exadata-like) environment on hand, you can go about answering for yourself these types of questions…
So, does Automatic Indexing now work in the case of a LIKE predicate?
First, make sure Automatic Indexing is enabled:
SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_MODE','IMPLEMENT'); PL/SQL procedure successfully completed.
Begin by creating and populating a basic table structure to test. The following table just has a few basic columns, with the MIXED_STUFF column simply populated with the rownum concatenated with a constant string:
SQL> create table aladdin_sane (id number, code1 number, grade number, mixed_stuff varchar2(42), name varchar2(42)); Table created. SQL> insert into aladdin_sane select rownum, mod(rownum,100000), mod(rownum,100), rownum || ' David Bowie ' || rownum, 'ZIGGY STARDUST' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ALADDIN_SANE'); PL/SQL procedure successfully completed.
I then run the following query several times with a LIKE predicate that returns just the one row from my 10M row table:
SQL> select * from aladdin_sane where mixed_stuff like '4242 %'; Execution Plan ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 57 | 14805 (2)| 00:00:01 | | * 1 | TABLE ACCESS FULL | ALADDIN_SANE | 1 | 57 | 14805 (2)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("MIXED_STUFF" LIKE '4242 %') filter("MIXED_STUFF" LIKE '4242 %') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 169813 consistent gets 84940 physical reads 0 redo size 912 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Without any indexes currently in place, the CBO has no choice but to use a FTS. But, with only 1 row returned from this 10M table, an appropriate index would almost certainly be beneficial. So what does Automatic Index do in this scenario?
Once we wait for the next running of the Automatic Indexing jobs to complete, we can check:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 29-NOV-2022 12:52:30 Activity end : 29-NOV-2022 12:53:50 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 452.98 MB (452.98 MB / 0 B) Indexes dropped : 0 SQL statements verified : 6 SQL statements improved (improvement factor) : 1 (169815.2x) SQL plan baselines created (SQL statements) : 1 (1) Overall improvement factor : 2x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ------------------------------------------------------------------------------- ----------------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ----------------------------------------------------------------------------------- | BOWIE | ALADDIN_SANE | SYS_AI_dzhahcw1cf0mw | MIXED_STUFF | B-TREE | NONE | ----------------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : fgvdbfsfwb9jv SQL Text : select * from aladdin_sane where mixed_stuff like '4242%' Improvement Factor : 169815.2x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 10869872 433 CPU Time (s): 9778626 433 Buffer Gets: 2377413 4 Optimizer Cost: 14805 4 Disk Reads: 1189160 2 Direct Writes: 0 0 Rows Processed: 14 1 Executions: 14 1
So, it appears that Automatic Indexing has indeed created a new index. We can now check out the new index details:
select index_name, auto, constraint_index, visibility, compression, status from user_indexes where table_name='ALADDIN_SANE'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS -------------------- --- --- --------- ------------- -------- SYS_AI_bnyacywycxx8b YES NO VISIBLE DISABLED VALID SQL> select index_name, column_name, column_position from user_ind_columns where table_name='ALADDIN_SANE' order by index_name, column_position; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ ------------------------------ --------------- SYS_AI_dzhahcw1cf0mw MIXED_STUFF 1
Automatic Indexing has indeed created a VALID/VISIBLE index on the MIXED_STUFF column.
If we now re-run the query:
SQL> select * from aladdin_sane where mixed_stuff like '4242 %'; Execution Plan ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 57 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ALADDIN_SANE | 1 | 57 | 4 (0)| 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_dzhahcw1cf0mw | 1 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("MIXED_STUFF" LIKE '4242 %') filter("MIXED_STUFF" LIKE '4242 %') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 116204 consistent gets 84940 physical reads 0 redo size 912 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We can see the newly generated plan now uses the new Automatic Index.
But due to Deferred Invalidations (which I’ve discussed previously), which in Oracle 21c delay the invalidation of SQL cursors due to new indexes, we may need to (for example) flush the shared_pool for the new plan to actually be used (a safe enough option in our play/test environment):
SQL> alter system flush shared_pool; System altered. SQL> select * from aladdin_sane where mixed_stuff like '4242 %'; Execution Plan ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 57 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ALADDIN_SANE | 1 | 57 | 4 (0)| 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_dzhahcw1cf0mw | 1 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("MIXED_STUFF" LIKE '4242 %') filter("MIXED_STUFF" LIKE '4242 %') Statistics ---------------------------------------------------------- 263 recursive calls 0 db block gets 508 consistent gets 0 physical reads 0 redo size 916 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 70 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select * from aladdin_sane where mixed_stuff like '4242 %'; Execution Plan ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 57 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ALADDIN_SANE | 1 | 57 | 4 (0)| 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_dzhahcw1cf0mw | 1 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("MIXED_STUFF" LIKE '4242 %') filter("MIXED_STUFF" LIKE '4242 %') Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 916 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
We can now see that the new plan has indeed been adopted with the substantial decrease in consistent gets, down to just 5 from the previous 169813 with the FTS.
So yes, Automatic Indexing does indeed now work with LIKE predicates, but most importantly, it’s very easy for you to test and see these things for yourself.
In which case, you won’t need blogs such as this in the future to show you the way… 🙂
Costing Concatenated Indexes With Range Scan Predicates Part II (Coming Back To Life) July 27, 2022
Posted by Richard Foote in Automatic Indexing, CBO, Column Statistics, Concatenated Indexes, Explain Plan For Index, Full Table Scans, Index Access Path, Index Column Order, Index Column Reorder, Index Internals, Index statistics, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Index Seminar, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard Foote Training.add a comment
In my previous Part I post, I discussed how the CBO basically stops the index leaf block access calculations after a non-equality predicate. This means that for an index with the leading indexed column being accessed via an unselective non-equality predicate, a large percentage of the index’s leaf blocks might need to be scanned, making the index access path unviable.
In the example in Part I, an index on the ID, CODE columns was too expensive due to the unselective range-scan predicate based on the leading ID column.
To provide the CBO a potentially much more efficient access path, we need an index with the more selective CODE predicate to be the leading column:
SQL> CREATE INDEX radiohead_code_id_i ON radiohead(code, id); Index created. SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor FROM user_indexes WHERE index_name = 'RADIOHEAD_CODE_ID_I'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ----------------------------- ---------- ----------- ----------------- RADIOHEAD_CODE_ID_I 1 265 98619
If we now re-run the previous query:
SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140; Execution Plan ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 72 | 6 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD | 4 | 72 | 6 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | RADIOHEAD_CODE_ID_I | 4 | | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 7 consistent gets 0 physical reads 0 redo size 806 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 4 rows processed
We notice the CBO is now using this new index, as the costs for this index-based plan have dropped significantly, down to just 6 (from the previous 116). This overall cost of 6 is lower than the cost of 105 for the Full Table Scan and hence the reason why this index-based plan is now chosen by the CBO.
This is all due entirely to the significant drop in costs in accessing the index itself, now just 2 (from the previous 112).
Importantly, these much lower costs are accurate as we can tell via the reduced number of consistent reads, now just 7 (from 114 from the previous index-based plan).
If we now look at the associated costings:
Effective Index Selectivity = CODE selectivity x ID selectivity
= (1/10000) x ((5000-1000)/(10000-1) + 2 x (1/10000))
= 0.0001 x ((4000/9999) + 0.0002)
= 0.0001 x 0.40024)
= 0.000040024
Effective Table Selectivity = same as Index Selectivity
= 0.000040024
The effective index selectivity of 0.000040024 is now much lower than the previous (0.40024), as the CBO can now consider the product of the selectivities of both columns).
If we now plug this improved effective index selectivity into the index path costing calculations:
Index IO Cost = blevel +
ceil(effective index selectivity x leaf_blocks) +
ceil(effective table selectivity x clustering_factor)
Index IO Cost = 1 + ceil(0.000040024 x 265) + ceil(0.000040024 x 99034)
= 1 + 1 + 4
= 2 + 4
= 6
Index Access Cost = IO Costs + CPU Costs (in this plan, 0% of total costs and so unchanged from the IO costs)
= 2 + 4
= 6
We can see how the respective 2 and 6 improved CBO index costings are derived.
So again, it’s important to note that Automatic Indexing is doing entirely the correct thing with these examples, when it creates an index with the equality based predicate columns as the leading columns of the index…
Costing Concatenated Indexes With Range Scan Predicates Part I (Nothing To Be Desired) July 22, 2022
Posted by Richard Foote in BLEVEL, CBO, Clustering Factor, Concatenated Indexes, Index Access Path, Index Column Order, Index Column Reorder, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Richard Foote Consulting, Richard Foote Training, Richard's Blog.1 comment so far
In my previous post, I discussed how Automatic Indexing ordered columns when derived from SQLs containing both equality and non-equality predicates.
I’ve since had offline questions asking why indexes are more effective with leading columns addressing the equality predicates rather than the leading columns addressing non-equality predicates. Based on the theory that for everyone who asks a question, there are likely numerous others wondering the same thing, I thought I’ll try to explain things with these posts.
I’ll start by creating the following simple table that has two columns (ID and CODE) that are both highly selective (they both have 10,000 distinct values in a 100,000 rows table, so 10 rows approximately per value):
SQL> CREATE TABLE radiohead (id NUMBER, code NUMBER, name VARCHAR2(42)); Table created. SQL> INSERT INTO radiohead SELECT mod(rownum,10000)+1, ceil(dbms_random.value(0,10000)), 'RADIOHEAD' FROM dual CONNECT BY LEVEL <= 100000; 100000 rows created. SQL> commit; Commit complete.
I’ll next create an index based on the ID, CODE columns, with importantly the ID column as the leading column:
SQL> CREATE INDEX radiohead_id_code_i ON radiohead(id, code); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'RADIOHEAD', estimate_percent=> null, method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
When it comes to costing index accesses, some of the crucial statistics including the Blevel, Leaf_Blocks and often most crucial of all, the Clustering_Factor:
SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor FROM user_indexes WHERE index_name = 'RADIOHEAD_ID_CODE_I'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR -------------------- ---------- ----------- ----------------- RADIOHEAD_ID_CODE_I 1 265 99034
We begin by running the following query, with an equality predicate on the ID column and a relatively large, non-selective range predicate on the CODE column:
SQL> SELECT * FROM radiohead WHERE id = 42 AND CODE BETWEEN 1000 AND 5000; Execution Plan ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 72 | 6 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD | 4 | 72 | 6 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | RADIOHEAD_ID_CODE_I | 4 | | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 824 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed
As (perhaps) expected, the CBO uses the index to retrieve the small number of rows (just 5 rows).
However, if we run the following query which also returns a small number of rows (just 4 rows) BUT with the relatively unselective, non-equality predicate based on the leading indexed ID column:
SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140; Execution Plan ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 72 | 105 (11)| 00:00:01 | |* 1 | TABLE ACCESS FULL| RADIOHEAD | 4 | 72 | 105 (11)| 00:00:01 | ------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 363 consistent gets 0 physical reads 0 redo size 770 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 4 rows processed
We notice (perhaps unexpectedly) that the CBO now ignores the index and uses a Full Table Scan, even though only 4 rows are returned from a 100,000 row table.
This is a common area of confusion. Why does Oracle not use the index when both columns in the index are referenced in the SQL predicates and only a tiny number of rows are returned?
The answer comes down to the very unselective non-equality predicate (ID BETWEEN 1000 AND 5000) being serviced by the leading column (ID) of the index.
The “ID BETWEEN 1000 AND 5000” predicate basically covers 40% of all known ID values, which means Oracle must now read 40% of all Leaf Blocks within the index (one leaf block at a time), starting with ID =1000 and ending with ID = 5000. Although there are very few rows that then subsequently match up with the other (CODE = 140) predicate based on the second column (CODE) of the index, these relatively few values could exist anywhere within the 40% ID range.
Therefore, when costing the reading of the actual index, the CBO basically stops its calculations after the non-equality predicate on this leading ID column and indeed estimates that a full 40% of the index itself must be scanned.
If we force the CBO into a range scan via a basic index hint:
SQL> SELECT /*+ index(r) */ * FROM radiohead r WHERE id BETWEEN 1000 AND 5000 AND CODE = 140; Execution Plan ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 72 | 116 (4)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD | 4 | 72 | 116 (4)| 00:00:01 | |* 2 | INDEX RANGE SCAN | RADIOHEAD_ID_CODE_I | 4 | | 112 (4)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 114 consistent gets 0 physical reads 0 redo size 806 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 4 rows processed
We notice that the overall cost of this index based plan is 116, greater than the 105 cost of the Full Table Scan (and hence why the Full Table Scan was selected). We also notice that the vast majority of this 116 cost can be attributed to the index scan itself in the plan, which has a cost of 112.
If you have a calculator handy, this is basically how these costs are derived.
Range Selectivity = (Max Range Value–Min Range Value)/(Max Column Value–Min Column Value)
Effective Index Selectivity = Range Selectivity + 2 x ID density (as a BETWEEN clause was used which is inclusive of Min/Max range)
= (5000-1000)/(10000-1) + 2 x (1/10000)
= 0.40004 + 0.0002
= 0.40024
Effective Table Selectivity = ID selectivity (as above) x CODE selectivity
= 0.40024 x (1/10000)
= 0.40024 x 0.0001
= 0.000040024
These selectivities are then inserted into the following index costing formula:
Index IO Cost = blevel +
ceil(effective index selectivity x leaf_blocks) +
ceil(effective table selectivity x clustering_factor)
Index IO Cost = 1 + ceil(0.40024 x 265) + ceil(0.000040024 x 99034)
= 1 + 107 + 4
= 108 + 4 = 112.
Index Access Cost = IO Costs + CPU Costs (in this plan, 4% of total costs)
= (108 + (112 x 0.04)) + (4 + (4 x 0.04))
= (108 + 4) + (4 + 0)
= 112 + 4
= 116
So we can clearly see how the CBO has made its calculations, come up with its costs and has decided that the Full Table Scan is indeed the cheaper alternative with the current index in place.
So Automatic Indexing is doing the right thing, by creating an index with the leading column based on the equality predicate and the second indexed column based on the unselective non-equality predicate.
I’ll expand on this point in an upcoming Part II post.
Automatic Indexing 21c: Non-Equality Predicate Anomaly (“Strangers When We Meet”) July 14, 2022
Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, Index Column Order, Invisible Indexes, Non-Equality Predicates, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning, Richard Foote Training, Richard's Blog.3 comments
I’m currently putting together some Exadata related training for a couple of customers and came across a rather strange anomaly with regard the status of Automatic Indexes, when created in part on unselective, non-equality predicates.
As discussed previously, Oracle Database 21c now allows the creation of Automatic Indexes based on non-equality predicates (previously, Automatic Indexes were only created on equality-based predicates).
But one appears to get rather odd resultant Automatic Indexes in the scenario where the non-equality predicate is not particularly selective but other predicates are highly selective.
To illustrate, I’ll create a basic table that has two columns (ID and CODE) that are both highly selective:
SQL> create table ziggy_new (id number, code number, name varchar2(42)); Table created. SQL> insert into ziggy_new select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_NEW'); PL/SQL procedure successfully completed.
So there are currently no indexes on this table.
I’ll next run the following SQL (and others similar) a number of times:
SQL> select * from ziggy_new where code=42 and id between 1 and 100000; Execution Plan ---------------------------------------------------------- Plan hash value: 3165184525 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 23 | 6738 (2) | 00:00:01 | | * 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 (2) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1) filter("CODE"=42 AND "ID"<=100000 AND "ID">=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 38605 consistent gets 38600 physical reads 0 redo size 725 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Without any indexes, the CBO currently has no choice but to use a Full Table Scan.
But only 1 row is returned. The first equality predicate on the CODE column is highly selective and on its own would only return 10 rows out of the 10M row table. The second, non-equality range-based predicate on the ID column is nowhere near as selective and offers limited additional filtering.
The CBO stops calculating index related costs after a non-equality predicate column (as subsequent column values could exist anywhere within the preceding range), and so the more effective index here is one based on (CODE, ID) with the non-equality predicate column second, or potentially just on the CODE column only, as the ID range offers minimal filtering benefits.
So what does Automatic Indexing make of things?
If we look at the subsequent Automatic Indexing report:
SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 3 Indexes created (visible / invisible) : 1 (0 / 1) Space used (visible / invisible) : 209.72 MB (0 B / 209.72 MB) Indexes dropped : 0 SQL statements verified : 44 SQL statements improved (improvement factor) : 12 (64.7x) SQL plan baselines created : 0 Overall improvement factor : 1.6x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ------------------------------------------------------------------------------- ---------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ---------------------------------------------------------------------------- | BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE | ----------------------------------------------------------------------------
So Automatic Indexing has indeed created an index based on CODE,ID (a common Automatic Indexing trait appears to be to create an index based on all available predicates).
BUT the index is created as an INVISIBLE Index and so can not generally be used by database sessions.
SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='ZIGGY_NEW'; INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- -------- ---------- ----------- ----------------- SYS_AI_75j16xff1ag3j YES INVISIBLE VALID 10000000 25123 10000000 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='ZIGGY_NEW'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ ------------ --------------- SYS_AI_75j16xff1ag3j CODE 1 SYS_AI_75j16xff1ag3j ID 2
So re-running the previous SQL statements continues to use a Full Table Scan:
SQL> select * from ziggy_new where code=42 and id between 1 and 100000; Execution Plan ---------------------------------------------------------- Plan hash value: 3165184525 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 23 | 6738 (2) | 00:00:01 | | * 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 (2) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42 AND "ID"<=100000 AND "ID">=1) filter("CODE"=42 AND "ID"<=100000 AND "ID">=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 38605 consistent gets 38600 physical reads 0 redo size 725 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Automatic Indexing appears to only create Invisible indexes when there is an inefficient non-equality predicate present. It won’t create the index as a Visible index, even though it would significantly benefit these SQL statements that caused its creation. And Automatic Indexing won’t create an index on just the highly selective CODE equality predicate, which would also be of much benefit to these SQL statements.
If we now run similar queries, but with much more selective non-equality predicates, such as:
SQL> select * from ziggy_new where code=1 and id between 1 and 10; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3165184525 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 23 | 6738 (2) | 00:00:01 | | * 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 (2) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=1 AND "ID"<=10 AND "ID">=1) filter("CODE"=1 AND "ID"<=10 AND "ID">=1) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 38604 consistent gets 38600 physical reads 0 redo size 503 bytes sent via SQL*Net to client 41 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Again, with no (Visible) index present, the CBO currently has no choice but to use the Full Table Scan.
But during the next cycle, after Automatic Indexing kicks in again:
SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 5 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 209.72 MB (209.72 MB / 0 B) Indexes dropped : 0 SQL statements verified : 89 SQL statements improved (improvement factor) : 31 (71.9x) SQL plan baselines created : 0 Overall improvement factor : 1.7x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ------------------------------------------------------------------------------- ---------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ---------------------------------------------------------------------------- | BOWIE | ZIGGY_NEW | SYS_AI_75j16xff1ag3j | CODE,ID | B-TREE | NONE | ---------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : d4znwcu4h52ca SQL Text : select * from ziggy_new where code=42 and id between 1 and 10 Improvement Factor : 38604x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 3398605 68 CPU Time (s): 3166824 68 Buffer Gets: 463250 3 Optimizer Cost: 6738 4 Disk Reads: 463200 0 Direct Writes: 0 0 Rows Processed: 0 0 Executions: 12 1 PLANS SECTION -------------------------------------------------------------------------------- ------------- - Original ----------------------------- Plan Hash Value : 3165184525 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 6738 | | | 1 | TABLE ACCESS STORAGE FULL | ZIGGY_NEW | 1 | 23 | 6738 | 00:00:01 | -------------------------------------------------------------------------------- - With Auto Indexes ----------------------------- Plan Hash Value : 1514586396 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 23 | 4 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW | 1 | 23 | 4 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_75j16xff1ag3j | 1 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
But this time, the index on the CODE,ID columns is created as a Visible index.
INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- -------- ---------- ----------- ----------------- SYS_AI_75j16xff1ag3j YES VISIBLE VALID 10000000 25123 10000000 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='ZIGGY_NEW'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ ------------ --------------- SYS_AI_75j16xff1ag3j CODE 1 SYS_AI_75j16xff1ag3j ID 2
So this index can be generally used, both by the newer SQLs that generated the now Visible index:
SQL> select * from ziggy_new where code=42 and id between 1 and 10; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1514586396 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW | 1 | 23 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_75j16xff1ag3j | 1 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42 AND "ID">=1 AND "ID"<=10) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 503 bytes sent via SQL*Net to client 41 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
And also used by the SQLs with the unselective non-equality predicates, that Automatic Indexing would only create as Invisible indexes:
SQL> select * from ziggy_new where code=42 and id between 1 and 100000; Execution Plan ---------------------------------------------------------- Plan hash value: 1514586396 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_NEW | 1 | 23 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_75j16xff1ag3j | 1 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42 AND "ID">=1 AND "ID"<=100000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 729 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
Automatic Indexing appears to currently not quite do the right thing with SQL statements that have unselective non-equality predicates, by creating such indexes as only Invisible Indexes, inclusive of the unselective columns.
Although an edge case, I would recommend looking through the list of created Automatic Indexes to see if any such Invisible/Valid indexes exists, as it can suggest there are current inefficient SQL statements that could benefit from such indexes being Visible.
Automatic Indexing: Non-Equality Predicates Part III (“Who Can I Be Now”) December 21, 2021
Posted by Richard Foote in 21c New Features, Automatic Indexing, Non-Equality Predicates, Oracle Indexes.3 comments
In previous posts, I discussed how Automatic Indexing supported the creation of indexes based on equality predicates only.
One of the most significant improvements with Automatic Indexing introduced with Oracle Database 21c is that non-equality predicates are now supported.
Previously, if I created the following 10 million row table and ran an SQL query based on a non-equality predicate that only returned 9 rows:
SQL> create table ziggy (id number, code number, name varchar2(42)); Table created. SQL> insert into ziggy select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY'); PL/SQL procedure successfully completed. SQL> select * from ziggy where id between 42 and 50; 9 rows selected. Execution Plan ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 230 | 6173 (6)| 00:00:01 | | 1 | PX COORDINATOR | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 10 | 230 | 6173 (6)| 00:00:01 | | 3 | PX BLOCK ITERATOR | | 10 | 230 | 6173 (6)| 00:00:01 | |* 4 | TABLE ACCESS STORAGE FULL| ZIGGY | 10 | 230 | 6173 (6)| 00:00:01 | -----------------------------------------------------------------------------------------
It didn’t matter how often I ran the query, Automatic Indexing would never created the necessary Automatic Index on the ID column.
However, run the same query now on a 21c database and Automatic Indexing will generate the following index:
INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ------------------------------------------------------------------------------- -------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | -------------------------------------------------------------------- | BOWIE | ZIGGY | SYS_AI_8102kh14m1mf8 | ID | B-TREE | NONE | -------------------------------------------------------------------- ------------------------------------------------------------------------------- SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='ZIGGY'; INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------- --- --------- -------- ---------- ----------- ----------------- SYS_AI_8102kh14m1mf8 YES VISIBLE VALID 10000000 23780 38451
If I re-run the equivalent query:
SQL> select * from ziggy where id between 42 and 50; Execution Plan ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 10 | 230 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY | 10 | 230 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_8102kh14m1mf8 | 10 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=42 AND "ID"<=50) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 6 consistent gets 0 physical reads 0 redo size 963 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9 rows processed
Automatic Indexing has kicked in and significantly improved the performance of this query for me.
Automatic Indexing may still have a number of restrictions, but by supporting non-equality predicates, it has now removed its most significant restriction…
Oracle 19c Automatic Indexing: Non-Equality Predicates Part II (Let’s Spend The Night Together) January 21, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.1 comment so far
In my previous post in this series, I discussed out Automatic Indexing currently does not consider Non-Equality predicates. Automatic Indexing will index columns based only on Equality predicates.
So how does Oracle handle the scenario when an SQL has a mixture of both Equality and Non-Equality predicates?
I’ll begin by creating two very similar tables, but with the second table having a more selective CODE column:
SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42)); Table created. SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD'); PL/SQL procedure successfully completed. SQL> create table pink_floyd1 (id number, code number, create_date date, name varchar2(42)); Table created. SQL> insert into pink_floyd1 select rownum, ceil(dbms_random.value(0, 25000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD1'); PL/SQL procedure successfully completed.
So table PINK_FLOYD has 5,000 distinct CODE values, whereas table PINK_FLOYD1 has 25,000 distinct CODE values.
I’ll next run the following identical SQLs, which both use an Equality predicate on the CODE column and a Non-Equality predicate on the CREATE_DATE column. The CODE column provides some filtering (more so with the PINK_FLOYD1 table) but in combination with the CREATE_DATE column, results in the ultimate filtering with no rows returned:
SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1152280033 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 63660 consistent gets 63649 physical reads 0 redo size 426 bytes sent via SQL*Net to client 380 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 564520720 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 856 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 (11) | 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss')) filter("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00','syyyy-mm-dd hh24:mi:ss')) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 64424 consistent gets 64413 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
So how does Automatic Indexing handle this scenario. If we look at the subsequent Automatic Indexing report (highlights only):
INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- ----------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ----------------------------------------------------------------------------- | BOWIE | PINK_FLOYD1 | SYS_AI_96snkmu4sk44g | CODE | B-TREE | NONE | ----------------------------------------------------------------------------- ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 7wag3gbk0b3tm SQL Text : select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021' Improvement Factor : 64442.3x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 568513 2771 CPU Time (s): 275534 1874 Buffer Gets: 1031078 406 Optimizer Cost: 856 405 Disk Reads: 1030609 3 Direct Writes: 0 0 Rows Processed: 0 0 Executions: 16 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 564520720 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 856 | | | 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD1 | 1 | 41 | 856 | 00:00:01 | ----------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 2703636439 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 405 | 00:00:01 | | * 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 405 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 403 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) * 2 - access("CODE"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
If we look at the definitions of all indexes currently on these tables:
SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='PINK_FLOYD'; INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_dp2t0j12zux49 YES INVISIBLE ADVANCED LOW UNUSABLE 10000000 21702 4161898 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- SYS_AI_dp2t0j12zux49 CODE 1 SQL> select index_name, auto, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='PINK_FLOYD1'; INDEX_NAME AUT VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_96snkmu4sk44g YES VISIBLE ADVANCED LOW VALID 10000000 15400 9969473 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='PINK_FLOYD1'; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ --------------- --------------- SYS_AI_96snkmu4sk44g CODE 1
In both cases, Automatic Indexing only created an index on the CODE column, as it was the only column with an Equality predicate.
However, the Automatic Index on the table PINK_FLOYD remained in an INVISIBLE/UNUSABLE. That’s because an index on only the CODE column was not efficient enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relatively poor Clustering Factor.
The index on the table PINK_FLOYD1 was eventually created as a VISIBLE/VALID index, as its better filtering was sufficient to actually improve the performance of the SQL.
So if we re-run the first query:
SQL> select * from pink_floyd where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 1152280033 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 40 | 844 (11) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | PINK_FLOYD | 1 | 40 | 844 (11) | 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-ddhh24:mi:ss') AND "CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 63660 consistent gets 63649 physical reads 0 redo size 426 bytes sent via SQL*Net to client 380 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
It continues to use a Full Table Scan.
If we re-run the second query:
SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2703636439 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 41 | 415 (0) | 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 415 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_96snkmu4sk44g | 412 | | 3 (0) | 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 2 - access("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 406 consistent gets 0 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
If now uses the newly created Automatic Index, with an improved 406 Consistent Gets (down from the previous 64424 Consistent Gets with the FTS).
BUT if we were to manually create an index on BOTH CODE and CREATE_DATE columns:
SQL> create index pink_floyd1_code_create_date_i on pink_floyd1(code, create_date) compress advanced low; Index created. SQL> select * from pink_floyd1 where code=42 and create_date> '19-JAN-2021'; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3366491378 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 4 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | PINK_FLOYD1 | 1 | 41 | 4 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | PINK_FLOYD1_CODE_CREATE_DATE_I | 1 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42 AND "CREATE_DATE">TO_DATE(' 2021-01-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "CREATE_DATE" IS NOT NULL) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 426 bytes sent via SQL*Net to client 381 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
Performance improves significantly further, by reducing Consistent Gets down to just 3.
So if you have SQL statements with a mixture of both Equality and Non-Equality predicates, you may encounter these 2 scenarios:
A potentially efficient index that is not created at all as the filtering on just the Equality based predicates are not sufficient to create a viable index, or
A potentially suboptimal Automatic Index that doesn’t contain useful filtering columns because they’re used in Non-Equality predicates…
Oracle 19c Automatic Indexing: Non-Equality Predicates Part I (Lucy Can’t Dance) January 14, 2021
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, Exadata, Full Table Scans, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cloud, Oracle Indexes, Oracle19c.6 comments
I’ve been waiting a while before posting a series on the various limitations associated with Automatic Indexing, in order to see how the feature matures over time.
The following have all been re-tested post 1 January 2021 on the Autonomous ATP Database Cloud service, using Oracle Database version 19.5.0.0.0.
In the Oracle Documentation (including version 21c), the only limitations with regard Automatic Indexing listed are the following:
- Auto indexes are local B-tree indexes.
- Auto indexes can be created for partitioned as well as non-partitioned tables.
- Auto indexes cannot be created for temporary tables.
Well, as I discussed in the previous series on Automatic Indexing on Partitioned tables, we already saw how Oracle can actually also create Non-Partitioned (Global) indexes. So the limitation on Automatic Indexes being “local” indexes is not actually correct, even with 19c.
But are there other limitations that are not officially documented?
If you look at every example I’ve used previously with regard Automatic Indexing, they all feature Equality predicates. In the following examples, I’m going to run a series on Range Scan predicates that heavily filter and would benefit greatly from an index.
I first create a simple table with 10M rows:
SQL> create table ziggy1 (id number, code number, name varchar2(42)); Table created. SQL> insert into ziggy1 select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY1'); PL/SQL procedure successfully completed.
I then run the following range scan queries several times that each return only a few rows:
SQL> select * from ziggy1 where id between 42 and 50; 9 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 8 | 184 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 8 | 184 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID"<=50 AND "ID">=42) filter("ID"<=50 AND "ID">=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 596 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9 rows processed SQL> select * from ziggy1 where id < 0; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 1 | 23 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID"<0) filter("ID"<0) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 364 bytes sent via SQL*Net to client 344 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed SQL> select * from ziggy1 where id > 100000000000; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 1 | 23 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID">100000000000) filter("ID">100000000000) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 364 bytes sent via SQL*Net to client 355 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
If we look at the subsequent Automatic Indexing report:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 13-JAN-2021 11:55:37 Activity end : 13-JAN-2021 11:56:20 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 3 SQL statements improved : 0 SQL plan baselines created : 0 Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 -------------------------------------------------------------------------------
We notice NO Automatic Indexes were created.
We can run these queries endlessly and Automatic Indexing will never create associated Automatic Indexes:
SQL> select index_name, auto, constraint_index, visibility from user_indexes where table_name='ZIGGY1'; no rows selected
These queries are doomed to perform Full Table Scans unless indexes are manually created:
SQL> select * from ziggy1 where id between 42 and 50; 9 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 4062853157 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 8 | 184 | 538 (14) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL | ZIGGY1 | 8 | 184 | 538 (14) | 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("ID"<=50 AND "ID">=42) filter("ID"<=50 AND "ID">=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39436 consistent gets 39425 physical reads 0 redo size 596 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 9 rows processed
Currently Automatic Indexes do not support Non-Equality predicates. Automatic Indexes are only created based on Equality-based predicates.
Obviously, Automatic Indexing is a fabulous feature and this might all change in the future. But with Non-Equality predicates so prevalent in SQL, it’s vital to note this current limitation when using and relying on Automatic Indexing…