Migrated Rows In Oracle Data Warehouse Autonomous Databases (“Sit Down. Stand Up.”) March 14, 2023
Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Changing ROWID, Migrated Rows, Oracle, Oracle Cloud, Oracle Indexes, Oracle19c, ROWID.add a comment
In all my recent discussions on how Oracle can now update ROWIDs on the fly when a row migrates, I’ve mentioned how this only occurs on tables in which the ENABLE ROW MOVEMENT clause has been set.
So you have the option on whether you wish this new behaviour to occur by simply not setting ENABLE ROW MOVEMENT on tables where you want the previous behavior of the ROWIDs not changing when a row migrates (and for Oracle to simply have a pointer in the original table block to denote the new location of the row). You may not what ROWIDs to suddenly change on you for example if you have an application that explicitly stores ROWIDs and relies on them not changing for the application to correctly fetch data.
However, all my previous tests and examples have been run on Oracle Transaction Processing Autonomous Database environments, but as Phil Goldenberg mentioned in this comment, things unfortunately behave somewhat differently in Oracle Data Warehouse Autonomous Database environments.
To illustrate, a simple little demo as usual, but this time using an Oracle Data Warehouse Autonomous Database environment…
Let’s start by creating and populating a tightly packed table, but without setting the ENABLE ROW MOVEMENT clause:
SQL> create table bowie (id number, name varchar2(142)) pctfree 0; Table BOWIE created. SQL> insert into bowie select rownum, 'BOWIE' from dual connect by level <=10000; 10,000 rows inserted. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE'); PL/SQL procedure successfully completed.
Let’s now create an index based on the ID column:
SQL> create index bowie_id_i on bowie(id); Index BOWIE_ID_I created. SQL> select index_name, num_rows, blevel, leaf_blocks from user_indexes where table_name='BOWIE'; INDEX_NAME NUM_ROWS BLEVEL LEAF_BLOCKS _____________ ___________ _________ ______________ BOWIE_ID_I 10000 1 23
Let’s have a look at the ROWIDs of a few random rows:
SQL> select id, rowid from bowie where id in (42, 424, 4242) order by id; ID ROWID _______ _____________________ 42 AAASTdAAAAAAJq0AAp 424 AAASTdAAAAAAJq0AIH 4242 AAASTdAAAAAAJq1ADP
If we store these ROWIDs, we can use them to directly access a row of interest very efficiently:
SQL> select id from bowie where rowid='AAASTdAAAAAAJq0AAp'; ID _____ 42
Next, I’ll update the table, increasing the size of the rows such that I generate a bunch of migrated rows:
SQL> update bowie set name='THE RISE AND FALL OF ZIGGY STARDUST AND THE SPIDERS FROM MARS'; 10,000 rows updated. SQL> commit; Commit complete.
As discussed previously, if this were an Oracle Transaction Processing Autonomous Database environment, because I haven’t set ENABLE ROW MOVEMENT on this table, the ROWIDs of any migrated rows would NOT have changed.
But here in this Oracle Data Warehouse Database environment:
SQL> select id, rowid from bowie where id in (42, 424, 4242) order by id; ID ROWID _______ _____________________ 42 AAASTdAAAAAAJq3AAp 424 AAASTdAAAAAAJq3AIH 4242 AAASTdAAAAAAJrcAFd
We can see that all these rows now have a new ROWID.
If I now re-run my previous query that relied on the ROWIDs not changing:
SQL> select id from bowie where rowid='AAASTdAAAAAAJq0AAp'; no rows selected
We can see that the query no longer returns the required row.
And yet, if we ANALYZE the table:
SQL> analyze table bowie compute statistics; Table BOWIE analyzed. SQL> select table_name, num_rows, blocks, chain_cnt, row_movement from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS CHAIN_CNT ROW_MOVEMENT _____________ ___________ _________ ____________ _______________ BOWIE 10000 28 0 DISABLED
We can see that we have no migrated/chained rows listed, even though ENABLE ROW MOVEMENT is disabled.
I think this a little unfortunate, because I quite like the concept of ROWIDs being updated when a row migrates, but I also like the option of being able to revert to the previous behaviour if necessary.
I have no idea how this works in the other Oracle Autonomous Database environments (other than Transaction Processing), but regardless, this behaviour can potentially change on any of the environments at any time.
So, what’s the moral of the story? Well there’s a couple.
Firstly, in Oracle Autonomous Database environments, you’re meant to have a “hands-off” attitude and just let Oracle handle all this day to day stuff. So you can’t necessarily rely on the behaviour of the database to be as consistent as in environments where you control all the levers. Indexes might come and go, tables might suddenly get partitioned, ROWIDs might change when a row migrates, etc. etc. etc. etc.
And secondly, it’s becoming an even worse idea for applications to explicitly store and rely on ROWIDs not changing for such applications function properly. Especially, if you use Oracle Autonomous Database environments…
Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part III (“Dancing With The Big Boys”) March 9, 2023
Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Changing ROWID, Clustering Factor, Data Clustering, Full Table Scans, Index Access Path, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, ROWID.add a comment
In my previous post, I discussed how you can best reorg a table that has a significant number of migrated rows impact the Clustering Factor of important indexes, when such tables have the ENABLED ROW MOVEMENT disabled.
In this post I’ll discuss resolving similar issues, but when ROWIDs are updated on the fly when rows are migrated in Oracle Autonomous Databases.
As I discussed previously, by updating indexes with the new ROWIDs when rows migrate, such indexes can potentially increase in size as they store both old/new index entries concurrently AND due to the increased likelihood of associated index block splits. Additionally, such indexes can also have their Clustering Factor directly impacted when migrated rows disrupt the otherwise tight clustering of specific columns.
As such, we may want to address these issues to improve the performance of impacted queries. But it’s important we address these issues appropriately…
To illustrate all this, I’m going to re-run the same demo as my previous post, but on a table with ENABLE ROW MOVEMENT enabled.
I’ll start by creating and populating a tightly packed table with ENABLE ROW MOVEMENT enabled and with data inserted in ID column order:
SQL> create table bowie2(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0 ENABLE ROW MOVEMENT; Table BOWIE2 created. SQL> insert into bowie2 SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000; 200,000 rows inserted. SQL> commit; Commit complete.
I’ll now create an index on this well ordered/clustered ID column:
SQL> create index bowie2_id_i on bowie2(id); Index BOWIE2_ID_I created.
Next, I’ll update the table, increasing the size of the rows such that I generate a bunch of migrated rows:
SQL> update bowie2 set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS'; 200,000 rows updated. SQL> commit; Commit complete.
If we check the number of migrated rows:
SQL> analyze table bowie2 compute statistics; Table BOWIE2 analyzed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE2'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE2 200000 4654 82 367 169 0
We notice there are indeed 0 migrated rows. This is because in Oracle Autonomous Databases, the associated ROWIDs of migrated rows as updated on the fly in this scenario.
If we check the current Clustering Factor of the index:
SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE2'); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE2 200000 4654 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ______________ _________ ______________ ____________________ BOWIE2_ID_I 2 945 109061
We can see that although the data was initially inserted in ID column order, we now have a relatively poor Clustering Factor at 109061 as the migrated rows have disrupted this previously perfect clustering.
We also notice that the BLEVEL has increased from 1 to now be 2 and the number of Leaf Blocks has increased to 945 from 473 after the rows migrated (as I discussed previously).
If we now run a query that returns 4200 rows from a 200,000 row table:
SQL> select * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 1495904576 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.02 | 4572 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE2 | 1 | 4200 | 4200 |00:00:00.02 | 4572 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1)) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 4 CPU used by this session 4 CPU used when call started 4 DB time 37101 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2 buffer is not pinned count 325 bytes received via SQL*Net from client 461965 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 14 calls to kcmgcs 4572 consistent gets 4572 consistent gets from cache 4572 consistent gets pin 4572 consistent gets pin (fastpath) 2 execute count 37453824 logical read bytes from cache 4560 no work - consistent read gets 72 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 1 session cursor cache count 1 session cursor cache hits 4572 session logical reads 1 sorts (memory) 2024 sorts (rows) 4560 table scan blocks gotten 252948 table scan disk non-IMC rows gotten 252948 table scan rows gotten 1 table scans (short tables) 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ______________________________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 1495904576 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 1264 (100)| 4200 |00:00:00.02 | 4572 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE2 | 1 | 4200 | 684K| 1264 (1)| 4200 |00:00:00.02 | 4572 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1))
We can see that Oracle has decided to perform a Full Table Scan (FTS) and not use the index.
The Clustering Factor of the ID column is now so bad, that returning 4200 rows via such an index is just too expensive. The FTS is now deemed the cheaper option by the CBO.
We notice that the CBO cost of the FTS is 1264.
If we run a query that forces the use of the index:
SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________ SQL_ID bzm2vhchqpq7w, child number 0 ------------------------------------- select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 2665 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 4200 |00:00:00.01 | 2665 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 21 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Statistics ----------------------------------------------------------- 2 CPU used by this session 2 CPU used when call started 2 DB time 14531 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2646 buffer is not pinned count 5755 buffer is pinned count 348 bytes received via SQL*Net from client 462143 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 2665 consistent gets 2 consistent gets examination 2 consistent gets examination (fastpath) 2665 consistent gets from cache 2663 consistent gets pin 2663 consistent gets pin (fastpath) 2 execute count 1 index range scans 21831680 logical read bytes from cache 2663 no work - consistent read gets 73 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 3 process last non-idle time 2 session cursor cache count 2665 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________________________ SQL_ID bzm2vhchqpq7w, child number 0 ------------------------------------- select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 2314 (100)| 4200 |00:00:00.01 | 2665 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 684K| 2314 (1)| 4200 |00:00:00.01 | 2665 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | | 22 (0)| 4200 |00:00:00.01 | 21 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
The cost of the Index Range Scan plan has an overall cost of 2314, greater than the 1264 cost of the FTS plan.
Notice that the cost of using just the index within the plan is currently 22.
So the vast majority of the cost of this plan (2314 – 22 = 2292) is in Oracle having to access so many different table blocks due to the poor index Clustering Factor and NOT in the increased size of the index.
As I’ve discussed numerous times, you can potentially make an index smaller by rebuilding the index (if there’s free space within the index), but the impact on the Clustering Factor will be nothing but “disappointing”…
If we just rebuild the index:
SQL> alter index bowie2_id_i rebuild online; Index BOWIE2_ID_I altered.
And now look at the new index related statistics:
SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ______________ _________ ______________ ____________________ BOWIE2_ID_I 1 473 109061
We notice that the index has indeed decreased in size, back to what is was before the row migrated following the Update (Blevel=1 and Leaf Blocks=473).
But the Clustering Factor remains unchanged at 109061.
If we now re-run the query:
SQL> select * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 1495904576 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.02 | 4572 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE2 | 1 | 4200 | 4200 |00:00:00.02 | 4572 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1)) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 3 CPU used by this session 3 CPU used when call started 3 DB time 31738 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2 buffer is not pinned count 325 bytes received via SQL*Net from client 461972 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 14 calls to kcmgcs 4572 consistent gets 4572 consistent gets from cache 4572 consistent gets pin 4572 consistent gets pin (fastpath) 2 execute count 37453824 logical read bytes from cache 4560 no work - consistent read gets 73 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 3 process last non-idle time 2 session cursor cache count 4572 session logical reads 1 sorts (memory) 2024 sorts (rows) 4560 table scan blocks gotten 252948 table scan disk non-IMC rows gotten 252948 table scan rows gotten 1 table scans (short tables) 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ______________________________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 1495904576 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 1264 (100)| 4200 |00:00:00.02 | 4572 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE2 | 1 | 4200 | 684K| 1264 (1)| 4200 |00:00:00.02 | 4572 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1))
The CBO decides to still use a FTS instead of the index.
If we look at the cost now of using the index for this query:
SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________ SQL_ID bzm2vhchqpq7w, child number 0 ------------------------------------- select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 2655 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 4200 |00:00:00.01 | 2655 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 2 CPU used by this session 2 CPU used when call started 1 DB time 13484 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2646 buffer is not pinned count 5755 buffer is pinned count 347 bytes received via SQL*Net from client 461972 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 2655 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 2655 consistent gets from cache 2654 consistent gets pin 2654 consistent gets pin (fastpath) 2 execute count 1 index range scans 21749760 logical read bytes from cache 2654 no work - consistent read gets 73 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 1 session cursor cache count 1 session cursor cache hits 2655 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________________________ SQL_ID bzm2vhchqpq7w, child number 0 ------------------------------------- select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 2303 (100)| 4200 |00:00:00.01 | 2655 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 684K| 2303 (1)| 4200 |00:00:00.01 | 2655 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
We notice the cost of the index has only moderately gone down to 2303 (previously it was 2314).
This reduction of 11 in the CBO cost is due entirely to the fact the index is now approximately 1/2 the size as it was before the index rebuild and has thus reduced the cost of reading the index blocks to 11 within the execution plan (previously it was 22).
But the vast majority of the cost within the Index Range Scan plan comes again with accessing the table blocks, which remains unchanged due to the unchanged Clustering Factor.
To reduce the Clustering Factor, we need to change the clustering of the data with the TABLE.
So, to improve the performance of this potentially important query, we need to re-cluster the data just as we did in the example in my previous post when we had migrated rows listed and ROWIDs were not updated on the fly.
We can now add an appropriate Clustering Attribute before we perform the table reorg:
SQL> alter table bowie2 add clustering by linear order (id); Table BOWIE2 altered. SQL> alter table bowie2 move online; Table BOWIE2 altered.
If we now look at the Clustering Factor of this important index:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE2 200000 4936 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ______________ _________ ______________ ____________________ BOWIE2_ID_I 1 473 4850
The Clustering Factor has been reduced down to the almost perfect 4850, down from the previous 109061.
If we now re-run the query:
SQL> select * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 102 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 4200 |00:00:00.01 | 102 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 90 Cached Commit SCN referenced 11345 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 93 buffer is not pinned count 8308 buffer is pinned count 325 bytes received via SQL*Net from client 462117 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 102 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 102 consistent gets from cache 101 consistent gets pin 101 consistent gets pin (fastpath) 2 execute count 1 index range scans 835584 logical read bytes from cache 101 no work - consistent read gets 72 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 2 process last non-idle time 1 session cursor cache count 1 session cursor cache hits 102 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls
We can see the query now automatically uses the index and only requires just 102 consistent gets, down from 4572 when it performed the FTS.
If we look at the cost of this new plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 113 (100)| 4200 |00:00:00.01 | 102 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 684K| 113 (0)| 4200 |00:00:00.01 | 102 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
We can see the plan has a cost of just 113, which is both much more accurate and close to the 102 consistent gets and much less than the previous cost of 1340 for the FTS plan.
So in specific examples where migrated rows significantly impact the Clustering Factor of indexes important to our applications, including when ROWIDs are updated on the fly in Oracle Autonomous Databases, we may need to appropriately reorg such tables to repair the Clustering Factor of impacted indexes.
I’ve mentioned a number of times in this series how tables in Oracle Autonomous Databases with ENABLE ROW MOVEMENT have their ROWIDs updated on the fly when a row migrates. In my next post, I’ll discuss how even tables that don’t have the ENABLE ROW MOVEMENT clause set can still have their ROWIDs updated on the fly when a row migrates…
Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part II (“Dancing Out In Space”) March 7, 2023
Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Changing ROWID, Clustering Factor, Data Clustering, David Bowie, Full Table Scans, Index Access Path, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Richard's Musings, ROWID.1 comment so far
In my previous post, I discussed how the clustering of data can be impacted if rows migrate and how this in turn can have a detrimental impact on the efficiency of associated indexes.
In this post, I’ll discuss what you can do (and not do) to remedy things in the relatively unlikely event that you hit this issue with migrated rows.
I’ll just discuss initially the example where the table is defined without ENABLE ROW MOVEMENT enabled in the Transaction Processing Autonomous Database (and so does NOT update ROWIDs on the fly when a row migrates).
I’ll start by again creating and populating a tightly packed table, with the data inserted in ID column order:
SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0; Table BOWIE created. SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000; 200,000 rows inserted. SQL> commit; Commit complete.
I’ll now create an index on this well ordered/clustered ID column:
SQL> create index bowie_id_i on bowie(id); Index BOWIE_ID_I created.
Next, I’ll update the table, increasing the size of the rows such that I generate a bunch of migrated rows:
SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS'; 200,000 rows updated. SQL> commit; Commit complete.
If we check the number of migrated rows:
SQL> analyze table bowie compute statistics; Table BOWIE analyzed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 4906 86 414 170 56186
We notice there are indeed 56186 migrated rows.
If we check the current Clustering Factor of the index:
SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE'); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE 200000 4906 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 3250
We notice the index still has an excellent Clustering Factor of just 3250. As the ROWIDs are NOT updated in this example when rows migrate, the index retains the same Clustering Factor as before the Update statement.
If we run the following query that returns 4200 rows (as per my previous post):
SQL> select * from bowie where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 2771 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 4200 |00:00:00.01 | 2771 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Statistics ----------------------------------------------------------- 2 CPU used by this session 2 CPU used when call started 3 DB time 24901 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2762 buffer is not pinned count 7005 buffer is pinned count 324 bytes received via SQL*Net from client 461909 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 2771 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 2771 consistent gets from cache 2770 consistent gets pin 2770 consistent gets pin (fastpath) 2 execute count 1 index range scans 22700032 logical read bytes from cache 2770 no work - consistent read gets 73 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 1 session cursor cache count 1 session cursor cache hits 2771 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 1366 table fetch continued row 3 user calls
We can see the query currently uses 2771 consistent gets, which is significantly higher than it could be, as Oracle has to visit the original table block and then follow the pointer to the new location for any migrated row that needs to be retrieved.
However, if we look at the cost of the current plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 80 (100)| 4200 |00:00:00.01 | 2771 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 684K| 80 (0)| 4200 |00:00:00.01 | 2771 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | --------------------------------------------------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
We can see it only has a cost of 80, as Oracle does not consider the additional accesses required now for these migrated rows. With such a perfect Clustering Factor, this cost is not particularly accurate and does not represent the true cost of the 2771 consistent gets now required.
Now there are various ways we can look at fixing this issue with all these migrated rows requiring additional consistent gets to access.
One method is to capture all the ROWIDs of the migrated rows, copy these rows to a temporary holding table, delete these rows and then re-insert them all back into the table from the temporary table.
We can identify the migrated rows by creating the CHAIN_ROWS table as per the Oracle supplied UTLCHAIN.SQL script and then use the ANALYZE command to store their ROWIDs in this CHAIN_ROWS table:
SQL> create table CHAINED_ROWS ( 2 owner_name varchar2(128), 3 table_name varchar2(128), 4 cluster_name varchar2(128), 5 partition_name varchar2(128), 6 subpartition_name varchar2(128), 7 head_rowid rowid, 8 analyze_timestamp date 9* ); Table CHAINED_ROWS created. SQL> analyze table bowie list chained rows; Table BOWIE analyzed. SQL> select table_name, head_rowid from chained_rows where table_name='BOWIE' and rownum<=10; TABLE_NAME HEAD_ROWID _____________ _____________________ BOWIE AAAqFjAAAAAE6CzAAP BOWIE AAAqFjAAAAAE6CzAAR BOWIE AAAqFjAAAAAE6CzAAU BOWIE AAAqFjAAAAAE6CzAAW BOWIE AAAqFjAAAAAE6CzAAZ BOWIE AAAqFjAAAAAE6CzAAb BOWIE AAAqFjAAAAAE6CzAAe BOWIE AAAqFjAAAAAE6CzAAg BOWIE AAAqFjAAAAAE6CzAAj BOWIE AAAqFjAAAAAE6CzAAl
Another method we can now utilise is to simply MOVE ONLINE the table:
SQL> alter table bowie move online; Table BOWIE altered.
If we now look at the number of migrated rows after the table reorg:
SQL> analyze table bowie compute statistics; Table BOWIE analyzed. SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE AVG_ROW_LEN CHAIN_CNT _____________ ___________ _________ _______________ ____________ ______________ ____________ BOWIE 200000 4936 56 838 169 0
We can see we no longer have any migrated rows.
BUT, if we now look at the Clustering Factor of this index:
SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE'); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE 200000 4936 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 114560
We can see it has now significantly increased to 114560 (previously it was just 3250).
The problem of course is that if the ROWIDs now represent the correct new physical location of the migrated rows, the previously perfect clustering/ordering of the ID column has been impacted.
If we now re-run the query returning the 4200 rows:
SQL> select * from bowie where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1845943507 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.02 | 4857 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE | 1 | 4200 | 4200 |00:00:00.02 | 4857 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1)) Statistics ----------------------------------------------------------- 3 CPU used by this session 3 CPU used when call started 4849 Cached Commit SCN referenced 2 DB time 25870 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2 buffer is not pinned count 324 bytes received via SQL*Net from client 461962 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 9 calls to kcmgcs 4857 consistent gets 4857 consistent gets from cache 4857 consistent gets pin 4857 consistent gets pin (fastpath) 2 execute count 39788544 logical read bytes from cache 4850 no work - consistent read gets 72 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 2 process last non-idle time 1 session cursor cache count 4857 session logical reads 1 sorts (memory) 2024 sorts (rows) 4850 table scan blocks gotten 200000 table scan disk non-IMC rows gotten 200000 table scan rows gotten 1 table scans (short tables) 3 user calls
Oracle is now performing a Full Table Scan (FTS). The number of consistent gets now at 4857 is actually worse than when we had the migrated rows (previously at 2771)
The Clustering Factor of the ID column is now so bad, that returning 4200 rows via such an index is just too expensive. The FTS is now deemed the cheaper option by the CBO.
If we look at the CBO cost of using this FTS plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1845943507 ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | | 1340 (100)| 4200 |00:00:00.02 | 4857 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE | 1 | 4200 | 684K| 1340 (1)| 4200 |00:00:00.02 | 4857 | ------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1))
We can see the cost of this plan is 1340.
If we compare this with the cost of using the (now deemed) inefficient index:
SQL> select /*+ index (bowie) */ * from bowie where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID 9215hkzd3v1up, child number 0 ------------------------------------- select /*+ index (bowie) */ * from bowie where id between 1 and 4200 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 2784 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 4200 |00:00:00.01 | 2784 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Statistics ----------------------------------------------------------- 2 CPU used by this session 2 CPU used when call started 2741 Cached Commit SCN referenced 2 DB time 12633 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2775 buffer is not pinned count 5626 buffer is pinned count 345 bytes received via SQL*Net from client 462170 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 2784 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 2784 consistent gets from cache 2783 consistent gets pin 2783 consistent gets pin (fastpath) 2 execute count 1 index range scans 22806528 logical read bytes from cache 2783 no work - consistent read gets 72 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 4 process last non-idle time 1 session cursor cache count 1 session cursor cache hits 2784 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'9215hkzd3v1up',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID 9215hkzd3v1up, child number 0 ------------------------------------- select /*+ index (bowie) */ * from bowie where id between 1 and 4200 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 2418 (100)| 4200 |00:00:00.01 | 2784 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 684K| 2418 (1)| 4200 |00:00:00.01 | 2784 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
We can see the CBO cost of the index is now 2418, more than the 1340 cost of using the FTS.
So in the scenario where by migrating a significant number of rows, we impact the Clustering Factor and so the efficiency of vital indexes in our applications, we need to eliminate the migrated rows in a more thoughtful manner.
An option we have available is to first add an appropriate Clustering Attribute before we perform the table reorg:
SQL> alter table bowie add clustering by linear order (id); Table BOWIE altered. SQL> alter table bowie move online; Table BOWIE altered.
If we now look at the Clustering Factor of this important index:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE 200000 4936 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 4850
The Clustering Factor has been reduced down to the almost perfect 4850, down from the previous 114560.
If we now re-run the query:
SQL> select * from bowie where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 102 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 4200 |00:00:00.01 | 102 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 89 Cached Commit SCN referenced 1 DB time 11249 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 93 buffer is not pinned count 8308 buffer is pinned count 324 bytes received via SQL*Net from client 462165 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 102 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 102 consistent gets from cache 101 consistent gets pin 101 consistent gets pin (fastpath) 2 execute count 1 index range scans 835584 logical read bytes from cache 101 no work - consistent read gets 72 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 1 session cursor cache count 1 session cursor cache hits 102 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls
We can see the query now automatically uses the index and only requires just 102 consistent gets (down from 4857 when it performed the FTS and down from 2771 when we had the migrated rows).
If we look at the cost of this new plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 113 (100)| 4200 |00:00:00.01 | 102 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 684K| 113 (0)| 4200 |00:00:00.01 | 102 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
We can see the plan has a cost of just 113, which is both much more accurate and close to the 102 consistent gets and much less than the previous cost of 1340 for the FTS plan.
So in specific scenarios where by having migrated rows we significantly impact the Clustering Factor of indexes important to our applications, we have to be a little cleverer in how we address the migrated rows.
This can also the case in the new scenario where Oracle automatically updates the ROWIDs of migrated rows, as I’ll discuss in my next post…
Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part I (“Growin’ Up”) March 1, 2023
Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, BLEVEL, CBO, Changing ROWID, Clustering Factor, Data Clustering, Hints, Index Access Path, Index Block Splits, Index Delete Operations, Index Height, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle Statistics, Oracle19c, Performance Tuning, Richard Foote Training, Richard's Blog, ROWID.2 comments
In my previous post I discussed how an index can potentially be somewhat inflated in size after ROWIDs are updated on the fly after a substantial number of rows are migrated.
However, there’s another key “factor” of an index that in some scenarios can be impacted by this new ROWID behaviour with regard migrated rows.
To highlight this scenario, I’ll again start by creating and populating a table with ENABLE ROW MOVEMENT disabled:
SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0; Table BOWIE created. SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000; 200,000 rows inserted. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed.
I’ll next create an index on the ID column. The important aspect with the ID column is that the data is entered monotonically in ID column order, so the associated index will have an excellent (very low) Clustering Factor:
SQL> create index bowie_id_i on bowie(id); Index BOWIE_ID_I created.
If we look at some key statistics of the table and index:
SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE 200000 3268 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 3250
We can see that the number of table blocks is 3268, the number of index leaf blocks is 473 and we indeed have a near perfect Clustering Factor of 3250.
If we run a couple of queries:
SQL> select * from bowie where id between 1 and 1000; 1,000 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID gz5u92hmjwz1h, child number 0 ------------------------------------- select * from bowie where id between 1 and 1000 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 18 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 1000 |00:00:00.01 | 18 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | 1000 |00:00:00.01 | 4 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 7353 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 16 buffer is not pinned count 1985 buffer is pinned count 324 bytes received via SQL*Net from client 171305 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 18 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 18 consistent gets from cache 17 consistent gets pin 17 consistent gets pin (fastpath) 2 execute count 1 index range scans 147456 logical read bytes from cache 17 no work - consistent read gets 38 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 18 session logical reads 1 sorts (memory) 2024 sorts (rows) 1000 table fetch by rowid 3 user calls
We can see for this first query that returns 1000 rows, it requires just 18 consistent gets, thanks primarily due to the efficient index with the perfect Clustering Factor.
If we look at the cost of this plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gz5u92hmjwz1h',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID gz5u92hmjwz1h, child number 0 ------------------------------------- select * from bowie where id between 1 and 1000 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 21 (100)| 1000 |00:00:00.01 | 18 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 108K| 21 (0)| 1000 |00:00:00.01 | 18 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | | 4 (0)| 1000 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000)
We can see the plan has an accurate cost of just 21.
If we now run a similar query that returns a few more rows:
SQL> select * from bowie where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 68 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 4200 |00:00:00.01 | 68 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 1 DB time 11353 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 59 buffer is not pinned count 8342 buffer is pinned count 324 bytes received via SQL*Net from client 461834 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 68 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 68 consistent gets from cache 67 consistent gets pin 67 consistent gets pin (fastpath) 2 execute count 1 index range scans 557056 logical read bytes from cache 67 no work - consistent read gets 73 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 68 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls
We can see that it only required just 68 consistent gets to return 4200 rows, thanks to the excellent data clustering and associated very low Clustering Factor.
If we look at the cost of this plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 80 (100)| 4200 |00:00:00.01 | 68 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 455K| 80 (0)| 4200 |00:00:00.01 | 68 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
We can see the cost of the plan is currently a relatively accurate 80.
OK, let’s now perform an update on this table that generates a bunch of migrated rows:
SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS'; 200,000 rows updated. SQL> commit; Commit complete.
If we now look at the table and index statistics:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE 200000 4906
We can see that the table blocks value has increased to 4906 (previously 3268). This as explained previously is to due in large part to the increased NAME column values and also due to the pointers in the original table blocks that point to the new locations of the migrated rows.
This relates to approximately a 50% increase in table blocks.
If we look at the current index statistics:
SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR _____________ _________ ______________ ____________________ BOWIE_ID_I 1 473 3250
We can see that these values are all unchanged, as the ROWIDs in indexes remain unchanged when a row migrates, when ENABLE ROW MOVEMENT is not set.
Therefore, when we re-run these same queries:
SQL> select * from bowie where id between 1 and 1000; 1,000 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID gz5u92hmjwz1h, child number 0 ------------------------------------- select * from bowie where id between 1 and 1000 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 1000 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | 1000 |00:00:00.01 | 4 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 1 DB time 7967 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 664 buffer is not pinned count 1664 buffer is pinned count 324 bytes received via SQL*Net from client 171419 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 666 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 666 consistent gets from cache 665 consistent gets pin 665 consistent gets pin (fastpath) 2 execute count 1 index range scans 5455872 logical read bytes from cache 665 no work - consistent read gets 37 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 666 session logical reads 1 sorts (memory) 2024 sorts (rows) 1000 table fetch by rowid 327 table fetch continued row 3 user calls
The number of consistent gets has increased significantly to 666 (previously it was just 18).
Now we can attributed an increase of approximately 50% of the previous consistent gets (18 x 0.50 = 9) due to the 50% increase in table blocks required now to store the rows due to the increased row size.
We can also attribute an additional 327 consistent gets for the table fetch continued row value listed in the statistics, representing the extra consistent gets required to access the migrated rows from their new physical location.
But 18 + 9 + 327 = 354 still leaves us short of the new 666 consistent gets value.
The problem with having to visit another table block to get a row from its new location is that it means Oracle has to re-access again the original table block to get the next row (rather than reading multiple rows with the same consistent get).
So it’s actually approximately 2 x table fetch continued row, by which the number of consistent gets is going to increase when accessing migrated rows (noting that the last migrated row in a block will only incur a additional consistent get as the next table block accessed will differ regardless).
If we look at the new CBO cost for this plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gz5u92hmjwz1h',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ____________________________________________________________________________________________________________________________________ SQL_ID gz5u92hmjwz1h, child number 0 ------------------------------------- select * from bowie where id between 1 and 1000 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 21 (100)| 1000 |00:00:00.01 | 666 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 1000 | 163K| 21 (0)| 1000 |00:00:00.01 | 666 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 1000 | | 4 (0)| 1000 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000)
We notice the CBO cost for this plan remains unchanged at 21.
This is totally to be expected, as the index statistics by which the cost of an index scan is calculated are unchanged.
Considering the rough “rule of thumb” is that the CBO cost of an index scan should be in the ball-park of the number of possible IOs, the fact the plan now uses 666 consistent gets highlights this cost of just 21 is no longer as accurate…
If we look at the second SQL that returns 4200 rows:
SQL> select * from bowie where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _______________________________________________________________________________________________________________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 2771 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 4200 |00:00:00.01 | 2771 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 2 CPU used by this session 2 CPU used when call started 2 DB time 14103 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2762 buffer is not pinned count 7005 buffer is pinned count 324 bytes received via SQL*Net from client 461947 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 2771 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 2771 consistent gets from cache 2770 consistent gets pin 2770 consistent gets pin (fastpath) 2 execute count 1 index range scans 22700032 logical read bytes from cache 2770 no work - consistent read gets 72 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 2771 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 1366 table fetch continued row 3 user calls
We again notice consistent gets has increased significantly to 2771 (previously it was just 68). Again, these additional consistent gets can not be attributed to the extra size of the table and the additional approximate 2 x 1366 table fetch continued row gets.
If we now look at the cost of this plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________________ ____________ SQL_ID c376kdhy5b0x9, child number 0 ------------------------------------- select * from bowie where id between 1 and 4200 Plan hash value: 1405654398 --------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | --------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 80 (100)| 4200 |00:00:00.01 | 2771 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 1 | 4200 | 684K| 80 (0)| 4200 |00:00:00.01 | 2771 | |* 2 | INDEX RANGE SCAN | BOWIE_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | --------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
We again notice the CBO cost for this plan remains unchanged at 80, again totally expected as the underlying index statistics have remain unchanged after the update statement.
But again, not necessary as accurate a cost as it was previously…
If we repeat this demo, but this time on a table with ENABLE ROW MOVEMENT enabled:
SQL> create table bowie2(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0 ENABLE ROW MOVEMENT; Table BOWIE2 created. SQL> insert into bowie2 SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000; 200,000 rows inserted. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed. SQL> create index bowie2_id_i on bowie2(id); Index BOWIE2_ID_I created. SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE2 200000 3268 SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR __________________ _________ ______________ ____________________ BOWIE2_ID_I 1 473 3250
The table and index statistics are currently identical to the previous demo.
If we run the same two equivalent queries:
SQL> select * from bowie2 where id between 1 and 1000; 1,000 rows selected. PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________ SQL_ID gtkw2704bxj7q, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 1000 Plan hash value: 3243780227 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 18 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 1000 | 1000 |00:00:00.01 | 18 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 1000 | 1000 |00:00:00.01 | 4 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 7909 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 16 buffer is not pinned count 1985 buffer is pinned count 325 bytes received via SQL*Net from client 171306 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 18 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 18 consistent gets from cache 17 consistent gets pin 17 consistent gets pin (fastpath) 2 execute count 1 index range scans 147456 logical read bytes from cache 17 no work - consistent read gets 37 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 18 session logical reads 1 sorts (memory) 2024 sorts (rows) 1000 table fetch by rowid 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gtkw2704bxj7q',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________________________ SQL_ID gtkw2704bxj7q, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 1000 Plan hash value: 3243780227 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 21 (100)| 1000 |00:00:00.01 | 18 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 1000 | 108K| 21 (0)| 1000 |00:00:00.01 | 18 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 1000 | | 4 (0)| 1000 |00:00:00.01 | 4 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000) SQL> select * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 68 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 4200 |00:00:00.01 | 68 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 11 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 2 DB time 13157 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 59 buffer is not pinned count 8342 buffer is pinned count 325 bytes received via SQL*Net from client 461838 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 68 consistent gets 1 consistent gets examination 1 consistent gets examination (fastpath) 68 consistent gets from cache 67 consistent gets pin 67 consistent gets pin (fastpath) 2 execute count 1 index range scans 557056 logical read bytes from cache 67 no work - consistent read gets 73 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 68 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 80 (100)| 4200 |00:00:00.01 | 68 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 455K| 80 (0)| 4200 |00:00:00.01 | 68 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | | 11 (0)| 4200 |00:00:00.01 | 11 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
With identical table/index statistics, we notice as expected that both SQLs have the same consistent gets and CBO costs as with the previous demo.
If we now repeat the equivalent Update statement:
SQL> update bowie2 set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS'; 200,000 rows updated. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false); PL/SQL procedure successfully completed.
If we look at the table statistics:
SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2'; TABLE_NAME NUM_ROWS BLOCKS _____________ ___________ _________ BOWIE2 200000 4654
We notice the number of table blocks has increased to 4654 due to the increased row lengths, but not as much as with the previous demo (where table blocks increased to 4906) as in this scenario, Oracle does not have to store the row location pointers in the original blocks for the migrated rows.
If we look at the index statistics:
SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ______________ _________ ______________ ____________________ BOWIE2_ID_I 2 945 109061
We notice that these are substantially different from the first demo, where ROWIDs for migrated rows are not updated on the fly.
By now updating the ROWIDs, the indexes can possibly increase in size as they have to store both the previous and new ROWIDs in separate index entries and hence Oracle is more likely to perform additional index block splits (as I discussed in my previous post).
The LEAF_BLOCKS are now 945 (previously 473) and even the BLEVEL has increased from 1 to 2.
Additionally, and perhaps importantly for specific key indexes, the Clustering Factor value of indexes can also be impacted. By migrating rows and physically storing them in different locations, this can potentially detrimentally impact the tight clustering of rows based on specific column values.
The Clustering Factor for the index on the monotonically increased ID column has now increased significantly to 109061, up from the previously perfect 3250.
So columns that have naturally good clustering (e.g.: monotonically increasing values such as IDs and dates) or have been manually well clustered for performance purposes, can have the Clustering Factor of associated indexes detrimentally impacted by migrated rows.
If we re-run the first query:
SQL> select * from bowie2 where id between 1 and 1000; 1,000 rows selected. PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________ SQL_ID gtkw2704bxj7q, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 1000 Plan hash value: 3243780227 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 639 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 1000 | 1000 |00:00:00.01 | 639 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 1000 | 1000 |00:00:00.01 | 7 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 1 CPU used by this session 1 CPU used when call started 1 DB time 15262 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 634 buffer is not pinned count 1367 buffer is pinned count 325 bytes received via SQL*Net from client 171421 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 639 consistent gets 2 consistent gets examination 2 consistent gets examination (fastpath) 639 consistent gets from cache 637 consistent gets pin 637 consistent gets pin (fastpath) 2 execute count 1 index range scans 5234688 logical read bytes from cache 637 no work - consistent read gets 38 non-idle wait count 1 non-idle wait time 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 1 process last non-idle time 2 session cursor cache count 639 session logical reads 1 sorts (memory) 2024 sorts (rows) 1000 table fetch by rowid 3 user calls
I discussed in a previous post how by updating the ROWIDs of migrated rows we can improve performance, as Oracle can go directly to the correct new physical location of a migrated row.
But for some specific indexes, where data clustering is crucial, and we have a significant number migrated rows, this might not necessarily be the case.
We can see consistent gets here has increased to 639 (previously is was just 21), and so not hugely different from the 666 consistent gets required to fetch the migrated rows when the ROWIDs were not updated in the first demo.
If we look at the CBO costings:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gtkw2704bxj7q',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________________________ SQL_ID gtkw2704bxj7q, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 1000 Plan hash value: 3243780227 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 553 (100)| 1000 |00:00:00.01 | 639 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 1000 | 163K| 553 (0)| 1000 |00:00:00.01 | 639 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 1000 | | 7 (0)| 1000 |00:00:00.01 | 7 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=1000)
We can see the CBO cost has increased significantly to 553 (previously it was just 21).
With a much increased Clustering Factor, this will obviously impact the CBO costs of associated index scans.
In very extreme cases, these possible changes in the Clustering Factor can even impact the viability of using the index.
If we re-run the second query returning the 4200 rows:
SQL> select * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 1495904576 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.02 | 4572 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE2 | 1 | 4200 | 4200 |00:00:00.02 | 4572 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1))
We can see that the CBO has now chosen to perform a Full Table Scan (FTS), rather than use the now less efficient index to return this number of rows.
If we look at the CBO costings of this FTS plan:
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT ______________________________________________________________________________________________________________________ SQL_ID 25qktyn35b662, child number 0 ------------------------------------- select * from bowie2 where id between 1 and 4200 Plan hash value: 1495904576 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 1264 (100)| 4200 |00:00:00.02 | 4572 | |* 1 | TABLE ACCESS STORAGE FULL | BOWIE2 | 1 | 4200 | 684K| 1264 (1)| 4200 |00:00:00.02 | 4572 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage(("ID"<=4200 AND "ID">=1)) filter(("ID"<=4200 AND "ID">=1))
The cost of the FTS plan is 1264.
If we compare this is a plan that used the index:
SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200; 4,200 rows selected. PLAN_TABLE_OUTPUT ________________________________________________________________________________________________________________ SQL_ID bzm2vhchqpq7w, child number 0 ------------------------------------- select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 4200 |00:00:00.01 | 2665 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 4200 |00:00:00.01 | 2665 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | 4200 |00:00:00.01 | 21 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ----------------------------------------------------------- 2 CPU used by this session 2 CPU used when call started 2 DB time 14531 RM usage 3 Requests to/from client 2 SQL*Net roundtrips to/from client 2646 buffer is not pinned count 5755 buffer is pinned count 348 bytes received via SQL*Net from client 462143 bytes sent via SQL*Net to client 2 calls to get snapshot scn: kcmgss 2 calls to kcmgcs 2665 consistent gets 2 consistent gets examination 2 consistent gets examination (fastpath) 2665 consistent gets from cache 2663 consistent gets pin 2663 consistent gets pin (fastpath) 2 execute count 1 index range scans 21831680 logical read bytes from cache 2663 no work - consistent read gets 73 non-idle wait count 2 opened cursors cumulative 1 opened cursors current 2 parse count (total) 3 process last non-idle time 2 session cursor cache count 2665 session logical reads 1 sorts (memory) 2024 sorts (rows) 4200 table fetch by rowid 3 user calls SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes')); PLAN_TABLE_OUTPUT _____________________________________________________________________________________________________________________________________ SQL_ID bzm2vhchqpq7w, child number 0 ------------------------------------- select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200 Plan hash value: 3243780227 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 2314 (100)| 4200 |00:00:00.01 | 2665 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2 | 1 | 4200 | 684K| 2314 (1)| 4200 |00:00:00.01 | 2665 | |* 2 | INDEX RANGE SCAN | BOWIE2_ID_I | 1 | 4200 | | 22 (0)| 4200 |00:00:00.01 | 21 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID">=1 AND "ID"<=4200)
The cost of using the index to retrieve the 4200 rows is 2310, more than the 1264 of the FTS.
For the vast majority of indexes, updating the ROWIDs for migrated rows will result in better performance, as such indexes will be able to directly access the correct new physical location of migrated rows, rather than having to visit the original table block and then follow the stored pointer to the new table block.
But for some very specific indexes, where data clustering is crucial, AND we have a significant number migrated rows, this might not necessarily be the case. The performance benefit might be minimal at best.
That’s more than enough for one post 🙂
In my next post, I’ll discuss how to potentially remedy these performance implications, both for tables with or without ENABLE TABLE MOVEMENT enabled…
When Does A ROWID Change? Part IV (“Mass Production”) December 21, 2022
Posted by Richard Foote in Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Changing ROWID, Clustering Factor, Data Clustering, Flashback, Move Partitions, Oracle, Oracle Blog, Oracle Cloud, Oracle General, Oracle Indexes, Partitioning, Richard's Blog, ROWID.1 comment so far
In Part II in this series, I discussed how the update of the partitioned key column of a row that results in the row being moved to a different partition, will result in the ROWID of such rows changing.
However, there a quite a number of other user initiated actions in which ROWIDs can easily change (as indeed discussed in Connor McDonald’s video on this subject).
Some of these include:
- Moving a table or partition, as this results in the segment being reorganised, with all associated rows being physically relocated and their associated ROWIDs changing
- Altering a non-partitioned table such that it be now be partitioned, which again results in the physical relocation of all rows and their ROWIDs changing (which BTW, can potentially occur on a Autonomous Database without any user intervention)
- Altering the partitioning strategy of a partitioned table, again changes the physical location of all rows
- Hybrid Columnar Compression (HCC), which by packing rows more tightly, can more likely result in the physical relocation of a row during subsequent DML statements
- Altering a table to Shrink Space, which attempts to move rows between table blocks to pack rows more tightly, again potentially resulting in rows physically moving and the changing of their associated ROWIDs
- Flashback of a table, which results in rows being deleted and inserted and hence the change of their associated ROWIDs
I’ll illustrate an example of all this, with one of the key reasons why you may want to re-organise a table (and implicitly change all the ROWIDs of a table).
I’ll start by creating and populating a simple little table, with a CODE column that has very poorly clustered data:
SQL> create table bowie (id number, code number, name varchar2(42)); Table created. SQL> insert into bowie select rownum, mod(rownum, 500), 'DAVID BOWIE' from dual connect by level <=1000000; 1000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE'); PL/SQL procedure successfully completed.
Let’s now create an index on this CODE column:
SQL> create index bowie_code_i on bowie(code); Index created.
We take note of the ROWIDs of a few random rows:
SQL> select id, rowid from bowie where id in (42, 4242, 424242) order by id; ID ROWID ---------- ------------------ 42 AAASn1AAMAAAgB2AAp 4242 AAASn1AAMAAAgCHACL 424242 AAASn1AAMAAAgbtAAJ
If we run a simple query with a predicate based on the CODE column:
SQL> select * from bowie where code=42; 2000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1845943507 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2000 | 42000 | 1004 (2)| 00:00:01 | | * 1 | TABLE ACCESS FULL | BOWIE | 2000 | 42000 | 1004 (2)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3596 consistent gets 0 physical reads 0 redo size 20757 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2000 rows processed
We notice the CBO has chosen to ignore the index and use a FTS instead, even though only 2000 rows in a 1M row table (just 0.2%) are returned.
Why?
Because the clustering of the CODE data is terrible, with the required values littered throughout the table. If we look at the Clustering Factor of the index:
SQL> select index_name, leaf_blocks, clustering_factor from user_indexes where index_name='BOWIE_CODE_I'; INDEX_NAME LEAF_BLOCKS CLUSTERING_FACTOR ------------ ----------- ----------------- BOWIE_CODE_I 2063 1000000
We notice the index has the worst possible Clustering Factor value of 1000000.
So to improve the performance of this (say critical) query, we can add a Clustering Attribute to this table based on the CODE column and then reorganise the table:
SQL> alter table bowie add clustering by linear order (code); Table altered. SQL> alter table bowie move online; Table altered.
If we now look at the Clustering Factor of the index:
SQL> select index_name, leaf_blocks, clustering_factor from user_indexes where index_name='BOWIE_CODE_I'; INDEX_NAME LEAF_BLOCKS CLUSTERING_FACTOR ------------ ----------- ----------------- BOWIE_CODE_I 2063 3568
We can see it has substantially improved, down to just 3568 from the previous 1000000 value, as the data is now perfectly clustered based on the CODE column.
If we now re-run the query:
SQL> select * from bowie where code=42; 2000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 853003755 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2000 | 42000 | 15 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE | 2000 | 42000 | 15 (0)| 00:00:01 | | * 2 | INDEX RANGE SCAN | BOWIE_CODE_I | 2000 | | 7 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 17 consistent gets 0 physical reads 0 redo size 50735 bytes sent via SQL*Net to client 52 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2000 rows processed
The CBO now choses to use the index and the query is much more efficient as a result (consistent gets down to just 17 from the previous 3596).
So all is now much better, except for any application that was reliant on using ROWIDs to fetch the data, as all ROWIDs have now changed:
SQL> select id, rowid from bowie where id in (42, 4242, 424242) order by id; ID ROWID ---------- ------------------ 42 AAASn6AAMAAAACvAEf 4242 AAASn6AAMAAAiRaAA4 424242 AAASn6AAMAAAiRWAEQ
So there are many ways in which the ROWID of a row can potentially change.
And now there’s another key manner in which a ROWID can very easily change in Oracle Autonomous Database environments, as I’ll next discuss…
METHOD_OPT Default In Oracle Autonomous Databases (She’ll Drive The Big Car) March 2, 2021
Posted by Richard Foote in 19c, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Histograms, METHOD_OPT, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Statistics.1 comment so far
In a recent post on Invisible Automatic Indexes, I was puzzled by a couple of “oddities” in relation to some behaviour in the Oracle Autonomous Database Cloud environments.
The first one was how Oracle appeared to be creating Histograms on a much more regular basis than it had previously.
As one can see in the demo below, if I create and populate a table:
SQL> create table pink_floyd (id number, code number, create_date date, name varchar2(42)); Table created. SQL> insert into pink_floyd select rownum, ceil(dbms_random.value(0, 5000)), sysdate-mod(rownum, 50000)+1, 'Dark Side of the Moon' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete.
And then collect statistics using the “default” options:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'PINK_FLOYD'); PL/SQL procedure successfully completed.
All the columns in the table now have histograms, regardless of whether they’ve been used in SQL predicates or if they have data skew:
SQL> select column_name, num_distinct, density, histogram from dba_tab_columns where table_name='PINK_FLOYD'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM -------------------- ------------ ---------- --------------- ID 9705425 0 HYBRID CODE 4835 .00005 HYBRID CREATE_DATE 50357 .00002 HYBRID NAME 1 4.9639E-08 FREQUENCY
The explanation for this is embarrassingly simple. A quick check on the default settings for METHOD_OPT shows the following:
SQL> select dbms_stats.get_prefs('METHOD_OPT') from dual; DBMS_STATS.GET_PREFS('METHOD_OPT') -------------------------------------------------------------------------------- FOR ALL COLUMNS SIZE 254
The default is FOR ALL COLUMNS 254, meaning that we will now indeed have histograms collected on all columns. With new capabilities such as High Frequency Statistics Collection, it’s interesting that Oracle has taken this approach but Oracle has obviously taken the attitude that with Exadata as the hosted infrastructure, it can afford to simply collect histograms globally on all columns in the Autonomous Database environments.
If you wanted to change this, you can do so by for example:
SQL> exec DBMS_STATS.SET_GLOBAL_PREFS ('METHOD_OPT', 'FOR ALL COLUMNS SIZE AUTO'); PL/SQL procedure successfully completed. SQL> select dbms_stats.get_prefs('METHOD_OPT') from dual; DBMS_STATS.GET_PREFS('METHOD_OPT') -------------------------------------------------------------------------------- FOR ALL COLUMNS SIZE AUTO
So not an “oddity”, but expected behaviour now on Oracle Autonomous Databases.
The other “oddity” I noticed were Invisible Valid Automatic indexes at times being created. The explanation for this will be the topic of my next blog post…
Oracle 19c Automatic Indexing: Indexing Partitioned Tables Part I (Conversation Piece) October 14, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Index Access Path, Local Indexes, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Partitioned Indexes, Partitioning, Performance Tuning.2 comments
In this little series, I’m going to discuss how Automatic Indexing works in relation to Partitioning.
I’ve discussed Indexing and Partitioning many times previously and how Oracle has various options when indexing a partitioned table:
- Non-Partitioned Index
- Globally Partitioned Index
- Locally Partitioned Index
So the question(s) are how does Automatic Indexing handle scenarios with partitioned objects?
A very important point to make at the start is that based on my research, the answer has already changed significantly since Automatic Indexing was first released. So it’s important to understand that Automatic Indexing is an ever evolving capability, that will advance and improve as time goes on.
I’ll focus on how the feature currently works (as of Oracle Database 19.5), but will mention previously identified behaviour as a reference on how things can easily change.
In my first simple little example, I’m just going to create a range-partitioned table, partitioned based on RELEASE_DATE, with a partition for each year’s worth of data:
SQL> CREATE TABLE big_bowie1(id number, album_id number, country_id number, release_date date, total_sales number) PARTITION BY RANGE (release_date) (PARTITION ALBUMS_2013 VALUES LESS THAN (TO_DATE('01-JAN-2014', 'DD-MON-YYYY')), PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')), PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')), PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')), PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')), PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')), PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')), PARTITION ALBUMS_2020 VALUES LESS THAN (MAXVALUE)); Table created.
I’ll now add about 8 years worth of data:
SQL> INSERT INTO big_bowie1 SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800), ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000; 10000000 rows created. SQL> COMMIT; Commit complete.
As discussed previously, I’ll importantly collect statistics:
SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE1'); PL/SQL procedure successfully completed.
I’ll now run the following very selective query based the TOTAL_SALES column that is NOT part of the partitioning key:
SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42; 19 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2468051548 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 520 | 643 (15)| 00:00:01 | | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 643 (15)| 00:00:01 | 1 | 8 | |* 2 | TABLE ACCESS STORAGE FULL| BIG_BOWIE1 | 20 | 520 | 643 (15)| 00:00:01 | 1 | 8 | --------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - storage("TOTAL_SALES"=42) filter("TOTAL_SALES"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 44014 consistent gets 9516 physical reads 0 redo size 1107 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 19 rows processed
Without an index in place, the CBO has no choice but to use a FTS. But what will Automatic Indexing make of things?
If we look at the next Automatic Indexing report:
SQL> select dbms_auto_index.report_last_activity() from dual; GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 13-OCT-2020 01:47:48 Activity end : 13-OCT-2020 02:59:48 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 184.55 MB (184.55 MB / 0 B) Indexes dropped : 0 SQL statements verified : 2 SQL statements improved (improvement factor) : 1 (44119.6x) SQL plan baselines created : 0 Overall improvement factor : 25135.8x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- --------------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | --------------------------------------------------------------------------------- | BOWIE | BIG_BOWIE1 | SYS_AI_2zt7rg40mxa4n | TOTAL_SALES | B-TREE | NONE | --------------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : chwm2gubm8fx9 SQL Text : SELECT * FROM big_bowie1 WHERE total_sales = 42 Improvement Factor : 44119.6x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 4387193 1173 CPU Time (s): 2599423 1037 Buffer Gets: 749507 22 Optimizer Cost: 643 22 Disk Reads: 470976 2 Direct Writes: 0 0 Rows Processed: 323 19 Executions: 17 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 2468051548 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 643 | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 643 | 00:00:01 | | 2 | TABLE ACCESS STORAGE FULL | BIG_BOWIE1 | 20 | 520 | 643 | 00:00:01 | ----------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 937174207 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 19 | 494 | 22 | 00:00:01 | | 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE1 | 19 | 494 | 22 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_2zt7rg40mxa4n | 19 | | 3 | 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("TOTAL_SALES"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
We notice a couple of interesting points.
Firstly, yes Automatic Indexing has created an index based on the TOTAL_SALES column (SYS_AI_2zt7rg40mxa4n) as it improves performance by a reported 44119.6x.
Note also that the Automatic Index is a Non-Partitioned (Global) Index. From a performance perspective, this is the most efficient index to create to improve the performance of this query as the CBO only has the one index structure to navigate (vs. a LOCAL index that would require having to navigate down all 8 index structures for each table partition.
If we look at the index details:
SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes WHERE table_name = 'BIG_BOWIE1'; INDEX_NAME PAR AUT VISIBILIT STATUS ------------------------------ --- --- --------- -------- SYS_AI_2zt7rg40mxa4n NO YES VISIBLE VALID
We notice that this is indeed a Non-Partitioned Index, that is both VISIBLE and VALID and so can be potentially used by any database session.
If we now re-run the query:
SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42; 19 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 937174207 ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 520 | 23 (0)| 00:00:01 | | | | 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| BIG_BOWIE1 | 20 | 520 | 23 (0)| 00:00:01 | ROWID | ROWID | |* 2 | INDEX RANGE SCAN | SYS_AI_2zt7rg40mxa4n | 20 | | 3 (0)| 00:00:01 | | | ----------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("TOTAL_SALES"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 23 consistent gets 0 physical reads 0 redo size 1166 bytes sent via SQL*Net to client 369 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 19 rows processed
We can see the query now uses the newly created Automatic Index and is indeed more efficient, performing now just 23 consistent gets (previously 44014 consistent gets).
However, this was NOT previous behaviour.
The documentation previously mentioned that only LOCAL indexes are used when indexing partitioned tables.
If we run the same demo on Oracle Database 19.3, we get the following report:
GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 14-OCT-2020 13:12:07 Activity end : 14-OCT-2020 14:24:07 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 192.94 MB (192.94 MB / 0 B) Indexes dropped : 0 SQL statements verified : 1 SQL statements improved (improvement factor) : 1 (1950.5x) SQL plan baselines created : 0 Overall improvement factor : 1950.5x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- --------------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | --------------------------------------------------------------------------------- | BOWIE | BIG_BOWIE1 | SYS_AI_8armv0hqq73fa | TOTAL_SALES | B-TREE | LOCAL | --------------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 2pp8ypramw30s SQL Text : SELECT * FROM big_bowie1 WHERE total_sales = 42 Improvement Factor : 1950.5x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 6996973 27327 CPU Time (s): 6704215 12819 Buffer Gets: 815306 49 Optimizer Cost: 12793 28 Disk Reads: 2 40 Direct Writes: 0 0 Rows Processed: 475 25 Executions: 19 1 PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 4294056405 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 12793 | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 12793 | 00:00:01 | | 2 | TABLE ACCESS FULL | BIG_BOWIE1 | 20 | 520 | 12793 | 00:00:01 | ----------------------------------------------------------------------------- - With Auto Indexes ----------------------------- Plan Hash Value : 3781269341 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 25 | 650 | 28 | 00:00:01 | | 1 | PARTITION RANGE ALL | | 25 | 650 | 28 | 00:00:01 | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE1 | 25 | 650 | 28 | 00:00:01 | | * 3 | INDEX RANGE SCAN | SYS_AI_8armv0hqq73fa | 25 | | 17 | 00:00:01 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 3 - access("TOTAL_SALES"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
As we can see, in this scenario, the newly created Automatic Index has a “Property” of LOCAL.
If we look at its index details:
SQL> SELECT index_name, partitioned, auto, visibility, status FROM user_indexes WHERE table_name = 'BIG_BOWIE1'; INDEX_NAME PAR AUT VISIBILIT STATUS ------------------------------ --- --- --------- -------- SYS_AI_8armv0hqq73fa YES YES VISIBLE N/A SQL> SELECT index_name, partitioning_type, partition_count, locality FROM user_part_indexes WHERE table_name = 'BIG_BOWIE1'; INDEX_NAME PARTITION PARTITION_COUNT LOCALI ------------------------------ --------- --------------- ------ SYS_AI_8armv0hqq73fa RANGE 8 LOCAL
We can see how a Local Index was previously created.
As such if we re-run an equivalent query:
SQL> SELECT * FROM big_bowie1 WHERE total_sales = 42; 25 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3781269341 ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 520 | 26 (0)| 00:00:01 | | | | 1 | PARTITION RANGE ALL | | 20 | 520 | 26 (0)| 00:00:01 | 1 | 8 | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| BIG_BOWIE1 | 20 | 520 | 26 (0)| 00:00:01 | 1 | 8 | |* 3 | INDEX RANGE SCAN | SYS_AI_8armv0hqq73fa | 20 | | 17 (0)| 00:00:01 | 1 | 8 | ----------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("TOTAL_SALES"=42) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 50 consistent gets 0 physical reads 0 redo size 1555 bytes sent via SQL*Net to client 409 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 25 rows processed
Although the query is returning 6 more rows (as with the random number generation, has a slightly different data set), it’s more expensive proportionally now having to perform 50 consistent gets as it now has to read 8 index structures rather than just the one.
So (IMHO), Automatic Indexing has improved here, creating a more efficient index structure than previously. So always bear in mind that Automatic Indexing is an evolving beast, improving and adapting as time moves on.
However, note the compromise here is that by having an effectively Global index structure, there may be some additional issues depending on any subsequent structural changes to the table.
More on Automatic Indexing and Partitioning in my next post…
Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part III (Do Anything You Say) October 8, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Stale Statistics.2 comments
In Part I of this series, we saw how Automatic Indexing will not create a viable Automatic Index if there are stale or missing statistics on the underlining segments. In Part II we saw how these SQL statements effectively become blacklisted and when segment statistics are subsequently collected, Automatic Indexing will still not create viable Automatic Indexes when the SQL statements are re-run.
So how do we get Automatic Indexing to now kick in and create necessary indexes on these problematic SQLs?
As I’ve discussed previously in relation to blacklisted SQLs, we need to run a NEW SQL statement that hasn’t been blacklist that will result in a necessary index to be created. An easy way to do this is just to include a new comment within the previous SQL to give the SQL a new signature.
If we now run the following “new” SQL statement (identical to the problematic SQL but with a comment embedded):
SQL> select /* new */ * from bowie_stale where code=42; ID CODE NAME ---------- ---------- ------------------------------------------ 1000041 42 David Bowie 6000041 42 David Bowie 41 42 David Bowie 3000041 42 David Bowie 7000041 42 David Bowie 8000041 42 David Bowie 4000041 42 David Bowie 9000041 42 David Bowie 5000041 42 David Bowie 2000041 42 David Bowie
If we now wait to see what the next Automatic Indexing task makes of things:
SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 07-JUL-2020 06:34:49 Activity end : 07-JUL-2020 06:35:54 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 142.61 MB (142.61 MB / 0 B) Indexes dropped : 0 SQL statements verified : 1 SQL statements improved (improvement factor) : 1 (19787.7x) SQL plan baselines created : 0 Overall improvement factor : 19787.7x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- 1. The following indexes were created: *: invisible ------------------------------------------------------------------------------- --------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | --------------------------------------------------------------------------- | BOWIE | BOWIE_STALE | SYS_AI_300kk2unp8tr0 | CODE | B-TREE | NONE | --------------------------------------------------------------------------- -------------------------------------------------------------------------------
We see that the index on the CODE column (SYS_AI_300kk2unp8tr0) has now been created.
Further down the report:
VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : du6psd0xmzpg5 SQL Text : select /* new */ * from bowie_stale where code=42 Improvement Factor : 19787.7x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 137261 2620 CPU Time (s): 84621 1769 Buffer Gets: 277028 13 Optimizer Cost: 544 13 Disk Reads: 275947 2 Direct Writes: 0 0 Rows Processed: 70 10 Executions: 7 1
A new index was indeed created because of this new SQL statement, with a performance improvement of 19787.7x.
Further down the report to the Plans Section:
PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 65903426 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 544 | | | 1 | TABLE ACCESS STORAGE FULL | BOWIE_STALE | 10 | 230 | 544 | 00:00:01 | ----------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 2558864466 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 230 | 13 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_STALE | 10 | 230 | 13 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_300kk2unp8tr0 | 10 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
We can see that the new plan using the new Automatic Index with a much lower CBO cost.
If we now look at the status of this index:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_STALE'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --- --------- ------------- -------- ---------- ----------- ----------------- BOWIE_STALE_PK NO YES VISIBLE DISABLED VALID 10000000 20164 59110 SYS_AI_300kk2unp8tr0 YES NO VISIBLE ADVANCED LOW VALID 10000000 16891 10000000
We see that the index is now both VISIBLE and VALID (previously, it was INVISIBLE and UNUSABLE).
As such, the Automatic Index can now potentially be used by any SQL, including the previous problematic query.
So with a viable index now in place, if we re-run the initial problematic query:
SQL> select * from bowie_stale where code=42; 10 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2558864466 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 10 | 230 | 14 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_STALE | 10 | 230 | 14 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_300kk2unp8tr0 | 10 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 14 consistent gets 0 physical reads 0 redo size 738 bytes sent via SQL*Net to client 361 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
We see that finally, the SQL uses the new Automatic Index and is indeed much more efficient as a result, with just 14 consistent gets required (when previously it was 39430 consistent gets).
So if ever you come across the scenario where an SQL does not have an Automatic Index created when clearly it should, it could be that it has been blacklisted and needs a different SQL to actually generate the necessary index.
To avoid some of these issues, make sure you do not have stale or missing statistics when reliant on Automatic Indexing. The new High Frequency Statistics Collection capability to designed to specifically avoid such a scenario.
Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part II (Survive) October 7, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Internals, Index statistics, Oracle, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Stale Statistics.1 comment so far
In my previous post, I discussed how having stale statistics, usually a bad idea, is especially problematic with regard Automatic Indexes as it usually results in viable automatic indexes only being created in an UNUSABLE/INVISIBLE state.
If we were to now to collect the missing statistics:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STALE'); PL/SQL procedure successfully completed. SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE'; TABLE_NAME NUM_ROWS BLOCKS LAST_ANAL -------------------- ---------- ---------- --------- BOWIE_STALE 10000000 39677 06-JUL-20 SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols where table_name='BOWIE_STALE'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM LAST_ANAL -------------------- ------------ ---------- --------------- --------- ID 10000000 0 HYBRID 06-JUL-20 CODE 971092 .000001 HYBRID 06-JUL-20 NAME 1 4.9416E-08 FREQUENCY 06-JUL-20
If we now repeatedly re-run the problematic query many times:
SQL> select * from bowie_stale where code=42; 10 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 65903426 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 230 | 544 (14)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE | 10 | 230 | 544 (14)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42) filter("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39430 consistent gets 39421 physical reads 0 redo size 610 bytes sent via SQL*Net to client 361 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
The CBO is forced to use the FTS as the current Automatic Index is in an UNUSABLE/INVISIBLE state.
If we wait for the next Automatic Indexing reporting period:
SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 06-JUL-2020 05:12:42 Activity end : 06-JUL-2020 05:13:34 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 0 SQL statements improved : 0 SQL plan baselines created : 0 Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 -------------------------------------------------------------------------------
We notice that the Automatic Indexing process has nothing to report. Even though the problematic query is repeatedly executed, the SQL is now effectively on a blacklist and is not re-considered by the Automatic Indexing process.
If we look at the index details on the table:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_STALE'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ---------------------- --- --- --------- ------------- -------- ---------- ----------- ----------------- BOWIE_STALE_PK NO YES VISIBLE DISABLED VALID 10000000 20164 59110 SYS_AI_300kk2unp8tr0 YES NO INVISIBLE ADVANCED LOW UNUSABLE 10000000 23058 4147514
So the Automatic Index (SYS_AI_300kk2unp8tr0) is still UNUSABLE and INVISIBLE and can not be used by the CBO.
NOTE: In earlier patches of Oracle Database 19c (I’m using version 19.5.0.0.0 in this demo), I identified some scenarios after stale statistics when indexes were created in but in a VALID/INVISIBLE state, such that they could still not be used by the CBO in general database sessions.
If we simply re-run the same queries again from the time when the dependant object statistics were stale, any SQL is just ignored by the Automatic Indexing process.
As such, if we now subsequently re-run the problematic query again:
SQL> select * from bowie_stale where code=42; 10 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 65903426 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 230 | 544 (14)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE | 10 | 230 | 544 (14)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42) filter("CODE"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 39430 consistent gets 39421 physical reads 0 redo size 610 bytes sent via SQL*Net to client 361 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
Again, the CBO has no choice here with no viable VALID/VISIBLE index present but to perform a FTS, even though its getting the cardinality estimates spot on since statistics gathering.
In Part III I’ll discuss how to get this query to finally use the Automatic Index and improve its performance, although if you’re a regular reader of the blog you should already know the solution…
Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part I (Dead Against It) October 6, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, High Frequency Statistics Collection, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Stale Statistics, Unusable Indexes.5 comments
A “golden rule” when working with Automatic Indexing is that things don’t work properly if there are stale statistics on the dependant objects. Stale statistics can of course be problematic but they can be particularly troublesome when dealing with Automatic Indexing.
In the Oracle Autonomous Database environments, this issue is addressed somewhat by the new High Frequency Statistics Collection capability, which helps to automatically collect stale statistics on a regular basis. However, in on-prem Exadata environments where this can more easily be turned off or collected less frequently, it’s a potential issue worth consideration.
I’ll start with a simple little table, with a CODE column that has lots of distinct values:
SQL> create table bowie_stale (id number constraint bowie_stale_pk primary key, code number, name varchar2(42)); Table created. SQL> insert into bowie_stale select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete.
Importantly, I don’t collect statistics on this newly populated table…
SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE'; TABLE_NAME NUM_ROWS BLOCKS LAST_ANAL --------------- ---------- ---------- --------- BOWIE_STALE SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols where table_name='BOWIE_STALE'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM LAST_ANAL -------------------- ------------ ---------- --------------- --------- ID NONE CODE NONE NAME NONE
If we now run the following query a number of times while there are no statistics on the table:
SQL> select * from bowie_stale where code=42; 10 rows selected. Execution Plan ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 437 | 21413 | 553 (16)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE | 437 | 21413 | 553 (16)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42) filter("CODE"=42) Note ----- - dynamic statistics used: dynamic sampling (level=2) - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 6 recursive calls 0 db block gets 39026 consistent gets 0 physical reads 0 redo size 867 bytes sent via SQL*Net to client 588 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
The CBO has no choice but to use a FTS as I don’t yet have an index on the CODE column.
If I now wait for the next Automatic Indexing task to kick in AND if there are still NO statistics on the table:
SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 05-JUL-2020 06:36:31 Activity end : 05-JUL-2020 06:37:07 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 0 SQL statements improved : 0 SQL plan baselines created : 0 Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0
You can see that there was the one index candidate BUT no Automatic Index appears to have been created.
Assuming there are still no statistics:
SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE'; TABLE_NAME NUM_ROWS BLOCKS LAST_ANAL ------------------------------ ---------- ---------- --------- BOWIE_STALE SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols where table_name='BOWIE_STALE2'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM LAST_ANAL -------------------- ------------ ---------- --------------- --------- ID NONE CODE NONE NAME NONE
If we look now at what indexes exist on the table:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_STALE'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --- --------- ------------- -------- ---------- ----------- ----------------- BOWIE_STALE_PK NO YES VISIBLE DISABLED VALID SYS_AI_300kk2unp8tr0 YES NO INVISIBLE DISABLED UNUSABLE 0 0 0 SQL> select index_name, column_name, column_position from user_ind_columns where table_name='BOWIE_STALE2' order by index_name, column_position; INDEX_NAME COLUMN_NAME COLUMN_POSITION ------------------------------ -------------------- --------------- BOWIE_STALE_PK ID 1 SYS_AI_300kk2unp8tr0 CODE 1
We notice there is now an Automatic Index BUT it remains in an UNUSABLE/INVISIBLE state. This means the index can’t be used by the CBO.
So if we now re-run the SQL query again:
SQL> select * from bowie_stale where code=42; 10 rows selected. Execution Plan ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 437 | 21413 | 553 (16)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE | 437 | 21413 | 553 (16)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=42) filter("CODE"=42) Note ----- - dynamic statistics used: dynamic sampling (level=2) - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 6 recursive calls 0 db block gets 39026 consistent gets 0 physical reads 0 redo size 867 bytes sent via SQL*Net to client 588 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
The CBO has no choice still but to use the FTS.
In Part II, we’ll see that once we get into this scenario, it can be a tad problematic to get ourselves out of it and get the Automatic Index created as we would like…
Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.1 comment so far
In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.
However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.
Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).
These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.
One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.
This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:
SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1005K| 135M| 11411 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY | 1005K| 135M| 11411 (1)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 41169 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2705 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.
However, if we now remove the histogram on the CODE column:
SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’); PL/SQL procedure successfully completed.
There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.
So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 18-AUG-2020 16:42:33 Activity end : 18-AUG-2020 16:43:06 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created : 0 Space used : 0 B Indexes dropped : 0 SQL statements verified : 1 SQL statements improved : 0 SQL plan baselines created (SQL statements) : 1 (1) Overall improvement factor : 0x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0
We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.
If we look at the Verification Details part of this report:
VERIFICATION DETAILS ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- The following SQL plan baselines were created: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 3yz8unzhhvnuz SQL Text : select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000) SQL Signature : 3910785437403172730 SQL Handle : SQL_3645e6a2952fcf7a SQL Plan Baselines (1) : SQL_PLAN_3cjg6naakzmvu198c05b9
We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.
Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.
Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.
Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.
This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:
SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz'; EXECUTION_NAME ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS -------------------------- -------------------- ---------------------- --------- SYS_AI_2020-08-18/16:42:33 41169 410291 REGRESSED
If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):
SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 32 | 4512 | 11425 (2)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY | 32 | 4512 | 11425 (2)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Hint Report (identified by operation id / Query Block Name / Object Alias): Total hints for statement: 1 (U - Unused (1)) --------------------------------------------------------------------------- 1 - SEL$1 U - dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS Note ----- - SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement Statistics ---------------------------------------------------------- 9 recursive calls 4 db block gets 41170 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2705 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.
So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…
Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part I (The Prettiest Star)) September 25, 2020
Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.1 comment so far
In my previous few blog posts, I’ve been discussing some issues in relation to how Automatic Indexes handle SQL statements that accesses skewed data. In this post, I’m going to setup the scenario in which Automatic Indexing can potentially use Baselines to help address some of these issues. BUT, as we’ll see, I’m having to manufacture things somewhat to make this work due to the problem of the Automatic Indexing task using Dynamic Sampling of level 11, whereas most usual database sessions do not.
To set things up, I’m going recap what I’ve previously discussed (but with a slight difference), by creating a table that has significant data skew on the CODE column, with most values very uncommon, but with a handful of values being very common:
SQL> create table space_oddity (id number constraint space_oddity_pk primary key, code number, name varchar2(142)); Table created. SQL> begin 2 for i in 1..2000000 loop 3 if mod(i,2) = 0 then 4 insert into space_oddity values(i, ceil(dbms_random.value(0,1000000)), 'David Bowie is really Ziggy Stardust and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history'); 5 else 6 insert into space_oddity values(i, mod(i,20)*10000, 'Ziggy Stardust is really David Bowie and his band are called The Spiders From Mars. Then came Aladdin Sane and the rest is history.'); 7 end if; 8 end loop; 9 commit; 10 end; 11 / PL/SQL procedure successfully completed.
So most CODE values will only occur a few times if at all, but a few values divisible by 10000 have many many occurrences within the table.
Importantly, we will initially collect statistics with NO histograms on the CODE column, which is the default behaviour anyways if no SQL has previous run with predicates on the column:
SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we run a query based on a rare value for CODE:
SQL> set arraysize 5000 SQL> select * from space_oddity where code=25; Execution Plan ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 423 | 11356 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY | 3 | 423 | 11356 (1)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=25) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40974 consistent gets 0 physical reads 0 redo size 1018 bytes sent via SQL*Net to client 402 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 2 rows processed
Without an index, the CBO has no choice at this point but to perform a FTS. BUT note that the 2 rows returned is very similar to the 3 estimated rows, which would make an index likely the way to go if such an index existed.
However, the following SQL accesses many of the common values of CODE and returns many rows:
SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 32 | 4512 | 11425 (2)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY | 32 | 4512 | 11425 (2)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 41169 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2678 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
Again, without an index in place, the CBO has no choice but to perform a FTS but this is almost certainly the way to go regardless. BUT without a histogram on the CODE column, the CBO has got the cardinality estimate way way off and thinks only 32 rows are to be returned and not the actual 1000011 rows.
So what does Automatic Indexing make of things. Let’s wait and have a look at the next Automatic Indexing Report:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 18-AUG-2020 15:57:14 Activity end : 18-AUG-2020 15:58:10 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 1 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 35.65 MB (35.65 MB / 0 B) Indexes dropped : 0 SQL statements verified : 1 SQL statements improved (improvement factor) : 1 (40984.3x) SQL plan baselines created : 0 Overall improvement factor : 40984.3x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ---------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ---------------------------------------------------------------------------- | BOWIE | SPACE_ODDITY | SYS_AI_82bdnqs7q8rtm | CODE | B-TREE | NONE | ----------------------------------------------------------------------------
So Automatic Indexing has indeed created the index (SYS_AI_82bdnqs7q8rtm) on the CODE column BUT this is based on only the one SQL statement:
VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 19sv1g6tt0g1y SQL Text : select * from space_oddity where code=25 Improvement Factor : 40984.3x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 5417408 139265 CPU Time (s): 1771880 7797 Buffer Gets: 327876 5 Optimizer Cost: 11356 5 Disk Reads: 649 2 Direct Writes: 0 0 Rows Processed: 16 2 Executions: 8 1
The Automatic Indexing task has correctly identified a significant improvement of 40984.3x when using an index on the SQL statement that returned just the 2 rows. The other SQL statement that returns many rows IS NOT MENTIONED.
This is because the Automatic Indexing tasks uses Dynamic Sampling Level=11, meaning it determines the more accurate cardinality estimate on the fly and correctly identifies that a vast number of rows are going to be returned. As a result, it correctly determines that the new Automatic Indexing if used would be detrimental to performance and would not be used by the CBO.
BUT most importantly, it also makes the assumption that the CBO would automatically likewise make this same decision to NOT use any such index in other database sessions and so there’s nothing to protect.
BUT this assumption is incorrect IF other database sessions don’t likewise use Dynamic Sampling with Level=11.
BUT by default, including in Oracle’s Autonomous Database Transaction Processing Cloud environment, the Dynamic Sampling Level is NOT set to 11, but the 2.
Therefore, most database sessions will not be able to determine the correct cardinality estimate on the fly and so will incorrectly assume the number of returned rows is much less than in reality and potentially use any such new Automatic Index inappropriately…
So if we look at the Plans Section of the Automatic Indexing report:
PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 2301175572 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 11356 | | | 1 | TABLE ACCESS FULL | SPACE_ODDITY | 3 | 423 | 11356 | 00:00:01 | ----------------------------------------------------------------------------- - With Auto Indexes ----------------------------- Plan Hash Value : 54782313 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 423 | 5 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | SPACE_ODDITY | 3 | 423 | 5 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_82bdnqs7q8rtm | 2 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE"=25) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
The new plan for the SQL returning 2 rows when using the new Automatic Index and is much more efficient with a significantly reduced cost (just 3 down from 11356).
But again, the plans for the SQL that returns many rows are not listed as the Automatic Indexing task has already determined that an index would make such a plan significantly less efficient.
If we now rerun the SQL the returns many rows (and BEFORE High Frequency Collection Statistics potentially kicks in):
SQL> select * from space_oddity where code in (10000, 30000, 50000, 70000, 90000, 110000, 130000, 150000, 170000, 190000); 1000011 rows selected. Execution Plan ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 32 | 4512 | 35 (0)| 00:00:01 | | 1 | INLIST ITERATOR | | | | | | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED| SPACE_ODDITY | 32 | 4512 | 35 (0)| 00:00:01 | |* 3 | INDEX RANGE SCAN | SYS_AI_82bdnqs7q8rtm | 32 | | 12 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 410422 consistent gets 0 physical reads 0 redo size 145536076 bytes sent via SQL*Net to client 2678 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
Note that the cardinality estimate is still way way wrong, thinking that just 32 rows are to be returned, when is fact 1000011 rows are returned.
As a result, the CBO has decided to incorrectly use the new Automatic Index. Incorrectly, in that the number of consistent gets has increased 10x from the previous FTS plan (410,422 now, up from 41,169).
One way to resolve this is to collect histograms on the CODE column (or wait for the High Frequency Stats Collection to kick in):
SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 2048’); PL/SQL procedure successfully completed.
If we now re-run this SQL:
SQL> select * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000); 1000011 rows selected. Execution Plan ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 996K| 133M| 11411 (1)| 00:00:01 | |* 1 | TABLE ACCESS FULL| SPACE_ODDITY | 996K| 133M| 11411 (1)| 00:00:01 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 41169 consistent gets 0 physical reads 0 redo size 13535504 bytes sent via SQL*Net to client 2678 bytes received via SQL*Net from client 202 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1000011 rows processed
The cardinality estimate is now much more accurate and the the execution plan now uses the more efficient FTS.
In Part II, we’ll look at how the Automatic Indexing tasks can be made to identify the dangers of a new index to SQLs that might degrade in performance and how it will create a Baseline to protect against any such SQL regressions….
Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.1 comment so far
As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).
As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.
One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).
If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:
SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 101K| 2388K| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=42 AND "CODE2"=42) filter("CODE1"=42 AND "CODE2"=42) Note ----- - dynamic statistics used: dynamic sampling (level=AUTO) - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 609 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.
Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true) in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.
In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.
We currently have the following columns and statistics on the IGGY_POP table:
SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM -------------------- ------------ ---------- ----------- --------------- ID 9705425 0 254 HYBRID CODE1 100 .00000005 100 FREQUENCY CODE2 100 .00000005 100 FREQUENCY NAME 1 5.0210E-08 1 FREQUENCY
If we now collect Extended Statistics on both CODE1, CODE2 columns:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254'); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP'; COLUMN_NAME NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM ------------------------------ ------------ ---------- ----------- --------------- ID 9705425 0 254 HYBRID CODE1 100 .00000005 100 FREQUENCY CODE2 100 .00000005 100 FREQUENCY NAME 1 5.0210E-08 1 FREQUENCY SYS_STU#29QF8Y9BUDOW2HCDL47N44 99 .00000005 100 FREQUENCY
The CBO now has some idea on the cardinality if both columns are used within a predicate.
If we re-run the problematic query without the hint:
SQL> select * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 2343K| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 100K| 2343K| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=42 AND "CODE2"=42) filter("CODE1"=42 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 581 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.
However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:
SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50000 | 878K | 575 (15) | 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 | 878K | 575 (15) | 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=1 AND "CODE2"=42) filter("CODE1"=1 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 368 bytes sent via SQL*Net to client 377 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.
This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.
So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.
As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…
Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part I (Neighborhood Threat) September 18, 2020
Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Explain Plan For Index, Extended Statistics, Full Table Scans, Histograms, Index Access Path, Oracle, Oracle General, Oracle Indexes.1 comment so far
Following on from my previous few posts on “data skew”, I’m now going to look at it from a slightly different perspective, where there is an inherent relationship between columns. The CBO has difficulties in recognising (by default) that some combinations of column values are far more common than other combinations, resulting in incorrect cardinality estimates and resultant poor execution plans.
As we’ll see, this skew in returned data can lead to poor execution plans due to the inappropriate use of newly created Automatic Indexes…
I’ll start by creating a simple table that has two columns of interest, CODE1 and CODE2:
SQL> create table iggy_pop (id number, code1 number, code2 number, name varchar2(42)); Table created. SQL> insert into iggy_pop select rownum, mod(rownum, 100)+1, mod(rownum, 100)+1, 'David Bowie' from dual connect by level <= 10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP'); PL/SQL procedure successfully completed.
Both columns CODE1 and CODE2 each have 100 distinct values, so that the possible combinations of data from both columns is 100 x 100 = 10,000. HOWEVER, the values of CODE1 and CODE2 are always the same and so there is in fact only 100 distinct combinations of data because of this inherent relationship between columns.
If we run the following query for a combination of data that exists:
SQL> select * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000| 24000| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 1000| 24000| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=42 AND "CODE2"=42) filter("CODE1"=42 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 581 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100000 rows processed
Without an index, the CBO has no choice but to use a FTS. However, the interesting thing to note is how the cardinality estimate is way wrong, with 100,000 rows returned but only 1000 rows estimated. The CBO incorrect assumes that 1/10000th of the data is being returned and not actual the 1/100 (1%).
If we run a query on a combination of data that doesn’t exist:
SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3288467 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000 | 18000| 575 (15)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 1000 | 18000| 575 (15)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE1"=1 AND "CODE2"=42) filter("CODE1"=1 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 40964 consistent gets 40953 physical reads 0 redo size 368 bytes sent via SQL*Net to client 377 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
The CBO still estimates that 1000 rows are to be returned. However, with no rows returned, an index would be a much better alternative than the current FTS in this case.
Let’s now wait and see what the Automatic Indexing process makes of all this (following are highlights from the Auto Indexing Last Activity report):
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT -------------------------------------------------------------------------------- GENERAL INFORMATION ------------------------------------------------------------------------------- Activity start : 18-SEP-2020 01:24:17 Activity end : 18-SEP-2020 01:25:29 Executions completed : 1 Executions interrupted : 0 Executions with fatal error : 0 ------------------------------------------------------------------------------- SUMMARY (AUTO INDEXES) ------------------------------------------------------------------------------- Index candidates : 0 Indexes created (visible / invisible) : 1 (1 / 0) Space used (visible / invisible) : 134.22 MB (134.22 MB / 0 B) Indexes dropped : 0 SQL statements verified : 1 SQL statements improved (improvement factor) : 1 (41301.7x) SQL plan baselines created : 0 Overall improvement factor : 41301.7x ------------------------------------------------------------------------------- SUMMARY (MANUAL INDEXES) ------------------------------------------------------------------------------- Unused indexes : 0 Space used : 0 B Unusable indexes : 0 ------------------------------------------------------------------------------- INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | ------------------------------------------------------------------------------- | BOWIE | IGGY_POP | SYS_AI_1awkddqkwa4f8 | CODE1,CODE2 | B-TREE | NONE | ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
So Oracle does indeed create an automatic index on the CODE1, CODE2 columns. However, notice that only 1 statement has been verified and not the above two statements that I had executed during the previous period.
VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : bdnf0barn3jk7 SQL Text : select code1, code2, name from iggy_pop where code1=1 and code2=42 Improvement Factor : 41301.7x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 72085 1342 CPU Time (s): 39272 679 Buffer Gets: 123907 3 Optimizer Cost: 575 4 Disk Reads: 122859 2 Direct Writes: 0 0 Rows Processed: 0 0 Executions: 3 1
So only the SQL that returned 0 rows has been reported. As expected, it runs much more efficiently with an index than via the previous FTS, with an Improvement Factor of some 41301.7x.
PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 3288467 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 575 | | | 1 | TABLE ACCESS STORAGE FULL | IGGY_POP | 1000 | 18000 | 575 | 00:00:01 | -------------------------------------------------------------------------------- Notes ----- - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 2496796491 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 36 | 4 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | IGGY_POP | 2 | 36 | 4 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_1awkddqkwa4f8 | 1 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE1"=1 AND "CODE2"=42) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
If we look at the comparison between plans, the new plan of course uses the newly created Automatic Index.
The critical point to notice here however is that the cardinality estimates are almost spot for the new execution plan (2 rows is much closer to reality than the previous 1000).
The reason why it’s much more accurate is because the Auto Indexing process session uses the new Dynamic Sampling Level = 11. This enables the CBO to sample data on the fly and determine a much more accurate cardinality estimate than by default where the Dynamic Sampling Level=2.
This also explains why the other statement which returned many rows was not “verified”. Actually, it was but because the Auto Index process with Dynamic Sampling set to 11 correctly identified that too many rows were being returned to make any new index viable, this statement did NOT cause the new index to be kept.
So it was only the SQL that returned no rows that resulted in the newly created Automatic Index. The other statement was correctly determined by the Automatic Indexing process to run worse with the new index and so determined that the CBO would simply ignore the index if created.
BUT this assumption of the CBO ignoring the index is NOT correct as we’ll see…
If we look at the new Automatic Index:
SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='IGGY_POP'; INDEX_NAME AUT CON VISIBILIT COMPRESSION STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --- --------- ------------- -------- ---------- ----------- ----------------- SYS_AI_1awkddqkwa4f8 YES NO VISIBLE ADVANCED LOW VALID 10000000 15362 4083700
We can see the index is both VISIBLE and VALID and so can potentially be used now by ANY subsequent SQL statement.
Now the important thing to note is that the default for most sessions in a database is for Dynamic Sampling to be set to 2 and for Optimizer_Adaptive_Statistics=False. Importantly, this is also the case in Oracle’s Autonomous Transaction Processing Cloud service.
SQL> show parameter sampling NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ optimizer_dynamic_sampling integer 2 SQL> show parameter optimizer_adaptive NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ optimizer_adaptive_plans boolean TRUE optimizer_adaptive_reporting_only boolean FALSE optimizer_adaptive_statistics boolean FALSE
So this is DIFFERENT to the settings for the Automatic Indexing process. In a standard session, the CBO will NOT have the capability to accurately determine the correct cardinality estimates as we saw previously.
If we now re-run the SQL that returns no rows:
SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 2496796491 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1000 | 18000 | 413 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP | 1000 | 18000 | 413 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_1awkddqkwa4f8 | 1000 | | 4 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE1"=1 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 368 bytes sent via SQL*Net to client 377 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed
The execution uses the new index, because even though it STILL thinks 1000 rows are to be returned, that’s sufficiently few for the index to be costed the cheaper option.
When when re-run the SQL that returns many many rows:
SQL> select * from iggy_pop where code1=42 and code2=42; 100000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2496796491 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1000 | 24000 | 413 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| IGGY_POP | 1000 | 24000 | 413 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_1awkddqkwa4f8 | 1000 | | 4 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE1"=42 AND "CODE2"=42) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Statistics ---------------------------------------------------------- 25 recursive calls 0 db block gets 41981 consistent gets 40953 physical reads 0 redo size 1092240 bytes sent via SQL*Net to client 581 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 100000 rows processed
Ouch. It also uses the new Automatic Index, because it also STILL thinks only 1000 rows are to be returned and just like the previous SQL statement, is determined to be the cheaper option.
BUT in this case it isn’t really the cheaper option, having to read the table potentially piecemeal at a time via the index, rather than more efficiently with fewer and larger multiblock reads via a FTS.
This is not really how Automatic is designed to work. Its meant to protect us from making SQL statements regress in performance BUT because there is a difference in how a normal session and the Automatic Indexing process determines the cost of execution plans, these scenarios can eventuate.
In my next blog I’ll look at how to address this specific scenario and then look at an example of how Automatic Indexing is really meant to work via the use of automated baselines…
Oracle 19c Automatic Indexing: Data Skew Part III (The Good Son) September 16, 2020
Posted by Richard Foote in 19c, 19c New Features, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Index Access Path, Oracle, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Unusable Indexes.add a comment
I’m going to expand just a tad on my previous posts on data skew and run a simple query that returns a few rows based on a column predicate AND another query on the same column that returns many rows.
The following table has a CODE column as with previous posts with the data heavily skewed:
SQL> create table bowie_skew (id number, code number, name varchar2(42)); Table created. SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000; 1000000 rows created. SQL> update bowie_skew set code = 9 where mod(id,3) = 0; 333333 rows updated. SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000; 10000 rows updated. SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000; 5000 rows updated. SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000; 1000 rows updated. SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000; 1000 rows updated. SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000; 1000 rows updated. SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000; 100 rows updated. SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000; 100 rows updated. SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000; 100 rows updated. SQL> commit; Commit complete.
I’ll next collect statistics with NO histogram, as I don’t think they’re required at this point:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'bowie_skew', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed.
If we look at the table data:
SQL> select code, count(*) from bowie_skew group by code order by code; CODE COUNT(*) ---------- ---------- 1 10000 2 5000 3 1000 4 1000 5 1000 6 100 7 100 8 100 9 327235 10 654465
The value “7” only has 100 associated rows, while the value “10” is very common with 654,465 rows.
But I currently have no histograms:
SQL> select column_name, num_buckets, histogram from user_tab_cols where table_name='BOWIE_SKEW'; COLUMN_NAME NUM_BUCKETS HISTOGRAM --------------- ----------- --------------- ID 1 NONE CODE 1 NONE NAME 1 NONE
If I run the following query with a CODE=7 predicate just once:
SQL> select * from bowie_skew where code=7; 100 rows selected. Execution Plan -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 1953K| 570 (7)| 00:00:01 | | 1 | PX COORDINATOR | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 100K| 1953K| 570 (7)| 00:00:01 | | 3 | PX BLOCK ITERATOR | | 100K| 1953K| 570 (7)| 00:00:01 | |* 4 | TABLE ACCESS STORAGE FULL| bowie_skew | 100K| 1953K| 570 (7)| 00:00:01 | --------------------------------------------------------------------------------------------
It uses a Full Table Scan (the CBO has no choice without an index) AND hopelessly gets the cardinality estimate wrong, thinking 100K are going to be returned (and not the 100 actual rows). So the CBO is unlikely to use an index anyways as it would be deemed too expensive to return so many rows.
I’ll now run the following query many times on the CODE=10 predicate that returns many rows:
SQL> select * from bowie_skew where code=10; 654465 rows selected. Execution Plan -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 1953K| 570 (7)| 00:00:01 | | 1 | PX COORDINATOR | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 100K| 1953K| 570 (7)| 00:00:01 | | 3 | PX BLOCK ITERATOR | | 100K| 1953K| 570 (7)| 00:00:01 | |* 4 | TABLE ACCESS STORAGE FULL| bowie_skew | 100K| 1953K| 570 (7)| 00:00:01 | --------------------------------------------------------------------------------------------
So again, no choice here with a FTS and we likely wouldn’t want to use an index anyways as it would be just too expensive.
If we check out what the Automatic Indexing process has done with such a workload:
SQL> select dbms_auto_index.report_last_activity() report from dual; REPORT INDEX DETAILS ------------------------------------------------------------------------------- The following indexes were created: *: invisible ------------------------------------------------------------------------------- -------------------------------------------------------------------------- | Owner | Table | Index | Key | Type | Properties | -------------------------------------------------------------------------- | BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE | -------------------------------------------------------------------------- ------------------------------------------------------------------------------- VERIFICATION DETAILS ------------------------------------------------------------------------------- The performance of the following statements improved: ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Parsing Schema Name : BOWIE SQL ID : 6fm3m8cg2jnun SQL Text : select * from bowie_skew where code=7 Improvement Factor : 46.6x Execution Statistics: ----------------------------- Original Plan Auto Index Plan ---------------------------- ---------------------------- Elapsed Time (s): 36653 1992 CPU Time (s): 33899 967 Buffer Gets: 4291 103 Optimizer Cost: 52 4 Disk Reads: 0 2 Direct Writes: 0 0 Rows Processed: 100 100 Executions: 1 1
An Automatic Index on the CODE column is created (SYS_AI_7psvzc164vbng), with ONLY the SQL based on the CODE=7 predicate listed in the report. The other query is indeed too expensive for a new index to be viable and so isn’t listed.
If we look at the Plans Section of the Automatic Indexing report:
PLANS SECTION --------------------------------------------------------------------------------------------- - Original ----------------------------- Plan Hash Value : 410492785 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 52 | | | 1 | TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 | 52 | 00:00:01 | -------------------------------------------------------------------------------------- Notes ----- - dop_reason = no expensive parallel operation - dop = 1 - px_in_memory_imc = no - px_in_memory = no - With Auto Indexes ----------------------------- Plan Hash Value : 140816325 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 119 | 2380 | 4 | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW | 119 | 2380 | 4 | 00:00:01 | | * 2 | INDEX RANGE SCAN | SYS_AI_7psvzc164vbng | 100 | | 3 | 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): ------------------------------------------ * 2 - access("CODE"=7) Notes ----- - Dynamic sampling used for this statement ( level = 11 )
The important point to note here is that the cardinality estimates are relatively accurate despite there being no histograms at this stage because the Automatic Indexing session uses Dynamic Sampling Level=11. Missing/inaccurate statistics are calculated on fly and this enables the session to accurately determine the size of the returned data set and that an index is indeed the more efficient access path.
So with mixed workloads, all it takes is one SQL executed once that demonstrably improves thanks to an index for the associated Automatic Index to be created as a VISIBLE/VALID index:
SQL> select index_name, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE_SKEW'; INDEX_NAME AUT VISIBILIT STATUS NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ --- --------- -------- ---------- ----------- ----------------- SYS_AI_7psvzc164vbng YES VISIBLE VALID 1000000 1537 8534
If we now run the query AFTER the histograms are subsequently created thanks to the High-Frequency Automatic Statistics Collection (see previous post), the new Automatic Index is now used:
SQL> select * from bowie_skew where code=7; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 140816325 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 100 | 2000 | 4 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW | 100 | 2000 | 4 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | SYS_AI_7psvzc164vbng | 100 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CODE"=7) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 104 consistent gets 0 physical reads 0 redo size 2871 bytes sent via SQL*Net to client 359 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
Note if the histogram is NOT yet collected, the CBO will not determine the correct cardinality estimate and will ignore the new Automatic Index (as previously discussed).
If we run again the query that returns many rows:
SQL> select * from bowie_skew where code=10; 654465 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 410492785 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 654K| 12M| 52 (16)| 00:00:01 | |* 1 | TABLE ACCESS STORAGE FULL| BOWIE_SKEW | 654K| 12M| 52 (16)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - storage("CODE"=10) filter("CODE"=10) Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3725 consistent gets 0 physical reads 0 redo size 6549708 bytes sent via SQL*Net to client 1790 bytes received via SQL*Net from client 132 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 654465 rows processed
The new Automatic Index is correctly ignored by the CBO, as the query returns too many rows for the index to be viable.
So in this example, Automatic Indexing works exactly as it should. It creates a new Automatic Index for a query where it will indeed improve the performance, while other queries on the same column in which many more rows are returned are also run. For these other queries, the new Automatic Index is correctly not used as such an index would degrade the performance of the query.
In my next post, I’ll look at the first example with data skew where Automatic Indexing can be problematic…