jump to navigation

CBO Costing Plans With Migrated Rows Part II (“New Killer Star”) March 28, 2023

Posted by Richard Foote in CBO, Index Access Path, Index statistics, Leaf Blocks, Migrated Rows, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard's Blog, ROWID.
add a comment

I’ve spent the past few months discussing Migrated Rows, in large part thanks to an excellent 15 minute video by Connor McDonald on how ROWIDs can now be updated on the fly in Oracle Autonomous databases.

Well 14 such posts later, I have finally reached the end of this topic (for now at least). So, an average of about 1 post per minute of video 🙂

In my previous post, I discussed how the CBO costs execution plans with tables that have migrated rows, when the statistics are collected as recommended via the DBMS_STATS package. In summary, migrated rows are basically just ignored, with the CBO blissfully unaware of the existence of any such migrated rows.

As I discussed, if I want to easily see how many migrated rows I have in a table, I can potentially use the ANALYZE command as follows:

SQL> analyze table bowie compute statistics;

Table BOWIE analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables
     where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4906              86          415            170        56186

As you can now see, the table currently has 56186 migrated rows (yes CHAIN_CNT can potentially count rows that simply can’t fit within a single block, but all these rows are definitely migrated rows as per the demo in my previous post).

Now, it had always been my belief that although you can use the ANALYZE command to count out these migrated rows, the CBO would simply ignore this statistic in its calculations.

But I was wrong.

If we now re-run the query from the previous post:

SQL> select * from bowie where id > 1 and id < 1001;

999 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0
-------------------------------------
select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |    999 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |    999 |    999 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |    999 |    999 |00:00:00.01 |       4 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">1 AND "ID"<1001)

Statistics
-----------------------------------------------------------
            1 CPU used by this session
            1 CPU used when call started
            1 DB time
         9193 RM usage
            3 Requests to/from client
            2 SQL*Net roundtrips to/from client
          664 buffer is not pinned count
         1662 buffer is pinned count
          323 bytes received via SQL*Net from client
       171333 bytes sent via SQL*Net to client
            2 calls to get snapshot scn: kcmgss
            2 calls to kcmgcs
          666 consistent gets
            1 consistent gets examination
            1 consistent gets examination (fastpath)
          666 consistent gets from cache
          665 consistent gets pin
          665 consistent gets pin (fastpath)
            2 execute count
            1 index range scans
      5455872 logical read bytes from cache
          665 no work - consistent read gets
           39 non-idle wait count
            2 opened cursors cumulative
            1 opened cursors current
            2 parse count (total)
            2 process last non-idle time
            1 session cursor cache hits
          666 session logical reads
            1 sorts (memory)
         2024 sorts (rows)
          999 table fetch by rowid
          327 table fetch continued row
            3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0
-------------------------------------

select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |   302 (100)|    999 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |    999 |   141K|   302   (0)|    999 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |    999 |       |     4   (0)|    999 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

    2 - access("ID">1 AND "ID"<1001)

 

We can see that the cost of the plan has now changed.

Although the cost of reading the index itself is still the same with a cost of 4, the overall cost of the plan has increased to 302 (previously it was 21).

So the difference in plan costs is 302 – 21 = 281. And it’s pretty easy to see where this extra comes from…

The extra costs is basically query selectivity x no of migrated rows

Extra costs = 0.005 x 56186 = 281.

So the index scan costing formula should really be updated to be:

Index Scan Cost = blevel +
                                    ceil(effective index selectivity x leaf_blocks) +
                                    ceil(effective table selectivity x clustering_factor) +
                                    ceil(effective table selectivity x chain_cnt)

Now, IMHO, this new cost is actually more accurate and better matches the true cost of now using the index, which requires 666 Consistent Gets (previously, before the rows migrated, the index plan required just 18 Consistent Gets).

So in some respects, this new cost might not be a bad thing. But then again, a sudden change in such costings due to a flood of new migrated rows might result in an unexpected and undesired plan changes that have been carefully crafted for statistics generated with the conventional DBMS_STATS collection method.

However, it’s not sufficient to simply collect fresh statistics using DBMS_STATS to get the previous CBO costings where migrated rows are ignored:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables
     where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4906              86          415            167        56186

Simply collecting fresh statistics does NOT clear out the CHAIN_CNT statistic and so the CBO costings remain the same as with ANALYZE command:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0
-------------------------------------

select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |   302 (100)|    999 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |    999 |   141K|   302   (0)|    999 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |    999 |       |     4   (0)|    999 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">1 AND "ID"<1001)

You need to first delete the table statistics to remove the CHAIN_CNT statistic (which of course comes with obvious dangers now there are no table statistics present) before you collect fresh statistics using DBMS_STATS:

SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables
     where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4906               0            0            167            0

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

The CHAIN_CNT statistic has finally been cleared to 0 and the CBO costings now returned to as it was previously when such migrated rows were ignored:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0
-------------------------------------

select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    21 (100)|    999 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   163K|    21   (0)|    999 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |       |     4   (0)|    999 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">1 AND "ID"<1001)

The CBO costs are now back to the previous 21.

So I’m a little in two minds about this. I think the statistics generated and used by the CBO are better with the ANALYZE command, but would still suggest collecting the necessary statistics using the recommended DBMS_STATS approach. Perhaps Oracle giving us the option to collect these additional statistics using DBMS_STATS might be a useful enhancement… 🤷‍♂️

Now, at one point in time, a long long time ago, I’m reasonably sure the CBO previously didn’t use the CHAIN_CNT statistic. However, it came as no real surprise when I researched when the CBO had started using the CHAIN_CNT statistic in its calculations, that Jonathan Lewis had already written on this subject way way back in April 2009 🙂

So Oracle definitely had this behaviour all the way back to at least 9i and continues to behave this way in 21c. Ah well, better late than never I guess to finally realise how all this actually works…

UPDATE (29 March 2023): Jonathan Lewis can kindly confirmed with me that CHAIN_CNT was definitely ignored back in version 8.1.7.4 and that this changed to the current behaviour in either 9.0 or 9.2.

CBO Costing Plans With Migrated Rows Part I (“Ignoreland”) March 21, 2023

Posted by Richard Foote in BLEVEL, CBO, Clustering Factor, Data Clustering, Index Access Path, Index Height, Index statistics, Leaf Blocks, Migrated Rows, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard's Blog, ROWID.
3 comments

Whilst recently blogging about Migrated Rows and specifically changes to how ROWIDs are now maintained on the fly in Oracle Autonomous Databases, I made a discovery regarding how the Cost-Based Optimizer (CBO) costs such plans. This is one of the key reasons why I blog, not only to try and share odd titbits about how Oracle works, but also to hopefully learn much myself in the process.

Imagine my surprise in not only learning that Oracle and the CBO works differently to how I had always thought Oracle worked in this respect, but that this behaviour has been the case since at least Oracle 9i.

In Part I, I’ll use the same example of migrated rows as I’ve used in the past few blog posts and initially show how the CBO generally costs such plans (and by which I had incorrectly assumed ALWAYS costed such plans).

Let’s start by creating and populating a tightly packed table (in an environment where ROWIDs are NOT updated on the fly):

SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0;

Table BOWIE created.

SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

I’ll next create an index on the well clustered ID column (as the rows are inserted in ID column order within the table):

SQL> create index bowie_id_i on bowie(id);

Index BOWIE_ID_I created.

Next, we’ll use the Oracle recommended method of collecting table/index statistics, by using the DBMS_STATS package:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables
where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      3268               0            0            111            0

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

 

Note the key index statistics here: BLEVEL=1, LEAF_BLOCKS=473 and the near perfect CLUSTERING_FACTOR=3250.

If we run the following query featuring a non-equality range predicate:

 

SQL> select * from bowie where id > 1 and id < 1001;

999 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0
-------------------------------------
select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |    999 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |    999 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |    999 |00:00:00.01 |       4 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">1 AND "ID"<1001)

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
          1 DB time
       7678 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         16 buffer is not pinned count
       1983 buffer is pinned count
        323 bytes received via SQL*Net from client
     171383 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         18 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         18 consistent gets from cache
         17 consistent gets pin
         17 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     147456 logical read bytes from cache
         17 no work - consistent read gets
         40 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          2 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
         18 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
        999 table fetch by rowid
          3 user calls

We notice that the CBO indeed uses the index.

They key statistic to note here is that Consistent Gets is just 18, which is extremely low considering we’re returning 999 rows. This is due to the fact the index is currently extremely efficient as it can fetch multiple rows by visiting the same table block due to the excellent clustering/ordering of the required ID column values (and also due to my high arraysize session setting).

If we look at the CBO costings for this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0

-------------------------------------

select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     |Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    21 (100)|    999 |00:00:00.01 |     18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   108K|      21 (0)|    999 |00:00:00.01 |     18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |       |       4 (0)|    999 |00:00:00.01 |      4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">1 AND "ID"<1001)

 

I’ve previously discussed many times how the CBO costs index access paths, but it’s always useful to go over this again, as it’s the most common question I get asked when I visit customer sites.

The KEY statistic the CBO has to determine is the estimated Selectivity of the query (the estimated percentage of rows to be returned), as this is the driver of all the subsequent CBO calculations.

The Selectivity of this range-based predicate query is calculated as follows:

Selectivity = (Highest Bound Value – Lowest Bound Value) / (Highest Value – Lowest Value)
= (1001-1) /(200000-1)
= 1000/199999
=  approx. 0.005

Once Oracle has the selectivity, it can calculate the query Cardinality (estimated number of rows) as follows:

Cardinality = Selectivity x No of Rows

Cardinality = 0.005 x 200000 = 1000 rows

This is our visual window into the likelihood that the CBO has made an accurate decision with its execution plan. If the cardinality estimates are reasonably accurate, then the CBO is likely to generate a good plan. If the cardinality estimates are way off, then the CBO is more likely to generate an inappropriate plan.

The CBO cardinality estimate in the above plan is 1000 rows, whereas the number of rows actually returned is 999 rows.

So indeed, the CBO has got the cardinality almost spot on (except for a trivial rounding error) and so we have a high degree of confidence that the CBO is using the correct selectivity estimates when they get plugged into the following CBO formula for costing an index range scan (using this selectivity of 0.005 and the index statistics listed above):

Index Scan Cost = (blevel + ceil(effective index selectivity x leaf_blocks)) + ceil(effective table selectivity x clustering_factor)

= (1 + ceil(0.005 x 467)) + ceil(0.005 x 3250)
= (1 + 3) + 17
= 4 + 17 = 21

So we can clearly see where the CBO gets its costings for both reading the index during the Index Range Scan (4) and for the plan as a whole (21).

The CBO cost of 21 very closely resembles the 18 consistent gets accessed when the plan is executed. This to me suggests that the CBO has indeed costed this plan very accurately and appropriately.

It’s interesting to note in the above execution plan that Oracle is attributing 100% of this cost of 21 to CPU (21 (100)). That will be a discussion for another day…

OK, let’s now perform an update on the table, increasing the size of the rows such that I generate a bunch of migrated rows:

SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

If we now collect fresh statistics again using DBMS_STATS:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables
where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4906               0            0            167            0

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

 

We notice that none of the key statistics have changed, except for the number of Table Blocks (now 4906, previously it was 3268) and the Average Row Length has also increased (now 167, previously it was 111). Both of these can of course be attributed to the increase in the size of the values now stored in the NAME column following the Update.

Importantly, notice that collecting statistics via DBMS_STATS does NOT collect data for the CHAIN_CNT statistic, it remains at 0 even though many migrated rows were actually generated by the Update statement (as we’ll see below).

Increasing the Table Blocks will result in an associated increase in the cost of reading this table via a Full Table Scan (FTS).

We notice that none of the index-related statistics changed following the Update statement (as in this example, Oracle does NOT update the ROWIDs of any of the migrated rows, Oracle simply stores a pointer in the original block to denote the new physical location of the migrated rows as previously discussed).

So if we only INCREASE the cost of a FTS (via having more Table Blocks) but keep intact all the previous index related statistics, then the CBO is certainly going to again select the same Index Range Scan plan, as the plan will have the same (cheaper than FTS) costings as before.

If we re-run the query again:

SQL> select * from bowie where id > 1 and id < 1001;

999 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0
-------------------------------------
select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |    999 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |    999 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |    999 |00:00:00.01 |       4 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">1 AND "ID"<1001)

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
       7709 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
        664 buffer is not pinned count
       1662 buffer is pinned count
        323 bytes received via SQL*Net from client
     171500 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        666 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
        666 consistent gets from cache
        665 consistent gets pin
        665 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
    5455872 logical read bytes from cache
        665 no work - consistent read gets
         39 non-idle wait count
          1 non-idle wait time
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
        666 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
        999 table fetch by rowid
        327 table fetch continued row
          3 user calls

We notice that indeed it’s the same Index Range Scan plan as before.

But we notice that the number of Consistent Gets has increased substantially to 666 (previously it was just 18). The reason for this large jump is due to the now 327 table fetch continued rows that need to be accessed due to the newly migrated rows following the Update. This number is then doubled (so 2 x 327 = 654) to represent the approximate additional Consistent Gets we now need to perform, as Oracle needs to read the additional table block to access the migrated row’s new physical location AND to now re-read the original table block to access the next row to be fetched (previously Oracle could read all the required consecutive rows required from the same table block within the one consistent get).

So it’s now actually substantially more expensive to read the required 1000 rows via this index due to this increase in necessary consistent gets.

But if we look at the actual cost of this plan now:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'b1vwpu2rgn8p5',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID b1vwpu2rgn8p5, child number 0

-------------------------------------

select * from bowie where id > 1 and id < 1001

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     |Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    21 (100)|    999 |00:00:00.01 |    666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   163K|      21 (0)|    999 |00:00:00.01 |    666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |       |       4 (0)|    999 |00:00:00.01 |      4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">1 AND "ID"<1001)

 

We notice that as expected (as none of the index-related statistics have changed), that despite being much more expensive to now use this index, the costs of this plan (4 for reading the index and 21 overall) remain unchanged.

I would argue that these CBO costs are no longer as accurate as the 21 total CBO cost does not so closely represent the actual 666 consistent gets now required.

Now, the 327 table fetch continued row statistics from the previous run is clear proof we indeed have migrated rows following the Update statement.

But if we want to confirm how many migrated rows we now have in the table, we can use the ANALYZE command to collect these additional statistics:

SQL> analyze table bowie compute statistics;

Table BOWIE analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables
where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4906              86          415            170        56186

 

We notice that we now have a CHAIN_CNT of 56186.

Now this statistic can represent any row that is not housed inside a single table block (for which there could be a number of possible reasons, such as a row simply being too long to fit in a single table block), but as all rows are still relatively tiny, we can be certain that indeed all 56186 chained rows represent migrated rows.

Now that I’ve gone and used ANALYZE, primarily to generate this CHAIN_CNT statistic, my previous understanding of how the CBO costs migrated rows crumbles away, as I’ll discuss in my next post…

Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part III (“Dancing With The Big Boys”) March 9, 2023

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Changing ROWID, Clustering Factor, Data Clustering, Full Table Scans, Index Access Path, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, ROWID.
add a comment

In my previous post, I discussed how you can best reorg a table that has a significant number of migrated rows impact the Clustering Factor of important indexes, when such tables have the ENABLED ROW MOVEMENT disabled.

In this post I’ll discuss resolving similar issues, but when ROWIDs are updated on the fly when rows are migrated in Oracle Autonomous Databases.

As I discussed previously, by updating indexes with the new ROWIDs when rows migrate, such indexes can potentially increase in size as they store both old/new index entries concurrently AND due to the increased likelihood of associated index block splits. Additionally, such indexes can also have their Clustering Factor directly impacted when migrated rows disrupt the otherwise tight clustering of specific columns.

As such, we may want to address these issues to improve the performance of impacted queries.  But it’s important we address these issues appropriately…

To illustrate all this, I’m going to re-run the same demo as my previous post, but on a table with ENABLE ROW MOVEMENT enabled.

I’ll start by creating and populating a tightly packed table with ENABLE ROW MOVEMENT enabled and with data inserted in ID column order:

SQL> create table bowie2(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0 ENABLE ROW MOVEMENT;

Table BOWIE2 created.

SQL> insert into bowie2 SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

I’ll now create an index on this well ordered/clustered ID column:

SQL> create index bowie2_id_i on bowie2(id);

Index BOWIE2_ID_I created.

Next, I’ll update the table, increasing the size of the rows such that I generate a bunch of migrated rows:

SQL> update bowie2 set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

 

If we check the number of migrated rows:

SQL> analyze table bowie2 compute statistics;

Table BOWIE2 analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE2             200000      4654              82          367            169            0

We notice there are indeed 0 migrated rows. This is because in Oracle Autonomous Databases, the associated ROWIDs of migrated rows as updated on the fly in this scenario.

If we check the current Clustering Factor of the index:

SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE2');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE2             200000      4654

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            2            945               109061

We can see that although the data was initially inserted in ID column order, we now have a relatively poor Clustering Factor at 109061 as the migrated rows have disrupted this previously perfect clustering.

We also notice that the BLEVEL has increased from 1 to now be 2 and the number of Leaf Blocks has increased to 945 from 473 after the rows migrated (as I discussed previously).

If we now run a query that returns 4200 rows from a 200,000 row table:

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

----------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows | A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   4200 |00:00:00.02 |    4572 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

Note
-----
    - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          4 CPU used by this session
          4 CPU used when call started
          4 DB time
      37101 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
          2 buffer is not pinned count
        325 bytes received via SQL*Net from client
     461965 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
         14 calls to kcmgcs
       4572 consistent gets
       4572 consistent gets from cache
       4572 consistent gets pin
       4572 consistent gets pin (fastpath)
          2 execute count
   37453824 logical read bytes from cache
       4560 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       4572 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4560 table scan blocks gotten
     252948 table scan disk non-IMC rows gotten
     252948 table scan rows gotten
          1 table scans (short tables)
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

-------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |       |  1264 (100)|   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   684K|    1264 (1)|   4200 |00:00:00.02 |    4572 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

We can see that Oracle has decided to perform a Full Table Scan (FTS) and not use the index.

The Clustering Factor of the ID column is now so bad, that returning 4200 rows via such an index is just too expensive. The FTS is now deemed the cheaper option by the CBO.

We notice that the CBO cost of the FTS is 1264.

If we run a query that forces the use of the index:

SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0
-------------------------------------
select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      21 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          2 DB time
      14531 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2646 buffer is not pinned count
       5755 buffer is pinned count
        348 bytes received via SQL*Net from client
     462143 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2665 consistent gets
          2 consistent gets examination
          2 consistent gets examination (fastpath)
       2665 consistent gets from cache
       2663 consistent gets pin
       2663 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   21831680 logical read bytes from cache
       2663 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          3 process last non-idle time
          2 session cursor cache count
       2665 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0

-------------------------------------

select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |  2314 (100)|   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|    2314 (1)|   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      22 (0)|   4200 |00:00:00.01 |      21 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

The cost of the Index Range Scan plan has an overall cost of 2314, greater than the 1264 cost of the FTS plan.

Notice that the cost of using just the index within the plan is currently 22.

So the vast majority of the cost of this plan (2314 – 22 = 2292) is in Oracle having to access so many different table blocks due to the poor index Clustering Factor and NOT in the increased size of the index.

As I’ve discussed numerous times, you can potentially make an index smaller by rebuilding the index (if there’s free space within the index), but the impact on the Clustering Factor will be nothing but “disappointing”…

If we just rebuild the index:

SQL> alter index bowie2_id_i rebuild online;

Index BOWIE2_ID_I altered.

And now look at the new index related statistics:

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            1            473               109061

We notice that the index has indeed decreased in size, back to what is was before the row migrated following the Update (Blevel=1 and Leaf Blocks=473).

But the Clustering Factor remains unchanged at 109061.

If we now re-run the query:

 

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

----------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows | A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   4200 |00:00:00.02 |    4572 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          3 CPU used by this session
          3 CPU used when call started
          3 DB time
      31738 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
          2 buffer is not pinned count
        325 bytes received via SQL*Net from client
     461972 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
         14 calls to kcmgcs
       4572 consistent gets
       4572 consistent gets from cache
       4572 consistent gets pin
       4572 consistent gets pin (fastpath)
          2 execute count
   37453824 logical read bytes from cache
       4560 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          3 process last non-idle time
          2 session cursor cache count
       4572 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4560 table scan blocks gotten
     252948 table scan disk non-IMC rows gotten
     252948 table scan rows gotten
          1 table scans (short tables)
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

-------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |       |  1264 (100)|   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   684K|    1264 (1)|   4200 |00:00:00.02 |    4572 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

The CBO decides to still use a FTS instead of the index.

If we look at the cost now of using the index for this query:

SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0
-------------------------------------
select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |    2655 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |    2655 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          1 DB time
      13484 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2646 buffer is not pinned count
       5755 buffer is pinned count
        347 bytes received via SQL*Net from client
     461972 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2655 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
       2655 consistent gets from cache
       2654 consistent gets pin
       2654 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   21749760 logical read bytes from cache
       2654 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       2655 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0

-------------------------------------

select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |  2303 (100)|   4200 |00:00:00.01 |    2655 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|    2303 (1)|   4200 |00:00:00.01 |    2655 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We notice the cost of the index has only moderately gone down to 2303 (previously it was 2314).

This reduction of 11 in the CBO cost is due entirely to the fact the index is now approximately 1/2 the size as it was before the index rebuild and has thus reduced the cost of reading the index blocks to 11 within the execution plan (previously it was 22).

But the vast majority of the cost within the Index Range Scan plan comes again with accessing the table blocks, which remains unchanged due to the unchanged Clustering Factor.

To reduce the Clustering Factor, we need to change the clustering of the data with the TABLE.

So, to improve the performance of this potentially important query, we need to re-cluster the data just as we did in the example in my previous post when we had migrated rows listed and ROWIDs were not updated on the fly.

We can now add an appropriate Clustering Attribute before we perform the table reorg:

SQL> alter table bowie2 add clustering by linear order (id);

Table BOWIE2 altered.

SQL> alter table bowie2 move online;

Table BOWIE2 altered.

If we now look at the Clustering Factor of this important index:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE2             200000      4936

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            1            473                 4850

The Clustering Factor has been reduced down to the almost perfect 4850, down from the previous 109061.

If we now re-run the query:

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
         90 Cached Commit SCN referenced
      11345 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         93 buffer is not pinned count
       8308 buffer is pinned count
        325 bytes received via SQL*Net from client
     462117 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        102 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
        102 consistent gets from cache
        101 consistent gets pin
        101 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     835584 logical read bytes from cache
        101 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          2 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
        102 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

 

We can see the query now automatically uses the index and only requires just 102 consistent gets, down from 4572 when it performed the FTS.

If we look at the cost of this new plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |   113 (100)|   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|     113 (0)|   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the plan has a cost of just 113, which is both much more accurate and close to the 102 consistent gets and much less than the previous cost of 1340 for the FTS plan.

So in specific examples where migrated rows significantly impact the Clustering Factor of indexes important to our applications, including when ROWIDs are updated on the fly in Oracle Autonomous Databases, we may need to appropriately reorg such tables to repair the Clustering Factor of impacted indexes.

I’ve mentioned a number of times in this series how tables in Oracle Autonomous Databases with ENABLE ROW MOVEMENT have their ROWIDs updated on the fly when a row migrates. In my next post, I’ll discuss how even tables that don’t have the ENABLE ROW MOVEMENT clause set can still have their ROWIDs updated on the fly when a row migrates…

Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part II (“Dancing Out In Space”) March 7, 2023

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Changing ROWID, Clustering Factor, Data Clustering, David Bowie, Full Table Scans, Index Access Path, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Richard's Musings, ROWID.
1 comment so far

In my previous post, I discussed how the clustering of data can be impacted if rows migrate and how this in turn can have a detrimental impact on the efficiency of associated indexes.

In this post, I’ll discuss what you can do (and not do) to remedy things in the relatively unlikely event that you hit this issue with migrated rows.

I’ll just discuss initially the example where the table is defined without ENABLE ROW MOVEMENT enabled in the Transaction Processing Autonomous Database (and so does NOT update ROWIDs on the fly when a row migrates).

I’ll start by again creating and populating a tightly packed table, with the data inserted in ID column order:

SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0;

Table BOWIE created.

SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

I’ll now create an index on this well ordered/clustered ID column:

SQL> create index bowie_id_i on bowie(id);

Index BOWIE_ID_I created.

Next, I’ll update the table, increasing the size of the rows such that I generate a bunch of migrated rows:

SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

 

If we check the number of migrated rows:

SQL> analyze table bowie compute statistics;

Table BOWIE analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables

where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4906              86          414            170        56186

 

We notice there are indeed 56186 migrated rows.

If we check the current Clustering Factor of the index:

SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4906

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

 

We notice the index still has an excellent Clustering Factor of just 3250. As the ROWIDs are NOT updated in this example when rows migrate, the index retains the same Clustering Factor as before the Update statement.

If we run the following query that returns 4200 rows (as per my previous post):

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          3 DB time
      24901 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2762 buffer is not pinned count
       7005 buffer is pinned count
        324 bytes received via SQL*Net from client
     461909 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2771 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
       2771 consistent gets from cache
       2770 consistent gets pin
       2770 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   22700032 logical read bytes from cache
       2770 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       2771 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
       1366 table fetch continued row
          3 user calls

We can see the query currently uses 2771 consistent gets, which is significantly higher than it could be, as Oracle has to visit the original table block and then follow the pointer to the new location for any migrated row that needs to be retrieved.

However, if we look at the cost of the current plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|      80 (0)|   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see it only has a cost of 80, as Oracle does not consider the additional accesses required now for these migrated rows. With such a perfect Clustering Factor, this cost is not particularly accurate and does not represent the true cost of the 2771 consistent gets now required.

Now there are various ways we can look at fixing this issue with all these migrated rows requiring additional consistent gets to access.

One method is to capture all the ROWIDs of the migrated rows, copy these rows to a temporary holding table, delete these rows and then re-insert them all back into the table from the temporary table.

We can identify the migrated rows by creating the CHAIN_ROWS table as per the Oracle supplied UTLCHAIN.SQL script and then use the ANALYZE command to store their ROWIDs in this CHAIN_ROWS table:

SQL> create table CHAINED_ROWS (
2 owner_name varchar2(128),
3 table_name varchar2(128),
4 cluster_name varchar2(128),
5 partition_name varchar2(128),
6 subpartition_name varchar2(128),
7 head_rowid rowid,
8 analyze_timestamp date
9* );

Table CHAINED_ROWS created.

SQL> analyze table bowie list chained rows;

Table BOWIE analyzed.

SQL> select table_name, head_rowid from chained_rows where table_name='BOWIE' and rownum<=10;

   TABLE_NAME            HEAD_ROWID
_____________ _____________________
BOWIE         AAAqFjAAAAAE6CzAAP
BOWIE         AAAqFjAAAAAE6CzAAR
BOWIE         AAAqFjAAAAAE6CzAAU
BOWIE         AAAqFjAAAAAE6CzAAW
BOWIE         AAAqFjAAAAAE6CzAAZ
BOWIE         AAAqFjAAAAAE6CzAAb
BOWIE         AAAqFjAAAAAE6CzAAe
BOWIE         AAAqFjAAAAAE6CzAAg
BOWIE         AAAqFjAAAAAE6CzAAj
BOWIE         AAAqFjAAAAAE6CzAAl

 

Another method we can now utilise is to simply MOVE ONLINE the table:

SQL> alter table bowie move online;

Table BOWIE altered.

 

If we now look at the number of migrated rows after the table reorg:

SQL> analyze table bowie compute statistics;

Table BOWIE analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4936              56          838            169            0

 

We can see we no longer have any migrated rows.

BUT, if we now look at the Clustering Factor of this index:

SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4936

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473               114560

 

We can see it has now significantly increased to 114560 (previously it was just 3250).

The problem of course is that if the ROWIDs now represent the correct new physical location of the migrated rows, the previously perfect clustering/ordering of the ID column has been impacted.

If we now re-run the query returning the 4200 rows:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1845943507

---------------------------------------------------------------------------------------------
| Id | Operation                  | Name  | Starts | E-Rows | A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |       |      1 |        |   4200 |00:00:00.02 |    4857 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE |      1 |   4200 |   4200 |00:00:00.02 |    4857 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

Statistics
-----------------------------------------------------------
          3 CPU used by this session
          3 CPU used when call started
       4849 Cached Commit SCN referenced
          2 DB time
      25870 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
          2 buffer is not pinned count
        324 bytes received via SQL*Net from client
     461962 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          9 calls to kcmgcs
       4857 consistent gets
       4857 consistent gets from cache
       4857 consistent gets pin
       4857 consistent gets pin (fastpath)
          2 execute count
   39788544 logical read bytes from cache
       4850 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          2 process last non-idle time
          1 session cursor cache count
       4857 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4850 table scan blocks gotten
     200000 table scan disk non-IMC rows gotten
     200000 table scan rows gotten
          1 table scans (short tables)
          3 user calls

 

Oracle is now performing a Full Table Scan (FTS). The number of consistent gets now at 4857 is actually worse than when we had the migrated rows (previously at 2771)

The Clustering Factor of the ID column is now so bad, that returning 4200 rows via such an index is just too expensive. The FTS is now deemed the cheaper option by the CBO.

If we look at the CBO cost of using this FTS plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1845943507

------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |       |      1 |        |       |  1340 (100)|   4200 |00:00:00.02 |    4857 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE |      1 |   4200 |   684K|    1340 (1)|   4200 |00:00:00.02 |    4857 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

We can see the cost of this plan is 1340.

If we compare this with the cost of using the (now deemed) inefficient index:

SQL> select /*+ index (bowie) */ * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID 9215hkzd3v1up, child number 0
-------------------------------------
select /*+ index (bowie) */ * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |    2784 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |    2784 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
       2741 Cached Commit SCN referenced
          2 DB time
      12633 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2775 buffer is not pinned count
       5626 buffer is pinned count
        345 bytes received via SQL*Net from client
     462170 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2784 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
       2784 consistent gets from cache
       2783 consistent gets pin
       2783 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   22806528 logical read bytes from cache
       2783 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          4 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       2784 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'9215hkzd3v1up',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID 9215hkzd3v1up, child number 0

-------------------------------------

select /*+ index (bowie) */ * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |  2418 (100)|   4200 |00:00:00.01 |    2784 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|    2418 (1)|   4200 |00:00:00.01 |    2784 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the CBO cost of the index is now 2418, more than the 1340 cost of using the FTS.

So in the scenario where by migrating a significant number of rows, we impact the Clustering Factor and so the efficiency of vital indexes in our applications, we need to eliminate the migrated rows in a more thoughtful manner.

An option we have available is to first add an appropriate Clustering Attribute before we perform the table reorg:

SQL> alter table bowie add clustering by linear order (id);

Table BOWIE altered.

SQL> alter table bowie move online;

Table BOWIE altered.

 

If we now look at the Clustering Factor of this important index:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4936

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 4850

 

The Clustering Factor has been reduced down to the almost perfect 4850, down from the previous 114560.

If we now re-run the query:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
         89 Cached Commit SCN referenced
          1 DB time
      11249 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         93 buffer is not pinned count
       8308 buffer is pinned count
        324 bytes received via SQL*Net from client
     462165 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        102 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
        102 consistent gets from cache
        101 consistent gets pin
        101 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     835584 logical read bytes from cache
        101 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
        102 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

We can see the query now automatically uses the index and only requires just 102 consistent gets (down from 4857 when it performed the FTS and down from 2771 when we had the migrated rows).

If we look at the cost of this new plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |   113 (100)|   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|     113 (0)|   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the plan has a cost of just 113, which is both much more accurate and close to the 102 consistent gets and much less than the previous cost of 1340 for the FTS plan.

So in specific scenarios where by having migrated rows we significantly impact the Clustering Factor of indexes important to our applications, we have to be a little cleverer in how we address the migrated rows.

This can also the case in the new scenario where Oracle automatically updates the ROWIDs of migrated rows, as I’ll discuss in my next post…

Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part I (“Growin’ Up”) March 1, 2023

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, BLEVEL, CBO, Changing ROWID, Clustering Factor, Data Clustering, Hints, Index Access Path, Index Block Splits, Index Delete Operations, Index Height, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle Statistics, Oracle19c, Performance Tuning, Richard Foote Training, Richard's Blog, ROWID.
2 comments

In my previous post I discussed how an index can potentially be somewhat inflated in size after ROWIDs are updated on the fly after a substantial number of rows are migrated.

However, there’s another key “factor” of an index that in some scenarios can be impacted by this new ROWID behaviour with regard migrated rows.

To highlight this scenario, I’ll again start by creating and populating a table with ENABLE ROW MOVEMENT disabled:

SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0;

Table BOWIE created.

SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

I’ll next create an index on the ID column. The important aspect with the ID column is that the data is entered monotonically in ID column order, so the associated index will have an excellent (very low) Clustering Factor:

SQL> create index bowie_id_i on bowie(id);

Index BOWIE_ID_I created.

If we look at some key statistics of the table and index:

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      3268

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

We can see that the number of table blocks is 3268, the number of index leaf blocks is 473 and we indeed have a near perfect Clustering Factor of 3250.

If we run a couple of queries:

SQL> select * from bowie where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0
-------------------------------------
select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       4 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
       7353 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         16 buffer is not pinned count
       1985 buffer is pinned count
        324 bytes received via SQL*Net from client
     171305 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         18 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         18 consistent gets from cache
         17 consistent gets pin
         17 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     147456 logical read bytes from cache
         17 no work - consistent read gets
         38 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
         18 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
          3 user calls

We can see for this first query that returns 1000 rows, it requires just 18 consistent gets, thanks primarily due to the efficient index with the perfect Clustering Factor.

If we look at the cost of this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gz5u92hmjwz1h',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0

-------------------------------------

select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    21 (100)|   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   108K|      21 (0)|   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |       |       4 (0)|   1000 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

We can see the plan has an accurate cost of just 21.

If we now run a similar query that returns a few more rows:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
          1 DB time
      11353 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         59 buffer is not pinned count
       8342 buffer is pinned count
        324 bytes received via SQL*Net from client
     461834 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         68 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         68 consistent gets from cache
         67 consistent gets pin
         67 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     557056 logical read bytes from cache
         67 no work - consistent read gets
         73 non-idle wait count
         2 opened cursors cumulative
         1 opened cursors current
         2 parse count (total)
         1 process last non-idle time
         2 session cursor cache count
        68 session logical reads
         1 sorts (memory)
      2024 sorts (rows)
      4200 table fetch by rowid
         3 user calls

We can see that it only required just 68 consistent gets to return 4200 rows, thanks to the excellent data clustering and associated very low Clustering Factor.

If we look at the cost of this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   455K|      80 (0)|   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the cost of the plan is currently a relatively accurate 80.

OK, let’s now perform an update on this table that generates a bunch of migrated rows:

SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

If we now look at the table and index statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4906

We can see that the table blocks value has increased to 4906 (previously 3268). This as explained previously is to due in large part to the increased NAME column values and also due to the pointers in the original table blocks that point to the new locations of the migrated rows.

This relates to approximately a 50% increase in table blocks.

If we look at the current index statistics:

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

We can see that these values are all unchanged, as the ROWIDs in indexes remain unchanged when a row migrates, when ENABLE ROW MOVEMENT is not set.

Therefore, when we re-run these same queries:

SQL> select * from bowie where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0
-------------------------------------
select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   1000 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   1000 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       4 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 DB time
       7967 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
        664 buffer is not pinned count
       1664 buffer is pinned count
        324 bytes received via SQL*Net from client
     171419 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        666 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
        666 consistent gets from cache
        665 consistent gets pin
        665 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
    5455872 logical read bytes from cache
        665 no work - consistent read gets
         37 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
        666 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
        327 table fetch continued row
          3 user calls

The number of consistent gets has increased significantly to 666 (previously it was just 18).

Now we can attributed an increase of approximately 50% of the previous consistent gets (18 x 0.50 = 9) due to the 50% increase in table blocks required now to store the rows due to the increased row size.

We can also attribute an additional 327 consistent gets for the table fetch continued row value listed in the statistics, representing the extra consistent gets required to access the migrated rows from their new physical location.

But 18 + 9 + 327 = 354 still leaves us short of the new 666 consistent gets value.

The problem with having to visit another table block to get a row from its new location is that it means Oracle has to re-access again the original table block to get the next row (rather than reading multiple rows with the same consistent get).

So it’s actually approximately 2 x table fetch continued row, by which the number of consistent gets is going to increase when accessing migrated rows (noting that the last migrated row in a block will only incur a additional consistent get as the next table block accessed will differ regardless).

If we look at the new CBO cost for this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gz5u92hmjwz1h',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0

-------------------------------------

select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    21 (100)|   1000 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   163K|      21 (0)|   1000 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |       |       4 (0)|   1000 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

 

We notice the CBO cost for this plan remains unchanged at 21.

This is totally to be expected, as the index statistics by which the cost of an index scan is calculated are unchanged.

Considering the rough “rule of thumb” is that the CBO cost of an index scan should be in the ball-park of the number of possible IOs, the fact the plan now uses 666 consistent gets highlights this cost of just 21 is no longer as accurate…

If we look at the second SQL that returns 4200 rows:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          2 DB time
      14103 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2762 buffer is not pinned count
       7005 buffer is pinned count
        324 bytes received via SQL*Net from client
      461947 bytes sent via SQL*Net to client
           2 calls to get snapshot scn: kcmgss
           2 calls to kcmgcs
        2771 consistent gets
           1 consistent gets examination
           1 consistent gets examination (fastpath)
        2771 consistent gets from cache
        2770 consistent gets pin
        2770 consistent gets pin (fastpath)
           2 execute count
           1 index range scans
    22700032 logical read bytes from cache
        2770 no work - consistent read gets
          72 non-idle wait count
           2 opened cursors cumulative
           1 opened cursors current
           2 parse count (total)
           1 process last non-idle time
           2 session cursor cache count
        2771 session logical reads
           1 sorts (memory)
        2024 sorts (rows)
        4200 table fetch by rowid
        1366 table fetch continued row
           3 user calls

We again notice consistent gets has increased significantly to 2771 (previously it was just 68). Again, these additional consistent gets can not be attributed to the extra size of the table and the additional approximate 2 x 1366 table fetch continued row gets.

If we now look at the cost of this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________________
____________

SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|      80 (0)|   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We again notice the CBO cost for this plan remains unchanged at 80, again totally expected as the underlying index statistics have remain unchanged after the update statement.

But again, not necessary as accurate a cost as it was previously…

 

If we repeat this demo, but this time on a table with ENABLE ROW MOVEMENT enabled:

SQL> create table bowie2(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0 ENABLE ROW MOVEMENT;

Table BOWIE2 created.

SQL> insert into bowie2 SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> create index bowie2_id_i on bowie2(id);

Index BOWIE2_ID_I created.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE2             200000      3268

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

        INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
__________________ _________ ______________ ____________________
BOWIE2_ID_I                1            473                 3250

 

The table and index statistics are currently identical to the previous demo.

If we run the same two equivalent queries:

 

SQL> select * from bowie2 where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       4 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
       7909 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         16 buffer is not pinned count
       1985 buffer is pinned count
        325 bytes received via SQL*Net from client
     171306 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         18 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         18 consistent gets from cache
         17 consistent gets pin
         17 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     147456 logical read bytes from cache
         17 no work - consistent read gets
         37 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
         18 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
     3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gtkw2704bxj7q',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |    21 (100)|   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   108K|      21 (0)|   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |       |       4 (0)|   1000 |00:00:00.01 |       4 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)



SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
          2 DB time
      13157 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         59 buffer is not pinned count
       8342 buffer is pinned count
        325 bytes received via SQL*Net from client
     461838 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         68 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         68 consistent gets from cache
         67 consistent gets pin
         67 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     557056 logical read bytes from cache
         67 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
         68 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   455K|      80 (0)|   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

With identical table/index statistics, we notice as expected that both SQLs have the same consistent gets and CBO costs as with the previous demo.

If we now repeat the equivalent Update statement:

SQL> update bowie2 set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

 

If we look at the table statistics:

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME   NUM_ROWS     BLOCKS
_____________ ___________ _________
BOWIE2             200000      4654

 

We notice the number of table blocks has increased to 4654 due to the increased row lengths, but not as much as with the previous demo (where table blocks increased to 4906) as in this scenario, Oracle does not have to store the row location pointers in the original blocks for the migrated rows.

If we look at the index statistics:

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            2            945               109061

We notice that these are substantially different from the first demo, where ROWIDs for migrated rows are not updated on the fly.

By now updating the ROWIDs, the indexes can possibly increase in size as they have to store both the previous and new ROWIDs in separate index entries and hence Oracle is more likely to perform additional index block splits (as I discussed in my previous post).

The LEAF_BLOCKS are now 945 (previously 473) and even the BLEVEL has increased from 1 to 2.

Additionally, and perhaps importantly for specific key indexes, the Clustering Factor value of indexes can also be impacted. By migrating rows and physically storing them in different locations, this can potentially detrimentally impact the tight clustering of rows based on specific column values.

The Clustering Factor for the index on the monotonically increased ID column has now increased significantly to 109061, up from the previously perfect 3250.

So columns that have naturally good clustering (e.g.: monotonically increasing values such as IDs and dates) or have been manually well clustered for performance purposes, can have the Clustering Factor of associated indexes detrimentally impacted by migrated rows.

If we re-run the first query:

SQL> select * from bowie2 where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   1000 |00:00:00.01 |     639 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   1000 |00:00:00.01 |     639 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       7 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
          1 DB time
      15262 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
        634 buffer is not pinned count
       1367 buffer is pinned count
        325 bytes received via SQL*Net from client
     171421 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        639 consistent gets
          2 consistent gets examination
          2 consistent gets examination (fastpath)
        639 consistent gets from cache
        637 consistent gets pin
        637 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
    5234688 logical read bytes from cache
        637 no work - consistent read gets
         38 non-idle wait count
          1 non-idle wait time
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
        639 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
          3 user calls

I discussed in a previous post how by updating the ROWIDs of migrated rows we can improve performance, as Oracle can go directly to the correct new physical location of a migrated row.

But for some specific indexes, where data clustering is crucial, and we have a significant number migrated rows, this might not necessarily be the case.

We can see consistent gets here has increased to 639 (previously is was just 21), and so not hugely different from the 666 consistent gets required to fetch the migrated rows when the ROWIDs were not updated in the first demo.

If we look at the CBO costings:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gtkw2704bxj7q',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |   553 (100)|   1000 |00:00:00.01 |     639 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   163K|     553 (0)|   1000 |00:00:00.01 |     639 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |       |       7 (0)|   1000 |00:00:00.01 |       7 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

 

We can see the CBO cost has increased significantly to 553 (previously it was just 21).

With a much increased Clustering Factor, this will obviously impact the CBO costs of associated index scans.

In very extreme cases, these possible changes in the Clustering Factor can even impact the viability of using the index.

If we re-run the second query returning the 4200 rows:

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

----------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows | A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   4200 |00:00:00.02 |    4572 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

We can see that the CBO has now chosen to perform a Full Table Scan (FTS), rather than use the now less efficient index to return this number of rows.

If we look at the CBO costings of this FTS plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

-------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |       |  1264 (100)|   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   684K|    1264 (1)|   4200 |00:00:00.02 |    4572 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

The cost of the FTS plan is 1264.

If we compare this is a plan that used the index:

SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0
-------------------------------------
select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      21 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          2 DB time
      14531 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2646 buffer is not pinned count
       5755 buffer is pinned count
        348 bytes received via SQL*Net from client
     462143 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2665 consistent gets
         2 consistent gets examination
         2 consistent gets examination (fastpath)
      2665 consistent gets from cache
      2663 consistent gets pin
      2663 consistent gets pin (fastpath)
         2 execute count
         1 index range scans
  21831680 logical read bytes from cache
      2663 no work - consistent read gets
        73 non-idle wait count
         2 opened cursors cumulative
         1 opened cursors current
         2 parse count (total)
         3 process last non-idle time
         2 session cursor cache count
      2665 session logical reads
         1 sorts (memory)
      2024 sorts (rows)
      4200 table fetch by rowid
         3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0

-------------------------------------

select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |  2314 (100)|   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|    2314 (1)|   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      22 (0)|   4200 |00:00:00.01 |      21 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID">=1 AND "ID"<=4200)

 

The cost of using the index to retrieve the 4200 rows is 2310, more than the 1264 of the FTS.

 

For the vast majority of indexes, updating the ROWIDs for migrated rows will result in better performance, as such indexes will be able to directly access the correct new physical location of migrated rows, rather than having to visit the original table block and then follow the stored pointer to the new table block.

But for some very specific indexes, where data clustering is crucial, AND we have a significant number migrated rows, this might not necessarily be the case. The performance benefit might be minimal at best.

That’s more than enough for one post 🙂

In my next post, I’ll discuss how to potentially remedy these performance implications, both for tables with or without ENABLE TABLE MOVEMENT enabled…

Costing Concatenated Indexes With Range Scan Predicates Part II (Coming Back To Life) July 27, 2022

Posted by Richard Foote in Automatic Indexing, CBO, Column Statistics, Concatenated Indexes, Explain Plan For Index, Full Table Scans, Index Access Path, Index Column Order, Index Column Reorder, Index Internals, Index statistics, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Index Seminar, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard Foote Training.
add a comment

In my previous Part I post, I discussed how the CBO basically stops the index leaf block access calculations after a non-equality predicate. This means that for an index with the leading indexed column being accessed via an unselective non-equality predicate, a large percentage of the index’s leaf blocks might need to be scanned, making the index access path unviable.

In the example in Part I, an index on the ID, CODE columns was too expensive due to the unselective range-scan predicate based on the leading ID column.

To provide the CBO a potentially much more efficient access path, we need an index with the more selective CODE predicate to be the leading column:

SQL> CREATE INDEX radiohead_code_id_i ON radiohead(code, id);

Index created.

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor

FROM user_indexes WHERE index_name = 'RADIOHEAD_CODE_ID_I';

INDEX_NAME                        BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
----------------------------- ---------- ----------- -----------------
RADIOHEAD_CODE_ID_I                    1         265             98619

If we now re-run the previous query:

SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_CODE_ID_I |     4 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        806  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice the CBO is now using this new index, as the costs for this index-based plan have dropped significantly, down to just 6 (from the previous 116). This overall cost of 6 is lower than the cost of 105 for the Full Table Scan and hence the reason why this index-based plan is now chosen by the CBO.

This is all due entirely to the significant drop in costs in accessing the index itself, now just 2 (from the previous 112).

Importantly, these much lower costs are accurate as we can tell via the reduced number of consistent reads, now just 7 (from 114 from the previous index-based plan).

If we now look at the associated costings:

Effective Index Selectivity = CODE selectivity x ID selectivity

= (1/10000) x ((5000-1000)/(10000-1) + 2 x (1/10000))

= 0.0001 x ((4000/9999) + 0.0002)

= 0.0001 x 0.40024)

= 0.000040024

Effective Table Selectivity = same as Index Selectivity

= 0.000040024

 

The effective index selectivity of 0.000040024 is now much lower than the previous (0.40024), as the CBO can now consider the product of the selectivities of both columns).

If we now plug this improved effective index selectivity into the index path costing calculations:

Index IO Cost = blevel +

ceil(effective index selectivity x leaf_blocks) +

ceil(effective table selectivity x clustering_factor)

 

Index IO Cost = 1  +  ceil(0.000040024 x 265) + ceil(0.000040024 x 99034)

= 1 + 1  + 4

= 2 + 4

= 6

Index Access Cost  = IO Costs + CPU Costs (in this plan, 0% of total costs and so unchanged from the IO costs)

= 2 + 4

= 6

We can see how the respective 2 and 6 improved CBO index costings are derived.

So again, it’s important to note that Automatic Indexing is doing entirely the correct thing with these examples, when it creates an index with the equality based predicate columns as the leading columns of the index…

Automatic Indexing: JSON Expressions Part I (Making Plans For Nigel) April 13, 2022

Posted by Richard Foote in Automatic Indexing, Autonomous Database, CBO, Exadata, Function Based Indexes, Index statistics, JSON, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Virtual Columns.
1 comment so far

When Automatic Indexing was first released, one of the restrictions was that automatic indexes on JSON expressions were NOT supported.

However, the Oracle Database 21c doco mentions:

Automatic indexes can be single or multi-column. They are considered for the following: Selected expressions (for example, JSON expressions)“.

So on my (admittedly dodgy) “Exadata” VM, I thought I’ll check out how AI now indeed deals with JSON expressions.

I start by creating a simple little table that uses the new 21c JSON datatype and populate it with some JSON documents (note the PONumber key has effectively unique numeric values assigned):

SQL> CREATE TABLE bowie_json
       (id number,
        bowie_date date,
        bowie_order JSON);

SQL> insert into bowie_json
     select
     rownum,
     sysdate,
     '{"PONumber" : ' || rownum || ',
       "Reference" : "2022' || rownum || 'DBOWIE",
       "Requestor" : "David Bowie",
       "User" : "DBOWIE",
       "CostCenter" : "A42",
       "ShippingInstructions" : {"name" : "David Bowie",
                                 "Address": {"street" : "42 Ziggy Street",
                                             "city" : "Canberra",
                                              "state" : "ACT",
                                              "zipCode" : 2601,
                                              "country" : "Australia"},
                                 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
                                            {"type" : "Mobile", "number" : "417-555-1234"}]},
       "Special Instructions" : null,
       "AllowPartialShipment" : true,
       "LineItems" : [{"ItemNumber" : 1,
                       "Part" : {"Description" : "Hunky Dory",
                                 "UnitPrice" : 10.95},
                                  "Quantity" : 5.0},
                      {"ItemNumber" : 2,
                       "Part" : {"Description" : "Pin-Ups",
                                 "UnitPrice" : 10.95},
                                 "Quantity" : 3.0}]}'
from dual connect by level <= 2000000;

2000000 rows created.

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_JSON');

PL/SQL procedure successfully completed.

As always, it’s important to ensure the table has statistics, as AI does not work properly without them.

I then run a number of SQL statements, with different JSON expression based predicates, including:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

SQL> select * from bowie_json z where z.bowie_order.PONumber.number()=4242;

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber' returning number)=42;

Execution Plan
----------------------------------------------------------
Plan hash value: 1196930810

--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time      |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            | 20000 |   12M |  34476 (1) | 00:00:02  |
|* 1 | TABLE ACCESS FULL | BOWIE_JSON | 20000 |   12M |  34476 (1) | 00:00:02  |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON
           , '$.PONumber' RETURNING NUMBER NULL ON ERROR)=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     259127 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

They all return just the one row, but must currently use a Full Table Scan with no indexes present.

So what does AI make of things?

The first thing to note is that running the AI last activity report generates the following error:

SQL> select dbms_auto_index.report_last_activity() report from dual;
ERROR:
ORA-30954: char 0 is invalid in json_value(BOWIE_ORDER, '$.PONumber' returning VA
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 177
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 107
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8676
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 9226
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 89
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 167
ORA-06512: at line 1

no rows selected

If we look at the indexes now present with the table:

SQL> select index_name, index_type, auto, visibility, status, num_rows, leaf_blocks, clustering_factor from user_indexes
where table_name='BOWIE_JSON';

INDEX_NAME                INDEX_TYPE                AUT VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------- ------------------------- --- --------- -------- ---------- ----------- -----------------
SYS_IL0000081096C00003$$  LOB                       NO  VISIBLE   VALID
SYS_AI_ayvj257jd93cv      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        5141            380000
SYS_AI_gpdkwzugdn055      FUNCTION-BASED NORMAL     YES VISIBLE   VALID       2000000        4596            200000

SQL> select index_name, column_expression from user_ind_expressions where table_name='BOWIE_JSON';

INDEX_NAME                COLUMN_EXPRESSION
------------------------- --------------------------------------------------------------------------------
SYS_AI_ayvj257jd93cv      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING VARCHAR2(4000) ERR
OR ON ERROR NULL ON EMPTY)

SYS_AI_gpdkwzugdn055      JSON_VALUE("BOWIE_ORDER" FORMAT OSON , '$.PONumber' RETURNING NUMBER ERROR ON ER
ROR NULL ON EMPTY)

We can see that AI has indeed created two new automatic indexes, one on the VARCHAR2 JSON expression and one on the NUMBER JSON expression.

If we re-run the SQLs, we notice 3 very important points. Note the following example was run soon after the automatic indexes were created:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      | 20000 |   12M |    1524 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           | 20000 |   12M |    1524 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv |  8000 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     234168 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

The first point to note is that the CBO now chooses to use the newly created automatic index. As only one row is return, this is as one would hope.

But there are two other very important points/issues worth making about the above execution plan and associated costs and statistics. One is associated with new AI behaviour introduced in 21c and the other is associated with an old trap in relation to function-based indexes.

I’ll leave it to the discernible reader to spot these issues, before I cover them in Part II in the coming days…

Oracle 19c Automatic Indexing: Invisible/Valid Automatic Indexes (Bowie Rare) August 31, 2021

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Exadata, Index Access Path, Index statistics, Invisible Indexes, Invisible/Valid Indexes, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle Statistics, Oracle19c, Unusable Indexes.
1 comment so far

In my previous post, I discussed how newly created Automatic Indexes can have one of three statuses, depending the selectivity and effectiveness of the associated Automatic Index.

Indexes that improve performance sufficiently are created as Visible/Valid indexes and can be subsequently considered by the CBO. Indexes that are woeful and have no chance of improving performance are created as Invisible/Unusable indexes. Indexes considered potentially suitable but ultimately don’t sufficiently improve performance, are created as Invisible/Valid indexes.

Automatic Indexes are created as Visible/Valid indexes when shown to improve performance (by the _AUTO_INDEX_IMPROVEMENT_THRESHOLD parameter). But as I rarely came across Invisible/Valid Automatic Indexes (except for when Automatic Indexing is set to “Report Only” mode), I was curious to determine approximately at what point were such indexes created by the Automatic Indexing process.

To investigate things, I created a table with columns that contain data with various levels of selectivity, some of which should fall inside and outside the range of viability of any associated index, based on the cost of the associated Full Table Scan.

The following table has 32 columns of interest, each with a slight variation of distinct values giving small differences in overall column selectivity:

SQL> create table bowie_stuff1 (id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, code21 number, code22 number, code23 number, code24 number, code25 number, code26 number, code27 number, code28 number, code29 number, code30 number, code31 number, code32 number, name varchar2(42));

Table created.

SQL> insert into bowie_stuff1 
select rownum, 
       mod(rownum, 900)+1, 
       mod(rownum, 1000)+1, 
       mod(rownum, 1100)+1, 
       mod(rownum, 1200)+1, 
       mod(rownum, 1300)+1, 
       mod(rownum, 1400)+1, 
       mod(rownum, 1500)+1, 
       mod(rownum, 1600)+1, 
       mod(rownum, 1700)+1, 
       mod(rownum, 1800)+1, 
       mod(rownum, 1900)+1, 
       mod(rownum, 2000)+1, 
       mod(rownum, 2100)+1, 
       mod(rownum, 2200)+1, 
       mod(rownum, 2300)+1, 
       mod(rownum, 2400)+1, 
       mod(rownum, 2500)+1, 
       mod(rownum, 2600)+1, 
       mod(rownum, 2700)+1, 
       mod(rownum, 2800)+1, 
       mod(rownum, 2900)+1, 
       mod(rownum, 3000)+1, 
       mod(rownum, 3100)+1, 
       mod(rownum, 3200)+1, 
       mod(rownum, 3300)+1, 
       mod(rownum, 3400)+1, 
       mod(rownum, 3500)+1, 
       mod(rownum, 3600)+1, 
       mod(rownum, 3700)+1, 
       mod(rownum, 3800)+1, 
       mod(rownum, 3900)+1, 
       mod(rownum, 4000)+1,
       'THE RISE AND FALL OF ZIGGY STARDUST' 
from dual connect by level >=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

As always, it’s important that statistics be collected for Automatic Indexing to function properly:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STUFF1', estimate_percent=>null);

PL/SQL procedure successfully completed.

 

So on a 10M row table, I have 32 columns with the number of distinct values varying by only 100 values per column (or by a selectivity of just 0.001%):

SQL> select column_name, num_distinct, density, histogram from dba_tab_columns where table_name='BOWIE_STUFF1' order by num_distinct;

COLUMN_NAME  NUM_DISTINCT    DENSITY HISTOGRAM
------------ ------------ ---------- ---------------
NAME                    1  .00000005 FREQUENCY
CODE1                 900    .001111 HYBRID
CODE2                1000       .001 HYBRID
CODE3                1100    .000909 HYBRID
CODE4                1200    .000833 HYBRID
CODE5                1300    .000769 HYBRID
CODE6                1400    .000714 HYBRID
CODE7                1500    .000667 HYBRID
CODE8                1600    .000625 HYBRID
CODE9                1700    .000588 HYBRID
CODE10               1800    .000556 HYBRID
CODE11               1900    .000526 HYBRID
CODE12               2000      .0005 HYBRID
CODE13               2100    .000476 HYBRID
CODE14               2200    .000455 HYBRID
CODE15               2300    .000435 HYBRID
CODE16               2400    .000417 HYBRID
CODE17               2500      .0004 HYBRID
CODE18               2600    .000385 HYBRID
CODE19               2700     .00037 HYBRID
CODE20               2800    .000357 HYBRID
CODE21               2900    .000345 HYBRID
CODE22               3000    .000333 HYBRID
CODE23               3100    .000323 HYBRID
CODE24               3200    .000312 HYBRID
CODE25               3300    .000303 HYBRID
CODE26               3400    .000294 HYBRID
CODE27               3500    .000286 HYBRID
CODE28               3600    .000278 HYBRID
CODE29               3700     .00027 HYBRID
CODE30               3800    .000263 HYBRID
CODE31               3900    .000256 HYBRID
CODE32               4000     .00025 HYBRID
ID               10000000          0 HYBRID

I’ll next run the below queries (based on a simple equality predicate on each column) several times each in batches of 8 queries, so as to not swamp the Automatic Indexing process with potential new index requests (the ramifications of which I’ll discuss in another future post):

SQL> select * from bowie_stuff1 where code1=42;
SQL> select * from bowie_stuff1 where code2=42;
SQL> select * from bowie_stuff1 where code3=42;
SQL> select * from bowie_stuff1 where code4=42;
SQL> select * from bowie_stuff1 where code5=42;
...
SQL> select * from bowie_stuff1 where code31=42;
SQL> select * from bowie_stuff1 where code32=42;

 

If we now look at the statuses of the Automatic Indexes subsequently created:

SQL> select i.index_name, c.column_name, i.auto, i.constraint_index, i.visibility, i.status, i.num_rows, i.leaf_blocks, i.clustering_factor
from user_indexes i, user_ind_columns c
where i.index_name=c.index_name and i.table_name='BOWIE_STUFF1' order by visibility, status;

INDEX_NAME             COLUMN_NAME  AUT CON VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------- ------------ --- --- --------- -------- ---------- ----------- -----------------
SYS_AI_5rw9j3d8pc422   CODE5        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_48q3j752csn1p   CODE4        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_9sgharttf3yr7   CODE3        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_8n92acdfbuh65   CODE2        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_brgtfgngu3cj9   CODE1        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_1tu5u4012mkzu   CODE11       YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_34b6zwgtm86rr   CODE12       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_gd0ccvdwwb4mk   CODE13       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_7k7wh28n3nczy   CODE14       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_67k2zjp09w101   CODE15       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_5fa6k6fm0k6wg   CODE10       YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_4624ju6bxsv57   CODE9        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_bstrdkkxqtj4f   CODE8        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_39xqjjar239zq   CODE7        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_6h0adp60faytk   CODE6        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_5u0bqdgcx52vh   CODE16       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_0hzmhsraqkcgr   CODE22       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_4x716k4mdn040   CODE21       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_6wsuwr7p6drsu   CODE20       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_b424tdjx82rwy   CODE19       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_3a2y07fqkzv8x   CODE18       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_8dp0b3z0vxzyg   CODE17       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_d95hnqayd7t08   CODE23       YES NO  VISIBLE   VALID      10000000       15366          10000000
SYS_AI_fry4zrxqtpyzg   CODE24       YES NO  VISIBLE   VALID      10000000       15366          10000000
SYS_AI_920asb69q1r0m   CODE25       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_026pa8880hnm2   CODE31       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_96xhzrguz2qpy   CODE32       YES NO  VISIBLE   VALID      10000000       15368          10000000
SYS_AI_3dq93cc7uxruu   CODE29       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_5nbz41xny8fvc   CODE28       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_fz4q9bhydu2qt   CODE27       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_0kwczzg3k3pfw   CODE26       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_4qd5tsab7fnwx   CODE30       YES NO  VISIBLE   VALID      10000000       15367          10000000

We can see we indeed have the 3 statuses of Automatic Indexes captured:

Columns with a selectivity equal or worse to that of COL5 with 1300 distinct values are created as Invisible/Unusable indexes. Returning 10M/1300 rows or a cardinality of approx. 7,693 or more rows is just too expensive for such indexes on this table to be viable. This represents a selectivity of approx. 0.077%.

Note how the index statistics for these Invisible/Unusable indexes are not accurate. They all have an estimated LEAF_BLOCKS of 21702 and a CLUSTERING_FACTOR of 4272987. However, we can see from the other indexes which are physically created that these are not correct and are substantially off the mark with the actual LEAF_BLOCKS being around 15364 and the CLUSTERING_FACTOR actually much worse at around 10000000.

Again worthy of a future post to discuss how Automatic Indexing processing has to make (potentially inaccurate) guesstimates for these statistics in its analysis of index viability when such indexes don’t yet physically exist.

Columns with a selectivity equal or better to that of COL23 which has 3100 distinct values are created as Visible/Valid indexes. Returning 10M/3100 rows or a cardinality of approx. 3226 or less rows is cheap enough for such indexes on this table to be viable. This represents a selectivity of approx. 0.032%.

So in this specific example, only those columns between 1400 and 3000 distinct values meet the “borderline” criteria in which the Automatic Indexing process creates Invisible/Valid indexes. This represents a very very narrow selectivity range of only approx. 0.045% in which such Invisible/Valid indexes are created. Or for this specific example, only those columns that return approx. between 3,333 and 7,143 rows from the 10M row table.

Now the actual numbers and total range of selectivities for which Invisible/Valid Automatic Indexes are created of course depends on all sorts of factors, such as the size/cost of FTS of the table and not least the clustering of the associated data (which I’ve blogged about ad nauseam).

The point I want to make is that the range of viability for such Invisible/Valid indexes is relatively narrow and the occurrences of such indexes relatively rare in your databases. As such, the vast majority of Automatic Indexes are likely to be either Visible/Valid or Invisible/Unusable indexes.

It’s important to recognised this when you encounter such Invisible/Valid Automatic Indexes (outside of “REPORT ONLY” implementations), as it’s an indication that such an index is a borderline case that is currently NOT considered by the CBO (because of it being Invisible).

However, this Invisible/Valid Automatic Index status should really change to either of the other two more common statuses in the near future.

I’ll expand on this point in a future post…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part III (Do Anything You Say) October 8, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Performance Tuning, Stale Statistics.
2 comments

In Part I of this series, we saw how Automatic Indexing will not create a viable Automatic Index if there are stale or missing statistics on the underlining segments. In Part II we saw how these SQL statements effectively become blacklisted and when segment statistics are subsequently collected, Automatic Indexing will still not create viable Automatic Indexes when the SQL statements are re-run.

So how do we get Automatic Indexing to now kick in and create necessary indexes on these problematic SQLs?

As I’ve discussed previously in relation to blacklisted SQLs, we need to run a NEW SQL statement that hasn’t been blacklist that will result in a necessary index to be created. An easy way to do this is just to include a new comment within the previous SQL to give the SQL a new signature.

If we now run the following “new” SQL statement (identical to the problematic SQL but with a comment embedded):

SQL> select /* new */ * from bowie_stale where code=42;

        ID       CODE NAME
---------- ---------- ------------------------------------------
   1000041         42 David Bowie
   6000041         42 David Bowie
        41         42 David Bowie
   3000041         42 David Bowie
   7000041         42 David Bowie
   8000041         42 David Bowie
   4000041         42 David Bowie
   9000041         42 David Bowie
   5000041         42 David Bowie
   2000041         42 David Bowie

 

If we now wait to see what the next Automatic Indexing task makes of things:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 07-JUL-2020 06:34:49
Activity end                : 07-JUL-2020 06:35:54
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created (visible / invisible)        : 1 (1 / 0)
Space used (visible / invisible)             : 142.61 MB (142.61 MB / 0 B)
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved (improvement factor) : 1 (19787.7x)
SQL plan baselines created                   : 0
Overall improvement factor                   : 19787.7x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Owner | Table       | Index                | Key  | Type   | Properties |
---------------------------------------------------------------------------
| BOWIE | BOWIE_STALE | SYS_AI_300kk2unp8tr0 | CODE | B-TREE | NONE       |
---------------------------------------------------------------------------
-------------------------------------------------------------------------------

 

We see that the index on the CODE column (SYS_AI_300kk2unp8tr0) has now been created.

Further down the report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------
Parsing Schema Name : BOWIE
SQL ID              : du6psd0xmzpg5
SQL Text            : select /* new */ * from bowie_stale where code=42
Improvement Factor  : 19787.7x

Execution Statistics:
-----------------------------
                  Original Plan Auto           Index Plan
                  ---------------------------- ----------------------------
Elapsed Time (s): 137261                       2620
CPU Time (s):     84621                        1769
Buffer Gets:      277028                       13
Optimizer Cost:   544                          13
Disk Reads:       275947                       2
Direct Writes:    0                            0
Rows Processed:   70                           10
Executions:       7                            1

 

A new index was indeed created because of this new SQL statement, with a performance improvement of 19787.7x.

Further down the report to the Plans Section:

 

PLANS SECTION
---------------------------------------------------------------------------------------------

- Original
-----------------------------
Plan Hash Value : 65903426

-----------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost | Time        |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT          |             |      |       |  544 |             |
| 1 | TABLE ACCESS STORAGE FULL | BOWIE_STALE |   10 |   230 |  544 | 00:00:01    |
-----------------------------------------------------------------------------------

Notes
-----
- dop = 1
- px_in_memory_imc = no
- px_in_memory = no

- With Auto Indexes
-----------------------------
Plan Hash Value : 2558864466

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows | Bytes | Cost | Time       |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   10 |   230 |   13 | 00:00:01   |
|   1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_STALE          |   10 |   230 |   13 | 00:00:01   |
| * 2 | INDEX RANGE SCAN                    | SYS_AI_300kk2unp8tr0 |   10 |       |    3 | 00:00:01   |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 2 - access("CODE"=42)

Notes
-----
- Dynamic sampling used for this statement ( level = 11 )

 

We can see that the new plan using the new Automatic Index with a much lower CBO cost.

If we now look at the status of this index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0           YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       16891          10000000

 

We see that the index is now both VISIBLE and VALID (previously, it was INVISIBLE and UNUSABLE).

As such, the Automatic Index can now potentially be used by any SQL, including the previous problematic query.

So with a viable index now in place, if we re-run the initial problematic query:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2558864466

------------------------------------------------------------------------------------------------------------
| Id | Operation                          | Name                 | Rows | Bytes | Cost (%CPU)| Time        |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |                      |   10 |   230 |      14 (0)| 00:00:01    |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_STALE          |   10 |   230 |      14 (0)| 00:00:01    |
|* 2 | INDEX RANGE SCAN                   | SYS_AI_300kk2unp8tr0 |   10 |       |       3 (0)| 00:00:01    |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         14 consistent gets
          0 physical reads
          0 redo size
        738 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

We see that finally, the SQL uses the new Automatic Index and is indeed much more efficient as a result, with just 14 consistent gets required (when previously it was 39430 consistent gets).

So if ever you come across the scenario where an SQL does not have an Automatic Index created when clearly it should, it could be that it has been blacklisted and needs a different SQL to actually generate the necessary index.

To avoid some of these issues, make sure you do not have stale or missing statistics when reliant on Automatic Indexing. The new High Frequency Statistics Collection capability to designed to specifically avoid such a scenario.

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part II (Survive) October 7, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Internals, Index statistics, Oracle, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Stale Statistics.
1 comment so far

 

 

In my previous post, I discussed how having stale statistics, usually a bad idea, is especially problematic with regard Automatic Indexes as it usually results in viable automatic indexes only being created in an UNUSABLE/INVISIBLE state.

If we were to now to collect the missing statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STALE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME             NUM_ROWS     BLOCKS LAST_ANAL
-------------------- ---------- ---------- ---------
BOWIE_STALE            10000000      39677 06-JUL-20

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                       10000000          0 HYBRID          06-JUL-20
CODE                       971092    .000001 HYBRID          06-JUL-20
NAME                            1 4.9416E-08 FREQUENCY       06-JUL-20

 

If we now repeatedly re-run the problematic query many times:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)|  Time       |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)|  00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)|  00:00:01   |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

 

The CBO is forced to use the FTS as the current Automatic Index is in an UNUSABLE/INVISIBLE state.

If we wait for the next Automatic Indexing reporting period:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 06-JUL-2020 05:12:42
Activity end                : 06-JUL-2020 05:13:34
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 0
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

 

We notice that the Automatic Indexing process has nothing to report. Even though the problematic query is repeatedly executed, the SQL is now effectively on a blacklist and is not re-considered by the Automatic Indexing process.

If we look at the index details on the table:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME             AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK         NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0   YES NO  INVISIBLE ADVANCED LOW  UNUSABLE   10000000       23058           4147514 

 

So the Automatic Index (SYS_AI_300kk2unp8tr0) is still UNUSABLE and INVISIBLE and can not be used by the CBO.

NOTE: In earlier patches of Oracle Database 19c (I’m using version 19.5.0.0.0 in this demo), I identified some scenarios after stale statistics when indexes were created in but in a VALID/INVISIBLE state, such that they could still not be used by the CBO in general database sessions.

If we simply re-run the same queries again from the time when the dependant object statistics were stale, any SQL is just ignored by the Automatic Indexing process.

As such, if we now subsequently re-run the problematic query again:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

Again, the CBO has no choice here with no viable VALID/VISIBLE index present but to perform a FTS, even though its getting the cardinality estimates spot on since statistics gathering.

 

In Part III I’ll discuss how to get this query to finally use the Automatic Index and improve its performance, although if you’re a regular reader of the blog you should already know the solution…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part I (Dead Against It) October 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Exadata X8, Full Table Scans, High Frequency Statistics Collection, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Performance Tuning, Stale Statistics, Unusable Indexes.
5 comments

A “golden rule” when working with Automatic Indexing is that things don’t work properly if there are stale statistics on the dependant objects. Stale statistics can of course be problematic but they can be particularly troublesome when dealing with Automatic Indexing.

In the Oracle Autonomous Database environments, this issue is addressed somewhat by the new High Frequency Statistics Collection capability, which helps to automatically collect stale statistics on a regular basis. However, in on-prem Exadata environments where this can more easily be turned off or collected less frequently, it’s a potential issue worth consideration.

I’ll start with a simple little table, with a CODE column that has lots of distinct values:

SQL> create table bowie_stale (id number constraint bowie_stale_pk primary key, code number, name varchar2(42));

Table created.

SQL> insert into bowie_stale select rownum, mod(rownum, 1000000)+1, 'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

Importantly, I don’t collect statistics on this newly populated table…

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME        NUM_ROWS     BLOCKS LAST_ANAL
--------------- ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

If we now run the following query a number of times while there are no statistics on the table:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice but to use a FTS as I don’t yet have an index on the CODE column.

If I now wait for the next Automatic Indexing task to kick in AND if there are still NO statistics on the table:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 05-JUL-2020 06:36:31
Activity end                 : 05-JUL-2020 06:37:07
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates            : 1
Indexes created             : 0
Space used                  : 0 B
Indexes dropped             : 0
SQL statements verified     : 0
SQL statements improved     : 0
SQL plan baselines created  : 0
Overall improvement factor  : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

 

You can see that there was the one index candidate BUT no Automatic Index appears to have been created.

Assuming there are still no statistics:

 

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables where table_name='BOWIE_STALE';

TABLE_NAME                       NUM_ROWS    BLOCKS  LAST_ANAL
------------------------------ ---------- ---------- ---------
BOWIE_STALE

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE2';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                                           NONE
CODE                                         NONE
NAME                                         NONE

 

If we look now at what indexes exist on the table:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME                     AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK                 NO  YES VISIBLE   DISABLED      VALID
SYS_AI_300kk2unp8tr0           YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='BOWIE_STALE2' order by index_name, column_position;

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
BOWIE_STALE_PK                 ID                                 1
SYS_AI_300kk2unp8tr0           CODE                               1

 

We notice there is now an Automatic Index BUT it remains in an UNUSABLE/INVISIBLE state. This means the index can’t be used by the CBO.

So if we now re-run the SQL query again:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |  437 | 21413 |    553 (16)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |  437 | 21413 |    553 (16)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          6  recursive calls
          0  db block gets
      39026  consistent gets
          0  physical reads
          0  redo size
        867  bytes sent via SQL*Net to client
        588  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

 

The CBO has no choice still but to use the FTS.

In Part II, we’ll see that once we get into this scenario, it can be a tad problematic to get ourselves out of it and get the Automatic Index created as we would like…

Oracle 19c Automatic Indexing: Data Skew Fixed By Baselines Part II (Sound And Vision) September 28, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Baselines, CBO, Data Skew, Exadata, Explain Plan For Index, Full Table Scans, Histograms, Index Access Path, Index statistics, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning.
1 comment so far

 

In my previous post, I discussed how the Automatic Indexing task by using Dynamic Sampling Level=11 can correctly determine the correct query cardinality estimates and assume the CBO will likewise determine the correct cardinality estimate and NOT use an index if it would cause performance to regress.

However, if other database sessions DON’T use Dynamic Sampling at the same Level=11 and hence NOT determine correct cardinality estimates, newly created Automatic Indexes might get used by the CBO inappropriately and result inefficient execution plans.

Likewise, with incorrect CBO cardinality estimates, it might also be possible for newly created Automatic Indexes to NOT be used when they should be (as I’ve discussed previously).

These are potential issues if the Dynamic Sampling value differs between the Automatic Indexing task and other database sessions.

One potential way to make things more consistent and see how the Automatic Indexing behaves if it detects an execution plan where the CBO would use an Automatic Index that causes performance regression, is to disable Dynamic Sampling within the Automatic Indexing task.

This can be easily achieved by using the following hint which effectively disables Dynamic Sampling with the previous problematic query:

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1005K|   135M| 11411   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |  1005K|   135M| 11411   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41169  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

The query currently has good cardinality estimates (1005K vs 1000011 rows returned) only because we currently have histograms in place for the CODE column. As such, the query correctly uses a FTS.

However, if we now remove the histogram on the CODE column:

SQL> exec dbms_stats.gather_table_stats(null, 'SPACE_ODDITY', method_opt=> 'FOR ALL COLUMNS SIZE 1’);

PL/SQL procedure successfully completed.

 

There is no way for the CBO to now determine the correct cardinality estimate because of the skewed data and missing histograms.

So what does the Automatic Indexing tasks make of things now. If we look at the next activity report:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-AUG-2020 16:42:33
Activity end                 : 18-AUG-2020 16:43:06
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                             : 0
Indexes created                              : 0
Space used                                   : 0 B
Indexes dropped                              : 0
SQL statements verified                      : 1
SQL statements improved                      : 0
SQL plan baselines created (SQL statements)  : 1 (1)
Overall improvement factor                   : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

We can see that it has verified this one new statement and has created 1 new SQL Plan Baseline as a result.

If we look at the Verification Details part of this report:

 

VERIFICATION DETAILS
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
The following SQL plan baselines were created:
-------------------------------------------------------------------------------
Parsing Schema Name     : BOWIE
SQL ID                  : 3yz8unzhhvnuz
SQL Text                : select /*+ dynamic_sampling(0) */ * from
space_oddity where code in (190000, 170000, 150000,
130000, 110000, 90000, 70000, 50000, 30000, 10000)
SQL Signature           : 3910785437403172730
SQL Handle              : SQL_3645e6a2952fcf7a
SQL Plan Baselines (1)  : SQL_PLAN_3cjg6naakzmvu198c05b9

We can see Automatic Indexing has created a new SQL Plan Baseline for our query with Dynamic Sampling set to 0 thanks to the hint.

Basically, the Automatic Indexing task has found a new query and determined the CBO would be inclined to use the index, because it now incorrectly assumes few rows are to be returned. It makes the poor cardinality estimate because there are currently no histograms in place AND because it can’t now use Dynamic Sampling to get a more accurate picture of things on the fly because it has been disabled with the dynamic_sampling(0) hint.

Using an Automatic Index over the current FTS plan would make the performance of the SQL regress.

Therefore, to protect the current FTS plan, Automatic Indexing has created a SQL Plan Baseline that effectively forces the CBO to use the current, more efficient FTS plan.

This can be confirmed by looking at the DBA_AUTO_INDEX_VERIFICATIONS view:

 

SQL> select execution_name, original_buffer_gets, auto_index_buffer_gets, status
from dba_auto_index_verifications where sql_id = '3yz8unzhhvnuz';

EXECUTION_NAME             ORIGINAL_BUFFER_GETS AUTO_INDEX_BUFFER_GETS STATUS
-------------------------- -------------------- ---------------------- ---------
SYS_AI_2020-08-18/16:42:33                41169                 410291 REGRESSED

 

If we now re-run the SQL again (noting we still don’t have histograms on the CODE column):

SQL> select /*+ dynamic_sampling(0) */ * from space_oddity where code in (190000, 170000, 150000, 130000, 110000, 90000, 70000, 50000, 30000, 10000);

1000011 rows selected.

Execution Plan
----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |    32 |  4512 | 11425   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| SPACE_ODDITY |    32 |  4512 | 11425   (2)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=10000 OR "CODE"=30000 OR "CODE"=50000 OR
           "CODE"=70000 OR "CODE"=90000 OR "CODE"=110000 OR "CODE"=130000 OR
           "CODE"=150000 OR "CODE"=170000 OR "CODE"=190000)

Hint Report (identified by operation id / Query Block Name / Object Alias):

Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
1 -  SEL$1
U -  dynamic_sampling(0) / rejected by IGNORE_OPTIM_EMBEDDED_HINTS

Note
-----

- SQL plan baseline "SQL_PLAN_3cjg6naakzmvu198c05b9" used for this statement

Statistics
----------------------------------------------------------
          9  recursive calls
          4  db block gets
      41170  consistent gets
          0  physical reads
          0  redo size
   13535504  bytes sent via SQL*Net to client
       2705  bytes received via SQL*Net from client
        202  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
    1000011  rows processed

 

We can see the CBO is forced to use the SQL Plan Baseline “SQL_PLAN_3cjg6naakzmvu198c05b9” as created by the Automatic Indexing task to ensure the more efficient FTS is used and not the available Automatic Index.

So Automatic Indexing CAN create SQL PLan Baselines to protect SQL from performance regressions caused by inappropriate use of Automatic Indexes BUT it’s really hard and difficult for it to do this effectively if the Automatic Indexing tasks and other database sessions have differing Dynamic Sampling settings as it does by default…

Oracle 19c Automatic Indexing: CBO Incorrectly Using Auto Indexes Part II ( Sleepwalk) September 21, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Data Skew, Dynamic Sampling, Exadata, Explain Plan For Index, Extended Statistics, Hints, Histograms, Index Access Path, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
1 comment so far

As I discussed in Part I of this series, problems and inconsistencies can appear between what the Automatic Indexing processing thinks will happen with newly created Automatic Indexing and what actually happens in other database sessions. This is because the Automatic Indexing process session uses a much higher degree of Dynamic Sampling (Level=11) than other database sessions use by default (Level=2).

As we saw in Part I, an SQL statement may be deemed to NOT use an index in the Automatic Indexing deliberations, where it is actually used in normal database sessions (and perhaps incorrectly so). Where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such “skewness” is one such scenario where we might encounter this issue.

One option to get around this is to hint any such queries with a Dynamic Sampling value that matches that of the Automatic Indexing process (or sufficient to determine more accurate cardinality estimates).

If we re-run the problematic query from Part I (where a new Automatic Index was inappropriately used by the CBO) with such a Dynamic Sampling hint:

SQL> select /*+ dynamic_sampling(11) */ * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  101K|  2388K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        609 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

We can see that the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the Automatic Index.

Although these parameters can’t be changed in the Oracle Autonomous Database Cloud services, on the Exadata platform if using Automatic Indexing you might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING parameter to 11 (and/or OPTIMIZER_ADAPTIVE_STATISTICS=true)  in order to be consistent with the Automatic Indexing process. These settings can obviously add significant overhead during parsing and so need to be set with caution.

In this scenario where there is an inherent relationship between columns which the CBO is not detecting, the creation of Extended Statistics can be beneficial.

We currently have the following columns and statistics on the IGGY_POP table:

SQL> select column_name, num_distinct, density, num_buckets, histogram
from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME          NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
-------------------- ------------ ---------- ----------- ---------------
ID                        9705425          0         254 HYBRID
CODE1                         100  .00000005         100 FREQUENCY
CODE2                         100  .00000005         100 FREQUENCY
NAME                            1 5.0210E-08           1 FREQUENCY

 

If we now collect Extended Statistics on both CODE1, CODE2 columns:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'IGGY_POP', method_opt=> 'FOR COLUMNS (CODE1,CODE2) SIZE 254');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, num_buckets, histogram from user_tab_cols where table_name='IGGY_POP';

COLUMN_NAME                    NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ----------- ---------------
ID                                  9705425          0         254 HYBRID
CODE1                                   100  .00000005         100 FREQUENCY
CODE2                                   100  .00000005         100 FREQUENCY
NAME                                      1 5.0210E-08           1 FREQUENCY
SYS_STU#29QF8Y9BUDOW2HCDL47N44           99  .00000005         100 FREQUENCY

 

The CBO now has some idea on the cardinality if both columns are used within a predicate.

If we re-run the problematic query without the hint:

 

SQL> select * from iggy_pop where code1=42 and code2=42;

100000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time        |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          |  100K|  2343K|    575 (15)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP |  100K|  2343K|    575 (15)| 00:00:01    |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=42 AND "CODE2"=42)
    filter("CODE1"=42 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
    1092240 bytes sent via SQL*Net to client
        581 bytes received via SQL*Net from client
         21 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
     100000 rows processed

 

Again, the CBO is correctly the cardinality estimate of 100K rows and so is NOT using the Automatic Index.

However, we can still get ourselves in problems. If I now re-run the query that returns no rows and was previously correctly using the Automatic Index:

SQL> select code1, code2, name from iggy_pop where code1=1 and code2=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3288467

--------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time       |
--------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |          | 50000 |  878K |   575 (15) | 00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| IGGY_POP | 50000 |  878K |   575 (15) | 00:00:01   |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE1"=1 AND "CODE2"=42)
    filter("CODE1"=1 AND "CODE2"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      40964 consistent gets
      40953 physical reads
          0 redo size
        368 bytes sent via SQL*Net to client
        377 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

We see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50,000 rows are to be returned (and not the 1000 rows it estimated previously). This increased estimate is now deemed too expensive for the Automatic Index to retrieve and is now incorrectly using a FTS.

This because with a Frequency based histogram now in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned (100,000 x 0.5 = 50,000) if the values don’t exist but resided within the known min-max range of values.

So we need to be very careful HOW we potentially collect any additional statistics and its potential impact on other SQL statements.

 

As I’ll discuss next, another alternative to get more consistent behavior with Automatic Indexing in these types of scenarios is to make the Automatic Indexing processing session appear more like other database sessions…

Oracle 19c Automatic Indexing: Data Skew Part I (A Saucerful of Secrets) September 10, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Data Skew, Full Table Scans, Histograms, Index Access Path, Index statistics, Low Cardinality, Oracle Blog, Oracle Indexes, Oracle19c, Performance Tuning.
1 comment so far

When it comes to Automatic Indexes, things can become particularly interesting when dealing with data skew (meaning that some columns values are much less common than other column values). The next series of blog posts will look at a number of different scenarios in relation to how Automatic Indexing works with data that is skewed and not uniformly distributed.

I’ll start with a simple little example, that has an interesting little twist at the end.

The following table has a CODE column, which has 10 distinct values that a widely skewed, with some values much less common than others:

SQL> create table bowie_skew (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie_skew select rownum, 10, 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> update bowie_skew set code = 9 where mod(id,3) = 0;

333333 rows updated.

SQL> update bowie_skew set code = 1 where mod(id,2) = 0 and id between 1 and 20000;

10000 rows updated.

SQL> update bowie_skew set code = 2 where mod(id,2) = 0 and id between 30001 and 40000;

5000 rows updated.

SQL> update bowie_skew set code = 3 where mod(id,100) = 0 and id between 300001 and 400000;

1000 rows updated.

SQL> update bowie_skew set code = 4 where mod(id,100) = 0 and id between 400001 and 500000;

1000 rows updated.

SQL> update bowie_skew set code = 5 where mod(id,100) = 0 and id between 600001 and 700000;

1000 rows updated.

SQL> update bowie_skew set code = 6 where mod(id,1000) = 0 and id between 700001 and 800000;

100 rows updated.

SQL> update bowie_skew set code = 7 where mod(id,1000) = 0 and id between 800001 and 900000;

100 rows updated.

SQL> update bowie_skew set code = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;

100 rows updated.

SQL> commit;

Commit complete.

 

I’ll collect statistics on this table, but explicitly NOT collect histograms, so that the CBO will have no idea that the data is actually skewed. Note if I collected data with the default size, there would still be no histograms, as the column has yet to be used within an SQL predicate and so has no column usage recorded.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_SKEW', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We can clearly see that some CODE values (such as “6”) have relatively few values, with only 100 occurrences:

SQL> select code, count(*) from bowie_skew group by code order by code;

      CODE   COUNT(*)
---------- ----------
         1      10000
         2       5000
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327235
        10     654465

 

As I explicitly collected statistics with SIZE 1, we currently have NO histograms in the table:

SQL> select column_name, num_buckets, histogram from user_tab_cols
where table_name='BOWIE_SKEW';

COLUMN_NAME     NUM_BUCKETS HISTOGRAM
--------------- ----------- ---------------
ID                        1 NONE
CODE                      1 NONE
NAME                      1 NONE

 

Let’s now run the following query with a predicate on CODE=6, returning just 100 rows:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name         | Rows  | Bytes | Cost (%CPU)| Time       |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |              |   100K|  1953K|   570   (7)| 00:00:01 |
|   1 |  PX COORDINATOR                |              |         |         |              |            |
|   2 |   PX SEND QC (RANDOM)          | :TQ10000   |   100K|  1953K|   570   (7)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR           |              |   100K|  1953K|   570   (7)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_SKEW |   100K|  1953K|   570   (7)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - storage("CODE"=6)
    filter("CODE"=6)

Statistics
----------------------------------------------------------
         6  recursive calls
         0  db block gets
      3781  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         0  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

The CBO has no choice but to use a FTS as I currently have no indexes on the CODE column. Note also that the CBO has got its cardinality estimates way wrong, expecting 100,000 rows and not the actual 100 rows, as I have no histograms on the CODE column.

So let’s now wait 15 minutes or so and see what the Automatic Indexing process decides to do. Following are portions of the next Auto Indexing report:

INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
--------------------------------------------------------------------------
| Owner | Table      | Index                | Key  | Type   | Properties |
--------------------------------------------------------------------------
| BOWIE | BOWIE_SKEW | SYS_AI_7psvzc164vbng | CODE | B-TREE | NONE       |
--------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : fn4shnphu4bvj
SQL Text             : select * from bowie_skew where code=6
Improvement Factor   : 41.1x

Execution Statistics:
-----------------------------

                   Original Plan                 Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  119596                        322
CPU Time (s):      100781                        322
Buffer Gets:       11347                         103
Optimizer Cost:    570                           4
Disk Reads:        0                             0
Direct Writes:     0                             0
Rows Processed:    100                           100
Executions:        1                             1

 

So we can see that yes, Auto Indexing has decided to create a new index here on the CODE column (“SYS_AI_7psvzc164vbng“) as it improves the performance of the query by a factor of 41.1x.

If we look further down the Auto Indexing report and compare the execution plans:

 

PLANS SECTION
---------------------------------------------------------------------------------------------
- Original
-----------------------------
Plan Hash Value  : 3374004665
-----------------------------------------------------------------------------------------
| Id | Operation                      | Name       | Rows   | Bytes   | Cost | Time     |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |            |        |         |  570 |          |
|  1 |  PX COORDINATOR                |            |        |         |      |          |
|  2 |    PX SEND QC (RANDOM)         | :TQ10000   | 100000 | 2000000 |  570 | 00:00:01 |
|  3 |     PX BLOCK ITERATOR          |            | 100000 | 2000000 |  570 | 00:00:01 |
|  4 |      TABLE ACCESS STORAGE FULL | BOWIE_SKEW | 100000 | 2000000 |  570 | 00:00:01 |
-----------------------------------------------------------------------------------------

- With Auto Indexes
-----------------------------
Plan Hash Value  : 140816325
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100 |  2000 |    4 | 00:00:01 |
|   1 |   TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_SKEW           |  100 |  2000 |    4 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_7psvzc164vbng |  100 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------

* 2 - access("CODE"=6)

Notes
-----

- Dynamic sampling used for this statement ( level = 11 )

 

We can see that new execution plan indeed uses the index BUT interestingly, it has a correct cardinality estimate of 100 and not 100,000 as per the original plan.

Now this can be explained in that the Automatic Indexing process uses a Dynamic Sampling level of 11, meaning it can calculate the correct cardinality on the fly and can cause difficulties between what the Automatic Indexing process thinks the CBO costs will be vs. the CBO costs in a default database session that uses the (usually default) Dynamic Sampling level of 2 (as I’ve discussed previously).

BUT when I now rerun the SQL query again:

SQL> select * from bowie_skew where code=6;

100 rows selected.

Execution Plan
---------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |   100 |  2000 |     4   (0)|
|   1 |  PX COORDINATOR                       |                      |       |       |            |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |   100 |  2000 |     4   (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE_SKEW           |   100 |  2000 |     4   (0)|
|   4 |     BUFFER SORT                       |                      |       |       |            |
|   5 |      PX RECEIVE                       |                      |   100 |       |     3   (0)|
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |   100 |       |     3   (0)|
|   7 |        PX SELECTOR                    |                      |       |       |            |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_7psvzc164vbng |   100 |       |     3   (0)|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

8 - access("CODE"=6)

Statistics
----------------------------------------------------------
        12  recursive calls
         0  db block gets
       103  consistent gets
         0  physical reads
         0  redo size
      2796  bytes sent via SQL*Net to client
       654  bytes received via SQL*Net from client
         8  SQL*Net roundtrips to/from client
         2  sorts (memory)
         0  sorts (disk)
       100  rows processed

 

We notice the new Automatic Index is now used BUT also that the CBO has now determined the correct cardinality estimate of 100. But how is this possible when I haven’t recalculated the table statistics?

I’ll explain in my next post.

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part III (Star) August 11, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Internals, Index statistics, Oracle, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
2 comments

In Part I we looked at a scenario where an index was deemed to be too inefficient for Automatic Indexing to create a VALID index, because of the poor clustering of data within the table.

In Part II we improved the data clustering but the previous SQLs could still not generate a new Automatic Index because they had effectively been blacklisted.

So how do we get Automatic Indexing to improve the performance of these queries?

Basically, we need to run some new SQL statements to those previously run which have not been blacklisted, that can make the Automatic Indexing process kick in and create the necessary indexes.

For example, if we now run the following SQL statements that have not previously run:

select * from nickcave where code=1;

select * from nickcave where code=2;

select * from nickcave where code=3;

 

And now wait for the next Automatic Indexing process period and look at the following (partial) Automatic Indexing report:

 

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 22-JUN-2020 04:26:31
Activity end                 : 22-JUN-2020 04:27:25
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                              : 0
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 167.77 MB (167.77 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 3
SQL statements improved (improvement factor)  : 3 (76x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 76x


INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
------------------------------------------------------------------------
| Owner | Table    | Index                | Key  | Type   | Properties |
------------------------------------------------------------------------
| BOWIE | NICKCAVE | SYS_AI_dh8pumfww3f4r | CODE | B-TREE | NONE       |
------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 5k1wmtu7um5q9
SQL Text             : select * from nickcave where code=1
Improvement Factor   : 76x

Execution Statistics:
-----------------------------

                   Original Plan                   Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  1725103                       106145
CPU Time (s):      1534305                       62314
Buffer Gets:       291835                        779
Optimizer Cost:    9125                          792
Disk Reads:        0                             197
Direct Writes:     0                             0
Rows Processed:    500000                        100000
Executions:        5                             1

 

We can see that an index has indeed now been created on the CODE column because one of the new statements is now deemed to be 76x more efficient thanks to the new index.

If we look at details of this new Automatic Index:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  VISIBLE   DISABLED      VALID      10000000       19518             57983

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='NICKCAVE'
order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that the index is now indeed VALID and VISIBLE with a much improved Clustering Factor at just 57983.

If we now re-run newer SQL statement:

 

SQL> select * from nickcave where code=1;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("CODE"=1)

Statistics
----------------------------------------------------------
          12  recursive calls
           0  db block gets
         779  consistent gets
           0  physical reads
         176  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

We notice the SQL statement is now indeed using this new Automatic Index.

If we now re-run our original SQL statement that had been using a FTS execution plan and that we couldn’t make Automatic Indexing create a VALID index because when originally run, the data clustering was too poor within the table:

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

    8 - access("CODE"=42)

Statistics
----------------------------------------------------------
          14  recursive calls
           4  db block gets
         780  consistent gets
         198  physical reads
       15224  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

This query is now also finally using the newly created index, because the CBO now too deems it to be more efficient with an index based execution plan.

The moral of the story. Automatic Indexing may initially deem a potential index to not be efficient enough to be created. However, things may change such as the clustering of table data (or the distribution of data values, etc. etc.) that may make a new index now viable. This though requires a NEW SQL statement to be executed, such that a non-blacklisted SQL can invoke the Automatic Indexing process to create the necessary Automatic Index.

Of course, things may change in the future. Future releases may have the facility to automatically re-cluster the data in tables optimally based on existing workloads and may also have a mechanism to identify that things have sufficient “changed” such that previously “failed” SQL statements from an Automatic Indexing perspective may warrant reevaluation.

This has only been tested up to version Oracle Database 19.5 of the Oracle Autonomous Database environments.