jump to navigation

Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part III (“Dancing With The Big Boys”) March 9, 2023

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Changing ROWID, Clustering Factor, Data Clustering, Full Table Scans, Index Access Path, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, ROWID.
add a comment

In my previous post, I discussed how you can best reorg a table that has a significant number of migrated rows impact the Clustering Factor of important indexes, when such tables have the ENABLED ROW MOVEMENT disabled.

In this post I’ll discuss resolving similar issues, but when ROWIDs are updated on the fly when rows are migrated in Oracle Autonomous Databases.

As I discussed previously, by updating indexes with the new ROWIDs when rows migrate, such indexes can potentially increase in size as they store both old/new index entries concurrently AND due to the increased likelihood of associated index block splits. Additionally, such indexes can also have their Clustering Factor directly impacted when migrated rows disrupt the otherwise tight clustering of specific columns.

As such, we may want to address these issues to improve the performance of impacted queries.  But it’s important we address these issues appropriately…

To illustrate all this, I’m going to re-run the same demo as my previous post, but on a table with ENABLE ROW MOVEMENT enabled.

I’ll start by creating and populating a tightly packed table with ENABLE ROW MOVEMENT enabled and with data inserted in ID column order:

SQL> create table bowie2(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0 ENABLE ROW MOVEMENT;

Table BOWIE2 created.

SQL> insert into bowie2 SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

I’ll now create an index on this well ordered/clustered ID column:

SQL> create index bowie2_id_i on bowie2(id);

Index BOWIE2_ID_I created.

Next, I’ll update the table, increasing the size of the rows such that I generate a bunch of migrated rows:

SQL> update bowie2 set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

 

If we check the number of migrated rows:

SQL> analyze table bowie2 compute statistics;

Table BOWIE2 analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE2             200000      4654              82          367            169            0

We notice there are indeed 0 migrated rows. This is because in Oracle Autonomous Databases, the associated ROWIDs of migrated rows as updated on the fly in this scenario.

If we check the current Clustering Factor of the index:

SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE2');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE2             200000      4654

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            2            945               109061

We can see that although the data was initially inserted in ID column order, we now have a relatively poor Clustering Factor at 109061 as the migrated rows have disrupted this previously perfect clustering.

We also notice that the BLEVEL has increased from 1 to now be 2 and the number of Leaf Blocks has increased to 945 from 473 after the rows migrated (as I discussed previously).

If we now run a query that returns 4200 rows from a 200,000 row table:

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

----------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows | A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   4200 |00:00:00.02 |    4572 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

Note
-----
    - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          4 CPU used by this session
          4 CPU used when call started
          4 DB time
      37101 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
          2 buffer is not pinned count
        325 bytes received via SQL*Net from client
     461965 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
         14 calls to kcmgcs
       4572 consistent gets
       4572 consistent gets from cache
       4572 consistent gets pin
       4572 consistent gets pin (fastpath)
          2 execute count
   37453824 logical read bytes from cache
       4560 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       4572 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4560 table scan blocks gotten
     252948 table scan disk non-IMC rows gotten
     252948 table scan rows gotten
          1 table scans (short tables)
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

-------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |       |  1264 (100)|   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   684K|    1264 (1)|   4200 |00:00:00.02 |    4572 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

We can see that Oracle has decided to perform a Full Table Scan (FTS) and not use the index.

The Clustering Factor of the ID column is now so bad, that returning 4200 rows via such an index is just too expensive. The FTS is now deemed the cheaper option by the CBO.

We notice that the CBO cost of the FTS is 1264.

If we run a query that forces the use of the index:

SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0
-------------------------------------
select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      21 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          2 DB time
      14531 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2646 buffer is not pinned count
       5755 buffer is pinned count
        348 bytes received via SQL*Net from client
     462143 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2665 consistent gets
          2 consistent gets examination
          2 consistent gets examination (fastpath)
       2665 consistent gets from cache
       2663 consistent gets pin
       2663 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   21831680 logical read bytes from cache
       2663 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          3 process last non-idle time
          2 session cursor cache count
       2665 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0

-------------------------------------

select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |  2314 (100)|   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|    2314 (1)|   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      22 (0)|   4200 |00:00:00.01 |      21 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

The cost of the Index Range Scan plan has an overall cost of 2314, greater than the 1264 cost of the FTS plan.

Notice that the cost of using just the index within the plan is currently 22.

So the vast majority of the cost of this plan (2314 – 22 = 2292) is in Oracle having to access so many different table blocks due to the poor index Clustering Factor and NOT in the increased size of the index.

As I’ve discussed numerous times, you can potentially make an index smaller by rebuilding the index (if there’s free space within the index), but the impact on the Clustering Factor will be nothing but “disappointing”…

If we just rebuild the index:

SQL> alter index bowie2_id_i rebuild online;

Index BOWIE2_ID_I altered.

And now look at the new index related statistics:

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            1            473               109061

We notice that the index has indeed decreased in size, back to what is was before the row migrated following the Update (Blevel=1 and Leaf Blocks=473).

But the Clustering Factor remains unchanged at 109061.

If we now re-run the query:

 

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

----------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows | A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   4200 |00:00:00.02 |    4572 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          3 CPU used by this session
          3 CPU used when call started
          3 DB time
      31738 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
          2 buffer is not pinned count
        325 bytes received via SQL*Net from client
     461972 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
         14 calls to kcmgcs
       4572 consistent gets
       4572 consistent gets from cache
       4572 consistent gets pin
       4572 consistent gets pin (fastpath)
          2 execute count
   37453824 logical read bytes from cache
       4560 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          3 process last non-idle time
          2 session cursor cache count
       4572 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4560 table scan blocks gotten
     252948 table scan disk non-IMC rows gotten
     252948 table scan rows gotten
          1 table scans (short tables)
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

-------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |       |  1264 (100)|   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   684K|    1264 (1)|   4200 |00:00:00.02 |    4572 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

The CBO decides to still use a FTS instead of the index.

If we look at the cost now of using the index for this query:

SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0
-------------------------------------
select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |    2655 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |    2655 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          1 DB time
      13484 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2646 buffer is not pinned count
       5755 buffer is pinned count
        347 bytes received via SQL*Net from client
     461972 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2655 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
       2655 consistent gets from cache
       2654 consistent gets pin
       2654 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   21749760 logical read bytes from cache
       2654 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       2655 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0

-------------------------------------

select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |  2303 (100)|   4200 |00:00:00.01 |    2655 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|    2303 (1)|   4200 |00:00:00.01 |    2655 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We notice the cost of the index has only moderately gone down to 2303 (previously it was 2314).

This reduction of 11 in the CBO cost is due entirely to the fact the index is now approximately 1/2 the size as it was before the index rebuild and has thus reduced the cost of reading the index blocks to 11 within the execution plan (previously it was 22).

But the vast majority of the cost within the Index Range Scan plan comes again with accessing the table blocks, which remains unchanged due to the unchanged Clustering Factor.

To reduce the Clustering Factor, we need to change the clustering of the data with the TABLE.

So, to improve the performance of this potentially important query, we need to re-cluster the data just as we did in the example in my previous post when we had migrated rows listed and ROWIDs were not updated on the fly.

We can now add an appropriate Clustering Attribute before we perform the table reorg:

SQL> alter table bowie2 add clustering by linear order (id);

Table BOWIE2 altered.

SQL> alter table bowie2 move online;

Table BOWIE2 altered.

If we now look at the Clustering Factor of this important index:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE2             200000      4936

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            1            473                 4850

The Clustering Factor has been reduced down to the almost perfect 4850, down from the previous 109061.

If we now re-run the query:

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
         90 Cached Commit SCN referenced
      11345 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         93 buffer is not pinned count
       8308 buffer is pinned count
        325 bytes received via SQL*Net from client
     462117 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        102 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
        102 consistent gets from cache
        101 consistent gets pin
        101 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     835584 logical read bytes from cache
        101 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          2 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
        102 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

 

We can see the query now automatically uses the index and only requires just 102 consistent gets, down from 4572 when it performed the FTS.

If we look at the cost of this new plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |   113 (100)|   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|     113 (0)|   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the plan has a cost of just 113, which is both much more accurate and close to the 102 consistent gets and much less than the previous cost of 1340 for the FTS plan.

So in specific examples where migrated rows significantly impact the Clustering Factor of indexes important to our applications, including when ROWIDs are updated on the fly in Oracle Autonomous Databases, we may need to appropriately reorg such tables to repair the Clustering Factor of impacted indexes.

I’ve mentioned a number of times in this series how tables in Oracle Autonomous Databases with ENABLE ROW MOVEMENT have their ROWIDs updated on the fly when a row migrates. In my next post, I’ll discuss how even tables that don’t have the ENABLE ROW MOVEMENT clause set can still have their ROWIDs updated on the fly when a row migrates…

Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part II (“Dancing Out In Space”) March 7, 2023

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Changing ROWID, Clustering Factor, Data Clustering, David Bowie, Full Table Scans, Index Access Path, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Richard's Musings, ROWID.
1 comment so far

In my previous post, I discussed how the clustering of data can be impacted if rows migrate and how this in turn can have a detrimental impact on the efficiency of associated indexes.

In this post, I’ll discuss what you can do (and not do) to remedy things in the relatively unlikely event that you hit this issue with migrated rows.

I’ll just discuss initially the example where the table is defined without ENABLE ROW MOVEMENT enabled in the Transaction Processing Autonomous Database (and so does NOT update ROWIDs on the fly when a row migrates).

I’ll start by again creating and populating a tightly packed table, with the data inserted in ID column order:

SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0;

Table BOWIE created.

SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

I’ll now create an index on this well ordered/clustered ID column:

SQL> create index bowie_id_i on bowie(id);

Index BOWIE_ID_I created.

Next, I’ll update the table, increasing the size of the rows such that I generate a bunch of migrated rows:

SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

 

If we check the number of migrated rows:

SQL> analyze table bowie compute statistics;

Table BOWIE analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables

where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4906              86          414            170        56186

 

We notice there are indeed 56186 migrated rows.

If we check the current Clustering Factor of the index:

SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4906

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

 

We notice the index still has an excellent Clustering Factor of just 3250. As the ROWIDs are NOT updated in this example when rows migrate, the index retains the same Clustering Factor as before the Update statement.

If we run the following query that returns 4200 rows (as per my previous post):

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          3 DB time
      24901 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2762 buffer is not pinned count
       7005 buffer is pinned count
        324 bytes received via SQL*Net from client
     461909 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2771 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
       2771 consistent gets from cache
       2770 consistent gets pin
       2770 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   22700032 logical read bytes from cache
       2770 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       2771 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
       1366 table fetch continued row
          3 user calls

We can see the query currently uses 2771 consistent gets, which is significantly higher than it could be, as Oracle has to visit the original table block and then follow the pointer to the new location for any migrated row that needs to be retrieved.

However, if we look at the cost of the current plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|      80 (0)|   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see it only has a cost of 80, as Oracle does not consider the additional accesses required now for these migrated rows. With such a perfect Clustering Factor, this cost is not particularly accurate and does not represent the true cost of the 2771 consistent gets now required.

Now there are various ways we can look at fixing this issue with all these migrated rows requiring additional consistent gets to access.

One method is to capture all the ROWIDs of the migrated rows, copy these rows to a temporary holding table, delete these rows and then re-insert them all back into the table from the temporary table.

We can identify the migrated rows by creating the CHAIN_ROWS table as per the Oracle supplied UTLCHAIN.SQL script and then use the ANALYZE command to store their ROWIDs in this CHAIN_ROWS table:

SQL> create table CHAINED_ROWS (
2 owner_name varchar2(128),
3 table_name varchar2(128),
4 cluster_name varchar2(128),
5 partition_name varchar2(128),
6 subpartition_name varchar2(128),
7 head_rowid rowid,
8 analyze_timestamp date
9* );

Table CHAINED_ROWS created.

SQL> analyze table bowie list chained rows;

Table BOWIE analyzed.

SQL> select table_name, head_rowid from chained_rows where table_name='BOWIE' and rownum<=10;

   TABLE_NAME            HEAD_ROWID
_____________ _____________________
BOWIE         AAAqFjAAAAAE6CzAAP
BOWIE         AAAqFjAAAAAE6CzAAR
BOWIE         AAAqFjAAAAAE6CzAAU
BOWIE         AAAqFjAAAAAE6CzAAW
BOWIE         AAAqFjAAAAAE6CzAAZ
BOWIE         AAAqFjAAAAAE6CzAAb
BOWIE         AAAqFjAAAAAE6CzAAe
BOWIE         AAAqFjAAAAAE6CzAAg
BOWIE         AAAqFjAAAAAE6CzAAj
BOWIE         AAAqFjAAAAAE6CzAAl

 

Another method we can now utilise is to simply MOVE ONLINE the table:

SQL> alter table bowie move online;

Table BOWIE altered.

 

If we now look at the number of migrated rows after the table reorg:

SQL> analyze table bowie compute statistics;

Table BOWIE analyzed.

SQL> select table_name, num_rows, blocks, empty_blocks, avg_space, avg_row_len, chain_cnt from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS    EMPTY_BLOCKS    AVG_SPACE    AVG_ROW_LEN    CHAIN_CNT
_____________ ___________ _________ _______________ ____________ ______________ ____________
BOWIE              200000      4936              56          838            169            0

 

We can see we no longer have any migrated rows.

BUT, if we now look at the Clustering Factor of this index:

SQL> execute dbms_stats.delete_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4936

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473               114560

 

We can see it has now significantly increased to 114560 (previously it was just 3250).

The problem of course is that if the ROWIDs now represent the correct new physical location of the migrated rows, the previously perfect clustering/ordering of the ID column has been impacted.

If we now re-run the query returning the 4200 rows:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1845943507

---------------------------------------------------------------------------------------------
| Id | Operation                  | Name  | Starts | E-Rows | A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |       |      1 |        |   4200 |00:00:00.02 |    4857 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE |      1 |   4200 |   4200 |00:00:00.02 |    4857 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

Statistics
-----------------------------------------------------------
          3 CPU used by this session
          3 CPU used when call started
       4849 Cached Commit SCN referenced
          2 DB time
      25870 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
          2 buffer is not pinned count
        324 bytes received via SQL*Net from client
     461962 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          9 calls to kcmgcs
       4857 consistent gets
       4857 consistent gets from cache
       4857 consistent gets pin
       4857 consistent gets pin (fastpath)
          2 execute count
   39788544 logical read bytes from cache
       4850 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          2 process last non-idle time
          1 session cursor cache count
       4857 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4850 table scan blocks gotten
     200000 table scan disk non-IMC rows gotten
     200000 table scan rows gotten
          1 table scans (short tables)
          3 user calls

 

Oracle is now performing a Full Table Scan (FTS). The number of consistent gets now at 4857 is actually worse than when we had the migrated rows (previously at 2771)

The Clustering Factor of the ID column is now so bad, that returning 4200 rows via such an index is just too expensive. The FTS is now deemed the cheaper option by the CBO.

If we look at the CBO cost of using this FTS plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1845943507

------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name  | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |       |      1 |        |       |  1340 (100)|   4200 |00:00:00.02 |    4857 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE |      1 |   4200 |   684K|    1340 (1)|   4200 |00:00:00.02 |    4857 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

We can see the cost of this plan is 1340.

If we compare this with the cost of using the (now deemed) inefficient index:

SQL> select /*+ index (bowie) */ * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID 9215hkzd3v1up, child number 0
-------------------------------------
select /*+ index (bowie) */ * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |    2784 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |    2784 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
       2741 Cached Commit SCN referenced
          2 DB time
      12633 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2775 buffer is not pinned count
       5626 buffer is pinned count
        345 bytes received via SQL*Net from client
     462170 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2784 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
       2784 consistent gets from cache
       2783 consistent gets pin
       2783 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
   22806528 logical read bytes from cache
       2783 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          4 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
       2784 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'9215hkzd3v1up',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID 9215hkzd3v1up, child number 0

-------------------------------------

select /*+ index (bowie) */ * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |  2418 (100)|   4200 |00:00:00.01 |    2784 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|    2418 (1)|   4200 |00:00:00.01 |    2784 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the CBO cost of the index is now 2418, more than the 1340 cost of using the FTS.

So in the scenario where by migrating a significant number of rows, we impact the Clustering Factor and so the efficiency of vital indexes in our applications, we need to eliminate the migrated rows in a more thoughtful manner.

An option we have available is to first add an appropriate Clustering Attribute before we perform the table reorg:

SQL> alter table bowie add clustering by linear order (id);

Table BOWIE altered.

SQL> alter table bowie move online;

Table BOWIE altered.

 

If we now look at the Clustering Factor of this important index:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4936

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 4850

 

The Clustering Factor has been reduced down to the almost perfect 4850, down from the previous 114560.

If we now re-run the query:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)


Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
         89 Cached Commit SCN referenced
          1 DB time
      11249 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         93 buffer is not pinned count
       8308 buffer is pinned count
        324 bytes received via SQL*Net from client
     462165 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        102 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
        102 consistent gets from cache
        101 consistent gets pin
        101 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     835584 logical read bytes from cache
        101 no work - consistent read gets
         72 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          1 session cursor cache count
          1 session cursor cache hits
        102 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

We can see the query now automatically uses the index and only requires just 102 consistent gets (down from 4857 when it performed the FTS and down from 2771 when we had the migrated rows).

If we look at the cost of this new plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |   113 (100)|   4200 |00:00:00.01 |     102 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|     113 (0)|   4200 |00:00:00.01 |     102 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the plan has a cost of just 113, which is both much more accurate and close to the 102 consistent gets and much less than the previous cost of 1340 for the FTS plan.

So in specific scenarios where by having migrated rows we significantly impact the Clustering Factor of indexes important to our applications, we have to be a little cleverer in how we address the migrated rows.

This can also the case in the new scenario where Oracle automatically updates the ROWIDs of migrated rows, as I’ll discuss in my next post…

Possible Impact To Clustering Factor Now ROWIDs Are Updated When Rows Migrate Part I (“Growin’ Up”) March 1, 2023

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, BLEVEL, CBO, Changing ROWID, Clustering Factor, Data Clustering, Hints, Index Access Path, Index Block Splits, Index Delete Operations, Index Height, Index Internals, Index Rebuild, Index statistics, Leaf Blocks, Migrated Rows, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle Statistics, Oracle19c, Performance Tuning, Richard Foote Training, Richard's Blog, ROWID.
2 comments

In my previous post I discussed how an index can potentially be somewhat inflated in size after ROWIDs are updated on the fly after a substantial number of rows are migrated.

However, there’s another key “factor” of an index that in some scenarios can be impacted by this new ROWID behaviour with regard migrated rows.

To highlight this scenario, I’ll again start by creating and populating a table with ENABLE ROW MOVEMENT disabled:

SQL> create table bowie(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0;

Table BOWIE created.

SQL> insert into bowie SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

I’ll next create an index on the ID column. The important aspect with the ID column is that the data is entered monotonically in ID column order, so the associated index will have an excellent (very low) Clustering Factor:

SQL> create index bowie_id_i on bowie(id);

Index BOWIE_ID_I created.

If we look at some key statistics of the table and index:

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      3268

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

We can see that the number of table blocks is 3268, the number of index leaf blocks is 473 and we indeed have a near perfect Clustering Factor of 3250.

If we run a couple of queries:

SQL> select * from bowie where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0
-------------------------------------
select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       4 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
       7353 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         16 buffer is not pinned count
       1985 buffer is pinned count
        324 bytes received via SQL*Net from client
     171305 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         18 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         18 consistent gets from cache
         17 consistent gets pin
         17 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     147456 logical read bytes from cache
         17 no work - consistent read gets
         38 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
         18 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
          3 user calls

We can see for this first query that returns 1000 rows, it requires just 18 consistent gets, thanks primarily due to the efficient index with the perfect Clustering Factor.

If we look at the cost of this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gz5u92hmjwz1h',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0

-------------------------------------

select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    21 (100)|   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   108K|      21 (0)|   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |       |       4 (0)|   1000 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

We can see the plan has an accurate cost of just 21.

If we now run a similar query that returns a few more rows:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
          1 DB time
      11353 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         59 buffer is not pinned count
       8342 buffer is pinned count
        324 bytes received via SQL*Net from client
     461834 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         68 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         68 consistent gets from cache
         67 consistent gets pin
         67 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     557056 logical read bytes from cache
         67 no work - consistent read gets
         73 non-idle wait count
         2 opened cursors cumulative
         1 opened cursors current
         2 parse count (total)
         1 process last non-idle time
         2 session cursor cache count
        68 session logical reads
         1 sorts (memory)
      2024 sorts (rows)
      4200 table fetch by rowid
         3 user calls

We can see that it only required just 68 consistent gets to return 4200 rows, thanks to the excellent data clustering and associated very low Clustering Factor.

If we look at the cost of this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   455K|      80 (0)|   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We can see the cost of the plan is currently a relatively accurate 80.

OK, let’s now perform an update on this table that generates a bunch of migrated rows:

SQL> update bowie set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

If we now look at the table and index statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE              200000      4906

We can see that the table blocks value has increased to 4906 (previously 3268). This as explained previously is to due in large part to the increased NAME column values and also due to the pointers in the original table blocks that point to the new locations of the migrated rows.

This relates to approximately a 50% increase in table blocks.

If we look at the current index statistics:

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE';

   INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
_____________ _________ ______________ ____________________
BOWIE_ID_I            1            473                 3250

We can see that these values are all unchanged, as the ROWIDs in indexes remain unchanged when a row migrates, when ENABLE ROW MOVEMENT is not set.

Therefore, when we re-run these same queries:

SQL> select * from bowie where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0
-------------------------------------
select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   1000 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   1000 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       4 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 DB time
       7967 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
        664 buffer is not pinned count
       1664 buffer is pinned count
        324 bytes received via SQL*Net from client
     171419 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        666 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
        666 consistent gets from cache
        665 consistent gets pin
        665 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
    5455872 logical read bytes from cache
        665 no work - consistent read gets
         37 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
        666 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
        327 table fetch continued row
          3 user calls

The number of consistent gets has increased significantly to 666 (previously it was just 18).

Now we can attributed an increase of approximately 50% of the previous consistent gets (18 x 0.50 = 9) due to the 50% increase in table blocks required now to store the rows due to the increased row size.

We can also attribute an additional 327 consistent gets for the table fetch continued row value listed in the statistics, representing the extra consistent gets required to access the migrated rows from their new physical location.

But 18 + 9 + 327 = 354 still leaves us short of the new 666 consistent gets value.

The problem with having to visit another table block to get a row from its new location is that it means Oracle has to re-access again the original table block to get the next row (rather than reading multiple rows with the same consistent get).

So it’s actually approximately 2 x table fetch continued row, by which the number of consistent gets is going to increase when accessing migrated rows (noting that the last migrated row in a block will only incur a additional consistent get as the next table block accessed will differ regardless).

If we look at the new CBO cost for this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gz5u92hmjwz1h',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________________________________
SQL_ID gz5u92hmjwz1h, child number 0

-------------------------------------

select * from bowie where id between 1 and 1000

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    21 (100)|   1000 |00:00:00.01 |     666 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   1000 |   163K|      21 (0)|   1000 |00:00:00.01 |     666 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   1000 |       |       4 (0)|   1000 |00:00:00.01 |       4 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

 

We notice the CBO cost for this plan remains unchanged at 21.

This is totally to be expected, as the index statistics by which the cost of an index scan is calculated are unchanged.

Considering the rough “rule of thumb” is that the CBO cost of an index scan should be in the ball-park of the number of possible IOs, the fact the plan now uses 666 consistent gets highlights this cost of just 21 is no longer as accurate…

If we look at the second SQL that returns 4200 rows:

SQL> select * from bowie where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_______________________________________________________________________________________________________________
SQL_ID c376kdhy5b0x9, child number 0
-------------------------------------
select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows | A-Rows | A-Time     | Buffers |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          2 DB time
      14103 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2762 buffer is not pinned count
       7005 buffer is pinned count
        324 bytes received via SQL*Net from client
      461947 bytes sent via SQL*Net to client
           2 calls to get snapshot scn: kcmgss
           2 calls to kcmgcs
        2771 consistent gets
           1 consistent gets examination
           1 consistent gets examination (fastpath)
        2771 consistent gets from cache
        2770 consistent gets pin
        2770 consistent gets pin (fastpath)
           2 execute count
           1 index range scans
    22700032 logical read bytes from cache
        2770 no work - consistent read gets
          72 non-idle wait count
           2 opened cursors cumulative
           1 opened cursors current
           2 parse count (total)
           1 process last non-idle time
           2 session cursor cache count
        2771 session logical reads
           1 sorts (memory)
        2024 sorts (rows)
        4200 table fetch by rowid
        1366 table fetch continued row
           3 user calls

We again notice consistent gets has increased significantly to 2771 (previously it was just 68). Again, these additional consistent gets can not be attributed to the extra size of the table and the additional approximate 2 x 1366 table fetch continued row gets.

If we now look at the cost of this plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'c376kdhy5b0x9',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________________
____________

SQL_ID c376kdhy5b0x9, child number 0

-------------------------------------

select * from bowie where id between 1 and 4200

Plan hash value: 1405654398

---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name       | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
---------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |            |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |    2771 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE      |      1 |   4200 |   684K|      80 (0)|   4200 |00:00:00.01 |    2771 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
---------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

We again notice the CBO cost for this plan remains unchanged at 80, again totally expected as the underlying index statistics have remain unchanged after the update statement.

But again, not necessary as accurate a cost as it was previously…

 

If we repeat this demo, but this time on a table with ENABLE ROW MOVEMENT enabled:

SQL> create table bowie2(id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, name varchar2(142)) PCTFREE 0 ENABLE ROW MOVEMENT;

Table BOWIE2 created.

SQL> insert into bowie2 SELECT rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, rownum, 'BOWIE' FROM dual CONNECT BY LEVEL <= 200000;

200,000 rows inserted.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

SQL> create index bowie2_id_i on bowie2(id);

Index BOWIE2_ID_I created.

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME    NUM_ROWS    BLOCKS
_____________ ___________ _________
BOWIE2             200000      3268

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

        INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
__________________ _________ ______________ ____________________
BOWIE2_ID_I                1            473                 3250

 

The table and index statistics are currently identical to the previous demo.

If we run the same two equivalent queries:

 

SQL> select * from bowie2 where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       4 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
       7909 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         16 buffer is not pinned count
       1985 buffer is pinned count
        325 bytes received via SQL*Net from client
     171306 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         18 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         18 consistent gets from cache
         17 consistent gets pin
         17 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     147456 logical read bytes from cache
         17 no work - consistent read gets
         37 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
         18 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
     3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gtkw2704bxj7q',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |    21 (100)|   1000 |00:00:00.01 |      18 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   108K|      21 (0)|   1000 |00:00:00.01 |      18 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |       |       4 (0)|   1000 |00:00:00.01 |       4 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)



SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      11 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
          2 DB time
      13157 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
         59 buffer is not pinned count
       8342 buffer is pinned count
        325 bytes received via SQL*Net from client
     461838 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
         68 consistent gets
          1 consistent gets examination
          1 consistent gets examination (fastpath)
         68 consistent gets from cache
         67 consistent gets pin
         67 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
     557056 logical read bytes from cache
         67 no work - consistent read gets
         73 non-idle wait count
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
         68 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       4200 table fetch by rowid
          3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |    80 (100)|   4200 |00:00:00.01 |      68 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   455K|      80 (0)|   4200 |00:00:00.01 |      68 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      11 (0)|   4200 |00:00:00.01 |      11 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

 

With identical table/index statistics, we notice as expected that both SQLs have the same consistent gets and CBO costs as with the previous demo.

If we now repeat the equivalent Update statement:

SQL> update bowie2 set name='THE RISE AND FALL OF BOWIE STARDUST AND THE SPIDERS FROM MARS';

200,000 rows updated.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE2', estimate_percent=> null, no_invalidate=>false);

PL/SQL procedure successfully completed.

 

If we look at the table statistics:

SQL> select table_name, num_rows, blocks from user_tables where table_name='BOWIE2';

   TABLE_NAME   NUM_ROWS     BLOCKS
_____________ ___________ _________
BOWIE2             200000      4654

 

We notice the number of table blocks has increased to 4654 due to the increased row lengths, but not as much as with the previous demo (where table blocks increased to 4906) as in this scenario, Oracle does not have to store the row location pointers in the original blocks for the migrated rows.

If we look at the index statistics:

SQL> select index_name, blevel, leaf_blocks, clustering_factor from user_indexes where table_name='BOWIE2';

    INDEX_NAME    BLEVEL    LEAF_BLOCKS    CLUSTERING_FACTOR
______________ _________ ______________ ____________________
BOWIE2_ID_I            2            945               109061

We notice that these are substantially different from the first demo, where ROWIDs for migrated rows are not updated on the fly.

By now updating the ROWIDs, the indexes can possibly increase in size as they have to store both the previous and new ROWIDs in separate index entries and hence Oracle is more likely to perform additional index block splits (as I discussed in my previous post).

The LEAF_BLOCKS are now 945 (previously 473) and even the BLEVEL has increased from 1 to 2.

Additionally, and perhaps importantly for specific key indexes, the Clustering Factor value of indexes can also be impacted. By migrating rows and physically storing them in different locations, this can potentially detrimentally impact the tight clustering of rows based on specific column values.

The Clustering Factor for the index on the monotonically increased ID column has now increased significantly to 109061, up from the previously perfect 3250.

So columns that have naturally good clustering (e.g.: monotonically increasing values such as IDs and dates) or have been manually well clustered for performance purposes, can have the Clustering Factor of associated indexes detrimentally impacted by migrated rows.

If we re-run the first query:

SQL> select * from bowie2 where id between 1 and 1000;

1,000 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   1000 |00:00:00.01 |     639 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   1000 |00:00:00.01 |     639 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |   1000 |00:00:00.01 |       7 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          1 CPU used by this session
          1 CPU used when call started
          1 DB time
      15262 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
        634 buffer is not pinned count
       1367 buffer is pinned count
        325 bytes received via SQL*Net from client
     171421 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
        639 consistent gets
          2 consistent gets examination
          2 consistent gets examination (fastpath)
        639 consistent gets from cache
        637 consistent gets pin
        637 consistent gets pin (fastpath)
          2 execute count
          1 index range scans
    5234688 logical read bytes from cache
        637 no work - consistent read gets
         38 non-idle wait count
          1 non-idle wait time
          2 opened cursors cumulative
          1 opened cursors current
          2 parse count (total)
          1 process last non-idle time
          2 session cursor cache count
        639 session logical reads
          1 sorts (memory)
       2024 sorts (rows)
       1000 table fetch by rowid
          3 user calls

I discussed in a previous post how by updating the ROWIDs of migrated rows we can improve performance, as Oracle can go directly to the correct new physical location of a migrated row.

But for some specific indexes, where data clustering is crucial, and we have a significant number migrated rows, this might not necessarily be the case.

We can see consistent gets here has increased to 639 (previously is was just 21), and so not hugely different from the 666 consistent gets required to fetch the migrated rows when the ROWIDs were not updated in the first demo.

If we look at the CBO costings:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'gtkw2704bxj7q',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID gtkw2704bxj7q, child number 0

-------------------------------------

select * from bowie2 where id between 1 and 1000

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |   553 (100)|   1000 |00:00:00.01 |     639 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   1000 |   163K|     553 (0)|   1000 |00:00:00.01 |     639 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   1000 |       |       7 (0)|   1000 |00:00:00.01 |       7 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=1000)

 

We can see the CBO cost has increased significantly to 553 (previously it was just 21).

With a much increased Clustering Factor, this will obviously impact the CBO costs of associated index scans.

In very extreme cases, these possible changes in the Clustering Factor can even impact the viability of using the index.

If we re-run the second query returning the 4200 rows:

SQL> select * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

----------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows | A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   4200 |00:00:00.02 |    4572 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

We can see that the CBO has now chosen to perform a Full Table Scan (FTS), rather than use the now less efficient index to return this number of rows.

If we look at the CBO costings of this FTS plan:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'25qktyn35b662',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
______________________________________________________________________________________________________________________
SQL_ID 25qktyn35b662, child number 0
-------------------------------------
select * from bowie2 where id between 1 and 4200

Plan hash value: 1495904576

-------------------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT           |        |      1 |        |       |  1264 (100)|   4200 |00:00:00.02 |    4572 |
|* 1 |  TABLE ACCESS STORAGE FULL | BOWIE2 |      1 |   4200 |   684K|    1264 (1)|   4200 |00:00:00.02 |    4572 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(("ID"<=4200 AND "ID">=1))
       filter(("ID"<=4200 AND "ID">=1))

 

The cost of the FTS plan is 1264.

If we compare this is a plan that used the index:

SQL> select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200;

4,200 rows selected.

PLAN_TABLE_OUTPUT
________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0
-------------------------------------
select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

-------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows | A-Rows | A-Time     | Buffers |
-------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |   4200 |00:00:00.01 |      21 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=1 AND "ID"<=4200)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1 because of no expensive parallel operation

Statistics
-----------------------------------------------------------
          2 CPU used by this session
          2 CPU used when call started
          2 DB time
      14531 RM usage
          3 Requests to/from client
          2 SQL*Net roundtrips to/from client
       2646 buffer is not pinned count
       5755 buffer is pinned count
        348 bytes received via SQL*Net from client
     462143 bytes sent via SQL*Net to client
          2 calls to get snapshot scn: kcmgss
          2 calls to kcmgcs
       2665 consistent gets
         2 consistent gets examination
         2 consistent gets examination (fastpath)
      2665 consistent gets from cache
      2663 consistent gets pin
      2663 consistent gets pin (fastpath)
         2 execute count
         1 index range scans
  21831680 logical read bytes from cache
      2663 no work - consistent read gets
        73 non-idle wait count
         2 opened cursors cumulative
         1 opened cursors current
         2 parse count (total)
         3 process last non-idle time
         2 session cursor cache count
      2665 session logical reads
         1 sorts (memory)
      2024 sorts (rows)
      4200 table fetch by rowid
         3 user calls

SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(sql_id=>'bzm2vhchqpq7w',format=>'ALLSTATS LAST +cost +bytes'));

PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________________________________
SQL_ID bzm2vhchqpq7w, child number 0

-------------------------------------

select /*+ index (bowie2) */ * from bowie2 where id between 1 and 4200

Plan hash value: 3243780227

----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                            | Name        | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows | A-Time     | Buffers |
----------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |             |      1 |        |       |  2314 (100)|   4200 |00:00:00.01 |    2665 |
|  1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE2      |      1 |   4200 |   684K|    2314 (1)|   4200 |00:00:00.01 |    2665 |
|* 2 |   INDEX RANGE SCAN                   | BOWIE2_ID_I |      1 |   4200 |       |      22 (0)|   4200 |00:00:00.01 |      21 |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ID">=1 AND "ID"<=4200)

 

The cost of using the index to retrieve the 4200 rows is 2310, more than the 1264 of the FTS.

 

For the vast majority of indexes, updating the ROWIDs for migrated rows will result in better performance, as such indexes will be able to directly access the correct new physical location of migrated rows, rather than having to visit the original table block and then follow the stored pointer to the new table block.

But for some very specific indexes, where data clustering is crucial, AND we have a significant number migrated rows, this might not necessarily be the case. The performance benefit might be minimal at best.

That’s more than enough for one post 🙂

In my next post, I’ll discuss how to potentially remedy these performance implications, both for tables with or without ENABLE TABLE MOVEMENT enabled…

When Does A ROWID Change? Part IV (“Mass Production”) December 21, 2022

Posted by Richard Foote in Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Changing ROWID, Clustering Factor, Data Clustering, Flashback, Move Partitions, Oracle, Oracle Blog, Oracle Cloud, Oracle General, Oracle Indexes, Partitioning, Richard's Blog, ROWID.
1 comment so far

In Part II in this series, I discussed how the update of the partitioned key column of a row that results in the row being moved to a different partition, will result in the ROWID of such rows changing.

However, there a quite a number of other user initiated actions in which ROWIDs can easily change (as indeed discussed in Connor McDonald’s video on this subject).

Some of these include:

  • Moving a table or partition, as this results in the segment being reorganised, with all associated rows being physically relocated and their associated ROWIDs changing
  • Altering a non-partitioned table such that it be now be partitioned, which again results in the physical relocation of all rows and their ROWIDs changing (which BTW, can potentially occur on a Autonomous Database without any user intervention)
  • Altering the partitioning strategy of a partitioned table, again changes the physical location of all rows
  • Hybrid Columnar Compression (HCC), which by packing rows more tightly, can more likely result in the physical relocation of a row during subsequent DML statements
  • Altering a table to Shrink Space, which attempts to move rows between table blocks to pack rows more tightly, again potentially resulting in rows physically moving and the changing of their associated ROWIDs
  • Flashback of a table, which results in rows being deleted and inserted and hence the change of their associated ROWIDs

I’ll illustrate an example of all this, with one of the key reasons why you may want to re-organise a table (and implicitly change all the ROWIDs of a table).

I’ll start by creating and populating a simple little table, with a CODE column that has very poorly clustered data:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum, 500), 'DAVID BOWIE' from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

Let’s now create an index on this CODE column:

SQL> create index bowie_code_i on bowie(code);

Index created.

We take note of the ROWIDs of a few random rows:

SQL> select id, rowid from bowie where id in (42, 4242, 424242) order by id;

        ID ROWID
---------- ------------------
        42 AAASn1AAMAAAgB2AAp
      4242 AAASn1AAMAAAgCHACL
    424242 AAASn1AAMAAAgbtAAJ

If we run a simple query with a predicate based on the CODE column:

SQL> select * from bowie where code=42;

2000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       | 2000 | 42000 |    1004 (2)| 00:00:01 |
| * 1 |  TABLE ACCESS FULL | BOWIE | 2000 | 42000 |    1004 (2)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CODE"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       3596 consistent gets
          0 physical reads
          0 redo size
      20757 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       2000 rows processed

We notice the CBO has chosen to ignore the index and use a FTS instead, even though only 2000 rows in a 1M row table (just 0.2%) are returned.

Why?

Because the clustering of the CODE data is terrible, with the required values littered throughout the table. If we look at the Clustering Factor of the index:

SQL> select index_name, leaf_blocks, clustering_factor from user_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME   LEAF_BLOCKS CLUSTERING_FACTOR
------------ ----------- -----------------
BOWIE_CODE_I        2063           1000000

We notice the index has the worst possible Clustering Factor value of 1000000.

So to improve the performance of this (say critical) query, we can add a Clustering Attribute to this table based on the CODE column and then reorganise the table:

SQL> alter table bowie add clustering by linear order (code);

Table altered.

SQL> alter table bowie move online;

Table altered.

If we now look at the Clustering Factor of the index:

SQL> select index_name, leaf_blocks, clustering_factor from user_indexes where index_name='BOWIE_CODE_I';

INDEX_NAME   LEAF_BLOCKS CLUSTERING_FACTOR
------------ ----------- -----------------
BOWIE_CODE_I        2063              3568

We can see it has substantially improved, down to just 3568 from the previous 1000000 value, as the data is now perfectly clustered based on the CODE column.

If we now re-run the query:

SQL> select * from bowie where code=42;

2000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 853003755

----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              | 2000 | 42000 |      15 (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE        | 2000 | 42000 |      15 (0)| 00:00:01 |
| * 2 |   INDEX RANGE SCAN                   | BOWIE_CODE_I | 2000 |       |       7 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CODE"=42) 

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
         17 consistent gets
          0 physical reads
          0 redo size
      50735 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       2000 rows processed

The CBO now choses to use the index and the query is much more efficient as a result (consistent gets down to just 17 from the previous 3596).

So all is now much better, except for any application that was reliant on using ROWIDs to fetch the data, as all ROWIDs have now changed:

SQL> select id, rowid from bowie where id in (42, 4242, 424242) order by id;

        ID ROWID
---------- ------------------
        42 AAASn6AAMAAAACvAEf
      4242 AAASn6AAMAAAiRaAA4
    424242 AAASn6AAMAAAiRWAEQ

So there are many ways in which the ROWID of a row can potentially change.

And now there’s another key manner in which a ROWID can very easily change in Oracle Autonomous Database environments, as I’ll next discuss…

Oracle 19c Automatic Indexing: Invisible/Valid Automatic Indexes (Bowie Rare) August 31, 2021

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Exadata, Index Access Path, Index statistics, Invisible Indexes, Invisible/Valid Indexes, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle Statistics, Oracle19c, Unusable Indexes.
1 comment so far

In my previous post, I discussed how newly created Automatic Indexes can have one of three statuses, depending the selectivity and effectiveness of the associated Automatic Index.

Indexes that improve performance sufficiently are created as Visible/Valid indexes and can be subsequently considered by the CBO. Indexes that are woeful and have no chance of improving performance are created as Invisible/Unusable indexes. Indexes considered potentially suitable but ultimately don’t sufficiently improve performance, are created as Invisible/Valid indexes.

Automatic Indexes are created as Visible/Valid indexes when shown to improve performance (by the _AUTO_INDEX_IMPROVEMENT_THRESHOLD parameter). But as I rarely came across Invisible/Valid Automatic Indexes (except for when Automatic Indexing is set to “Report Only” mode), I was curious to determine approximately at what point were such indexes created by the Automatic Indexing process.

To investigate things, I created a table with columns that contain data with various levels of selectivity, some of which should fall inside and outside the range of viability of any associated index, based on the cost of the associated Full Table Scan.

The following table has 32 columns of interest, each with a slight variation of distinct values giving small differences in overall column selectivity:

SQL> create table bowie_stuff1 (id number, code1 number, code2 number, code3 number, code4 number, code5 number, code6 number, code7 number, code8 number, code9 number, code10 number, code11 number, code12 number, code13 number, code14 number, code15 number, code16 number, code17 number, code18 number, code19 number, code20 number, code21 number, code22 number, code23 number, code24 number, code25 number, code26 number, code27 number, code28 number, code29 number, code30 number, code31 number, code32 number, name varchar2(42));

Table created.

SQL> insert into bowie_stuff1 
select rownum, 
       mod(rownum, 900)+1, 
       mod(rownum, 1000)+1, 
       mod(rownum, 1100)+1, 
       mod(rownum, 1200)+1, 
       mod(rownum, 1300)+1, 
       mod(rownum, 1400)+1, 
       mod(rownum, 1500)+1, 
       mod(rownum, 1600)+1, 
       mod(rownum, 1700)+1, 
       mod(rownum, 1800)+1, 
       mod(rownum, 1900)+1, 
       mod(rownum, 2000)+1, 
       mod(rownum, 2100)+1, 
       mod(rownum, 2200)+1, 
       mod(rownum, 2300)+1, 
       mod(rownum, 2400)+1, 
       mod(rownum, 2500)+1, 
       mod(rownum, 2600)+1, 
       mod(rownum, 2700)+1, 
       mod(rownum, 2800)+1, 
       mod(rownum, 2900)+1, 
       mod(rownum, 3000)+1, 
       mod(rownum, 3100)+1, 
       mod(rownum, 3200)+1, 
       mod(rownum, 3300)+1, 
       mod(rownum, 3400)+1, 
       mod(rownum, 3500)+1, 
       mod(rownum, 3600)+1, 
       mod(rownum, 3700)+1, 
       mod(rownum, 3800)+1, 
       mod(rownum, 3900)+1, 
       mod(rownum, 4000)+1,
       'THE RISE AND FALL OF ZIGGY STARDUST' 
from dual connect by level >=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

As always, it’s important that statistics be collected for Automatic Indexing to function properly:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STUFF1', estimate_percent=>null);

PL/SQL procedure successfully completed.

 

So on a 10M row table, I have 32 columns with the number of distinct values varying by only 100 values per column (or by a selectivity of just 0.001%):

SQL> select column_name, num_distinct, density, histogram from dba_tab_columns where table_name='BOWIE_STUFF1' order by num_distinct;

COLUMN_NAME  NUM_DISTINCT    DENSITY HISTOGRAM
------------ ------------ ---------- ---------------
NAME                    1  .00000005 FREQUENCY
CODE1                 900    .001111 HYBRID
CODE2                1000       .001 HYBRID
CODE3                1100    .000909 HYBRID
CODE4                1200    .000833 HYBRID
CODE5                1300    .000769 HYBRID
CODE6                1400    .000714 HYBRID
CODE7                1500    .000667 HYBRID
CODE8                1600    .000625 HYBRID
CODE9                1700    .000588 HYBRID
CODE10               1800    .000556 HYBRID
CODE11               1900    .000526 HYBRID
CODE12               2000      .0005 HYBRID
CODE13               2100    .000476 HYBRID
CODE14               2200    .000455 HYBRID
CODE15               2300    .000435 HYBRID
CODE16               2400    .000417 HYBRID
CODE17               2500      .0004 HYBRID
CODE18               2600    .000385 HYBRID
CODE19               2700     .00037 HYBRID
CODE20               2800    .000357 HYBRID
CODE21               2900    .000345 HYBRID
CODE22               3000    .000333 HYBRID
CODE23               3100    .000323 HYBRID
CODE24               3200    .000312 HYBRID
CODE25               3300    .000303 HYBRID
CODE26               3400    .000294 HYBRID
CODE27               3500    .000286 HYBRID
CODE28               3600    .000278 HYBRID
CODE29               3700     .00027 HYBRID
CODE30               3800    .000263 HYBRID
CODE31               3900    .000256 HYBRID
CODE32               4000     .00025 HYBRID
ID               10000000          0 HYBRID

I’ll next run the below queries (based on a simple equality predicate on each column) several times each in batches of 8 queries, so as to not swamp the Automatic Indexing process with potential new index requests (the ramifications of which I’ll discuss in another future post):

SQL> select * from bowie_stuff1 where code1=42;
SQL> select * from bowie_stuff1 where code2=42;
SQL> select * from bowie_stuff1 where code3=42;
SQL> select * from bowie_stuff1 where code4=42;
SQL> select * from bowie_stuff1 where code5=42;
...
SQL> select * from bowie_stuff1 where code31=42;
SQL> select * from bowie_stuff1 where code32=42;

 

If we now look at the statuses of the Automatic Indexes subsequently created:

SQL> select i.index_name, c.column_name, i.auto, i.constraint_index, i.visibility, i.status, i.num_rows, i.leaf_blocks, i.clustering_factor
from user_indexes i, user_ind_columns c
where i.index_name=c.index_name and i.table_name='BOWIE_STUFF1' order by visibility, status;

INDEX_NAME             COLUMN_NAME  AUT CON VISIBILIT STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------- ------------ --- --- --------- -------- ---------- ----------- -----------------
SYS_AI_5rw9j3d8pc422   CODE5        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_48q3j752csn1p   CODE4        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_9sgharttf3yr7   CODE3        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_8n92acdfbuh65   CODE2        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_brgtfgngu3cj9   CODE1        YES NO  INVISIBLE UNUSABLE   10000000       21702           4272987
SYS_AI_1tu5u4012mkzu   CODE11       YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_34b6zwgtm86rr   CODE12       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_gd0ccvdwwb4mk   CODE13       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_7k7wh28n3nczy   CODE14       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_67k2zjp09w101   CODE15       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_5fa6k6fm0k6wg   CODE10       YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_4624ju6bxsv57   CODE9        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_bstrdkkxqtj4f   CODE8        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_39xqjjar239zq   CODE7        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_6h0adp60faytk   CODE6        YES NO  INVISIBLE VALID      10000000       15364          10000000
SYS_AI_5u0bqdgcx52vh   CODE16       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_0hzmhsraqkcgr   CODE22       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_4x716k4mdn040   CODE21       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_6wsuwr7p6drsu   CODE20       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_b424tdjx82rwy   CODE19       YES NO  INVISIBLE VALID      10000000       15366          10000000
SYS_AI_3a2y07fqkzv8x   CODE18       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_8dp0b3z0vxzyg   CODE17       YES NO  INVISIBLE VALID      10000000       15365          10000000
SYS_AI_d95hnqayd7t08   CODE23       YES NO  VISIBLE   VALID      10000000       15366          10000000
SYS_AI_fry4zrxqtpyzg   CODE24       YES NO  VISIBLE   VALID      10000000       15366          10000000
SYS_AI_920asb69q1r0m   CODE25       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_026pa8880hnm2   CODE31       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_96xhzrguz2qpy   CODE32       YES NO  VISIBLE   VALID      10000000       15368          10000000
SYS_AI_3dq93cc7uxruu   CODE29       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_5nbz41xny8fvc   CODE28       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_fz4q9bhydu2qt   CODE27       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_0kwczzg3k3pfw   CODE26       YES NO  VISIBLE   VALID      10000000       15367          10000000
SYS_AI_4qd5tsab7fnwx   CODE30       YES NO  VISIBLE   VALID      10000000       15367          10000000

We can see we indeed have the 3 statuses of Automatic Indexes captured:

Columns with a selectivity equal or worse to that of COL5 with 1300 distinct values are created as Invisible/Unusable indexes. Returning 10M/1300 rows or a cardinality of approx. 7,693 or more rows is just too expensive for such indexes on this table to be viable. This represents a selectivity of approx. 0.077%.

Note how the index statistics for these Invisible/Unusable indexes are not accurate. They all have an estimated LEAF_BLOCKS of 21702 and a CLUSTERING_FACTOR of 4272987. However, we can see from the other indexes which are physically created that these are not correct and are substantially off the mark with the actual LEAF_BLOCKS being around 15364 and the CLUSTERING_FACTOR actually much worse at around 10000000.

Again worthy of a future post to discuss how Automatic Indexing processing has to make (potentially inaccurate) guesstimates for these statistics in its analysis of index viability when such indexes don’t yet physically exist.

Columns with a selectivity equal or better to that of COL23 which has 3100 distinct values are created as Visible/Valid indexes. Returning 10M/3100 rows or a cardinality of approx. 3226 or less rows is cheap enough for such indexes on this table to be viable. This represents a selectivity of approx. 0.032%.

So in this specific example, only those columns between 1400 and 3000 distinct values meet the “borderline” criteria in which the Automatic Indexing process creates Invisible/Valid indexes. This represents a very very narrow selectivity range of only approx. 0.045% in which such Invisible/Valid indexes are created. Or for this specific example, only those columns that return approx. between 3,333 and 7,143 rows from the 10M row table.

Now the actual numbers and total range of selectivities for which Invisible/Valid Automatic Indexes are created of course depends on all sorts of factors, such as the size/cost of FTS of the table and not least the clustering of the associated data (which I’ve blogged about ad nauseam).

The point I want to make is that the range of viability for such Invisible/Valid indexes is relatively narrow and the occurrences of such indexes relatively rare in your databases. As such, the vast majority of Automatic Indexes are likely to be either Visible/Valid or Invisible/Unusable indexes.

It’s important to recognised this when you encounter such Invisible/Valid Automatic Indexes (outside of “REPORT ONLY” implementations), as it’s an indication that such an index is a borderline case that is currently NOT considered by the CBO (because of it being Invisible).

However, this Invisible/Valid Automatic Index status should really change to either of the other two more common statuses in the near future.

I’ll expand on this point in a future post…

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part III (Star) August 11, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Internals, Index statistics, Oracle, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
2 comments

In Part I we looked at a scenario where an index was deemed to be too inefficient for Automatic Indexing to create a VALID index, because of the poor clustering of data within the table.

In Part II we improved the data clustering but the previous SQLs could still not generate a new Automatic Index because they had effectively been blacklisted.

So how do we get Automatic Indexing to improve the performance of these queries?

Basically, we need to run some new SQL statements to those previously run which have not been blacklisted, that can make the Automatic Indexing process kick in and create the necessary indexes.

For example, if we now run the following SQL statements that have not previously run:

select * from nickcave where code=1;

select * from nickcave where code=2;

select * from nickcave where code=3;

 

And now wait for the next Automatic Indexing process period and look at the following (partial) Automatic Indexing report:

 

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 22-JUN-2020 04:26:31
Activity end                 : 22-JUN-2020 04:27:25
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                              : 0
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 167.77 MB (167.77 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 3
SQL statements improved (improvement factor)  : 3 (76x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 76x


INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
------------------------------------------------------------------------
| Owner | Table    | Index                | Key  | Type   | Properties |
------------------------------------------------------------------------
| BOWIE | NICKCAVE | SYS_AI_dh8pumfww3f4r | CODE | B-TREE | NONE       |
------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 5k1wmtu7um5q9
SQL Text             : select * from nickcave where code=1
Improvement Factor   : 76x

Execution Statistics:
-----------------------------

                   Original Plan                   Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  1725103                       106145
CPU Time (s):      1534305                       62314
Buffer Gets:       291835                        779
Optimizer Cost:    9125                          792
Disk Reads:        0                             197
Direct Writes:     0                             0
Rows Processed:    500000                        100000
Executions:        5                             1

 

We can see that an index has indeed now been created on the CODE column because one of the new statements is now deemed to be 76x more efficient thanks to the new index.

If we look at details of this new Automatic Index:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  VISIBLE   DISABLED      VALID      10000000       19518             57983

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='NICKCAVE'
order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that the index is now indeed VALID and VISIBLE with a much improved Clustering Factor at just 57983.

If we now re-run newer SQL statement:

 

SQL> select * from nickcave where code=1;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("CODE"=1)

Statistics
----------------------------------------------------------
          12  recursive calls
           0  db block gets
         779  consistent gets
           0  physical reads
         176  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

We notice the SQL statement is now indeed using this new Automatic Index.

If we now re-run our original SQL statement that had been using a FTS execution plan and that we couldn’t make Automatic Indexing create a VALID index because when originally run, the data clustering was too poor within the table:

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

    8 - access("CODE"=42)

Statistics
----------------------------------------------------------
          14  recursive calls
           4  db block gets
         780  consistent gets
         198  physical reads
       15224  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

This query is now also finally using the newly created index, because the CBO now too deems it to be more efficient with an index based execution plan.

The moral of the story. Automatic Indexing may initially deem a potential index to not be efficient enough to be created. However, things may change such as the clustering of table data (or the distribution of data values, etc. etc.) that may make a new index now viable. This though requires a NEW SQL statement to be executed, such that a non-blacklisted SQL can invoke the Automatic Indexing process to create the necessary Automatic Index.

Of course, things may change in the future. Future releases may have the facility to automatically re-cluster the data in tables optimally based on existing workloads and may also have a mechanism to identify that things have sufficient “changed” such that previously “failed” SQL statements from an Automatic Indexing perspective may warrant reevaluation.

This has only been tested up to version Oracle Database 19.5 of the Oracle Autonomous Database environments.

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part II (Wild Is The Wind) August 10, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Oracle Indexes, Performance Tuning.
2 comments

 

In my previous post, I discussed a scenario in which Oracle Automatic Indexing refused to create a VALID index, because the resultant index was too inefficient to access the necessary rows due to the poor clustering of data within the table.

If the performance of such an SQL were critical for business requirements, there is a way to address this scenario, by re-clustering the data within the table to align itself with the index. Although the re-clustering table operation can now be very easily performed online since Oracle Database 12.2 (without having to use the dbms_redefinition process), this is NOT automatically performed within the Autonomous Database self-tuning framework (yet).

But it’s an activity we can perform manually to improve the performance of such critical SQLs as follows:

SQL> alter table nickcave add clustering by linear order(code);

Table altered.

SQL> alter table nickcave move online;

Table altered.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

 

With the data in the table now perfectly aligned with the index, we would ordinarily now expect the index to be more efficient method to retrieve this 1% of the data.

However, if we now re-run the previously executed SQLs each a number of times:

 

SQL> select * from nickcave where code=24;

SQL> select * from nickcave where code=42;

SQL> select * from nickcave where code=13;

And now wait until next Automatic Indexing process period:

SQL> select dbms_auto_index.report_last_activity() report from dual;

...

We notice that the Automatic Index is still NOT mentioned in the Automatic Indexing reports and still remains UNUSABLE:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE          0           0                 0

 

So what’s going on?

In order to prevent the same SQLs from being continually re-evaluated to see if an index might be preferable, the Automatic Indexing process puts previously evaluated SQLs on a type of blacklist and therefore don’t get subsequently re-evaluated.

So although the new clustering of the data within the table would now likely warrant the creation of a new index, if we just run the some SQLs as previously, nothing changes. No Automatic Index is created and the SQLs remain in their current “sub-optimal” state.

In Part III, we’ll look at how to finally get Automatic Indexing to create these indexes and improve the performance of these queries…

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part I (Don’t Look Down) August 6, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, Clustering Factor, Full Table Scans, Index Rebuild, Index statistics, Oracle, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Oracle19c, Performance Tuning.
4 comments

I’ve discussed many times the importance of data clustering in relation to the efficiency of indexes. With respect to the efficiency of Automatic Indexes including their usage within Oracle’s Autonomous Database environments, data clustering is just as important.

The following demo was run on an Oracle 19c database within the Oracle Autonomous Database Transaction Processing Cloud environment.

I begin by creating a simple table that has the key column CODE, in which data is populated in a manner where the data is very poorly clustered:

 

SQL> create table nickcave (id number, code number, name varchar2(42));

Table created.

SQL> insert into nickcave select rownum, mod(rownum, 100), 'Nick Cave and the Bad Seeds'
     from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'NICKCAVE');

PL/SQL procedure successfully completed.

 

So we have 100 evenly distributed distinct CODE values but they’re all distributed throughout the table.

The following SQL statement is basically returning just 1% of the data and is executed a number of times:

 

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows    | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     100K|  3613K|  9125   (5)| 00:00:01|
|   1 |  PX COORDINATOR              |          |         |       |            |         |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000 |     100K|  3613K|  9125   (5)| 00:00:01|
|   3 |    PX BLOCK ITERATOR         |          |     100K|  3613K|  9125   (5)| 00:00:01|
|*  4 |     TABLE ACCESS STORAGE FULL| NICKCAVE |     100K|  3613K|  9125   (5)| 00:00:01|
------------------------------------------------------------------------------------------

Without an index, the CBO currently has no choice but to use a Full Table Scan to access the table. So we wait for the next Automatic Index process to kick in:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

The Automatic Indexing report makes no mention of Automatic Indexes on the NICKCAVE table…

If we look to see if any indexes have actually been created:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor 
     from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='NICKCAVE'
     order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that yes, an Automatic Index (SYS_AI_dh8pumfww3f4r) has been created on the CODE column of the NICKCAVE table BUT it remains in an INVISIBLE, UNUSABLE state.

So Automatic Indexing considered an index on CODE, created it in an INVISIBLE, USABLE state but when testing it, failed in that it found it to be less efficient than the current FTS and so reverted the Automatic Index back to an UNUSABLE index.

Therefore, if we run a bunch of other similar SQL statements such as the following:

SQL> select * from nickcave where code=24;

SQL> select * from nickcave where code=42;

SQL> select * from nickcave where code=13;

 

They all use the FTS as again, the CBO has no choice with no VALID index on the CODE column available.

If we keep checking the Automatic Indexing report:

SQL> select dbms_auto_index.report_last_activity() report from dual;

 

There’s still no mention of an index on the CODE column. The existing Automatic Index remains in an UNUSABLE state:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  INVISIBLE DISABLED      UNUSABLE   10000000       20346           4158302

 

Basically, the index remains ineffective because with a Clustering Factor of 4158302, it’s just too inefficient to return the 1% (100000 rows) of the table.

Even in an Autonomous Database environment, nothing will automatically change with this scenario.

In my next post, we’ll look at how we can improve the performance of this query and get an Automatic Index to actually kick in with a USABLE index…

Answer: Anything Wrong With Query Performance? (Red Right Hand) April 11, 2018

Posted by Richard Foote in 12c, Attribute Clustering, Clustering Factor, Oracle Indexes.
add a comment

red right hand

I of course attract a highly clever readership :). As some have commented, for a single table to require 1000+ consistent gets to retrieve 1000 rows implies that each row needs to be accessed from a different block. This in turn implies the Clustering Factor for this index to be relatively bad and the associated index relatively inefficient.

If this query is very infrequently executed, then no real damage done and the index is likely a better alternative than a Full Table Scan.

However, if this query was executed very frequently (maybe 100’s of times per second), if this query featured as one of the top consuming CPU queries in an AWR report, then you could be burning more CPU than necessary. Maybe a lot lot more CPU…

Improving database performance is of course desirable but reducing a significant amount of CPU usage is always a good thing. For a start you usually pay database licenses and cloud subscriptions based on CPU consumption. The less CPU your systems use, the more head-room you have in case anything goes wrong as running out of CPU usually means performance hell for your database systems. Less CPU means more time until you need to update your infrastructure, more database systems you can run in your current environment, more time until you need to pay for more database licenses, more time until you have to increase your cloud subscriptions etc.

I have assisted many customers in significantly improving performance, in delaying IT investments costs by significantly reducing CPU wastage. Often this is based on improving queries that individually perform adequately and often when the number of rows to number of consistent gets/logical reads ratios appear OK.

So in this particular example, although things are currently deemed hunky dory,  this query can potentially be significantly improved. The root issue here is an index that has a terrible Clustering Factor being used to retrieve a significant number of rows, while being executed a significant number of times.

If we look at the current Clustering Factor:

SQL> select index_name, clustering_factor from user_indexes

where table_name='MAJOR_TOM';

INDEX_NAME           CLUSTERING_FACTOR
-------------------- -----------------
MAJOR_TOM_CODE_I               2000000

At 2000000, it’s about as bad as it can get.

As I’ve discussed previously, Oracle now has a nice way of being able change the clustering of a table by adding a Clustering Attribute to a table (12.1) and by the reorganising the table online (12.2):

SQL> alter table major_tom add clustering by linear order(code);

Table altered.

SQL> alter table major_tom move online;

Table altered.

If we look at the Clustering Factor of the index now:

SQL> select index_name, clustering_factor from user_indexes where table_name='MAJOR_TOM';

INDEX_NAME           CLUSTERING_FACTOR
-------------------- -----------------
MAJOR_TOM_CODE_I                  7322

It’s now about as good as it can get at just 7322.

If we now re-run the “problematic” query:

SQL> select * from major_tom where code=42;

1000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 4132562429

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  | 1000 | 21000 |       9 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | MAJOR_TOM        | 1000 | 21000 |       9 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN            | MAJOR_TOM_CODE_I | 1000 |       |       5 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
    0 recursive calls
    0 db block gets
   12 consistent gets
    0 physical reads
    0 redo size
26208 bytes sent via SQL*Net to client
  608 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
 1000 rows processed

The number of consistent gets has plummeted from 1006 to just 12, which is about as good as it gets when retrieving 1000 rows.

Of course the impact this change has on other queries on the table based on other columns needs to be carefully considered. But we have now potentially significantly reduced the overall CPU consumption of our database (especially if we tackle other problem queries in a similar manner).

If you have attended by “Oracle Indexing Internals and Best Practices” seminar, you already know all this as this is one of many key messages from the seminar 🙂

Improve Data Clustering on Multiple Columns Concurrently (Two Suns in the Sunset) March 12, 2018

Posted by Richard Foote in 12c, Attribute Clustering, Clustering Factor, Online DDL, Oracle Indexes.
3 comments

I’ve had a couple of recent discussions around clustering and how if you attempt to improve the clustering of a table based on a column, you thereby ruin the current clustering that might exist for a different column. The common wisdom being you can only order the data one way and if you change the order, you might improve things for one column but totally stuff things up for another.

However, that’s not strictly correct. Depending on the characteristics of your data, you can potentially order (or interleave) data based on multiple columns concurrently. It’s quite possible to have good or good enough clustering on multiple columns and this is extremely important for indexes, as the efficiency of an index can be directly impacted by the clustering of data on the underlining tables.

So to illustrate, I’m going to create a table that initially has terrible clustering on two unrelated columns (code and grade) :

SQL> create table ziggy (id number, code number, grade number, name varchar2(42));

Table created.

SQL> insert into ziggy select rownum, mod(rownum, 100)+1, ceil(dbms_random.value(0,100)), 'ZIGGY STARDUST'
from dual connect by level  commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> 'ZIGGY', method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> create index ziggy_code_i on ziggy(code);

Index created.

SQL> create index ziggy_grade_i on ziggy(grade);

Index created.

SQL> select index_name, clustering_factor, num_rows from user_indexes
where table_name='ZIGGY';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY_CODE_I                   1748800    4000000
ZIGGY_GRADE_I                  1572829    4000000

So with values for both columns distributed all throughout the table, the Clustering Factor of both the CODE and GRADE indexes are both quite poor (values of 1748800 and 1572829 respectively). Even though both columns have 100 distinct values (and so a selectivity of 1%), the CBO will likely consider the indexes too inefficient to use:

SQL> select * from ziggy where code=42;

40000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU) | Time    |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       | 40000 | 1054K |   4985 (10) | 00:00:01|
| * 1 | TABLE ACCESS FULL | ZIGGY | 40000 | 1054K |   4985 (10) | 00:00:0 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=42)

Statistics
----------------------------------------------------------
       0 recursive calls
       0 db block gets
   20292 consistent gets
       0 physical reads
       0 redo size
 1058750 bytes sent via SQL*Net to client
   29934 bytes received via SQL*Net from client
    2668 SQL*Net roundtrips to/from client
       0 sorts (memory)
       0 sorts (disk)
  40000 rows processed

SQL> select * from ziggy where grade=42;

40257 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       | 40000 | 1054K |  5021 (10) | 00:00:01 |
| * 1 | TABLE ACCESS FULL | ZIGGY | 40000 | 1054K |  5021 (10) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("GRADE"=42)

Statistics
----------------------------------------------------------
       0 recursive calls
       0 db block gets
   20307 consistent gets
       0 physical reads
       0 redo size
 1065641 bytes sent via SQL*Net to client
   30121 bytes received via SQL*Net from client
    2685 SQL*Net roundtrips to/from client
       0 sorts (memory)
       0 sorts (disk)
   40257 rows processed

So even though the CBO has got the row estimates just about spot on, in both cases a Full Table Scan was chosen.

Let’s create another table based on the table above but this time order the data in CODE column order:

SQL> create table ziggy2 as select * from ziggy order by code;

Table created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> 'ZIGGY2', method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> create index ziggy2_code_i on ziggy2(code);

Index created.

SQL> create index ziggy2_grade_i on ziggy2(grade);

Index created.

SQL> select index_name, clustering_factor, num_rows from user_indexes 

where table_name='ZIGGY2';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY2_CODE_I                    17561    4000000
ZIGGY2_GRADE_I                 1577809    4000000

We can see that by doing so, we have significantly reduced the Clustering Factor of the CODE index (down from 1748800 to just 17561) . The GRADE index though has changed little as there’s little co-relation between the CODE and GRADE columns.

If we now run the same query with the CODE based predicate:

SQL> select * from ziggy2 where code=42;

40000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 16801974

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name          | Rows  | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |               | 40000 | 1054K |     264 (4) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY2        | 40000 | 1054K |     264 (4) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY2_CODE_I | 40000 |       |      84 (5) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
       0 recursive calls
       0 db block gets
     273 consistent gets
       0 physical reads
       0 redo size
 1272038 bytes sent via SQL*Net to client
     685 bytes received via SQL*Net from client
       9 SQL*Net roundtrips to/from client
       0 sorts (memory)
       0 sorts (disk)
   40000 rows processed

The CBO has not only used the index, but the query is much more efficient as a result, with just 273 consistent gets required to retrieve 40000 rows.

However the query based on the GRADE predicate still uses a FTS:

SQL> select * from ziggy2 where grade=42;

40257 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1810052534

----------------------------------------------------------------------------
| Id | Operation         | Name   | Rows  | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |        | 40000 | 1054K |   4920 (10) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | ZIGGY2 | 40000 | 1054K |   4920 (10) | 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("GRADE"=42)

Statistics
----------------------------------------------------------
      0 recursive calls
     11 db block gets
  17602 consistent gets
      0 physical reads
      0 redo size
 434947 bytes sent via SQL*Net to client
    696 bytes received via SQL*Net from client
     10 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
  40257 rows processed

Now if we decide that actually the query based on GRADE is far more important to the business, we could of course reorder the data again. The following is yet another table, this time based on the CODE sorted ZIGGY2 table, but inserted in GRADE column order:

SQL> create table ziggy3 as select * from ziggy2 order by grade;

Table created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> 'ZIGGY3', method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> create index ziggy3_code_i on ziggy3(code);

Index created.

SQL> create index ziggy3_grade_i on ziggy3(grade);

Index created.

SQL> select index_name, clustering_factor, num_rows from user_indexes 

where table_name='ZIGGY3';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY3_CODE_I                    30231    4000000
ZIGGY3_GRADE_I                   17582    4000000

We notice we now have an excellent, very low Clustering Factor for the GRADE index (down to just 17582). But notice also the Clustering Factor for CODE. Although it has increased from 17561 to 30231, it’s nowhere near as bad as it was initially when is was a massive 1748800.

The point being that with the data already ordered on CODE, Oracle inserting the data in GRADE order effectively had the data already sub-ordered on CODE. So we end up with perfect clustering on the GRADE column and “good enough” clustering on CODE as well.

If we now run the same queries again:

SQL> select * from ziggy3 where code=42;

40000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1004048030

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name          | Rows  | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |               | 40000 | 1054K |     392 (3) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY3        | 40000 | 1054K |     392 (3) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY3_CODE_I | 40000 |       |      84 (5) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
       0 recursive calls
       0 db block gets
     401 consistent gets
       0 physical reads
       0 redo size
 1272038 bytes sent via SQL*Net to client
     685 bytes received via SQL*Net from client
       9 SQL*Net roundtrips to/from client
       0 sorts (memory)
       0 sorts (disk)
   40000 rows processed

With the CODE based query, the CBO still uses the index and performance is still quite good with consistent gets having  gone up a tad (401 up from 273). However, we now have the scenario where the GRADE based query is also efficient with the index access also selected by the CBO:

SQL> select * from ziggy3 where grade=42;

40257 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 844233985

------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name           | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                | 40000 | 1054K |     264 (4) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY3         | 40000 | 1054K |     264 (4) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY3_GRADE_I | 40000 |       |      84 (5) | 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("GRADE"=42)

Statistics
----------------------------------------------------------
       0 recursive calls
       0 db block gets
     278 consistent gets
       0 physical reads
       0 redo size
 1280037 bytes sent via SQL*Net to client
     696 bytes received via SQL*Net from client
      10 SQL*Net roundtrips to/from client
       0 sorts (memory)
       0 sorts (disk)
   40257 rows processed

We are relying here however on how Oracle actually loads the data on the non-sorted columns, so we can guarantee good clustering on both these columns by simply ordering the data on both columns. Here’s table number 4 with data explicitly sorted on both columns (the values of CODE sub-sorted within the ordering of GRADE):

SQL> create table ziggy4 as select * from ziggy3 order by grade, code;

Table created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=> 'ZIGGY4', method_opt=> 'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> create index ziggy4_code_i on ziggy4(code);

Index created.

SQL> create index ziggy4_grade_i on ziggy4(grade);

Index created.

SQL> select index_name, clustering_factor, num_rows from user_indexes 

where table_name='ZIGGY4';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY4_CODE_I                    27540    4000000
ZIGGY4_GRADE_I                   17583    4000000

We notice we have a near perfect Clustering Factor on the GRADE column (just 17583) and a “good enough” Clustering Factor on the CODE column (27540).

With 12c Rel 2, we can effectively “fix” the original poorly clustered table online on both columns by adding an appropriate Clustering Attribute to the table (new in 12.1) and performing a subsequent Online table reorg (new in 12.2):

SQL> alter table ziggy add clustering by linear order (grade, code);

Table altered.

SQL> alter table ziggy move online;

Table altered.

SQL> select index_name, clustering_factor, num_rows from user_indexes

where table_name='ZIGGY';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY_CODE_I                     27525    4000000
ZIGGY_GRADE_I                    17578    4000000

We now have the same excellent Clustering Factor values as we had in the previous example.

Depending on data characteristics, you could potentially use the Interleave Clustering Attribute for good enough Clustering Factor values on your multiple columns, rather than perfect clustering on specific columns.

So it is entirely possible to have the necessary data ordering you need for effective data accesses on multiple columns concurrently.

12.2 Online Conversion of a Non-Partitioned Table to a Partitioned Table (A Small Plot Of Land) March 27, 2017

Posted by Richard Foote in 12c Release 2 New Features, Attribute Clustering, Clustering Factor, Online DDL, Oracle, Oracle Indexes, Partitioning.
4 comments

Image result for outside bowie

In my previous post, I discussed how you can now move heap tables online with Oracle Database 12.2 and how this can be very beneficial in helping to address issues with the Clustering Factor of key indexes.

A problem with this technique is that is requires the entire table to be effectively reorganised when most of the data might already be well clustered. It would be much more efficient if we could somehow only move and reorganise just the portion of a table that has poorly clustered data introduced to the table since the last reorg.

Partitioning the table appropriately would help to address this disadvantage but converting a non-partitioned table to be partitioned can be a pain. To do this online with as little complication as possible one could use the dbms_redefintion package which has improved with latter releases.

However, with Oracle Database 12.2, there is now an even easier, more flexible method of performing such a conversion.

Using the same table definition and data as from my previous post, I’m going to first create a couple of additional indexes (on the ID column and on the DATE_CREATED column) :


SQL> create unique index ziggy_id_i on ziggy(id);

Index created.

SQL> create index ziggy_date_created_i on ziggy(date_created);

Index created.

To convert a non-partitioned table to a partitioned table online, we can now use this new extension to the ALTER TABLE syntax:


SQL> alter table ziggy
2 modify partition by range (date_created)
3 (partition p1 values less than (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
4 partition p2 values less than (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
5 partition p3 values less than (maxvalue)) online;

Table altered.

How simple is that !! We now have a table that is range partitioned based on the DATE_CREATED column and this conversion was performed online.

We notice not only is the table now partitioned with all the indexes remaining Valid, but the index based on the partitioning key (DATE_CREATED) has also been implicitly converted to be a Local partitioned index:


SQL> select table_name, status, partitioned from dba_tables
where table_name='ZIGGY';

TABLE_NAME   STATUS   PAR
------------ -------- ---
ZIGGY        VALID    YES

SQL> select index_name, status, partitioned, num_rows
from dba_indexes where table_name='ZIGGY';

INDEX_NAME           STATUS   PAR   NUM_ROWS
-------------------- -------- --- ----------
ZIGGY_DATE_CREATED_I      N/A YES    2000000
ZIGGY_CODE_I VALID             NO    2000000
ZIGGY_ID_I VALID               NO    2000000

SQL> select index_name, partition_name, status, leaf_blocks from dba_ind_partitions
     where index_name like 'ZIGGY%';

INDEX_NAME           PARTITION_NAME  STATUS   LEAF_BLOCKS
-------------------- --------------- -------- -----------
ZIGGY_DATE_CREATED_I              P1   USABLE         865
ZIGGY_DATE_CREATED_I              P2   USABLE        1123
ZIGGY_DATE_CREATED_I              P3   USABLE        1089

SQL> select index_name, partitioning_type, partition_count, locality
from dba_part_indexes where table_name='ZIGGY';

INDEX_NAME           PARTITION PARTITION_COUNT LOCALI
-------------------- --------- --------------- ------
ZIGGY_DATE_CREATED_I     RANGE               3 LOCAL

As part of the table conversion syntax, we have the option to also update all the associated indexes and partition them in any manner we may want. For example:


SQL> alter table ziggy
2 modify partition by range (date_created)
3 (partition p1 values less than (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
4 partition p2 values less than (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
5 partition p3 values less than (maxvalue)) online
6 update indexes
7 (ziggy_code_i local,
8 ziggy_id_i global partition by range (id)
9 (partition ip1 values less than (maxvalue)));

Table altered.

In this example, not only are we converting the non-partitioned table to be partitioned, but we’re also explicitly converting the index on the CODE column to be a Locally partitioned index and the index on the ID column to be Globally partitioned in its own manner.

If we look at the definition of these indexes, we see that they also have all been converted to partitioned indexes online along with the table:


SQL> select table_name, status, partitioned from dba_tables
where table_name='ZIGGY';

TABLE_NAME   STATUS   PAR
------------ -------- ---
ZIGGY           VALID YES

SQL> select index_name, status, partitioned from dba_indexes
where table_name = 'ZIGGY';

INDEX_NAME           STATUS   PAR
-------------------- -------- ---
ZIGGY_CODE_I              N/A YES
ZIGGY_ID_I                N/A YES
ZIGGY_DATE_CREATED_I      N/A YES

SQL> select index_name, partitioning_type, partition_count, locality
from dba_part_indexes where table_name='ZIGGY';

INDEX_NAME           PARTITION PARTITION_COUNT LOCALI
-------------------- --------- --------------- ------
ZIGGY_CODE_I             RANGE               3 LOCAL
ZIGGY_ID_I               RANGE               1 GLOBAL
ZIGGY_DATE_CREATED_I     RANGE               3 LOCAL

If we look at the Clustering Factor of the important CODE column index, we see that all partitions have an excellent Clustering Factor as all partitions have just been created.


SQL> select partition_name, num_rows, clustering_factor from dba_ind_partitions
where index_name='ZIGGY_CODE_I';

PARTITION_NAME         NUM_ROWS CLUSTERING_FACTOR
-------------------- ---------- -----------------
P1                       490000              2275
P2                       730000              3388
P3                       780000              3620

However, if we now add new rows to the table as would occur with a real application, the data from the “current” partition results in the Clustering Factor “eroding” over time for this partition.


SQL> insert into ziggy select 2000000+rownum, mod(rownum,100), sysdate, 'DAVID BOWIE'
from dual connect by level <= 500000; 500000 rows created. SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_index_stats(ownname=>null,indname=>'ZIGGY_CODE_I');

PL/SQL procedure successfully completed.

SQL> select partition_name, num_rows, clustering_factor from dba_ind_partitions
     where index_name='ZIGGY_CODE_I';

PARTITION_NAME         NUM_ROWS CLUSTERING_FACTOR
-------------------- ---------- -----------------
P1                       490000              2275
P2                       730000              3388
P3                      1280000            238505

As discussed previously, the Clustering Attribute has no effect with standard DML operations. Therefore, the efficiency of the CODE index reduces over time in the partition where new data is being introduced. The Clustering Factor has now substantially increased from 3620 to 238505. Note for all the other partitions where there are no modifications to the data, the Clustering Factor remains excellent.

Having the table/index partitioned means we can therefore periodically reorg just the problematic partition:


SQL> alter table ziggy move partition p3 update indexes online;

Table altered.

SQL> select partition_name, num_rows, clustering_factor from dba_ind_partitions
     where index_name='ZIGGY_CODE_I';

PARTITION_NAME         NUM_ROWS CLUSTERING_FACTOR
-------------------- ---------- -----------------
P1                       490000              2275
P2                       730000              3388
P3                      1280000              5978

The Clustering Factor for this partition has now reduced substantially from 238505 to just 5978.

For those of you with the Partitioning database option, the ability in 12.2 to now so easily convert a non-partitioned table to be partitioned, along with its associated indexes is just brilliant 🙂

12.1.0.2 Introduction to Zone Maps Part III (Little By Little) November 24, 2014

Posted by Richard Foote in 12c, Attribute Clustering, Oracle Indexes, Zone Maps.
1 comment so far

I’ve previously discussed the new Zone Map database feature and how they work in a similar manner to Exadata Storage indexes.

Just like Storage Indexes (and conventional indexes for that manner), they work best when the data is well clustered in relation to the Zone Map or index. By having the data in the table ordered in the same manner as the Zone Map, the ranges of the min/max values for each 8M “zone” in the table can be as narrow as possible, making them more likely to eliminate zone accesses.

On the other hand, if the data in the table is not well clustered, then the min/max ranges within the Zone Map can be extremely wide, making their effectiveness limited.

In my previous example on the ALBUM_ID column in my first article on this subject, the data was extremely well clustered and so the associated Zone Map was very effective. But what if the data is poorly clustered ?

To illustrate, I’m going to create a Zone Map based on the poorly clustered ARTIST_ID column, which has its values randomly distributed throughout the whole table:

SQL> create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id);
 create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id)
 *
 ERROR at line 1:
 ORA-31958: fact table "BOWIE"."BIG_BOWIE" already has a zonemap
 "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" on it

Another difference between an index and Zone Map is that there can only be the one Zone Map defined per table, but a Zone Map can include multiple columns. As I already have a Zone Map defined on just the ALBUM_ID column, I can’t just create another.

So I’ll drop the current Zone Map and create a new one based on both the ARTIST_ID and ALBUM_ID columns:

SQL> drop materialized zonemap big_bowie_album_id_zm;

Materialized zonemap dropped.

SQL> create materialized zonemap big_bowie_zm on big_bowie(album_id, artist_id);

Materialized zonemap created.
    
 SQL> select measure, position_in_select, agg_function, agg_column_name
 from dba_zonemap_measures where zonemap_name='BIG_BOWIE_ZM';

MEASURE              POSITION_IN_SELECT AGG_FUNCTION  AGG_COLUMN_NAME
 -------------------- ------------------ ------------- --------------------
 "BOWIE"."BIG_BOWIE".                  5 MAX           MAX_2_ARTIST_ID
 "ARTIST_ID"

"BOWIE"."BIG_BOWIE".                  4 MIN           MIN_2_ARTIST_ID
 "ARTIST_ID"

"BOWIE"."BIG_BOWIE".                  3 MAX           MAX_1_ALBUM_ID
 "ALBUM_ID"

"BOWIE"."BIG_BOWIE".                  2 MIN           MIN_1_ALBUM_ID
 "ALBUM_ID"

So this new Zone Map has min/max details on each zone in the table for both the ARTIST_ID and ALBUM_ID columns.

The min/max ranges of a Zone Map provides an excellent visual representation of the clustering of the data. If I select Zone Map details of the ALBUM_ID column (see partial listing below):

SQL> select zone_id$, min_1_album_id, max_1_album_id, zone_rows$ from big_bowie_zm;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_ROWS$
 ---------- -------------- -------------- ----------
 3.8586E+11              1              2      66234
 3.8586E+11              5              6      56715
 3.8586E+11              7              7      76562
 3.8586E+11              7              8      76632
 3.8586E+11              8              9      76633
 3.8586E+11             21             22      75615
 3.8586E+11             29             29      75582
 3.8586E+11             31             32      75545
 3.8586E+11             35             36      75617
 3.8586E+11             43             44      75615
 ...

3.8586E+11             76             77      75615
 3.8586E+11             79             80      75615
 3.8586E+11             86             87      75616
 3.8586E+11             88             89      75618
 3.8586E+11             97             97      75771
 3.8586E+11            100            100      15871

134 rows selected.

As the data in the table is effectively ordered based on the ALBUM_ID column (and so is extremely well clustered in relation to this column), the min/max ranges for each zone is extremely narrow. Each zone basically only contains one or two different values of ALBUM_ID and so if I’m just after a specific ALBUM_ID value, the Zone Map is very effective in eliminating zones from having to be accessed. Just what we want.

However, if we look at the Zone Map details of the poorly clustered ARTIST_ID column (again just a partial listing):

SQL> select zone_id$, min_2_artist_id, max_2_artist_id, zone_rows$ from big_bowie_zm;

ZONE_ID$ MIN_2_ARTIST_ID MAX_2_ARTIST_ID ZONE_ROWS$
 ---------- --------------- --------------- ----------
 3.8586E+11            3661           98244      66234
 3.8586E+11               1          100000      56715
 3.8586E+11            5273           81834      76562
 3.8586E+11               1          100000      76632
 3.8586E+11               1          100000      76633
 3.8586E+11               1          100000      75615
 3.8586E+11            2383           77964      75582
 3.8586E+11               1          100000      75545
 3.8586E+11               1          100000      75617
 3.8586E+11               1          100000      75615
 ...

3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75616
 3.8586E+11               1          100000      75618
 3.8586E+11            4848           80618      75771
 3.8586E+11           84130          100000      15871

134 rows selected.

We notice the ranges for most of the zones is extremely large, with many actually having a min value of 1 (the actual minimum) and a max of 100000 (the actual maximum). This is a worst case scenario as a specific required value could potentially reside in most of the zones, thereby  forcing Oracle to visit most zones and making the Zone Map totally ineffective.

If we run a query searching for a specific ARTIST_ID:

SQL> select * from big_bowie where artist_id=42;

100 rows selected.

Elapsed: 00:00:00.69

Execution Plan
 ----------------------------------------------------------
 Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
 | Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
 ----------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                       |           |    99 |  9108 |  3291  (13)| 00:00:01 |
 |*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |    99 |  9108 |  3291  (13)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ARTIST_ID"=42)
 filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
 BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ARTIST_ID" > :1 OR
 zm."MAX_2_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ZM" zm WHERE
 zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
 "ARTIST_ID"=42)

      
 Statistics
 ----------------------------------------------------------
 141  recursive calls
 0  db block gets
 101614  consistent gets
 0  physical reads
 0  redo size
 5190  bytes sent via SQL*Net to client
 618  bytes received via SQL*Net from client
 8  SQL*Net roundtrips to/from client
 0  sorts (memory)
 0  sorts (disk)
 100  rows processed

We notice we are forced to perform a very high number of consistent gets (101,614) when returning just 100 rows, much higher than the 2,364 consistent gets required to return a full 100,000 rows for a specific ALBUM_ID and not far from the 135,085 consistent gets when performing a full table scan.

We need to improve the performance of these queries based on the ARTIST_ID column …

Let’s drop this zone map:

SQL> drop materialized zonemap big_bowie_zm;

Materialized zonemap dropped.

and change the physical clustering of the data in the table so that the data is primarily now clustered in ARTIST_ID order:

 

SQL> alter table big_bowie add clustering by linear order(artist_id, album_id) with materialized zonemap;

Table altered.

So we have added a clustering attribute to this table (previously discussed here) and based a new Zone Map on this clustering at the same time.

SQL> select zonemap_name from dba_zonemaps where fact_table='BIG_BOWIE';

ZONEMAP_NAME
---------------
ZMAP$_BIG_BOWIE

SQL> select zonemap_name, pruning, with_clustering, invalid, stale, unusable
from dba_zonemaps where zonemap_name = 'ZMAP$_BIG_BOWIE';

ZONEMAP_NAME    PRUNING  WITH_CLUSTERING INVALID STALE   UNUSABLE
--------------- -------- --------------- ------- ------- --------
ZMAP$_BIG_BOWIE ENABLED  YES             NO      NO      NO

However, as we haven’t actually reorganized the table, the rows in the table are still clustered the same as before:

SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$
---------- -------------- -------------- ----------
3.8586E+11             43             44      75615
3.8586E+11              1              2      66234
3.8586E+11             81             82      75615
3.8586E+11             29             29      75582
3.8586E+11             50             50      75481
3.8586E+11             90             91      75484
3.8586E+11              5              6      56715
3.8586E+11              7              8      76632
3.8586E+11              8              9      76633
3.8586E+11             16             16      75481
...

3.8586E+11             44             44      75480
3.8586E+11             82             83      75616
3.8586E+11            100            100      15871
3.8586E+11             34             35      75576
3.8586E+11             14             15      75615
3.8586E+11             33             34      75616
3.8586E+11              3              5      75707

134 rows selected.

SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$
---------- --------------- --------------- ----------
3.8586E+11               1          100000      75545
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75617
3.8586E+11               1          100000      75911
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75615
3.8586E+11               1          100000      75616
3.8586E+11             132           75743      75612
3.8586E+11               1          100000      75615
...

3.8586E+11               1          100000      66296
3.8586E+11               1          100000      75615
3.8586E+11            2360           96960      75701
3.8586E+11               1          100000      75615
3.8586E+11               1          100000      75616
3.8586E+11           23432           98911      75480
3.8586E+11               1          100000      75791
3.8586E+11           21104           96583      75480

134 rows selected.

But if we now reorganise the table so that the clustering attribute can take effect:

SQL> alter table big_bowie move;

Table altered.

We notice the characteristics of the Zone Map has change dramatically. The previously well clustered ALBUM_ID now has a totally ineffective Zone Map with all the ranges effectively consisting of the full min/max values:

SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$
---------- -------------- -------------- ----------
3.9704E+11              1            142      21185
3.9704E+11              1            100       9452
3.9704E+11              1            100      76516
3.9704E+11              1            100      75501
3.9704E+11              1            100      75497
3.9704E+11              1            100      75501
3.9704E+11              1            100      75499
3.9704E+11              1            100      75504
3.9704E+11              1            100      75500
3.9704E+11              1            100      75501
...

3.9704E+11              1            100      75503
3.9704E+11              1            100      75498
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75794

144 rows selected.

While the previously ineffective Zone Map on the ARTIST_ID column is now much more effective with significantly smaller min/max ranges for each zone:

SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$
---------- --------------- --------------- ----------
3.9704E+11              67            1036      21185
3.9704E+11            2359            2453       9452
3.9704E+11            8341            9106      76516
3.9704E+11           18933           19688      75501
3.9704E+11           22708           23463      75497
3.9704E+11           26483           27238      75501
3.9704E+11           27238           27993      75499
3.9704E+11           33278           34033      75504
3.9704E+11           36674           40449      75500
3.9704E+11           38563           39318      75501
...

3.9704E+11           49888           50643      75503
3.9704E+11           62723           63478      75498
3.9704E+11           77824           78579      75501
3.9704E+11           82354           83109      75501
3.9704E+11           88394           89149      75501
3.9704E+11           93679           94434      75501
3.9704E+11           98211           98969      75794

144 rows selected.

The same query now runs so much faster as the Zone Map can eliminate almost all zones from being accessed:

SQL> select * from big_bowie where artist_id=42;

100 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |    99 |  9108 |  3291  (13)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |    99 |  9108 |  3291  (13)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ARTIST_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ARTIST_ID" > :1 OR
zm."MAX_1_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE
zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
"ARTIST_ID"=42)
Statistics
----------------------------------------------------------
187  recursive calls
0  db block gets
175  consistent gets
0  physical reads
0  redo size
5190  bytes sent via SQL*Net to client
618  bytes received via SQL*Net from client
8  SQL*Net roundtrips to/from client
9  sorts (memory)
0  sorts (disk)
100  rows processed

Consistent gets has reduced dramatically down to just 175 from the previously massive 101,614.

As is common with changing the clustering of data, what improves one thing makes something else significantly worse. The previously efficient accesses based on the ALBUM_ID column is now nowhere near as efficient as before:

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:01.27

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ALBUM_ID" > :1 OR
zm."MAX_2_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE
zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)

Statistics
----------------------------------------------------------
187  recursive calls
0  db block gets
141568  consistent gets
0  physical reads
0  redo size
4399566  bytes sent via SQL*Net to client
73878  bytes received via SQL*Net from client
6668  SQL*Net roundtrips to/from client
9  sorts (memory)
0  sorts (disk)
100000  rows processed

We now have to perform a whopping 141,568 consistent gets up from the previous 2,364 consistent gets.

So Zone Maps, like database indexes and Exadata Storage Indexes, can be extremely beneficial in reducing I/O but their effectiveness is very much dependant on the clustering of the underlining data.

12.1.0.2 Introduction to Attribute Clustering (The Division Bell) August 26, 2014

Posted by Richard Foote in 12c, Attribute Clustering, Clustering Factor, Oracle Indexes.
6 comments

One of the really cool new features introduced in 12.1.0.2 is Attribute Clustering. This new table based attribute allows you to very easily cluster data in close physical proximity based on the content of specific columns.

As I’ve discussed many times, indexes love table data that is physically clustered in a similar manner to the index as it can significantly improve the efficiency of such indexes. A low Clustering Factor (CF) makes an index more viable and is one of the more important considerations in CBO calculations.

But not only database indexes benefit from well cluster data. Other index structures such as Exadata Storage Indexes and the new Zone Maps (to be discussed in future articles) all benefit from well clustered data. Additionally, compression is likely to be much more effective with data that is well clustered and this in turns also impacts the efficiency of In-memory data (again, to be discussed in future articles).

So having the capability to now easily cluster data in regular heap tables has potentially many benefits.

To illustrate, I’m first going to create a table with data that is not well clustered at all. The CODE column has data that is basically evenly distributed throughout the whole table structure:

SQL> create table ziggy (id number, code number, name varchar2(30));

Table created.

SQL> insert into ziggy select rownum, mod(rownum,100), 'DAVID BOWIE' from dual connect by level >= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY', estimate_percent=>null,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

I’ll next create an index on this CODE column and check out its default CF:

SQL> create index ziggy_code_i on ziggy(code);

Index created.

SQL> select index_name, clustering_factor, num_rows from user_indexes
where index_name='ZIGGY_CODE_I';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY_CODE_I                    703133    2000000

For a table with 2 million rows, a CF of some 703,133 is very high and the index is going to be very inefficient when retrieving high numbers of rows.

Let’s run a query that returns a specific CODE value, approx. 1% of all the data (note I’ve set a large arraysize to minimize unnecessary fetches and resultant consistent  gets):

SQL> set arraysize 5000

SQL> select * from ziggy where code = 42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

-----------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       | 20000 |   390K|   383  (17)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| ZIGGY | 20000 |   390K|   383  (17)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
filter("CODE"=42)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
15212  consistent gets
0  physical reads
0  redo size
211208  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

The CBO has chosen a Full Table Scan and has decided to not use the index. If we hint the SQL:

SQL> select /*+ index (ziggy, ziggy_code_i) */ * from ziggy where code = 42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3294205578

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              | 20000 |   390K|  7081   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY        | 20000 |   390K|  7081   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ZIGGY_CODE_I | 20000 |       |    43   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
7081  consistent gets
41  physical reads
0  redo size
511195  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

At a cost of 7081, the index is way more expensive than the 383 cost for the FTS. The poor clustering of the CODE data within the table has made the index non viable.

Let’s now create another table, but this one with a clustering attribute set on the CODE column:

SQL> create table ziggy2 (id number, code number, name varchar2(30))

clustering by linear order (code) without materialized zonemap;

Table created.

The CLUSTERING BY LINEAR ORDER clause orders data in the table based on the specified columns, in this case the CODE column. Up to 10 columns can be included using this particular technique (there are other attribute clustering options which I’ll again cover in later articles, yes I’ll be writing quite a few new articles) 🙂 WITHOUT MATERIALIZED ZONEMAP means I don’t want to create these new Zone Maps index structures at this stage which could potentially reduce the amount of table storage needed to be accessed (again, I’ll discuss these at another time).

You must use a direct path insert to make use of attribute clustering (or reorganize the table as we’ll see).

So lets insert the exact same data into this new ZIGGY2 table via a straight direct path sub-select:

SQL> insert /*+ append */ into ziggy2 select * from ziggy;

2000000 rows created.

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  0arqdyc9vznpg, child number 0
-------------------------------------
insert /*+ append */ into ziggy2 select * from ziggy

Plan hash value: 1975011999

--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name  | Rows  | Bytes |TempSpc| Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |       |       |       |       | 10596 (100)|          |
|   1 |  LOAD AS SELECT                  |       |       |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |       |  2000K|    38M|       | 10596   (3)| 00:00:01 |
|   3 |    SORT ORDER BY                 |       |  2000K|    38M|    61M| 10596   (3)| 00:00:01 |
|   4 |     TABLE ACCESS STORAGE FULL    | ZIGGY |  2000K|    38M|       |   376  (16)| 00:00:01 |
--------------------------------------------------------------------------------------------------

SQL> commit;

Commit complete.

Notice the SORT ORDER BY step in the insert execution plan. This implicitly sorts the incoming data in CODE order to satisfy the attribute clustering requirement.

If we create an index on this table and examine the CF:

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY2',
estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select index_name, clustering_factor, num_rows
from user_indexes where index_name='ZIGGY2_CODE_I';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY2_CODE_I                     7072    2000000

We notice the default CF is indeed significantly lower at just 7072 than the previous value of 703133.

If we now run the equivalent query as before on this table:

SQL> select * from ziggy2 where code=42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 16801974

-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               | 20000 |   390K|   114   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY2        | 20000 |   390K|   114   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ZIGGY2_CODE_I | 20000 |       |    43   (3)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
121  consistent gets
41  physical reads
0  redo size
511195  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

We notice the CBO has now decided to use the index. This is due to the cost of the index based execution plan being just 114, significantly lower than the previous index cost of 7081 or the FTS at a cost of 383. Just as importantly, the resultant number of consistent gets has also significantly reduced to just 121, significantly less than the previous 7081 consistent gets when using the index. So the index is indeed much more efficient to use and the CBO costs for this is just about spot on. The end result is that performance has improved.

So how to now likewise improve the performance of the first table? Simple add the attribute clustering and reorganize the table:

SQL> alter table ziggy add clustering by linear order(code) without materialized zonemap;

Table altered.

SQL> alter table ziggy move;

Table altered.

SQL> alter index ziggy_code_i rebuild;

Index altered.

SQL> select index_name, clustering_factor, num_rows from user_indexes where index_name='ZIGGY_CODE_I';

INDEX_NAME      CLUSTERING_FACTOR   NUM_ROWS
--------------- ----------------- ----------
ZIGGY_CODE_I                 7134    2000000

So as expected, the CF has likewise reduced. So if we now run the query:

SQL> select * from ziggy where code=42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3294205578

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              | 20000 |   390K|   115   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY        | 20000 |   390K|   115   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ZIGGY_CODE_I | 20000 |       |    43   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
121  consistent gets
0  physical reads
0  redo size
511195  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

The query likewise uses the index and with far less consistent gets and performance is significantly better.

So attribute clustering provides a nice mechanism by which data in a heap table (or importantly within a partition or sub-partition) can be physically clustered in a manner that can be potentially beneficial in various scenarios. Of course, the decision on how to actually cluster the data and on which columns is somewhat crucial 🙂

12.1.0.2 Released With Cool Indexing Features (Short Memory) July 25, 2014

Posted by Richard Foote in 12c, Advanced Index Compression, Attribute Clustering, Database In-Memory, Zone Maps.
2 comments

Oracle Database 12.1.0.2 has finally been released and it has a number of really exciting goodies from an indexing perspective which include:

  • Database In-Memory Option, which enables specific portions of the database to be in dual format, in both the existing row based format and additionally into an efficient memory only columnar based format. This in turn enables analytical based processing to access the real-time data in the In-Memory Store extremely fast, potentially faster and more effectively than via standard analytical based database indexes.
  • Advanced Index Compression, which allows Oracle to automatically choose the appropriate compression method for each individual leaf block, rather than having to manually select a single compression method across the whole index. This makes compressing an index a breeze and much more effective than previously possible.
  • Zone Maps, which enables Storage Index like capabilities to be manually configured and physically implemented inside the database, to eliminate unnecessary accesses of table storage via much smaller objects than conventional database indexes.
  • Attribute Clustering, a new table attribute which enables much better clustering of table data and we all know how both compression and index structures love table data to be well clustered.

These are all topics I’ll be covering in the coming weeks so stay tuned 🙂